Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Add non-pandas reference implementation for CodeChanges metric #203

Merged
merged 1 commit into from Aug 6, 2019

Conversation

Polaris000
Copy link
Contributor

@Polaris000 Polaris000 commented Jul 14, 2019

This pull request adds the root class, hierarchy classes and utility classes along with the reference implementation for CodeChanges metric.

group = list(group)

timeseries[period] = len(group)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a little trouble generating a timeseries similar to the result of the resample() method in the pandas implementation.

  • For example, df.resample('M').count() performs the count aggregation operation on the groups but also returns a value of 0 for those groups with no elements. It is hard to do this with the itertools I am using.
  • df.resample() allows for easily passing required timeperiods like M or W. Though I can use itertool to groupby month like the current implementation I have done, I'll have to come up with a different way to do it, so that it is easier for a period of a week or day.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, we don't need to do all the fancy stuff that Pandas resample does. We can stick to the basics, or even forget about the timeseries stuff, since we really don't need it for showing the definition of the metric.

I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.

In any case, for having months with zero items, you can just iterate after you have the dictionary with all months with non-zero items, filling in the gaps. You can do that at the same time you convert the data to a list (which maybe is the best representation for the resulting time serie). That is, you iterate from first month to last month, and if there is nothing in the dictonary for that month, the result is zero. If there is, the result is that one.

You can also try to do that when you build the dictionary, using itertools to iterate on the list of months, but that's a bit complex and I m not sure it's doable without coding it myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.

Sure. I am removing the functionality for now.
Thanks for the other suggestions as well!

Copy link
Collaborator

@jgbarah jgbarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the few comments I included, I find this good. Thanks.

Please, fix those little issues.

group = list(group)

timeseries[period] = len(group)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, we don't need to do all the fancy stuff that Pandas resample does. We can stick to the basics, or even forget about the timeseries stuff, since we really don't need it for showing the definition of the metric.

I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.

In any case, for having months with zero items, you can just iterate after you have the dictionary with all months with non-zero items, filling in the gaps. You can do that at the same time you convert the data to a list (which maybe is the best representation for the resulting time serie). That is, you iterate from first month to last month, and if there is nothing in the dictonary for that month, the result is zero. If there is, the result is that one.

You can also try to do that when you build the dictionary, using itertools to iterate on the list of months, but that's a bit complex and I m not sure it's doable without coding it myself.

from itertools import groupby
import conditions
import utils
from commit_git import CommitGit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, leave a blank line. See coding standard on import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Overlooked that. Made the change.

while instantiating CodeChangesGit.
"""

commit_hashes = [item['hash'] for item in self.items]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this, you can either make commit_hashes directly a set, or make it a dictionary, and then just count the dictionary (adding an existing key to a dictionary does not add a new element)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Used a set comprehension to directly make commit_hashes a set.

This patch includes the pure python implementation
for Code Changes metric along with other hierarchy and root classes. 
 
Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>
@Polaris000
Copy link
Contributor Author

Please have a look @jgbarah.

Copy link
Collaborator

@jgbarah jgbarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all your changes!

@jgbarah jgbarah merged commit 4bff7f6 into chaoss:master Aug 6, 2019
@Polaris000 Polaris000 deleted the CodeChangesGit_pure branch August 8, 2019 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GSoC] Create reference implementations without using pandas
2 participants