New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] Add non-pandas reference implementation for CodeChanges metric #203
Conversation
group = list(group) | ||
|
||
timeseries[period] = len(group) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a little trouble generating a timeseries similar to the result of the resample()
method in the pandas
implementation.
- For example,
df.resample('M').count()
performs the count aggregation operation on the groups but also returns a value of 0 for those groups with no elements. It is hard to do this with theitertools
I am using. df.resample()
allows for easily passing required timeperiods likeM
orW
. Though I can useitertool
to groupby month like the current implementation I have done, I'll have to come up with a different way to do it, so that it is easier for a period of aweek
orday
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, we don't need to do all the fancy stuff that Pandas resample does. We can stick to the basics, or even forget about the timeseries stuff, since we really don't need it for showing the definition of the metric.
I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.
In any case, for having months with zero items, you can just iterate after you have the dictionary with all months with non-zero items, filling in the gaps. You can do that at the same time you convert the data to a list (which maybe is the best representation for the resulting time serie). That is, you iterate from first month to last month, and if there is nothing in the dictonary for that month, the result is zero. If there is, the result is that one.
You can also try to do that when you build the dictionary, using itertools to iterate on the list of months, but that's a bit complex and I m not sure it's doable without coding it myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.
Sure. I am removing the functionality for now.
Thanks for the other suggestions as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the few comments I included, I find this good. Thanks.
Please, fix those little issues.
group = list(group) | ||
|
||
timeseries[period] = len(group) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, we don't need to do all the fancy stuff that Pandas resample does. We can stick to the basics, or even forget about the timeseries stuff, since we really don't need it for showing the definition of the metric.
I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.
In any case, for having months with zero items, you can just iterate after you have the dictionary with all months with non-zero items, filling in the gaps. You can do that at the same time you convert the data to a list (which maybe is the best representation for the resulting time serie). That is, you iterate from first month to last month, and if there is nothing in the dictonary for that month, the result is zero. If there is, the result is that one.
You can also try to do that when you build the dictionary, using itertools to iterate on the list of months, but that's a bit complex and I m not sure it's doable without coding it myself.
from itertools import groupby | ||
import conditions | ||
import utils | ||
from commit_git import CommitGit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, leave a blank line. See coding standard on import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Overlooked that. Made the change.
while instantiating CodeChangesGit. | ||
""" | ||
|
||
commit_hashes = [item['hash'] for item in self.items] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this, you can either make commit_hashes directly a set, or make it a dictionary, and then just count the dictionary (adding an existing key to a dictionary does not add a new element)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Used a set comprehension to directly make commit_hashes
a set.
15e7b3c
to
b0c0809
Compare
This patch includes the pure python implementation for Code Changes metric along with other hierarchy and root classes. Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>
b0c0809
to
04bff2c
Compare
Please have a look @jgbarah. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all your changes!
This pull request adds the root class, hierarchy classes and utility classes along with the reference implementation for CodeChanges metric.