[GSoC] Add non-pandas reference implementation for CodeChanges metric #203

Polaris000 · 2019-07-14T05:51:42Z

This pull request adds the root class, hierarchy classes and utility classes along with the reference implementation for CodeChanges metric.

Closes [GSoC] Create reference implementations without using pandas #156

Polaris000 · 2019-07-14T18:47:25Z

implementations/code/code_changes_git.py

+            group = list(group)
+
+            timeseries[period] = len(group)
+


I'm having a little trouble generating a timeseries similar to the result of the resample() method in the pandas implementation.

For example, df.resample('M').count() performs the count aggregation operation on the groups but also returns a value of 0 for those groups with no elements. It is hard to do this with the itertools I am using.

df.resample() allows for easily passing required timeperiods like M or W. Though I can use itertool to groupby month like the current implementation I have done, I'll have to come up with a different way to do it, so that it is easier for a period of a week or day.

First of all, we don't need to do all the fancy stuff that Pandas resample does. We can stick to the basics, or even forget about the timeseries stuff, since we really don't need it for showing the definition of the metric.

I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.

In any case, for having months with zero items, you can just iterate after you have the dictionary with all months with non-zero items, filling in the gaps. You can do that at the same time you convert the data to a list (which maybe is the best representation for the resulting time serie). That is, you iterate from first month to last month, and if there is nothing in the dictonary for that month, the result is zero. If there is, the result is that one.

You can also try to do that when you build the dictionary, using itertools to iterate on the list of months, but that's a bit complex and I m not sure it's doable without coding it myself.

I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.

Sure. I am removing the functionality for now.
Thanks for the other suggestions as well!

jgbarah

Except for the few comments I included, I find this good. Thanks.

Please, fix those little issues.

jgbarah · 2019-07-15T19:19:10Z

implementations/code/code_changes_git.py

+            group = list(group)
+
+            timeseries[period] = len(group)
+


First of all, we don't need to do all the fancy stuff that Pandas resample does. We can stick to the basics, or even forget about the timeseries stuff, since we really don't need it for showing the definition of the metric.

I suggest that we start without timeseries, and then in a separate pull request, we discuss about the best way of integrating it. If we find a reasonable way, that's it. If not, we can live without that functionality for this "branch" of the reference implementation.

In any case, for having months with zero items, you can just iterate after you have the dictionary with all months with non-zero items, filling in the gaps. You can do that at the same time you convert the data to a list (which maybe is the best representation for the resulting time serie). That is, you iterate from first month to last month, and if there is nothing in the dictonary for that month, the result is zero. If there is, the result is that one.

You can also try to do that when you build the dictionary, using itertools to iterate on the list of months, but that's a bit complex and I m not sure it's doable without coding it myself.

jgbarah · 2019-07-15T19:23:08Z

implementations/code/code_changes_git.py

+from itertools import groupby
+import conditions
+import utils
+from commit_git import CommitGit


Please, leave a blank line. See coding standard on import

Oops! Overlooked that. Made the change.

jgbarah · 2019-07-15T19:24:45Z

implementations/code/code_changes_git.py

+            while instantiating CodeChangesGit.
+        """
+
+        commit_hashes = [item['hash'] for item in self.items]


To avoid this, you can either make commit_hashes directly a set, or make it a dictionary, and then just count the dictionary (adding an existing key to a dictionary does not add a new element)

Ok. Used a set comprehension to directly make commit_hashes a set.

This patch includes the pure python implementation for Code Changes metric along with other hierarchy and root classes. Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>

Polaris000 · 2019-07-16T11:00:45Z

Please have a look @jgbarah.

jgbarah

Thanks a lot for all your changes!

Polaris000 commented Jul 14, 2019

View reviewed changes

jgbarah requested changes Jul 15, 2019

View reviewed changes

Polaris000 force-pushed the CodeChangesGit_pure branch from 15e7b3c to b0c0809 Compare July 16, 2019 10:54

Add code_changes_git pure python implemenation

04bff2c

This patch includes the pure python implementation for Code Changes metric along with other hierarchy and root classes. Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>

Polaris000 force-pushed the CodeChangesGit_pure branch from b0c0809 to 04bff2c Compare July 16, 2019 10:59

jgbarah approved these changes Aug 6, 2019

View reviewed changes

jgbarah merged commit 4bff7f6 into chaoss:master Aug 6, 2019

Polaris000 deleted the CodeChangesGit_pure branch August 8, 2019 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC] Add non-pandas reference implementation for CodeChanges metric #203

[GSoC] Add non-pandas reference implementation for CodeChanges metric #203

Polaris000 commented Jul 14, 2019 •

edited

Polaris000 Jul 14, 2019

jgbarah Jul 15, 2019

Polaris000 Jul 16, 2019

jgbarah left a comment

jgbarah Jul 15, 2019

jgbarah Jul 15, 2019

Polaris000 Jul 16, 2019

jgbarah Jul 15, 2019

Polaris000 Jul 16, 2019

Polaris000 commented Jul 16, 2019

jgbarah left a comment

[GSoC] Add non-pandas reference implementation for CodeChanges metric #203

[GSoC] Add non-pandas reference implementation for CodeChanges metric #203

Conversation

Polaris000 commented Jul 14, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgbarah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Polaris000 commented Jul 16, 2019

jgbarah left a comment

Choose a reason for hiding this comment

Polaris000 commented Jul 14, 2019 •

edited