[GSoC][implementions] Provide basic structure for the implementations directory #160

Polaris000 · 2019-06-10T02:52:01Z

This patch adds the basic structure for the latest reference implementations. Implementations will be added from the next pull request.
Closes #154.

This patch adds the basic structure for reference implementations. The structure, at its most basic level, is as follows: Root Class <- Category class <- individual Metric class. Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>

Polaris000 · 2019-06-10T14:18:38Z

Please have a look at @jgbarah!

This patch fixes a bug in identifying which commits are merge commits or empty commits. It also adds default arguments to SourceCode init to accomodate for when no exclude_list is provided. Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>

jgbarah

My main comment is structural: instead of having code for issues, prs, commit in the same pull request, let's have all the code we need for the first metric, and nothing else. That way, we can focus on a clean implementation of the first metric, including all of its hierarchy of classes.

I find it appropriate to work in other metrics, but for now, if that's needed, it would be be better done in your repo.

So, for this one, I suggest to merge all three current prs (#160, #161 and #162) into a single pr (closing then the other prs), with a reimplementation of CodeChanges, including the root Metric class, Commit (if needed, I'm not sure about it for now, but do as you feel it's better), and CodeChanges. in the same commit, any other auxiliary file that may be needed (util, for example), and the README file describing the new structure.

In any case, I'm adding some specific comments to this pr, that you may adress in that one.

jgbarah

Please, in addition to my other review, have these comments in mind.

jgbarah · 2019-06-10T22:43:57Z

implementations/scripts/Commit.py

+        """
+        Initilizes self.df, the dataframe with one commit per row.
+
+        :param data_list: A list of dictionaries.


Specify that each item in the list is a Perceval dictionary.

jgbarah · 2019-06-10T22:44:39Z

implementations/scripts/Commit.py

+        Initilizes self.df, the dataframe with one commit per row.
+
+        :param data_list: A list of dictionaries.
+            Each element a line from the JSON file


This is not exact. Items could come from a JSON file, but also directly from Perceval.

Will make the required changes!

jgbarah · 2019-06-10T22:45:48Z

implementations/scripts/Commit.py

+
+        :param data_list: A list of dictionaries.
+            Each element a line from the JSON file
+        :param date_range: A tuple which represents the period of interest


Explain that a tuple is (start, end), with any of those having None as a possible value, and the meaning of None.

jgbarah · 2019-06-10T22:47:59Z

implementations/scripts/Commit.py

+        :param data_list: A list of dictionaries.
+            Each element a line from the JSON file
+        :param date_range: A tuple which represents the period of interest
+        :param src_code_obj: An object of SourceCode.


If I understand well, SourceCode is a class used to determine what is source code, right? If so, this will be an object of the SourceCode hierarchy. And to make the name more clear, for the class I would use IsSourceCode (maybe), for the parameter something like is_src_code. But we can discuss naming later on, I'm not completely sure about that for now.

For now, I'll make the changes you suggest. We'll make changes later on if required.

jgbarah · 2019-06-10T22:48:53Z

implementations/scripts/Commit.py

+
+        super().__init__(data_list)
+
+        clean_data_list = list()


Why "clean_data_list" and not just "items", for example?

In my opinion, it is a better name for the list of cleaned commit dictionaries. items may refer to even the uncleaned list of commits, especially when we move on to the non-pandas version: which would have an unclean_list as well as clean_list

Is there a particular reason you favor items?

jgbarah · 2019-06-10T22:49:35Z

implementations/scripts/Commit.py

+
+        clean_data_list = list()
+        self.since = date_range[0]
+        self.until = date_range[1]


(since, until) = date_range is more clear and more "Pythonic"

Oops! That should n't be there : will update that.

jgbarah · 2019-06-10T22:51:16Z

implementations/scripts/Commit.py

+        self.since = date_range[0]
+        self.until = date_range[1]
+
+        for line in self.raw_df.iterrows():


So, you're doing some stuff with rows in the dataframe here, that was initialized in Metric, is that right? If so, I don't see the rationale.

Let me tell you how i would do it.

In Metric, __init__ would be like (assuming 'items' is the list of Perceval dictionaries, and just with simplified code):

for item in items: flat_item = flatten_item(item) if flat_item: add(self.df, flat_item)

In Commit (or maybe Code_Change) class, you would define flatten_item, as a function that given an item from Perceval, with a commit, produces a flat item (that, is, an flat dictionary) if conditions apply (eg, if date is as it should be, if it is source_code, etc.), and None if not.

This way, the code for this root class would be much cleaner, with most of the details for filtering items and selecting fields to flatten will be left to child classes.

Would you mind trying it that way?

What I have done

Metric.py:

Gets a list of raw dictionaries from Perceval/json file

Flattens it using the flatten method in the same class

Commit.py:

Based on since, until / sourcecode, excludes commits

Picks out certain commit attributes to create a final clean df

CodeChanges.py

Computes metric / timeseries metric over the df after Commit has worked with it.

In Commit (or maybe Code_Change) class, you would define flatten_item, as a function that given an item from Perceval, with a commit, produces a flat item (that, is, an flat dictionary) if conditions apply (eg, if date is as it should be, if it is source_code, etc.), and None if not.

But isn't the data flattened in the Metric class?
Are you suggesting to use a child class function (flatten_item defined in Commit.py or CodeChanges.py) in the parent class (Metric.py) and do all the flattening as well as cleaning at once in Metric.py?

Can you please elaborate a little @jgbarah ? Thanks.
@aswanipranjal, if possible, could you give your views on this? Thanks :)

@Polaris000, sorry, I am a little late in the discussion.

But isn't the data flattened in the Metric class?

Correct. The implementations of the flatten method will be in the child classes (Commit, Issue).

Are you suggesting to use a child class function (flatten_item defined in Commit.py or CodeChanges.py) in the parent class (Metric.py) and do all the flattening as well as cleaning at once in Metric.py?

Correct. Right now the current implementation which has been merged in the repo, in the Metric class is:

def __init__(self, items): flat_items = self._flatten_data(items) self.raw_df = pd.DataFrame(flat_items)

The _flatten_data function should be implemented in the Commit class, which is what you've done and is the correct way.

What I think @jgbarah and my concern is that the following code in the __init__ method of the Commit class:

self.issrccode_obj = issrccode_obj (self.since, self.until) = date_range clean_items = list() for line in items: commit = self._flatten_data(line) if commit is not None: clean_items.append(commit) self.df = pd.DataFrame(clean_items) if self.since is None: self.since = utils.get_date(self.df, 'since') if self.until is None: self.until = utils.get_date(self.df, 'until')

will be common to all the other data sources as well (Issue, PullRequest) and hence can be moved to the Metric class's __init__ method.

This way, we have common code to initialize the classes and specific implementations to flatten or clean the data according to the data sources.

Is it more clear now @Polaris000?

Thanks @aswanipranjal! My latest idea (#175) involves moving the population of since and until to compute_timeseries method. This way, the __init__method would have nothing out of place. The Metric(root) class is not implemented anyway.

jgbarah · 2019-06-10T23:00:59Z

implementations/scripts/Commit.py

+        else:
+            self.until = utils.get_date(self.df, "until")
+
+    def _clean_commit(self, line):


This would be similar to the flatten_commit that I mention above.

Polaris000 · 2019-06-11T13:12:30Z

@jgbarah Thanks for the prompt review! I had a look and left a few comments where I was a little unsure. I'll combine the pull requests and create a new one. I thought they would be easier to review if separated.

Polaris000 · 2019-06-11T18:22:38Z

I have added combined #160 and #161 into #162 and added commits there. Please have a look . Before that, please have a look at my comment here (just above this one). Thanks :)

Add basic structure for reference implementations

e23cd06

This patch adds the basic structure for reference implementations. The structure, at its most basic level, is as follows: Root Class <- Category class <- individual Metric class. Signed-off-by: Aniruddha Karajgi <akarajgi0@gmail.com>

Polaris000 changed the title ~~Provide basic structure for the implementations directory~~ [GSoC][implementions] Provide basic structure for the implementations directory Jun 10, 2019

Polaris000 force-pushed the BasicStructure branch from 195d9d3 to 30b5ac8 Compare June 10, 2019 17:37

jgbarah requested changes Jun 10, 2019

View reviewed changes

jgbarah reviewed Jun 10, 2019

View reviewed changes

Polaris000 closed this Jun 11, 2019

Polaris000 mentioned this pull request Jun 11, 2019

[GSoC] Add reference implementation and python script for CodeChanges metric #162

Merged

Polaris000 deleted the BasicStructure branch June 20, 2019 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC][implementions] Provide basic structure for the implementations directory #160

[GSoC][implementions] Provide basic structure for the implementations directory #160

Polaris000 commented Jun 10, 2019 •

edited

Polaris000 commented Jun 10, 2019

jgbarah left a comment

jgbarah left a comment

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019 •

edited

jgbarah Jun 10, 2019

Polaris000 Jun 11, 2019

aswanipranjal Jun 19, 2019

Polaris000 Jun 19, 2019

jgbarah Jun 10, 2019

Polaris000 commented Jun 11, 2019

Polaris000 commented Jun 11, 2019

[GSoC][implementions] Provide basic structure for the implementations directory #160

[GSoC][implementions] Provide basic structure for the implementations directory #160

Conversation

Polaris000 commented Jun 10, 2019 • edited

Polaris000 commented Jun 10, 2019

jgbarah left a comment

Choose a reason for hiding this comment

jgbarah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Polaris000 Jun 11, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

What I have done

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Polaris000 commented Jun 11, 2019

Polaris000 commented Jun 11, 2019

Polaris000 commented Jun 10, 2019 •

edited

Polaris000 Jun 11, 2019 •

edited