GSoC Idea: Support of Source Code Related Metrics #182

valeriocos · 2019-02-02T09:17:41Z

[ This issue for addressing questions and comments related to this GSoC idea, which is one of the ideas proposed by the CHAOSS group for the 2019 edition of GSoC.]

Description

Currently, GrimoireLab allows to produce analytics with data extracted from more than 30 tools related with contributing to Open Source development such as version control systems, issue trackers and forums. Despite the large set of metrics available in GrimoireLab, none of them relies on information extracted from source code, thus limiting the end-users to benefit of a wider spectrum of software development data.

Graal is a tool that allows to conduct customizable and incremental analysis of source code by leveraging on existing tools, and produce an output that conforms to the data that can be processed by GrimoireLab. Graal already offers analysis about code complexity, quality, dependencies, security and licensing, however currently it is not integrated with GrimoireLab.

This idea is about adding support to GrimoireLab to produce source code related metrics using Graal.

The aims of the project are as follows:

Understanding the GrimoireLab components (Perceval, ELK, Mordred and Sigils) and the corresponding tool-chain.
Adapting ELK and Mordred to be able to execute Graal and process the data produced.
Producing analytics with Graal data and including them in Sigils.
Evaluating the implementation with projects of different sizes.

Other aims, such as enhancing Graal to support more analysis or improve existing ones are completely within scope.

The aims will require extending GrimoireLab functionality to integrate Graal.

Difficulty: medium
Requirements: Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals.
Recommended: Experience with ElasticSearch and Kibana would be convenient, but can be learned during the project.
Mentors: @jgbarah , @valeriocos

Microtasks

For becoming familiar with GrimoireLab, you can start by reading some documentation. You can find useful information at:

GrimoireLab Tutorial
Perceval: Software Project Data at Your Will
Graal: The Quest for Source Code Knowledge
The GitHub repositories hosting the tools (https://github.com/chaoss/grimoirelab-perceval, https://github.com/chaoss/grimoirelab-sirmordred, https://github.com/chaoss/grimoirelab-*, and https://github.com/Bitergia/graal)

Once you're familiar with Grimoirelab, you can have a look at the following microtasks.

Microtask 0:
Download PyCharm and get familiar with it (for instance, you can follow this tutorial).
Microtask 1:
Set up Perceval to be executed from PyCharm.
Microtask 2:
Create a Python script to execute Perceval via its Python interface using the Git and GitHub backends. Feel free to select any target repository, for instance the GitHub repository hosting Perceval.
Microtask 3:
Based on the JSON documents produced by Perceval and its source code, try to answer the following questions:
- What is the meaning of the JSON attribute 'timestamp'?
- What is the meaning of the JSON attribute 'updated_on'?
- What is the meaning of the JSON attribute 'origin'?
- What is the meaning of the JSON attribute 'category'?
- What is the meaning of the JSON attribute 'uuid'?
- Which are the common methods of the Perceval backends?
- List and explain at least 3 Git commands used by the Perceval backend (you can rely on the Git documentation)
Microtask 4:
Create a Python script to fetch data from SoftwareHeritage using its API.
Given a target GitHub repository, the script should return a message if the repository is not available on SoftwareHeritage or the date of the last visit.
The script should rely on the endpoints: origin and visits.
Please use the Python library requests to issue requests to the SofwareHeritage API.
Microtask 5:
Set up Graal to be executed from PyCharm.
Microtask 6:
Create a Python script to execute Graal via its Python interface using the CoCom and CoLic backends. Feel free to select any target repository, for instance the GitHub repository hosting Toolkit.
Microtask 7:
Based on the JSON documents produced by Graal and its source code, try to answer the following questions:
- Which are the common methods of the Graal backends?
- List and explain at least 2 Git commands used by Graal (and not implemented in Perceval).
Microtask 8:
Create a Python script to execute flake8 for a given commit of any Git repository. Given a commit SHA and a Git repository, the script should clone the repository (if it doesn't exist locally), perform a checkout based on the commit SHA and execute flake8 on that checkout. The script should return a message that either lists the errors found or "OK" if flake8 successfully ended.
Microtask 9:
Submit at least a PR to one of the GrimoireLab repositories to fix an issue, improve the documentation, etc.

Showing the work you did

If you want to show the work you did, open a GitHub repository, and upload to it:

A README.md file explaining what you did, and linking to the results (which will be in the same repository, see below). This will be the main file to show your skills and interest on the project, so try to make it organized and clear, in a way that we can easily understand what you did.

Submitting information for the application process

You must complete microtask 0, and at least 6 microtasks, in the case you're interested in this idea.

Once you completed at least one microtask, go to the governance repository and create a pull request to add yourself, your information, and a link to your repository with the completed micro-task(s) in the GSoC-interest.md file.

You are welcome to include in your repository other information that could be of interest, such as open issues or pull requests submitted to the project to which you intend to contribute during GSoC, contributions to other projects, skills, and other related information.

You must complete these things by GSoC deadline for proposals. Make sure to also submit the information required by GSoC for applicants (i.e., project proposal), linking to it from your pull request in the GSoC-interest.md file.

Getting feedback for your proposal & microtasks

Our idea is to have a look at proposals that are registered in the governance repository starting after 25th March, when students can formally apply. But if you have specific doubts, comments, or whatever, use this issue.

In general, we don't want to give advice too specific to one case, because that could give some advantage to some person with respect to the others. Answering questions and addressing comments (if you want, based on your proposal) is not a problem as long as that's done in public, hence the threads in this issue.

Asking for help

If you need help, please use the following channels:

Comments in this issue
#grimoirelab channel in Freenode IRC

Polaris000 · 2019-02-09T06:08:53Z

Why is Pycharm necessary? Does it have something to do with Graal or SoftwareHeritage?

valeriocos · 2019-02-10T14:49:37Z

Hi @Polaris000 sorry for the late reply. Pycharm is not related to Graal or SoftwareHeritage, however is the IDE we commonly use with GrimoireLab. It has a good debug mode and allows to create a virtual env where to install external packages. Furthermore, it provides a mechanism to define a project structure (which allows to execute the source code of different packages without installing them via pip), so you can easily work on features that may require changes in several GrimoireLab components at the same time.
Which IDE are you familiar with?

Polaris000 · 2019-02-10T18:10:55Z

Thanks for the reply! I used pycharm in 2017 and 2018, mainly for basic data analytics and backend web development with django.
I was just curious as to why you specifically mentioned pycharm, though I agree with your logic. Pycharm is indeed a powerful tool.
As of now I don't use an IDE, just a text editor (sublime text). Though I do miss out on several features, I love the simple lightweight experience of sublime text.

inishchith · 2019-03-06T15:23:37Z

I'm interested in this Project and have been working on the microtasks.

@jgbarah @valeriocos I had some doubts though.

Microtask [ 0, 1 & 5 ] are mostly related to environment setup related to the tool and getting familiarized with it. How do i show the work on my Microtasks repository, a Readme.md file with some steps?
Had to similar doubt as mentioned in this thread about sharing our work.

Thanks

valeriocos · 2019-03-06T15:37:28Z

Thank you @inishchith for your interest in this idea.

For task 0 it is not needed to show something. WRT tasks 1 and 5, a screenshot of the configurations should be enough, something like the image below, for instance

WRT 2., you can work on a private repo if it suits you. What do you think @jgbarah ?

inishchith · 2019-03-06T17:16:59Z

@valeriocos Thanks for the response 👍

inishchith · 2019-03-14T17:51:37Z

@jgbarah @valeriocos @aswanipranjal
I've worked on the Microtasks mentioned above and would be looking out for improvements on it from here on and soon will start working on the proposal. I had some questions .

One of the aims mentioned "Evaluating the implementation with projects of different sizes", I'm unsure if i understand this clearly, can you please elaborate on this?
I've read through the discussion here . I'm planning to make my Microtasks repository public before the submission for proposal starts, is there any way we can get some feedback on the tasks we've performed?

Thanks

valeriocos · 2019-03-14T18:08:06Z

great @inishchith !

wrt 1, the idea is to take projects (e.g., github repositories) made of 1000, 10000 or more commits and benchmark the time needed by graal to process them. Ideally, this will allow to better understand where optimization efforts should be applied in the future.
does this sound more clear?

wrt 2. sure! if you point us to your repo (public or private), we will have a look and give feedback.
is it ok for you?

inishchith · 2019-03-14T18:13:42Z

@valeriocos Thanks for the response.

wrt 1, the idea is to take projects (e.g., github repositories) made of 1000, 10000 or more commits and benchmark the time needed by graal to process them. Ideally, this will allow to better understand where optimization efforts should be applied in the future.does this sound more clear?

Yes, Thanks ;)

wrt 2. sure! if you point us to your repo (public or private), we will have a look and give feedback.
is it ok for you?

I've got a private repository as of now. Do i add mentors as collaborators? if there's any other way i could give access, please do let me know.

valeriocos · 2019-03-14T20:31:39Z

yes, add us as collaborators.
thank you!

sumitskj · 2019-03-14T20:54:48Z

Hii, I am Sumit Kumar Jangir, I am really interested in this project and have done some microtasks as well and will try to solve all.

valeriocos · 2019-03-14T21:58:05Z

welcome @sumitskj ! don't hesitate to ping us if you have some questions or need feedback on your microtasks.

apoorvaanand1998 · 2019-03-15T04:21:51Z

Hi! So I've been going through the tutorial. When I run Kibiter (Which I decided to use over Kibana), I get the following warning:

[warning] You're running Kibana 6.1.4-3 with some different versions of Elasticsearch. Update Kibana or Elasticsearch to the same version to prevent compatibility issues: v6.1.0 @ 127.0.0.1:9200 (127.0.0.1)

I am running Kibiter community-v6.1.4-3 which is based on Kibana 6.1.0. So, I decided to install ElasticSearch 6.1.0 as well. Was I supposed to install some other version? The ElasticSearch support matrix says that all 6.1.x versions of ElasticSearch are compatible with all 6.1.x versions of Kibana. Is there anything I can (or should) do to take that warning away?

valeriocos · 2019-03-15T10:56:51Z

Hi @apoorvaanand1998 sorry for the late reply. Ideally ElasticSearch and Kibana should have the same versions (at least major and minor), however some time ago we decided to use Kibana 6.1.4 to benefit of some new features. You can ignore the warning, since it doesn't break anything in the platform.

Btw, I have prepared some docs to get started with grimoirelab (https://github.com/chaoss/grimoirelab-sirmordred/blob/master/README.md#getting-started). Hope this helps (and any feedback is welcome).

inishchith · 2019-03-15T13:54:09Z

@valeriocos I've added all the mentors as collaborators to the repository as you suggested. Looking forward for suggestions ;)
Thanks.

SunflowerPKU · 2019-03-16T14:34:05Z

Hi @valeriocos, I am very interested in this project and have already completed several microtasks.
However, when I was doing microtask 6, I encountered a problem and need your help.
I have already successfully created a Python script to execute Graal via its Python interface using the CoCom backends.
Unfortunately, when I was trying CoLic backends, I initialized CoLic object with function : CoLic(uri=repo_uri, git_path=repo_dir, exec_path=path), but I don't know what parameter 'exec_path' means.
I have tried a lot, but this problem remains unsolved. Could you help me? Thank you very much.

inishchith · 2019-03-16T14:42:25Z

@SunflowerPKU CoLic backend uses ScanCode and Nomos tools in order to process license related information.
exec_path is the executable path of the particular tool. You can install them using these instructions.

valeriocos · 2019-03-16T14:49:00Z

Hi @SunflowerPKU, thank you for your interest in this project.

The exec_path is the local path of the executable of nomos or scancode (the tools currently supported by CoLic). For instance, if you plan to use nomos, you should:

clone the repo: https://github.com/fossology/fossology/tree/master/src/nomos
follow the instructions at: http://archive15.fossology.org/projects/fossology/wiki/Nomos_Test_Cases#standalone-Nomos to get the nomossa executable
set the local path of nomossa to exec_path (e.g., /home/graal-libs/nomossa)

If you plan to use scancode, you should:

clone the repo or download one of the releases here: https://github.com/nexB/scancode-toolkit/releases
set the local path of scancode to exec_path (e.g., /home/graal-libs/scancode-toolkit/scancode)

valeriocos · 2019-03-16T14:49:29Z

sorry @inishchith I have just seen your message

sumitskj · 2019-03-16T18:54:45Z

@valeriocos I was doing microtask 8 and I got stuck at how to checkout at a given commit SHA. I searched graal documentation and other things but I didn't get it. Can you help?

valeriocos · 2019-03-16T19:39:09Z

Hi @sumitskj
please have a look at these methods: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/graal.py#L316 and https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/git.py#L1294

Hope this helps

sumitskj · 2019-03-17T10:32:08Z

@valeriocos

Hi @sumitskj
please have a look at these methods: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/graal.py#L316 and https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/git.py#L1294

Hope this helps

Thanks it worked

sumitskj · 2019-03-21T20:38:58Z

@valeriocos @jgbarah I have done the microtasks and I have also made a private repo to show what I have done. Can I add you both the mentors as collaborators to my repo, to give me feedback on any improvements that can be made.

valeriocos · 2019-03-22T08:33:39Z

@sumitskj feel free to add me as collaborator, thank you

inishchith · 2019-03-24T14:44:46Z

@jgbarah @valeriocos
One of the aims mentioned for the project idea is "Producing analytics with Graal data and including them in Sigils".
What i understood from this is, we're expected to create Kibana dashboards related to analytics produced using data retrieved from Graal and import the panels to grimoirelab-sigils.

When i was exploring the codebase, i found out that e2k.py script 's create_dashboard method facilitates creation of kibana dashboard from Enriched indexes.
But the import of the method is invalid as it's defined under kidash.py script of grimoirelab-kidash package.
Other way is to use grimoirelab-sirmordred.

Please do let me know if there's something i missed out or there's a different way to create a dashboard. :)
Thanks

valeriocos · 2019-03-24T16:46:51Z

Hi @inishchith , the dashboards are created directly in Kibiter (soft-fork of Kibana) and then exported with kidash, For example, the command below allows to export a dashboard with id f2f2f0d0-4bcc-11e9-a2bb-75841b49fd70 located in the ElasticSearch at https://admin:admin@localhost:9200. The exported dashboard will be saved as scava-test.json

kidash -g
-e https://admin:admin@localhost:9200
--dashboard f2f2f0d0-4bcc-11e9-a2bb-75841b49fd70
--export scava-test.json

For testing the platform, we currently use micro-mordred, which provides the same functionalities of mordred except for the scheduling. Thus, micro-mordred also allows to upload the dashboards related to a setup.cfg (by calling Kidash). They are loaded from Sigils, which contains all dashboards provided by GrimoireLab.

If you want to know more about micro-mordred, please have a look at: https://github.com/chaoss/grimoirelab-sirmordred#micro-mordred
If you plan to use pycharm, you should clone mordred, create a Pycharm project with it and add to the project structure the Grimoirelab components (see the image below).

Let me know if you have more questions :)

inishchith · 2019-03-25T16:10:39Z

@valeriocos Thanks for the response. The configuration really helped :)
As i am about to start working on a Proposal for the project, i had a thought over.

As it's known that Graal provides analysis related to code complexity, quality, dependencies, security and licensing. Defining and calculating metrics related to source code analysis data provided by Graal needs some clarity and discussion as there are only a few defined under CHAOSS metrics. Some of them being related to LOC, Licenses. and some missing related to Code Complexity.
IMO, there has to be a proper discussion related to this. Is it expected to be discussed with mentors during the period? If not can you please give a gist of the idea.

Please do let me know if i got something wrong here.
Thanks.

valeriocos · 2019-03-25T18:40:02Z

Thank you @inishchith for raising this point. I'd say that it will be discussed with mentors during the period. Probably we could start with some metrics already defined in CHAOSS (LOC, licenses), and then based on the data extracted with Graal, add more (e.g., code complexity) or refine the existing ones.

We could also focus just on some category metrics, for instance only the ones about licenses and code complexity since the corresponding backends are agnostic wrt the programming languages used in a repository, thus the work done during GSoC could be applied to a wide range of projects.

What do you think @jgbarah @inishchith ?

inishchith · 2019-03-26T14:04:56Z

@valeriocos Thanks for the response. This gives some clarity. And Yes, discussion with mentors would also be beneficial during the period.

We could also focus just on some category metrics, for instance only the ones about licenses and code complexity since the corresponding backends are agnostic w.r.t the programming languages used in a repository, thus the work done during GSoC could be applied to a wide range of projects.

Yes. I feel this can be incremental even during the GSoC period.

Also is there any way we could get our proposals reviewed by the mentors before the application period ends ( after we add our details to GSoC Interested students) ?

Thanks

valeriocos · 2019-03-27T18:05:22Z

Sorry @inishchith for the late reply. I guess it is possible, any thoughts @jgbarah ?

inishchith · 2019-03-30T07:06:29Z

@valeriocos @jgbarah Can we have an issue ticket open at Graal in order to have discussions related to improvements in existing analyzers and for addition of new ones under corresponding backends, I feel that would work as a guide for future work.
In case if there's some alternate way of having such discussion, please do let me know.
Thanks.

valeriocos · 2019-03-30T08:22:05Z

Sure @inishchith ! Thanks

The issue for improvements is here: chaoss/grimoirelab-graal#18

sumitskj · 2019-04-05T12:42:05Z

@valeriocos @jgbarah can you review my proposal https://docs.google.com/document/d/1K2i_nPKQqTCFxi6mNhQGll83Mr-C4KqTaK3W2tL7Qvk/edit?usp=sharing

inishchith · 2019-04-09T17:01:57Z

I've submitted my Final Proposal for the project.
All the best to other candidates :)

inishchith · 2019-04-16T19:18:09Z

@valeriocos @jgbarah @aswanipranjal
I had thought of opening a discussion regarding defining the metrics related to the source code, Is it okay if we open an issue ticket at the metrics repository for having further discussions related to it with the working group.

Let me know what you think :)
Thanks

valeriocos · 2019-04-17T07:45:50Z

Thank you @inishchith , that's a good idea in my opinion.
What do you think @jgbarah @aswanipranjal ?

inishchith · 2019-04-19T09:10:51Z

Thanks for the response @valeriocos and @aswanipranjal for a 👍 .
I've add an issue ticket chaoss/metrics#139
Please do have a look when you get time.

Thanks

valeriocos · 2019-08-28T09:29:13Z

A summary and details of the work carried on for this idea are available @inishchith blog.
A demo has been also produced, feel free to have a look at it: https://youtu.be/RXZeuJt0UXM.

valeriocos mentioned this issue Feb 2, 2019

Idea for GSoC: Support of Source Code Related Metrics chaoss/community#74

Merged

ManuelLecaro mentioned this issue Mar 11, 2019

GSoC Idea: Implementing CHAOSS Metrics in Augur chaoss/wg-evolution#82

Closed

valeriocos mentioned this issue Mar 22, 2019

GSoC Idea: Implementing CHAOSS metrics with Perceval chaoss/wg-evolution#81

Closed

This was referenced Mar 26, 2019

Organizing documentation #135

Open

Deprecated call to yaml.load chaoss/grimoirelab-sirmordred#288

Closed

inishchith mentioned this issue Apr 15, 2019

[backend] Add CoLang Backend and Linguist Analyzer chaoss/grimoirelab-graal#19

Merged

inishchith mentioned this issue Apr 19, 2019

New Metrics: Support of source code related metrics chaoss/metrics#139

Closed

inishchith mentioned this issue May 13, 2019

[ GSoC ] Project report and Meeting Log chaoss/community#134

Closed

valeriocos closed this as completed Sep 11, 2019

GSoC Idea: Support of Source Code Related Metrics #182

GSoC Idea: Support of Source Code Related Metrics #182

Comments

valeriocos commented Feb 2, 2019 • edited by jgbarah Loading

Description

Microtasks

Showing the work you did

Submitting information for the application process

Getting feedback for your proposal & microtasks

Asking for help

Polaris000 commented Feb 9, 2019 • edited Loading

valeriocos commented Feb 10, 2019

Polaris000 commented Feb 10, 2019

inishchith commented Mar 6, 2019

valeriocos commented Mar 6, 2019

inishchith commented Mar 6, 2019

inishchith commented Mar 14, 2019 • edited Loading

valeriocos commented Mar 14, 2019

inishchith commented Mar 14, 2019 • edited Loading

valeriocos commented Mar 14, 2019

sumitskj commented Mar 14, 2019

valeriocos commented Mar 14, 2019

apoorvaanand1998 commented Mar 15, 2019

valeriocos commented Mar 15, 2019 • edited Loading

inishchith commented Mar 15, 2019

SunflowerPKU commented Mar 16, 2019

inishchith commented Mar 16, 2019

valeriocos commented Mar 16, 2019

valeriocos commented Mar 16, 2019

sumitskj commented Mar 16, 2019 • edited Loading

valeriocos commented Mar 16, 2019 • edited Loading

sumitskj commented Mar 17, 2019

sumitskj commented Mar 21, 2019

valeriocos commented Mar 22, 2019

inishchith commented Mar 24, 2019 • edited Loading

valeriocos commented Mar 24, 2019

inishchith commented Mar 25, 2019

valeriocos commented Mar 25, 2019

inishchith commented Mar 26, 2019 • edited Loading

valeriocos commented Mar 27, 2019

inishchith commented Mar 30, 2019

valeriocos commented Mar 30, 2019 • edited Loading

sumitskj commented Apr 5, 2019

inishchith commented Apr 9, 2019 • edited Loading

inishchith commented Apr 16, 2019

valeriocos commented Apr 17, 2019

inishchith commented Apr 19, 2019

valeriocos commented Aug 28, 2019 • edited Loading

valeriocos commented Feb 2, 2019 •

edited by jgbarah

Loading

Polaris000 commented Feb 9, 2019 •

edited

Loading

inishchith commented Mar 14, 2019 •

edited

Loading

inishchith commented Mar 14, 2019 •

edited

Loading

valeriocos commented Mar 15, 2019 •

edited

Loading

sumitskj commented Mar 16, 2019 •

edited

Loading

valeriocos commented Mar 16, 2019 •

edited

Loading

inishchith commented Mar 24, 2019 •

edited

Loading

inishchith commented Mar 26, 2019 •

edited

Loading

valeriocos commented Mar 30, 2019 •

edited

Loading

inishchith commented Apr 9, 2019 •

edited

Loading

valeriocos commented Aug 28, 2019 •

edited

Loading