Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC Idea: Support of Source Code Related Metrics #182

Closed
valeriocos opened this issue Feb 2, 2019 · 38 comments
Closed

GSoC Idea: Support of Source Code Related Metrics #182

valeriocos opened this issue Feb 2, 2019 · 38 comments

Comments

@valeriocos
Copy link
Member

valeriocos commented Feb 2, 2019

[ This issue for addressing questions and comments related to this GSoC idea, which is one of the ideas proposed by the CHAOSS group for the 2019 edition of GSoC.]

Description

Currently, GrimoireLab allows to produce analytics with data extracted from more than 30 tools related with contributing to Open Source development such as version control systems, issue trackers and forums. Despite the large set of metrics available in GrimoireLab, none of them relies on information extracted from source code, thus limiting the end-users to benefit of a wider spectrum of software development data.

Graal is a tool that allows to conduct customizable and incremental analysis of source code by leveraging on existing tools, and produce an output that conforms to the data that can be processed by GrimoireLab. Graal already offers analysis about code complexity, quality, dependencies, security and licensing, however currently it is not integrated with GrimoireLab.

This idea is about adding support to GrimoireLab to produce source code related metrics using Graal.

The aims of the project are as follows:

  • Understanding the GrimoireLab components (Perceval, ELK, Mordred and Sigils) and the corresponding tool-chain.
  • Adapting ELK and Mordred to be able to execute Graal and process the data produced.
  • Producing analytics with Graal data and including them in Sigils.
  • Evaluating the implementation with projects of different sizes.

Other aims, such as enhancing Graal to support more analysis or improve existing ones are completely within scope.

The aims will require extending GrimoireLab functionality to integrate Graal.

  • Difficulty: medium
  • Requirements: Python programming. Interest in software analytics. Willingness to understand GrimoireLab internals.
  • Recommended: Experience with ElasticSearch and Kibana would be convenient, but can be learned during the project.
  • Mentors: @jgbarah , @valeriocos

Microtasks

For becoming familiar with GrimoireLab, you can start by reading some documentation. You can find useful information at:

Once you're familiar with Grimoirelab, you can have a look at the following microtasks.

  • Microtask 0:
    Download PyCharm and get familiar with it (for instance, you can follow this tutorial).

  • Microtask 1:
    Set up Perceval to be executed from PyCharm.

  • Microtask 2:
    Create a Python script to execute Perceval via its Python interface using the Git and GitHub backends. Feel free to select any target repository, for instance the GitHub repository hosting Perceval.

  • Microtask 3:
    Based on the JSON documents produced by Perceval and its source code, try to answer the following questions:

    • What is the meaning of the JSON attribute 'timestamp'?
    • What is the meaning of the JSON attribute 'updated_on'?
    • What is the meaning of the JSON attribute 'origin'?
    • What is the meaning of the JSON attribute 'category'?
    • What is the meaning of the JSON attribute 'uuid'?
    • Which are the common methods of the Perceval backends?
    • List and explain at least 3 Git commands used by the Perceval backend (you can rely on the Git documentation)
  • Microtask 4:
    Create a Python script to fetch data from SoftwareHeritage using its API.
    Given a target GitHub repository, the script should return a message if the repository is not available on SoftwareHeritage or the date of the last visit.
    The script should rely on the endpoints: origin and visits.
    Please use the Python library requests to issue requests to the SofwareHeritage API.

  • Microtask 5:
    Set up Graal to be executed from PyCharm.

  • Microtask 6:
    Create a Python script to execute Graal via its Python interface using the CoCom and CoLic backends. Feel free to select any target repository, for instance the GitHub repository hosting Toolkit.

  • Microtask 7:
    Based on the JSON documents produced by Graal and its source code, try to answer the following questions:

    • Which are the common methods of the Graal backends?
    • List and explain at least 2 Git commands used by Graal (and not implemented in Perceval).
  • Microtask 8:
    Create a Python script to execute flake8 for a given commit of any Git repository. Given a commit SHA and a Git repository, the script should clone the repository (if it doesn't exist locally), perform a checkout based on the commit SHA and execute flake8 on that checkout. The script should return a message that either lists the errors found or "OK" if flake8 successfully ended.

  • Microtask 9:
    Submit at least a PR to one of the GrimoireLab repositories to fix an issue, improve the documentation, etc.

Showing the work you did

If you want to show the work you did, open a GitHub repository, and upload to it:

A README.md file explaining what you did, and linking to the results (which will be in the same repository, see below). This will be the main file to show your skills and interest on the project, so try to make it organized and clear, in a way that we can easily understand what you did.

Submitting information for the application process

You must complete microtask 0, and at least 6 microtasks, in the case you're interested in this idea.

Once you completed at least one microtask, go to the governance repository and create a pull request to add yourself, your information, and a link to your repository with the completed micro-task(s) in the GSoC-interest.md file.

You are welcome to include in your repository other information that could be of interest, such as open issues or pull requests submitted to the project to which you intend to contribute during GSoC, contributions to other projects, skills, and other related information.

You must complete these things by GSoC deadline for proposals. Make sure to also submit the information required by GSoC for applicants (i.e., project proposal), linking to it from your pull request in the GSoC-interest.md file.

Getting feedback for your proposal & microtasks

Our idea is to have a look at proposals that are registered in the governance repository starting after 25th March, when students can formally apply. But if you have specific doubts, comments, or whatever, use this issue.

In general, we don't want to give advice too specific to one case, because that could give some advantage to some person with respect to the others. Answering questions and addressing comments (if you want, based on your proposal) is not a problem as long as that's done in public, hence the threads in this issue.

Asking for help

If you need help, please use the following channels:

  • Comments in this issue
  • #grimoirelab channel in Freenode IRC
@Polaris000
Copy link

Polaris000 commented Feb 9, 2019

Why is Pycharm necessary? Does it have something to do with Graal or SoftwareHeritage?

@valeriocos
Copy link
Member Author

Hi @Polaris000 sorry for the late reply. Pycharm is not related to Graal or SoftwareHeritage, however is the IDE we commonly use with GrimoireLab. It has a good debug mode and allows to create a virtual env where to install external packages. Furthermore, it provides a mechanism to define a project structure (which allows to execute the source code of different packages without installing them via pip), so you can easily work on features that may require changes in several GrimoireLab components at the same time.
Which IDE are you familiar with?

@Polaris000
Copy link

Thanks for the reply! I used pycharm in 2017 and 2018, mainly for basic data analytics and backend web development with django.
I was just curious as to why you specifically mentioned pycharm, though I agree with your logic. Pycharm is indeed a powerful tool.
As of now I don't use an IDE, just a text editor (sublime text). Though I do miss out on several features, I love the simple lightweight experience of sublime text.

@inishchith
Copy link

I'm interested in this Project and have been working on the microtasks.

@jgbarah @valeriocos I had some doubts though.

  1. Microtask [ 0, 1 & 5 ] are mostly related to environment setup related to the tool and getting familiarized with it. How do i show the work on my Microtasks repository, a Readme.md file with some steps?
  2. Had to similar doubt as mentioned in this thread about sharing our work.

Thanks

@valeriocos
Copy link
Member Author

Thank you @inishchith for your interest in this idea.

For task 0 it is not needed to show something. WRT tasks 1 and 5, a screenshot of the configurations should be enough, something like the image below, for instance
captura de pantalla de 2019-03-06 16-32-45

WRT 2., you can work on a private repo if it suits you. What do you think @jgbarah ?

@inishchith
Copy link

@valeriocos Thanks for the response 👍

@inishchith
Copy link

inishchith commented Mar 14, 2019

@jgbarah @valeriocos @aswanipranjal
I've worked on the Microtasks mentioned above and would be looking out for improvements on it from here on and soon will start working on the proposal. I had some questions .

  1. One of the aims mentioned "Evaluating the implementation with projects of different sizes", I'm unsure if i understand this clearly, can you please elaborate on this?
  2. I've read through the discussion here . I'm planning to make my Microtasks repository public before the submission for proposal starts, is there any way we can get some feedback on the tasks we've performed?

Thanks

@valeriocos
Copy link
Member Author

great @inishchith !

wrt 1, the idea is to take projects (e.g., github repositories) made of 1000, 10000 or more commits and benchmark the time needed by graal to process them. Ideally, this will allow to better understand where optimization efforts should be applied in the future.
does this sound more clear?

wrt 2. sure! if you point us to your repo (public or private), we will have a look and give feedback.
is it ok for you?

@inishchith
Copy link

inishchith commented Mar 14, 2019

@valeriocos Thanks for the response.

wrt 1, the idea is to take projects (e.g., github repositories) made of 1000, 10000 or more commits and benchmark the time needed by graal to process them. Ideally, this will allow to better understand where optimization efforts should be applied in the future.does this sound more clear?

Yes, Thanks ;)

wrt 2. sure! if you point us to your repo (public or private), we will have a look and give feedback.
is it ok for you?

I've got a private repository as of now. Do i add mentors as collaborators? if there's any other way i could give access, please do let me know.

@valeriocos
Copy link
Member Author

yes, add us as collaborators.
thank you!

@sumitskj
Copy link

Hii, I am Sumit Kumar Jangir, I am really interested in this project and have done some microtasks as well and will try to solve all.

@valeriocos
Copy link
Member Author

welcome @sumitskj ! don't hesitate to ping us if you have some questions or need feedback on your microtasks.

@apoorvaanand1998
Copy link

Hi! So I've been going through the tutorial. When I run Kibiter (Which I decided to use over Kibana), I get the following warning:

[warning] You're running Kibana 6.1.4-3 with some different versions of Elasticsearch. Update Kibana or Elasticsearch to the same version to prevent compatibility issues: v6.1.0 @ 127.0.0.1:9200 (127.0.0.1)

I am running Kibiter community-v6.1.4-3 which is based on Kibana 6.1.0. So, I decided to install ElasticSearch 6.1.0 as well. Was I supposed to install some other version? The ElasticSearch support matrix says that all 6.1.x versions of ElasticSearch are compatible with all 6.1.x versions of Kibana. Is there anything I can (or should) do to take that warning away?

@valeriocos
Copy link
Member Author

valeriocos commented Mar 15, 2019

Hi @apoorvaanand1998 sorry for the late reply. Ideally ElasticSearch and Kibana should have the same versions (at least major and minor), however some time ago we decided to use Kibana 6.1.4 to benefit of some new features. You can ignore the warning, since it doesn't break anything in the platform.

Btw, I have prepared some docs to get started with grimoirelab (https://github.com/chaoss/grimoirelab-sirmordred/blob/master/README.md#getting-started). Hope this helps (and any feedback is welcome).

@inishchith
Copy link

@valeriocos I've added all the mentors as collaborators to the repository as you suggested. Looking forward for suggestions ;)
Thanks.

@SunflowerPKU
Copy link

Hi @valeriocos, I am very interested in this project and have already completed several microtasks.
However, when I was doing microtask 6, I encountered a problem and need your help.
I have already successfully created a Python script to execute Graal via its Python interface using the CoCom backends.
Unfortunately, when I was trying CoLic backends, I initialized CoLic object with function : CoLic(uri=repo_uri, git_path=repo_dir, exec_path=path), but I don't know what parameter 'exec_path' means.
I have tried a lot, but this problem remains unsolved. Could you help me? Thank you very much.

@inishchith
Copy link

@SunflowerPKU CoLic backend uses ScanCode and Nomos tools in order to process license related information.
exec_path is the executable path of the particular tool. You can install them using these instructions.

@valeriocos
Copy link
Member Author

Hi @SunflowerPKU, thank you for your interest in this project.

The exec_path is the local path of the executable of nomos or scancode (the tools currently supported by CoLic). For instance, if you plan to use nomos, you should:

If you plan to use scancode, you should:

@valeriocos
Copy link
Member Author

sorry @inishchith I have just seen your message

@sumitskj
Copy link

sumitskj commented Mar 16, 2019

@valeriocos I was doing microtask 8 and I got stuck at how to checkout at a given commit SHA. I searched graal documentation and other things but I didn't get it. Can you help?

@valeriocos
Copy link
Member Author

valeriocos commented Mar 16, 2019

@sumitskj
Copy link

@valeriocos

Hi @sumitskj
please have a look at these methods: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/graal.py#L316 and https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/git.py#L1294

Hope this helps

Thanks it worked

@sumitskj
Copy link

@valeriocos @jgbarah I have done the microtasks and I have also made a private repo to show what I have done. Can I add you both the mentors as collaborators to my repo, to give me feedback on any improvements that can be made.

@valeriocos
Copy link
Member Author

@sumitskj feel free to add me as collaborator, thank you

@inishchith
Copy link

inishchith commented Mar 24, 2019

@jgbarah @valeriocos
One of the aims mentioned for the project idea is "Producing analytics with Graal data and including them in Sigils".
What i understood from this is, we're expected to create Kibana dashboards related to analytics produced using data retrieved from Graal and import the panels to grimoirelab-sigils.

When i was exploring the codebase, i found out that e2k.py script 's create_dashboard method facilitates creation of kibana dashboard from Enriched indexes.
But the import of the method is invalid as it's defined under kidash.py script of grimoirelab-kidash package.
Other way is to use grimoirelab-sirmordred.

Please do let me know if there's something i missed out or there's a different way to create a dashboard. :)
Thanks

@valeriocos
Copy link
Member Author

Hi @inishchith , the dashboards are created directly in Kibiter (soft-fork of Kibana) and then exported with kidash, For example, the command below allows to export a dashboard with id f2f2f0d0-4bcc-11e9-a2bb-75841b49fd70 located in the ElasticSearch at https://admin:admin@localhost:9200. The exported dashboard will be saved as scava-test.json

kidash -g
-e https://admin:admin@localhost:9200
--dashboard f2f2f0d0-4bcc-11e9-a2bb-75841b49fd70
--export scava-test.json

For testing the platform, we currently use micro-mordred, which provides the same functionalities of mordred except for the scheduling. Thus, micro-mordred also allows to upload the dashboards related to a setup.cfg (by calling Kidash). They are loaded from Sigils, which contains all dashboards provided by GrimoireLab.

If you want to know more about micro-mordred, please have a look at: https://github.com/chaoss/grimoirelab-sirmordred#micro-mordred
If you plan to use pycharm, you should clone mordred, create a Pycharm project with it and add to the project structure the Grimoirelab components (see the image below).

Captura de pantalla de 2019-03-24 17-34-48

Let me know if you have more questions :)

@inishchith
Copy link

@valeriocos Thanks for the response. The configuration really helped :)
As i am about to start working on a Proposal for the project, i had a thought over.

  • As it's known that Graal provides analysis related to code complexity, quality, dependencies, security and licensing. Defining and calculating metrics related to source code analysis data provided by Graal needs some clarity and discussion as there are only a few defined under CHAOSS metrics. Some of them being related to LOC, Licenses. and some missing related to Code Complexity.
  • IMO, there has to be a proper discussion related to this. Is it expected to be discussed with mentors during the period? If not can you please give a gist of the idea.

Please do let me know if i got something wrong here.
Thanks.

@valeriocos
Copy link
Member Author

Thank you @inishchith for raising this point. I'd say that it will be discussed with mentors during the period. Probably we could start with some metrics already defined in CHAOSS (LOC, licenses), and then based on the data extracted with Graal, add more (e.g., code complexity) or refine the existing ones.

We could also focus just on some category metrics, for instance only the ones about licenses and code complexity since the corresponding backends are agnostic wrt the programming languages used in a repository, thus the work done during GSoC could be applied to a wide range of projects.

What do you think @jgbarah @inishchith ?

@inishchith
Copy link

inishchith commented Mar 26, 2019

@valeriocos Thanks for the response. This gives some clarity. And Yes, discussion with mentors would also be beneficial during the period.

We could also focus just on some category metrics, for instance only the ones about licenses and code complexity since the corresponding backends are agnostic w.r.t the programming languages used in a repository, thus the work done during GSoC could be applied to a wide range of projects.

Yes. I feel this can be incremental even during the GSoC period.

  • Also is there any way we could get our proposals reviewed by the mentors before the application period ends ( after we add our details to GSoC Interested students) ?

Thanks

@valeriocos
Copy link
Member Author

Sorry @inishchith for the late reply. I guess it is possible, any thoughts @jgbarah ?

@inishchith
Copy link

@valeriocos @jgbarah Can we have an issue ticket open at Graal in order to have discussions related to improvements in existing analyzers and for addition of new ones under corresponding backends, I feel that would work as a guide for future work.
In case if there's some alternate way of having such discussion, please do let me know.
Thanks.

@valeriocos
Copy link
Member Author

valeriocos commented Mar 30, 2019

Sure @inishchith ! Thanks

The issue for improvements is here: chaoss/grimoirelab-graal#18

@sumitskj
Copy link

sumitskj commented Apr 5, 2019

@valeriocos @jgbarah can you review my proposal https://docs.google.com/document/d/1K2i_nPKQqTCFxi6mNhQGll83Mr-C4KqTaK3W2tL7Qvk/edit?usp=sharing

@inishchith
Copy link

inishchith commented Apr 9, 2019

I've submitted my Final Proposal for the project.
All the best to other candidates :)

@inishchith
Copy link

@valeriocos @jgbarah @aswanipranjal
I had thought of opening a discussion regarding defining the metrics related to the source code, Is it okay if we open an issue ticket at the metrics repository for having further discussions related to it with the working group.

Let me know what you think :)
Thanks

@valeriocos
Copy link
Member Author

Thank you @inishchith , that's a good idea in my opinion.
What do you think @jgbarah @aswanipranjal ?

@inishchith
Copy link

Thanks for the response @valeriocos and @aswanipranjal for a 👍 .
I've add an issue ticket chaoss/metrics#139
Please do have a look when you get time.

Thanks

@valeriocos
Copy link
Member Author

valeriocos commented Aug 28, 2019

A summary and details of the work carried on for this idea are available @inishchith blog.
A demo has been also produced, feel free to have a look at it: https://youtu.be/RXZeuJt0UXM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants