This repository provides the replication package for the paper titled An Empirical Study of the Maturity in the Eclipse Modeling Ecosystem sent to the International Conference on Model Driven Engineering Languages and Systems, currently under review.
The repository is organized in several folders containing the main artefacts developed to conduct our study. In the following, we briefly describe the content of each folder.
This folder includes the set of SQL-based metrics used in our study. The metrics have been implemented as SQL queries and the results are stored in databse tables for the sake of efficiency. We organized the metrics according to the dimension of the maturity model they assess, thus we have:
metrics-ecosystem.sql, which includes all the metrics related to the ecosystem dimension of our maturity model. The table collecting the results for the projects is called
metrics-product.sql, which includes all the metrics related to the product dimension of our maturity model The table collecting the results for the projects is called
This folder also includes the following scripts:
metrics-descriptives.sql, which includes a set of SQL statements to calculate some descriptive information of the dataset (e.g., number of files or authors).
metric-bus-factor, which includes the metric for calculating the bus factor in the projects (related to the extensions of our model). The table collecting the results for the projects is called
metrics-forum-activity, which includes the metrics used to calculate activity metrics in forums (related to the extensions of our model). The table collecting the results for the projects is called
All the scripts include annotations to describe what is being done (e.g., tables to be created, queries to execute, etc.). Thus the SQL statements can be executed individually or you can just execute all the file as a batch process. As a result, the execution of the script will create in the database a set of tables with the results of the metrics, including the table with the results for each project (as indicated before).
This folder includes the main R scripts used to generate the graphs of the paper and to perform the statistical tests. In particular
metric-descriptives-generator.R, which generates the graphs for the descriptive information of the dataset.
metric-ecosystem-generator.R, which generates the graphs for the ecosystem metrics.
metric-product-generator.R, which generates the graphs for the product metrics.
metric-extensions-generator.R, which generates the graphs for the metrics defined as extensions to our model.
individual.R, which includes the procedure followed to study the distribution of the metrics.
Last but not least, the previous scripts rely on some utility functions that are defined in the following scripts:
database.R, which includes some functions to configure the database access and information retrieval. The information is collected from the main tables created by the SQL scripts and SONAR tables (see the corresponding sections of this README).
utils.R, which includes some utility functions to generate the graphs of the paper and imports some libraries required for the analysis.
Each file has been annotated to describe what we are doing in each step.
This folder includes the database dump of the dataset used in the study. For the sake of space limitations in GitHub, we only provide the dump of the tables with the metric results of the Eclipse projects:
metrics.sql, which includes the main tables generated by the metrics.
metric_bus_factor.sql, which includes the main tables generated by the metrics defined to calculate the bus factor.
metric_forum_jdt.sql, which includes the main tables generated by the metrics defined to be calculated in forums, applied to JDT forum.
metric_forum_papyrus-sql, which includes the main tables generated by the metrics defined to be calculated in forums, applied to Papyrus forum.
Note that all the SQL scripts include the
CREATE SCHEMA statement.
The results of the metrics are also provided as CSV files:
metrics_ecosystem.csv, which includes the metric values for the ecosystem dimension.
metrics_product.csv, which includes the metric values for the product dimension.
We can also provide the database dumps of the analysis of the Eclipse projects performed by Gitana (including the full information coming from the Eclipse Git repositories) and Sonar upon request. Feel free to contact us.
Also, note that providing access to the full dataset including Git metadata information could involve some privacy concerns (as already happened in other projects like GHTorrent). Thus, if you want to have access to the full dataset, we may ask you to follow some privacy rules.
The folder also includes the files:
full_eclipse_projects.csvincluding the full list of Eclipse projects collected and the selection.
project_type.csvincluding the list of projects considered in the study, its purpose (modelling vs. non-modeling) and status (incubation vs. non-incubation).
repositories.csvincluding the list of repositories considered per project.
This folder includes the set of python scripts to collect the projects and launch sonar.
This folder includes two main figures of the paper for further analysis, in particular:
boxplots-modeling-nonmodeling.pdf, which includes the boxplots for each metric according to the project purpose (i.e., modeling vs. non-modeling).
boxplots-incubation-nonincubation.pdf, which includes the boxplots for each metric according to the project type dimension.
This folder includes the file
graphs.gephi, which includes the graphs generated for the JDT and Papyrus projects in Gephi format.
Who is behind this
This study and the companion artefacts have been developed by:
Javier, Valerio and Jordi are members of SOM, a research team of IN3-UOC.