This repo is a catalog of S-index project outputs: code repositories, released datasets, and materials needed to reproduce results. We will keep this updated as we keep working on this project.
- π S-index overview: github.com/data-s-index/overview
- π Scholar Data web app: beta.scholardata.io
- π©π»βπ» All code: github.com/data-s-index
- ποΈ Data and code archival (Zenodo): zenodo.org/communities/s-index
- ποΈ Full catalog: see below
- π¬ Questions: GitHub Issue or bvhpatel@gmail.com
- π Cite this work: Coming soon
| Resource | Type | What it contains | Status | Latest | License |
|---|---|---|---|---|---|
| Overview | Documentation | Details about the S-index formulation and calculation | Stable | v1.0.0 | CC-BY |
| Resource | Type | What it contains | Status | Latest | License |
|---|---|---|---|---|---|
| S-index Parameters Analysis | Code | Jupyter notebook used to analyze how different parameters influence the S-index to support our design choice | Stable | v1.0.0 | MIT |
| S-index Real-World Testing and Validation | Code | Jupyter notebook used to analyze the results of processing 49M+ datasets and calculating the S-index of 1M+ researchers | Stable | v1.0.0 | MIT |
| Sindex Data Collection and Processing Pipeline | Code | Python code and Jupyter notebooks developed to collect dataset metadata, calculate FAIR scores, find citations, identify mentions, assign research field, and calculate S-index for our large scale testing with 49M+ datasets (and counting). A summary of the results is also included. | Stable | In continous development | MIT |
| Dataset Research Field Classifier | Code | Code for the custom fine-tuned model we developed to assign research fields to datasets based on their metadata using the OpenAlex Topics taxonomy | Stable | v1.0.0 | MIT |
| F-UJI fork | Code | Fork of the F-UJI repository where we adapted the code for large scale usage | Stable | N/A | MIT |
| Scholar Data app | Code | Code of the Scholar Data web app (beta.scholardata.io) | Stable | In continous development | MIT |
| Scholar Data dev | Code | Development pipeline for the Scholar Data app | Stable | In continous development | MIT |
| S-index API | Code | API linking to our Dataset Registry | In continous development | In continous development | MIT |
| Resource | Type | What it contains | Status | Latest | License |
|---|---|---|---|---|---|
| S-index Real-World Testing and Validation Dataset | Dataset | Dataset metadata, FAIR scores, citations, mentions, and research field data collected/generated during our real world testing and validation of the S-index (NDJON format files) | Stable | v1.0.0 | CC0 |
| S-index Real-World Testing and Validation DB Data | Dataset | A DuckDB database file with all the data collected plus Dataset Index and S-index calculated as part of our real world validation. This database is needed for the real world analysis Jupyter notebook. | Stable | v1.0.0 | CC0 |
| Research Field Classifier Training Dataset | Dataset | Dataset used to fine-tune our research field classifier | Stable | v1.0.0 | CC0 |