# BASIC DATACITE ANALYSIS

This notebook uses the [datacite source file](../outputs/datacite/datacite_data_map.json) to explore the top Earthcube-funded papers from the Altmetric lens.

It produces three files in `../outputs/altmetric/`:

* [datacite_data_map.csv](../outputs/datacite/datacite_data_map.csv)
* [datacite_table_01.md](../outputs/datacite/datacite_table_01.md)

In [1]:
import pandas as pd 

In [2]:
df = pd.read_json("../outputs/datacite/datacite_data_map.json",  )

In [3]:
print(df.describe().loc['count'].sort_values(ascending=False).to_markdown())

|         |   count |
|--------:|--------:|
| 1440351 |       3 |
| 1928393 |       2 |
| 1541390 |       1 |
| 1639683 |       1 |
| 1639764 |       1 |
| 1928406 |       1 |
| 1639694 |       1 |
| 1440066 |       1 |


In [4]:
df

Unnamed: 0,1440351,1541390,1639683,1639764,1928406,1639694,1928393,1440066
10.1594/ieda/100709,"{'cr_meta': {'type': 'dataset', 'id': 'https:/...",,,,,,,
10.1594/ieda/100691,"{'cr_meta': {'type': 'dataset', 'id': 'https:/...",,,,,,,
10.6084/m9.figshare.4272164.v1,"{'cr_meta': {'type': 'article-journal', 'id': ...",,,,,,,
10.18739/a24m9198b,,"{'cr_meta': {'type': 'dataset', 'id': 'https:/...",,,,,,
10.6084/m9.figshare.14848713.v1,,,"{'cr_meta': {'type': 'graphic', 'id': 'https:/...",,,,,
10.1594/pangaea.892680,,,,"{'cr_meta': {'type': 'dataset', 'id': 'https:/...",,,,
10.5281/zenodo.5496306,,,,,"{'cr_meta': {'type': 'book', 'id': 'https://do...",,,
10.5065/p2jj-9878,,,,,,"{'cr_meta': {'type': 'report', 'id': 'https://...",,
10.5281/zenodo.6369184,,,,,,,"{'cr_meta': {'type': 'book', 'id': 'https://do...",
10.5281/zenodo.4558266,,,,,,,"{'cr_meta': {'type': 'book', 'id': 'https://do...",


In [5]:
nsf_project_titles = pd.read_csv("../outputs/nsf/nsfid_project_title_normed.csv")
nsf_project_titles.columns = ['nsfid', 'title']
nsf_project_titles.set_index('nsfid')

Unnamed: 0_level_0,title
nsfid,Unnamed: 1_level_1
1639588,Collaborative Proposal: Earthcube Building Blo...
1639614,Collaborative Proposal: Earthcube Building Blo...
1740719,Collaborative Proposal: Earthcube Integration:...
1740683,Collaborative Proposal: Earthcube Integration:...
1740641,Collaborative Proposal: Earthcube Integration:...
...,...
1540994,Earthcubeia: Collaborative Proposal: Building ...
1340265,Ec3 - Earth-Centered Communication For Cyberin...
1928208,Geosciences Earthcube Community Office
1324760,Rcn: Building A Sediment Experimentalist Netwo...


In [6]:
df_tmp = pd.DataFrame()
for i in df.index:
    #print(df.loc[i].dropna().values[0]['dc_meta']['attributes']['types']['schemaOrg'])
    
    dc_meta = df.loc[i].dropna().values[0]['dc_meta']
    
    nsfid = df.loc[i].dropna().index.values[0]
    doi = dc_meta['attributes']['doi']
    title = dc_meta['attributes']['titles'][0]['title']
    
    if dc_meta['attributes']['types']['schemaOrg'] in ['Dataset', 'ScholarlyArticle', 'SoftwareSourceCode', 'Report']:
        type_string = dc_meta['attributes']['types']['schemaOrg']
    elif dc_meta['attributes']['types']['schemaOrg'] == 'CreativeWork':
        type_string = \
            dc_meta['attributes']['types'].get('citeproc', dc_meta['attributes']['types']['schemaOrg']).capitalize()
    else:
        type_string = \
            dc_meta['attributes']['types'].get('resourceType', dc_meta['attributes']['types']['schemaOrg'])

    df_tmp = pd.concat([df_tmp, pd.Series([nsfid, doi, title, type_string]).T], axis=1)

In [7]:
df_tmp = df_tmp.T.reset_index().drop('index', axis=1)
df_tmp.columns = ['nsfid', 'doi', 'resource_title', 'type']

In [8]:
df_datacite_data = df_tmp.merge(nsf_project_titles)

In [9]:
df_datacite_data

Unnamed: 0,nsfid,doi,resource_title,type,title
0,1440351,10.1594/ieda/100709,iSamples Sample Management Training Module for...,Dataset,Earthcube Rcn: Isamples: The Internet Of Sampl...
1,1440351,10.1594/ieda/100691,iSamples Sample Management Training Module for...,Dataset,Earthcube Rcn: Isamples: The Internet Of Sampl...
2,1440351,10.6084/m9.figshare.4272164.v1,iSamples user stories: common themes and areas...,ScholarlyArticle,Earthcube Rcn: Isamples: The Internet Of Sampl...
3,1541390,10.18739/a24m9198b,Estimating the Freshwater Flux from the Greenl...,Dataset,Earthcube Rcn: Collaborative Research: Engagin...
4,1639683,10.6084/m9.figshare.14848713.v1,Intelligent Databases and Machine-Learning Ana...,Poster,Earthcube Data Infrastructure: Intelligent Dat...
5,1639764,10.1594/pangaea.892680,"Land2Sea database, Version 2.0",Dataset,Earthcube Building Blocks: Collaborative Propo...
6,1928406,10.5281/zenodo.5496306,Mapping ice flow velocity using an easy and in...,SoftwareSourceCode,Collaborative Research: Earthcube Data Capabil...
7,1639694,10.5065/p2jj-9878,Proceedings of the 2020 Improving Scientific S...,Report,Earthcube Building Blocks: Collaborative Propo...
8,1928393,10.5281/zenodo.6369184,QGreenland,SoftwareSourceCode,Earthcube Data Capabilities: Qgreenland: Enabl...
9,1928393,10.5281/zenodo.4558266,nsidc/qgreenland: v1.0.1,SoftwareSourceCode,Earthcube Data Capabilities: Qgreenland: Enabl...


In [10]:
df_datacite_data.nsfid.value_counts()

1440351    3
1928393    2
1541390    1
1639683    1
1639764    1
1928406    1
1639694    1
1440066    1
Name: nsfid, dtype: int64

In [11]:
df_datacite_data.type.value_counts()

Dataset               4
SoftwareSourceCode    3
ScholarlyArticle      1
Poster                1
Report                1
Article               1
Name: type, dtype: int64

In [12]:
with open("../outputs/datacite/datacite_table_01.md", "w") as fo:
    fo.write("| Resource Type |  EC Project | Resource Title |\n")
    fo.write("|:---:|:----|:----|\n")
    for r in df_datacite_data.itertuples():
        fo.write(
            f"| {r.type}" +
            f"|{r.title} (NSF [#{r.nsfid}](https://nsf.gov/awardsearch/showAward?AWD_ID={r.nsfid}&HistoricalAwards=false))" +
            f"| {r.resource_title} (doi: [{r.doi}](https://doi.org/{r.doi})) |\n"
            )

| Resource Type |  EC Project | Resource Title |
|:---:|:----|:----|
| Dataset |Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF [#1440351](https://nsf.gov/awardsearch/showAward?AWD_ID=1440351&HistoricalAwards=false)) | iSamples Sample Management Training Module for Soil Cores (doi: [10.1594/ieda/100709](https://doi.org/10.1594/ieda/100709)) |
| Dataset |Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF [#1440351](https://nsf.gov/awardsearch/showAward?AWD_ID=1440351&HistoricalAwards=false)) | iSamples Sample Management Training Module for Rock Outcrop Samples (doi: [10.1594/ieda/100691](https://doi.org/10.1594/ieda/100691)) |
| ScholarlyArticle |Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF [#1440351](https://nsf.gov/awardsearch/showAward?AWD_ID=1440351&HistoricalAwards=false)) | iSamples user stories: common themes and areas for future work (doi: [10.6084/m9.figshare.4272164.v1](https://doi.org/10.6084/m9.figshare.4272164.v1)) |
| Dataset |Earthcube Rcn: Collaborative Research: Engaging The Greenland Ice Sheet Ocean (Griso) Science Network (NSF [#1541390](https://nsf.gov/awardsearch/showAward?AWD_ID=1541390&HistoricalAwards=false)) | Estimating the Freshwater Flux from the Greenland Ice Sheet Workshop Report, American Geophysical Union, 2018 (doi: [10.18739/a24m9198b](https://doi.org/10.18739/a24m9198b)) |
| Poster |Earthcube Data Infrastructure: Intelligent Databases And Analysis Tools For Geospace Data (NSF [#1639683](https://nsf.gov/awardsearch/showAward?AWD_ID=1639683&HistoricalAwards=false)) | Intelligent Databases and Machine-Learning Analysis Tools for Heliophysics (doi: [10.6084/m9.figshare.14848713.v1](https://doi.org/10.6084/m9.figshare.14848713.v1)) |
| Dataset |Earthcube Building Blocks: Collaborative Proposal: Earthcube Data Discovery Hub (NSF [#1639764](https://nsf.gov/awardsearch/showAward?AWD_ID=1639764&HistoricalAwards=false)) | Land2Sea database, Version 2.0 (doi: [10.1594/pangaea.892680](https://doi.org/10.1594/pangaea.892680)) |
| SoftwareSourceCode |Collaborative Research: Earthcube Data Capabilities--Jupyter Meets The Earth: Enabling Discovery In Geoscience Through Interactive Computing At Scale (NSF [#1928406](https://nsf.gov/awardsearch/showAward?AWD_ID=1928406&HistoricalAwards=false)) | Mapping ice flow velocity using an easy and interactive feature tracking workflow (doi: [10.5281/zenodo.5496306](https://doi.org/10.5281/zenodo.5496306)) |
| Report |Earthcube Building Blocks: Collaborative Proposal: The Power Of Many: Ensemble Toolkit For Earth Sciences (NSF [#1639694](https://nsf.gov/awardsearch/showAward?AWD_ID=1639694&HistoricalAwards=false)) | Proceedings of the 2020 Improving Scientific Software Conference (doi: [10.5065/p2jj-9878](https://doi.org/10.5065/p2jj-9878)) |
| SoftwareSourceCode |Earthcube Data Capabilities: Qgreenland: Enabling Science Through Gis (NSF [#1928393](https://nsf.gov/awardsearch/showAward?AWD_ID=1928393&HistoricalAwards=false)) | QGreenland (doi: [10.5281/zenodo.6369184](https://doi.org/10.5281/zenodo.6369184)) |
| SoftwareSourceCode |Earthcube Data Capabilities: Qgreenland: Enabling Science Through Gis (NSF [#1928393](https://nsf.gov/awardsearch/showAward?AWD_ID=1928393&HistoricalAwards=false)) | nsidc/qgreenland: v1.0.1 (doi: [10.5281/zenodo.4558266](https://doi.org/10.5281/zenodo.4558266)) |
| Article |Earthcube Rcn: An Earthcube Oceanography And Geobiology Environmental Omics Research Coordination Network (Ecogeo Rcn) (NSF [#1440066](https://nsf.gov/awardsearch/showAward?AWD_ID=1440066&HistoricalAwards=false)) | EarthCube Oceanography and Geobiology Environmental 'Omics Research Coordination Network Workshop 1 Report (doi: [10.13140/rg.2.1.4908.4561](https://doi.org/10.13140/rg.2.1.4908.4561)) |

In [13]:
df_datacite_data.to_csv("../outputs/datacite/datacite_data_map.csv", index=False)