# BASIC JOURNAL ANALYSIS

The notebook uses _frozen_ static metadata files from the summer of 2022 to analyze the journal-level importance of EarthCube data.


It broadly aims to explore:

1. the proportion of EC papers that have received more citations than the average paper for the journal/year;
2. the cumulative citation count for this group and whether they are higher than the sum of all of these averages;
3. the top 10 papers in terms of % above the average for their journal/year
  

Fixed inputs to this notebook are:

* [../inputs/20220805_ec_journal_titles_plus_citations.xlsx](../inputs/20220805_ec_journal_titles_plus_citations.xlsx): the journal citation exported data from Web of Science
* [../inputs/cr_metadata_20220610012125.json](../inputs/cr_metadata_20220610012125.json): we use a fixed snapshop metadata file extracted from crossref on _June 10, 2022_.  

**NOTE:** We have fixed input snapshot targets because both sources are changing and updating at different times.  Furthermore, WOS data requires subscription, which makes dynamic replication difficult.

In [1]:
import pandas as pd
import json

In [2]:
df = pd.read_excel("../inputs/20220805_ec_journal_titles_plus_citations.xlsx")

In [3]:
df

Unnamed: 0.1,Unnamed: 0,doi,journal_title,publication_title,url,Year,Journal_Avg_cit
0,15,10.5065/p2jj-9878,--,--,https://doi.org/10.5065/p2jj-9878,,NI
1,79,10.1594/ieda/100709,--,--,https://doi.org/10.1594/ieda/https://doi.org/1...,,NI
2,81,10.5281/zenodo.5496306,--,--,https://doi.org/10.5281/zenodo.5496306,,NI
3,93,10.13140/rg.2.1.4908.4561,--,--,https://doi.org/10.13140/rg.2.1.4908.4561,,NI
4,109,10.6084/m9.figshare.4272164.v1,--,--,https://doi.org/10.6084/m9.figshare.4272164.v1,,NI
...,...,...,...,...,...,...,...
236,102,10.1111/tgis.12233,Transactions in GIS,Crowdsensing smart ambient environments and se...,https://doi.org/10.1111/tgis.12233,2016.0,13.08
237,219,10.1002/2015wr017342,Water Resources Research,Hydrocomplexity: Addressing water security and...,https://doi.org/10.https://doi.org/10.2/2015wr...,2015.0,40.97
238,27,10.22498/pages,,Past Global Changes Magazine,https://doi.org/10.22498/pages,,NI
239,60,10.17504/protocols.io.fjjbkkn,,ECOGEO 'Omics Training: Introduction to Enviro...,https://doi.org/10.17504/protocols.io.fjjbkkn,,NI


In [4]:
df['Journal_Avg_cit'] = df['Journal_Avg_cit'].astype(float, errors='ignore')

In [5]:
df

Unnamed: 0.1,Unnamed: 0,doi,journal_title,publication_title,url,Year,Journal_Avg_cit
0,15,10.5065/p2jj-9878,--,--,https://doi.org/10.5065/p2jj-9878,,NI
1,79,10.1594/ieda/100709,--,--,https://doi.org/10.1594/ieda/https://doi.org/1...,,NI
2,81,10.5281/zenodo.5496306,--,--,https://doi.org/10.5281/zenodo.5496306,,NI
3,93,10.13140/rg.2.1.4908.4561,--,--,https://doi.org/10.13140/rg.2.1.4908.4561,,NI
4,109,10.6084/m9.figshare.4272164.v1,--,--,https://doi.org/10.6084/m9.figshare.4272164.v1,,NI
...,...,...,...,...,...,...,...
236,102,10.1111/tgis.12233,Transactions in GIS,Crowdsensing smart ambient environments and se...,https://doi.org/10.1111/tgis.12233,2016.0,13.08
237,219,10.1002/2015wr017342,Water Resources Research,Hydrocomplexity: Addressing water security and...,https://doi.org/10.https://doi.org/10.2/2015wr...,2015.0,40.97
238,27,10.22498/pages,,Past Global Changes Magazine,https://doi.org/10.22498/pages,,NI
239,60,10.17504/protocols.io.fjjbkkn,,ECOGEO 'Omics Training: Introduction to Enviro...,https://doi.org/10.17504/protocols.io.fjjbkkn,,NI


In [6]:
df_subset = df[df['Journal_Avg_cit']!='NI']

In [7]:
df_subset[df_subset['Journal_Avg_cit']>30]

Unnamed: 0.1,Unnamed: 0,doi,journal_title,publication_title,url,Year,Journal_Avg_cit
45,134,10.1145/3129246,ACM Transactions on Database Systems,EmptyHeaded,https://doi.org/10.1145/3129246,2017.0,35.65
53,7,10.1175/bams-d-14-00164.1,Bulletin of the American Meteorological Society,The Earth System Prediction Suite: Toward a Co...,https://doi.org/10.1175/bams-d-14-00164.1,2016.0,30.85
54,65,10.1175/bams-d-15-00239.1,Bulletin of the American Meteorological Society,Sharing Experiences and Outlook on Coupling Te...,https://doi.org/10.1175/bams-d-15-00239.1,2016.0,30.85
70,147,10.1007/s40641-018-0107-0,Current Climate Change Reports,Rising Oceans Guaranteed: Arctic Land Ice Loss...,https://doi.org/10.https://doi.org/10.7/s40641...,2018.0,43.0
109,106,10.1186/s13073-015-0202-y,Genome Medicine,Use of semantic workflows to enhance transpare...,https://doi.org/10.1186/s13073-015-0202-y,2015.0,42.88
110,234,10.1111/gfl.12114,Geofluids,DigitalCrust - a 4D data system of material pr...,https://doi.org/10.1111/gfl.12114,2015.0,30.45
133,82,10.1109/tgrs.2014.2382566,IEEE Transactions on Geoscience and Remote Sen...,Regular Shape Similarity Index: A Novel Index ...,https://doi.org/10.1https://doi.org/10./tgrs.2...,2015.0,43.05
178,225,10.1038/nbt.4306,Nature Biotechnology,Minimum Information about an Uncultivated Viru...,https://doi.org/10.https://doi.org/10.8/nbt.4306,2019.0,67.85
179,47,10.1038/s41561-018-0272-8,Nature Geoscience,Similarity of fast and slow earthquakes illumi...,https://doi.org/10.https://doi.org/10.8/s41561...,2019.0,39.7
206,92,10.1016/j.renene.2017.02.052,Renewable Energy,Short-term photovoltaic power forecasting usin...,https://doi.org/10.https://doi.org/10.6/j.rene...,2017.0,32.19


In [8]:
df_json = pd.read_json("../inputs/cr_metadata_20220610012125.json").T

In [9]:
df_json.columns

Index(['indexed', 'reference-count', 'publisher', 'content-domain',
       'short-container-title', 'published-print', 'DOI', 'type', 'created',
       'source', 'is-referenced-by-count', 'title', 'prefix', 'author',
       'member', 'event', 'container-title', 'original-title', 'link',
       'deposited', 'score', 'resource', 'subtitle', 'short-title', 'issued',
       'references-count', 'URL', 'relation', 'published', 'issue', 'license',
       'funder', 'update-policy', 'volume', 'published-online', 'reference',
       'language', 'journal-issue', 'alternative-id', 'archive', 'ISSN',
       'issn-type', 'subject', 'assertion', 'abstract', 'page',
       'published-other', 'accepted', 'publisher-location', 'editor',
       'article-number', 'posted', 'subtype', 'isbn-type', 'ISBN',
       'institution', 'group-title'],
      dtype='object')

In [10]:
df_journals = df.merge(
    df_json[['DOI', 'is-referenced-by-count']].reset_index().drop('index',axis=1).rename(columns={'DOI': 'doi'}),
    on='doi'
)

## Journal Citation Counts vs Actual Citation Counts

Here our interest will turn to how well the actual citation counts match the expected journal citation counts from WOS.

In [11]:
df_journal_cva = df_journals.query('Journal_Avg_cit != "NI"') \
    [["Journal_Avg_cit", "Year", "journal_title", "is-referenced-by-count", 'doi']]

df_journal_cva['Journal_Avg_cit'] = df_journal_cva['Journal_Avg_cit'].astype(int) # note tjos rounds u
df_journal_cva['is-referenced-by-count'] = df_journal_cva['is-referenced-by-count'].astype(int)

In [12]:
df_journal_cva.describe()

Unnamed: 0,Journal_Avg_cit,Year,is-referenced-by-count
count,150.0,150.0,150.0
mean,14.96,2018.5,24.173333
std,15.534279,2.071701,47.443042
min,0.0,2013.0,0.0
25%,5.0,2017.0,4.0
50%,12.0,2019.0,10.0
75%,20.75,2020.0,21.75
max,126.0,2022.0,344.0


In [13]:
df_journal_cva[df_journal_cva['is-referenced-by-count']>=df_journal_cva['Journal_Avg_cit']]

Unnamed: 0,Journal_Avg_cit,Year,journal_title,is-referenced-by-count,doi
32,35,2017.0,ACM Transactions on Database Systems,43,10.1145/3129246
33,3,2018.0,AI Magazine,13,10.1609/aimag.v39i3.2816
34,0,2022.0,Applied Geochemistry,0,10.1016/j.apgeochem.2022.105273
38,9,2020.0,Biogeosciences,14,10.5194/bg-17-2537-2020
39,19,2018.0,BioScience,77,10.1093/biosci/biy068
...,...,...,...,...,...
211,9,2020.0,The Astrophysical Journal,22,10.3847/1538-4357/aba8a6
215,3,2021.0,The Astrophysical Journal,4,10.3847/1538-4357/abf2c8
217,0,2022.0,The Cryosphere,0,10.5194/tc-16-1431-2022
222,13,2016.0,Transactions in GIS,31,10.1111/tgis.12232


In [14]:
df_journal_cva[['Journal_Avg_cit', 'Year', 'journal_title']].sort_values(by='journal_title')

Unnamed: 0,Journal_Avg_cit,Year,journal_title
32,35,2017.0,ACM Transactions on Database Systems
33,3,2018.0,AI Magazine
34,0,2022.0,Applied Geochemistry
35,2,2021.0,Atmospheric Measurement Techniques
39,19,2018.0,BioScience
...,...,...,...
217,0,2022.0,The Cryosphere
223,13,2016.0,Transactions in GIS
222,13,2016.0,Transactions in GIS
224,40,2015.0,Water Resources Research


### JOURNAL ANALYSIS

* what is the average journal citation count (over all years)
* grouping by journal what are the paper averages

In [15]:
journals = df_journal_cva['journal_title'].unique()

In [16]:
df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').mean()


Unnamed: 0_level_0,Journal_Avg_cit
journal_title,Unnamed: 1_level_1
ACM Transactions on Database Systems,35.0
AI Magazine,3.0
Applied Geochemistry,0.0
Atmospheric Measurement Techniques,2.0
BioScience,19.0
...,...
The Astrophysical Journal Supplement Series,17.0
The Cryosphere,0.0
Transactions in GIS,13.0
Water Resources Research,40.0


In [17]:
df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').count().sort_values(by='Journal_Avg_cit', ascending=False)

Unnamed: 0_level_0,Journal_Avg_cit
journal_title,Unnamed: 1_level_1
The Astrophysical Journal,12
Environmental Modelling &amp; Software,8
ISPRS International Journal of Geo-Information,6
Computers &amp; Geosciences,5
Journal of Geophysical Research: Space Physics,5
...,...
International Journal of Digital Earth,1
International Journal of Remote Sensing,1
International Journal of Semantic Computing,1
Journal of Atmospheric and Oceanic Technology,1


In [18]:
df_journal_density_counts = \
    df_journal_cva[['Journal_Avg_cit', 'journal_title']]\
    .groupby('journal_title').count()\
    .sort_values(by='Journal_Avg_cit', ascending=False)
df_journal_density_counts = df_journal_density_counts.rename(columns={'Journal_Avg_cit': 'paper_counts'})

df_journal_density_counts

Unnamed: 0_level_0,paper_counts
journal_title,Unnamed: 1_level_1
The Astrophysical Journal,12
Environmental Modelling &amp; Software,8
ISPRS International Journal of Geo-Information,6
Computers &amp; Geosciences,5
Journal of Geophysical Research: Space Physics,5
...,...
International Journal of Digital Earth,1
International Journal of Remote Sensing,1
International Journal of Semantic Computing,1
Journal of Atmospheric and Oceanic Technology,1


In [19]:
df_journal_density_means = \
    df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').mean().sort_values(by='Journal_Avg_cit', ascending=False)

df_journal_density_means = df_journal_density_means.rename(columns={'Journal_Avg_cit': 'mean_cites'})

df_journal_density_means

Unnamed: 0_level_0,mean_cites
journal_title,Unnamed: 1_level_1
Science,106.5
Nature Biotechnology,67.0
Current Climate Change Reports,43.0
IEEE Transactions on Geoscience and Remote Sensing,43.0
Genome Medicine,42.0
...,...
"Engaging Science, Technology, and Society",0.0
Remote Sensing in Ecology and Conservation,0.0
The Cryosphere,0.0
Data in Brief,0.0


For all journals with 2 or more papers, how did they do:

In [20]:
df_journal_cva.columns

Index(['Journal_Avg_cit', 'Year', 'journal_title', 'is-referenced-by-count',
       'doi'],
      dtype='object')

### JOURNALS WITH MORE THAN 1 PUBLICATION

In [21]:
journal_list = df_journal_density_counts[df_journal_density_counts['paper_counts']>1].index.to_list()

df_journal_ec_mean_analysis = \
    df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
        .groupby('journal_title').mean() \
        .drop(columns=['Year']) \
        .merge(df_journal_density_counts, left_index=True, right_index=True) \
        .sort_values('paper_counts', ascending=False) \
        .rename(columns={'Journal_Avg_cit': 'journal_mean_cites', 'is-referenced-by-count': 'ec_mean_cites'})

df_journal_ec_mean_analysis.describe()

Unnamed: 0,journal_mean_cites,ec_mean_cites,paper_counts
count,27.0,27.0,27.0
mean,15.389198,22.391975,3.259259
std,19.555807,33.431244,2.297031
min,1.0,0.5,2.0
25%,7.45,8.6,2.0
50%,12.0,11.5,2.0
75%,17.0625,23.875,3.0
max,106.5,175.0,12.0


In [22]:
df_journal_ec_mean_analysis.paper_counts.sum()

88

* **88 papers (58.67%) with 2 or more** papers in a single journal (27 distinct journals)

In [23]:
df_journal_ec_mean_analysis

Unnamed: 0_level_0,journal_mean_cites,ec_mean_cites,paper_counts
journal_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
The Astrophysical Journal,10.833333,9.166667,12
Environmental Modelling &amp; Software,17.125,11.5,8
ISPRS International Journal of Geo-Information,9.166667,10.5,6
Journal of Geophysical Research: Space Physics,7.4,10.8,5
Computers &amp; Geosciences,8.4,8.2,5
JAWRA Journal of the American Water Resources Association,13.75,24.25,4
Bulletin of the American Meteorological Society,28.333333,34.666667,3
Concurrency and Computation: Practice and Experience,1.0,2.0,3
Earth and Space Science,15.333333,34.0,3
Journal of Proteome Research,7.666667,17.666667,3


In [24]:
df_journal_ec_mean_analysis[ 
    df_journal_ec_mean_analysis.ec_mean_cites>=df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()

47

* **47 of 88 papers (53.4%)** with a mean_cite greater than or equal to journal_mean_cite (for all years, for journals where there were more than 1 publication)

### JOURNALS WITH ONLY 1 PUBLICATION

In [25]:
journal_list = df_journal_density_counts[df_journal_density_counts['paper_counts']<2].index.to_list()

df_journal_ec_mean_analysis = \
    df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
        .groupby('journal_title').mean() \
        .drop(columns=['Year']) \
        .merge(df_journal_density_counts, left_index=True, right_index=True) \
        .sort_values('paper_counts', ascending=False) \
        .rename(columns={'Journal_Avg_cit': 'journal_mean_cites', 'is-referenced-by-count': 'ec_mean_cites'})

df_journal_ec_mean_analysis.describe()

Unnamed: 0,journal_mean_cites,ec_mean_cites,paper_counts
count,62.0,62.0,62.0
mean,16.370968,32.387097,1.0
std,13.698431,56.452027,0.0
min,0.0,0.0,1.0
25%,6.25,3.25,1.0
50%,12.5,13.5,1.0
75%,23.0,38.5,1.0
max,67.0,344.0,1.0


* **62 of 150 papers** with only 1 publication in a journal
* **journal means = 16.37, earthcube paper means = 32.39**
* stdevs are very different (a lot more variance)

In [26]:
df_journal_ec_mean_analysis

Unnamed: 0_level_0,journal_mean_cites,ec_mean_cites,paper_counts
journal_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ACM Transactions on Database Systems,35.0,43.0,1
PLoS ONE,26.0,54.0,1
Journal of Climate,14.0,12.0,1
Journal of Environmental Quality,21.0,73.0,1
Journal of Sedimentary Research,10.0,4.0,1
...,...,...,...
Hydrological Processes,21.0,16.0,1
IEEE Systems Journal,25.0,5.0,1
IEEE Transactions on Geoscience and Remote Sensing,43.0,23.0,1
IEEE Transactions on Parallel and Distributed Systems,24.0,2.0,1


In [27]:
df_journal_ec_mean_analysis[ 
    df_journal_ec_mean_analysis.ec_mean_cites>=df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()

32

In [28]:
df_journal_ec_mean_analysis[ 
    df_journal_ec_mean_analysis.ec_mean_cites>df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()

26

* **26 of 62 (41.9%) EC papers produce more than mean citations** than the journal means
* **32 of 62 (51.6%) EC papers produce at least the mean citations** from the journal means

* the top 10 papers in terms of % above the average for their journal/year

In [29]:
df_journal_cva

Unnamed: 0,Journal_Avg_cit,Year,journal_title,is-referenced-by-count,doi
32,35,2017.0,ACM Transactions on Database Systems,43,10.1145/3129246
33,3,2018.0,AI Magazine,13,10.1609/aimag.v39i3.2816
34,0,2022.0,Applied Geochemistry,0,10.1016/j.apgeochem.2022.105273
35,2,2021.0,Atmospheric Measurement Techniques,0,10.5194/amt-14-6917-2021
38,9,2020.0,Biogeosciences,14,10.5194/bg-17-2537-2020
...,...,...,...,...,...
216,17,2020.0,The Astrophysical Journal Supplement Series,2,10.3847/1538-4365/aba4aa
217,0,2022.0,The Cryosphere,0,10.5194/tc-16-1431-2022
222,13,2016.0,Transactions in GIS,31,10.1111/tgis.12232
223,13,2016.0,Transactions in GIS,16,10.1111/tgis.12233


In [30]:
df_journal_cva_reindexed = df_journal_cva.rename(columns={
    'Journal_Avg_cit': 'journal_mean_cites',
    'Year': 'year',
    'is-referenced-by-count': 'ec_cites'
    }).reset_index().drop('index', axis=1)
df_journal_cva_reindexed

Unnamed: 0,journal_mean_cites,year,journal_title,ec_cites,doi
0,35,2017.0,ACM Transactions on Database Systems,43,10.1145/3129246
1,3,2018.0,AI Magazine,13,10.1609/aimag.v39i3.2816
2,0,2022.0,Applied Geochemistry,0,10.1016/j.apgeochem.2022.105273
3,2,2021.0,Atmospheric Measurement Techniques,0,10.5194/amt-14-6917-2021
4,9,2020.0,Biogeosciences,14,10.5194/bg-17-2537-2020
...,...,...,...,...,...
145,17,2020.0,The Astrophysical Journal Supplement Series,2,10.3847/1538-4365/aba4aa
146,0,2022.0,The Cryosphere,0,10.5194/tc-16-1431-2022
147,13,2016.0,Transactions in GIS,31,10.1111/tgis.12232
148,13,2016.0,Transactions in GIS,16,10.1111/tgis.12233


In [31]:
# filter on items that ec beats (or is equal to) the journal mean
df_journal_cva_reindexed[
    df_journal_cva_reindexed.ec_cites>=df_journal_cva_reindexed.journal_mean_cites
]

# cast year as int
df_journal_cva_reindexed.year = df_journal_cva_reindexed.year.astype(int)

# norm the pct diff (i.e. 200% = "100% more")
df_journal_cva_reindexed['ec_pct_diff'] = \
    100*((df_journal_cva_reindexed.ec_cites / df_journal_cva_reindexed.journal_mean_cites) - 1)

* top 20 Earthcube papers by pct diff from journal mean

In [90]:
df_tmp = df_journal_cva_reindexed.sort_values(by='ec_pct_diff', ascending=False) \
    .query("journal_mean_cites > 0")[:20]
# df_tmp.doi = df_tmp.doi.apply(lambda d: f"[{d}](https://doi.org/{d})")
df_tmp.columns = ['Journal Mean Cites', 'Publication Year', 'Journal Title', 'EC Cites', 'EC Publication DOI', '% diff mean cites']
df_tmp.to_csv("../outputs/ec_journals_top_pct_delta.csv")

In [91]:
df_tmp

Unnamed: 0,Journal Mean Cites,Publication Year,Journal Title,EC Cites,EC Publication DOI,% diff mean cites
74,17,2017,International Journal of Digital Earth,344,10.1080/17538947.2016.1239771,1923.529412
61,25,2017,Geophysical Research Letters,336,10.1002/2017gl074954,1244.0
117,10,2018,Quaternary Research,120,10.1017/qua.2017.105,1100.0
18,1,2021,Computing in Science &amp; Engineering,10,10.1109/mcse.2021.3059437,900.0
10,4,2019,Communications of the ACM,39,10.1145/3192335,875.0
19,1,2021,Computing in Science &amp; Engineering,8,10.1109/mcse.2021.3059263,700.0
68,4,2021,GSA Bulletin,30,10.1130/b35560.1,650.0
75,8,2019,International Journal of Remote Sensing,49,10.1080/01431161.2018.1516313,512.5
28,15,2016,Earth and Space Science,91,10.1002/2015ea000136,506.666667
85,13,2017,JAWRA Journal of the American Water Resources ...,72,10.1111/1752-1688.12474,453.846154


In [92]:
df_citations = pd.read_csv("../outputs/full_nsf_doi_project_summary.tsv", sep='\t')[['doi', 'ams_bib']]

In [118]:

print("""
|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|""")
for i, r in df_tmp[:10].iterrows():
    doi = df_citations.query(f"doi == '{r['EC Publication DOI']}'").drop_duplicates()['ams_bib'].unique()[0]
    print(f"{doi} | {r['EC Cites']} | {r['Journal Mean Cites']} | {r['% diff mean cites']:.2f}% |")


|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|
Yang, C., Q. Huang, Z. Li, K. Liu, and F. Hu, 2016: Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53, https://doi.org/10.1080/17538947.2016.1239771. | 344 | 17 | 1923.53% |
Morlighem, M., and Coauthors, 2017: BedMachine v3: Complete Bed Topography and Ocean Bathymetry Mapping of Greenland From Multibeam Echo Sounding Combined With Mass Conservation. Geophysical Research Letters, 44, https://doi.org/10.1002/2017gl074954. | 336 | 25 | 1244.00% |
Williams, J. W., and Coauthors, 2018: The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research, 89, 156–177, https://doi.org/10.1017/qua.2017.105. | 120 | 10 | 1100.00% |
Abernathey, R. P., and Coauthors, 2021: Cloud-Native Repositories for Big Scientific Data. Computing in Science &amp; Engineering, 23, 26–35, https://doi.org/10.110

|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|
Yang, C., Q. Huang, Z. Li, K. Liu, and F. Hu, 2016: Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53, https://doi.org/10.1080/17538947.2016.1239771. | 344 | 17 | 1923.53% |
Morlighem, M., and Coauthors, 2017: BedMachine v3: Complete Bed Topography and Ocean Bathymetry Mapping of Greenland From Multibeam Echo Sounding Combined With Mass Conservation. Geophysical Research Letters, 44, https://doi.org/10.1002/2017gl074954. | 336 | 25 | 1244.00% |
Williams, J. W., and Coauthors, 2018: The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research, 89, 156–177, https://doi.org/10.1017/qua.2017.105. | 120 | 10 | 1100.00% |
Abernathey, R. P., and Coauthors, 2021: Cloud-Native Repositories for Big Scientific Data. Computing in Science &amp; Engineering, 23, 26–35, https://doi.org/10.1109/mcse.2021.3059437. | 10 | 1 | 900.00% |
Gil, Y., and Coauthors, 2018: Intelligent systems for geosciences. Communications of the ACM, 62, 76–84, https://doi.org/10.1145/3192335. | 39 | 4 | 875.00% |
Granger, B. E., and F. Perez, 2021: Jupyter: Thinking and Storytelling With Code and Data. Computing in Science &amp; Engineering, 23, 7–14, https://doi.org/10.1109/mcse.2021.3059263. | 8 | 1 | 700.00% |
Schaen, A. J., and Coauthors, 2020: Interpreting and reporting 40Ar/39Ar geochronologic data. GSA Bulletin, 133, 461–487, https://doi.org/10.1130/b35560.1. | 30 | 4 | 650.00% |
Sun, Z., L. Di, and H. Fang, 2018: Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. International Journal of Remote Sensing, 40, 593–614, https://doi.org/10.1080/01431161.2018.1516313. | 49 | 8 | 512.50% |
Gil, Y., and Coauthors, 2016: Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science, 3, 388–415, https://doi.org/10.1002/2015ea000136. | 91 | 15 | 506.67% |
Maidment, D. R., 2016: Conceptual Framework for the National Flood Interoperability Experiment. JAWRA Journal of the American Water Resources Association, 53, 245–257, https://doi.org/10.1111/1752-1688.12474. | 72 | 13 | 453.85% |

**TAKEAWAYS**

* journal average citation: 14.96
* average citation of EC papers: 24.17 *from WOS metadata citation counts
* EC papers beat average 79 times (52.7% of the time)


In [35]:
df_journal_cva[df_journal_cva['Journal_Avg_cit']>20]

Unnamed: 0,Journal_Avg_cit,Year,journal_title,is-referenced-by-count,doi
32,35,2017.0,ACM Transactions on Database Systems,43,10.1145/3129246
40,30,2016.0,Bulletin of the American Meteorological Society,33,10.1175/bams-d-14-00164.1
41,30,2016.0,Bulletin of the American Meteorological Society,6,10.1175/bams-d-15-00239.1
42,25,2018.0,Bulletin of the American Meteorological Society,65,10.1175/bams-d-16-0215.1
50,27,2017.0,"Computers, Environment and Urban Systems",14,10.1016/j.compenvurbsys.2016.11.007
51,27,2017.0,"Computers, Environment and Urban Systems",108,10.1016/j.compenvurbsys.2016.10.010
57,43,2018.0,Current Climate Change Reports,21,10.1007/s40641-018-0107-0
61,26,2017.0,Earth and Planetary Science Letters,41,10.1016/j.epsl.2016.12.012
70,28,2017.0,Environmental Modelling &amp; Software,29,10.1016/j.envsoft.2017.01.021
72,28,2017.0,Environmental Modelling &amp; Software,4,10.1016/j.envsoft.2017.03.032
