##Dataset 1: a dataset of open access journals

This dataset is downloaded from the Directory of Open Access Journals. It contains 57 fields of comprehensive metadata information about 12,642 open access (OA) journals from 128 countries. The metadata information of OA journals includes their title, URL, unique identifier like ISSN/EISSN, publisher, society or institution, as well as detailed information like download statistics, review process information, licensing and copyright information, etc. The data is stored in csv file format without any codebook available. But the dataset itself is quite straightforward. Please see a compiled sample below:



In [0]:
import pandas as pd

print("A list of open access journals from the Directory of Open Access Journals")
doaj_feb2018 = pd.read_csv("https://doaj.org/csv")
print(doaj_feb2018.head(10))

# Here I sample a subset of open access journals of which the publisher is located in U.S.
print("Open access journals published in U.S.")
doaj_us = doaj_feb2018.loc[doaj_feb2018['Country of publisher'] == "United States"]
print(doaj_us.head(10))



A list of open access journals from the Directory of Open Access Journals
                              Journal title  \
0                  Revista de Microbiologia   
1  Anais da Academia Brasileira de Ciências   
2                                      ACME   
3                Acta Dermato-Venereologica   
4                           Acta Mycologica   
5      Acta Societatis Botanicorum Poloniae   
6               Acta Stomatologica Croatica   
7                     Acta Veterinaria Brno   
8                           Africa Spectrum   
9                    Revista Alergia México   

                                         Journal URL  \
0  http://www.scielo.br/scielo.php/script_sci_ser...   
1  http://www.scielo.br/scielo.php?script=sci_ser...   
2             http://riviste.unimi.it/index.php/ACME   
3                 http://www.medicaljournals.se/acta   
4  https://pbsociety.org.pl/journals/index.php/am...   
5  https://pbsociety.org.pl/journals/index.php/as...   
6  http://hrcak.

##Dataset 2: A dataset about Open Access Repository Mandates and Policies

This dataset is queried from The Registry of Open Access Repository Mandates and Policies (ROARMAP). It records open access mandates and policies adopted by universities, research institutions, and research funders that require or request their researchers to provide open access to their peer-reviewed research article output by depositing it in an open access repository. The data is stored in json file format. No codebook available. But the web query interface provides some information about the contained data fields.

In [0]:
import pandas as pd

print("A dataset about open access mandates and policy")
oapolicy_2018 = pd.read_json("http://roarmap.eprints.org/cgi/search/archive/advanced/export_roarmap_JSON.js?screen=Search&dataset=archive&_action_export=1&output=JSON&exp=0%7C1%7Cpolicymaker_name%7Carchive%7C-%7Ccountry%3Acountry%3AANY%3AEQ%3A840%7Cdeposit_of_item%3Adeposit_of_item%3AANY%3AEQ%3Arequired+requested+not_specified%7Clocus_of_deposit%3Alocus_of_deposit%3AANY%3AEQ%3Ainstitution_repo+suject_repo+any_repo+not_specified%7Cmaking_deposit_open%3Amaking_deposit_open%3AANY%3AEQ%3Arequired+recommended+not_mentioned+other%7Copen_licensing_conditions%3Aopen_licensing_conditions%3AANY%3AEQ%3Ano_req+req_open+req_cc_by+req_cc_by_nc+req_diff_open+other+not_specified%7Cpolicymaker_type%3Apolicymaker_type%3AANY%3AEQ%3Afunder+research_org+funder_and_research_org+multiple_research_orgs+research_org_subunit%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow&n=&cache=79843")
print(roadmap_2018.head(10))

A dataset about open access mandates and policy
  added_by                                        apc_fun_url  \
0      NaN                                                NaN   
1      EOS                                                NaN   
2      EOS  https://www.amherst.edu/media/view/20316/origi...   
3      EOS             http://libguides.asu.edu/OAMemberships   
4      EOS                                                NaN   
5      NaN                                                NaN   
6      EOS                                                NaN   
7      EOS                                                NaN   
8      EOS  http://scholcomm.brandeis.edu/open-access/bran...   
9      EOS                                                NaN   

             apc_funding can_deposit_be_waived  country  \
0          not_mentioned                   yes      840   
1  institutional_funding                    no      840   
2  institutional_funding                   yes      840   


##Dataset 3: Journal ranking by impact factors

This dataset is retrieved from Web of Science InCites Journal Citation Reports (2017) in csv format. It contains journal title, total cite count, journal impact factor, and an eigenfactor score. I filtered journal data from Computer Science and Information Science disciplines. No codebook available.

In [0]:
import pandas as pd

jrank_2017 = pd.read_csv("jrank.csv")
print(jrank_2017.head(10))
                         

   Rank                                 Full Journal Title Total Cites  \
0     1                    Journal of Statistical Software      14,900   
1     2                                        IEEE Access       6,291   
2     3  International Journal of Distributed Sensor Ne...       4,254   
3     4                       COLLEGE & RESEARCH LIBRARIES       1,121   
4     5         JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION         932   
5     6                          COMPUTATIONAL LINGUISTICS       1,864   
6     7        JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH       3,157   
7     8               INFORMATION TECHNOLOGY AND LIBRARIES         245   
8     8               INFORMATION TECHNOLOGY AND LIBRARIES         245   

   Journal Impact Factor  Eigenfactor Score  
0                 22.737              0.039  
1                  3.557              0.019  
2                  1.787              0.012  
3                  1.626              0.001  
4                  1.541     