# Part II: Academic Analyzer Framework

In [Part I]() of our tutorial, we explained how to load data from a variety of datasets into SFrame objects and MongoDB collections. While the SFrame objects make it easy to answer questions about how various academic trends change over time (see [Part III]() of the tutorial), it is still challenging to answer more complicated questions using these types of objects. For example, it would hard to calculate how many papers in Nature were written by a second author who is from the University of Washington, or how many papers have first authors who published in PLOS ONE in 2014.
To answer these types of more complicated questions, we developed a code framework which provides easy object-oriented access to academic data stored in MongoDB. Our framework uses several basic object classes, such as Author, Paper, and AuthorCollection, that let us use Python code to answer complicated questions. In the following sections, we will explain each object class and give examples of how to use it.

## 1. The Paper Class

The Paper class is based on paper data from the MAG dataset and the AMinerMAG dataset (see tutorial Part I). The main idea behind this class is to make it easy to fetch data on a specific paper. Given a paper ID, it is possible to construct a paper object using the following code:

In [3]:
%load_ext autoreload
%autoreload 2
%aimport
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Modules to reload:
all-except-skipped

Modules to skip:



In [4]:
import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

In [12]:
from ScienceDynamics.paper import Paper
p = Paper('75508021')

#We can easily get various basic paper attribute
print(f"Paper's id {p.paper_id}")
print(f"Paper's refrences count {p.references_count}")
print(f"Paper's venue name {p.venue_name}")
print(f"Paper's publication year {p.publish_year}")
print(f"Paper's title '{p.title}'")
print(f"Paper's keywords list {p.keywords_list}" )
print(f"Paper's authors ids list -  {p.author_ids_list}") # we can also get list of author objects using p.authors_list

2019-05-05 11:03:37,998 [MainThread  ] [DEBUG]  Fetching paper 75508021


Paper's id 75508021
Paper's refrences count 8
Paper's venue name Nature
Paper's publication year 2007
Paper's title 'Cell biology: The checkpoint brake relieved'
Paper's keywords list ['proteomics', 'functional genomics', 'medicine', 'ecology', 'evolution', 'molecular biology', 'pharmacology', 'transcriptomics', 'biology', 'computational biology', 'cell cycle', 'environmental science', 'marine biology', 'biochemistry', 'cancer', 'dna', 'systems biology', 'astrophysics', 'climate change', 'quantum physics', 'cell biology', 'genetics', 'genomics', 'geophysics', 'neurobiology', 'materials science', 'nature', 'bioinformatics', 'structural biology', 'biotechnology', 'earth science', 'metabolomics', 'cell division', 'immunology', 'evolutionary biology', 'palaeobiology', 'cell signalling', 'signal transduction', 'medical research', 'neuroscience', 'rna', 'astronomy', 'nanotechnology', 'physics', 'drug discovery', 'developmental biology']
Paper's authors ids list -  ['83162F0D']


We can also get more complicated paper features.

In [14]:
print(f"Did the paper's last author publised in the venue before? Answer: {p.did_last_author_publish_in_venue()}")
print(f"Paper's total citations in 2015 {p.get_total_citations_number_in_year(2015)}") 
print(f"Paper's max citation number in a specific year{p.get_max_citations_number_in_year()}")  
print(f"The number of times paper's authors published in the paper's venue in the past - { p.total_number_of_times_authors_published_in_venue}")

Did the paper's last author publised in the venue before? Answer: True
Paper's total citations in 2015 1.0
Paper's max citation number in a specific year2.0
The number of times paper's authors published in the paper's venue in the past - 2


If the paper is in the AMinerMAG dataset, additional features are available, such as the paper's abstract and ISSN.

In [15]:
p = Paper('778DE072')
print(f"Paper's abstract: \n\n {p.abstract}")
print(f"Paper's ISSN {p.issn}")

2019-05-05 11:06:37,387 [MainThread  ] [DEBUG]  Fetching paper 778DE072


Paper's abstract: 

 In spite of various cytogenetic works on suborder Heteroptera, the chromosome organization, function and its evolution in this group is far from being fully understood. Cytologically, the family Rhyparochromidae constitutes a heterogeneous group differing in chromosome numbers. This family possesses XY sex mechanism in the majority of the species with few exceptions. In the present work, multiple banding techniques viz., C-banding, base-specific fluorochromes (DAPI/CMA3) and silver nitrate staining have been used to cytologically characterize the chromosomes of the seed plant pest Elasmolomus (Aphanus) sordidus Fabricius, 1787 having 2n=12=8A+2m+XY. One pair of the autosomes was large while three others were of almost equal size. At diplotene, C-banding technique revealed, that three autosomal bivalents show terminal constitutive heterochromatic bands while one medium sized bivalent was euchromatic. Microchromosomes (m-chromosomes) were positively heteropycnotic. A

The Paper class contains many functions that can be used to extract additional data on each paper. Moreover, the full paper's object data can be accessed using the following line:

In [16]:
p._json_data

{'_id': ObjectId('5cca6b8d65387acb1290171d'),
 'Aminer Paper ID': '55a5ce0d65ce60f99bf5c02d',
 'Authors List Sorted': '["7FDB7566","80418DAE"]',
 'Authors Number': 2.0,
 'Conference ID mapped to venue name': 'nan',
 'Field of study list': '["0660586C","039D5C06"]',
 'Field of study list names': '[,"Bioinformatics"]',
 'Fields of study parent list (L0)': '[]',
 'Fields of study parent list (L1)': '["039D5C06"]',
 'Fields of study parent list (L2)': '[]',
 'Fields of study parent list (L3)': '[]',
 'Fields of study parent list names (L0)': '[]',
 'Fields of study parent list names (L1)': '["Bioinformatics"]',
 'Fields of study parent list names (L2)': '[]',
 'Fields of study parent list names (L3)': '[]',
 'Journal ID mapped to venue name': '0BDFC074',
 'Keywords List': '["biomedical research","bioinformatics"]',
 'MAG Paper ID': '778DE072',
 'Normalized paper title': 'first report on c banding fluorochrome staining and nor location in holocentric chromosomes of elasmolomus aphanus sordi

More details on the Paper class can be found at [paper.py]() 

# 2. The Author Class

The Author class is based on author data from the MAG dataset (see tutorial Part I). The dataset contains data based on over 22 million authors. The main idea behind this class is to make it easy to fetch data on a specific author using the author's author id. To obtain an author's *author id* we can use the following code:

In [20]:
from ScienceDynamics.author import Author
import re

l = Author.find_authors_id_by_name('tim bernerslee')
print(f"Matching Authors ids {l}")

#we can also us e regex to find matching authors ids
r = re.compile(r"Tim\s+B.*", re.IGNORECASE)
l = Author.find_authors_id_by_name(r)
print(f"Matching Authors ids {l[:10]}")

Matching Authors ids ['79762927']
Matching Authors ids ['7FD415E0', '82479236', '85145FD9', '7DB99A96', '7DB0F51C', '7EEB86E0', '7F309657', '7FBB9CFB', '29542192', '849D27DC']


Given an Author ID, it is possible to construct an Author object using the following code:

In [25]:
from ScienceDynamics.venue import VenueType
a = Author(u'79762927')
print(f"Author's full name: {a.fullname}")
print(f"Author's papers number: {a.papers_number}")
print(f"Author's journals list between 2000-2010:{a.get_venues_list(VenueType.journal, start_year=2000, end_year=2010)}")
print(f"Author's predict gender and name's male probability: {a.gender} {a.male_probability}") # predicting the author gender based on his/her first name
print(f"Author's last publication year in the dataset: {a.last_publication_year}")

Author's full name: tim bernerslee
Author's papers number: 20
Author's journals list between 2000-2010:['062B05D6', '038E80CE', '09B77941', '06CF2E55', '046515A4', '0ACD8486', '0BB0DA67', '0BB0DA67']
Author's predict gender and name's male probability: Male 0.9976552374219394
Author's last publication year in the dataset: 2012


We can use the code to find more complex insights for example who is the author's most common coauthor:

In [27]:
from collections import Counter
coauthors = a.get_coauthors_list(start_year=None, end_year=None)
print(f"Author's number of coauthors {len(coauthors)}")
c = Counter(coauthors)
print("Author's most common coauthor's ID - %s (number of joint papers %s)" % c.most_common(1)[0])

Author's number of coauthors 90
Author's most common coauthor's ID - 0E390AA0 (number of joint papers 7)


The author's object data can be accessed using the following line:

In [28]:
a._json_data

{'_id': ObjectId('5cc93f9765387acb120bd990'),
 'Author ID': '79762927',
 'Papers by Years Dict': {2012: ['0363BC68'],
  2006: ['69094209',
   '80ED6199',
   '800F70F4',
   '7E22A795',
   '627243A2',
   '0B340A09'],
  2007: ['5B3F9EC1', '77AA1DDA'],
  2009: ['7E4620BC', '5EA67E7B', '781D60AA', '815E4846'],
  2008: ['7EB0A7DF', '75A42130', '762EF5EE', '7FF2B178'],
  2010: ['8075C5EC', '858CCF29', '771F58EB']},
 'Coauthors by Years Dict': {2007: ['7E5C9B91',
   '0E390AA0',
   '7BAA4A2F',
   '8635F2AD',
   '19931D56',
   '18056665',
   '5D6F8B09',
   '76BC859C',
   '7B1EE4CE'],
  2009: ['84D4AC28',
   '0E4BE3F3',
   '79B8ED2E',
   '0EBE528E',
   '7E9E7340',
   '7CF754AE',
   '7F1CB5D0',
   '807F3031',
   '760D21FE',
   '82E93258',
   '7EDC9E57',
   '0C833833',
   '2873C9C0',
   '27DC87D1',
   '19931D56',
   '02D2C557',
   '0E390AA0',
   '01C84A39',
   '456CD085',
   '7B1EE4CE',
   '036D501B',
   '13C11734'],
  2010: ['77E10B49',
   '721704E3',
   '7D11AF84',
   '7628BD51',
   '7F221475',
 

More details on the Author class can be found at [author.py]() 

# 3. The Papers Collection Class

The goal of the Papers Collection class is to make it easy to analyze a list of paper objects. The class makes it possible to easily filter papers which were published in a specific year, and to obtain various insights regarding the papers and their authors. For example, let's select papers that were published in Nature journal and are part of the AMinerMAG dataset.

In [30]:
from ScienceDynamics.datasets.microsoft_academic_graph import MicrosoftAcademicGraph
from ScienceDynamics.config.configs import DATASETS_BASE_DIR
mag = MicrosoftAcademicGraph(DATASETS_BASE_DIR / "MicrosoftAcademicGraph.zip")
sf = mag.extended_papers
sf = sf[sf['Original venue name'] == 'Nature'] # Another option to get paper ids is to use PAPERS_FETCHER.get_papers_ids_by_issn(issn) 
sf = sf[sf['Ref Number'] >= 5]
sf.materialize()
paper_ids = list(sf['Paper ID'])

Now let's define a papers collection object which contains all these papers:

In [35]:
from ScienceDynamics.papers_collection_analyer import PapersCollection
pc = PapersCollection(paper_ids) # this is a lazy object
#Create a list of papers object
print(f"Number of retrieved paper ids with at least 5 refs- {len(pc.papers_list)}")

100%|██████████| 32536/32536 [00:14<00:00, 2211.42it/s]

Number of retrieved paper ids with at least 5 refs- 32536





Now we can use the papers collection object to gain various insights regarding the papers in the collection.

In [36]:
print(f"Min publication year {pc.min_publication_year}")
print(f"Max publication year {pc.max_publication_year}")
print(f"Last Authors Median Academic Age in 2000 {pc.last_authors_median_age(2000)}")
print(f"Last Authors Median Academic Age in 2010 - {pc.last_authors_median_age(2010)}")

2019-05-05 12:09:39,758 [MainThread  ] [DEBUG]  Fetching author 7F182519
2019-05-05 12:09:39,762 [MainThread  ] [DEBUG]  Fetching author 2761E14D
2019-05-05 12:09:39,765 [MainThread  ] [DEBUG]  Fetching author 7D63AB2B
2019-05-05 12:09:39,768 [MainThread  ] [DEBUG]  Fetching author ]
2019-05-05 12:09:39,770 [MainThread  ] [WARNI]  Failed to fetch author ] features


Min publication year 1930
Max publication year 2015


AuthorNotFound: 

Let's calculate what is the most-cited paper in the collection:

In [71]:
p = pc.max_citations_paper(2015, include_self_citations=True)
print(p.title, p.total_citations_number_by_year(2015, include_self_citation=True))

Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4 118108.0


Let's calculate the median citation number for papers that were published in 2000 after 5 years:

In [68]:
print(f"Median citation number for papers that were published in 2000 after 5 years - {pc.papers_median_citations_after_years(2009, 5, True)}")

Median citation number for papers that were published in 2000 after 5 years - 77.0


We can also get the papers’ top keywords in various years:

In [74]:
print(f"Top 10-keywords in 1950 {pc.top_keywords(1980, top_keywords_number=10)}")
print(f"Top 10-keywords in 2015 {pc.top_keywords(2015, top_keywords_number=10)}")

Top 10-keywords in 1950 {'genetics': 46, 'enzyme': 34, 'dna sequence': 23, 'molecular weight': 20, 'biochemistry': 18, 'nucleotide sequence': 18, 'dopamine': 17, 'escherichia coli': 16, 'amino acid': 16, 'central nervous system': 14}
Top 10-keywords in 2015 {'nature': 137, 'x ray crystallography': 25, 'cell biology': 12, 'neuroscience': 11, 'geochemistry': 10, 'palaeontology': 10, 'genetics': 10, 'rna': 9, 'biochemistry': 9, 'stars': 9}


The Papers Collection class provides wide functionality to obtain insights about the papers that easily can be extended. More details on the Papers Collection class can be found in papers_collections_analyzer.py

# 4. The Authors Collection Class

The goal of the Authors Collection class is to provide an easy way to analyze a variety of features of list of Author objects, such as age and gender statistics. Let's, for example, take all the authors that published in Nature in 2010:

In [15]:
from authors_list_analyzer import AuthorsListAnalyzer
authors_list = pc.all_authors_in_year_list(2010) # We can also consturct the 
ac = AuthorsListAnalyzer(authors_list)
print "Authors average academic age in 2010 - %s" % ac.get_average_age(2010)
print 'Authors average number of publications between 2005 and 2010 - %s' % ac.get_average_publication_number(2005,2010)
print "Authors Gender stats - %s" % ac.get_gender_stats()

Authors average academic age in 2010 - 12.468946188340807
Authors average number of publications between 2005 and 2010 - 19.47690582959641
Authors Gender stats - Counter({u'Male': 4339, u'Unisex': 1954, u'Female': 1611, None: 1016})


# 5. The Venue Class

The Venue class is an extension of the Papers Collection class, and it supports all the Papers Collection class functionality. Moreover, the main goal of the Venue class is to easily analyze venues (especially journals) and to help to understand how various venue's features change over time. The class can be constructed using MAG's venue ID, venue name, ISSN list, or a list of MAG paper IDs. For example, if we want to analyze Science journal, we can use the following line:

In [79]:
from ScienceDynamics.venue import Venue
# this will search and load all the MAG papers from Science  
v = Venue(venue_name="Science")
# Total papers: 212,305 papers
print(f"Total paper {len(v.papers_list)}")
print(f"Top 10-keywords in 1950 {v.top_keywords(1960, top_keywords_number=10)}")
print(f"Top 10-keywords in 2015 {v.top_keywords(2015, top_keywords_number=10)}")

2019-05-05 16:20:23,453 [MainThread  ] [INFO ]  Consturcting a Venue object with the following params venue_id=None, venue_name=Science, issn_list=()
2019-05-05 16:20:23,454 [MainThread  ] [INFO ]  Getting papers id of venue_id=None,venue_name=Science. and issn_list=()
2019-05-05 16:20:34,434 [MainThread  ] [INFO ]  Consturcted a Venue object with 212305 papers
2019-05-05 16:20:34,436 [MainThread  ] [INFO ]  Get SJR data of venue_name=Science, issn_list=()
100%|██████████| 212305/212305 [01:57<00:00, 1800.70it/s]


Total paper 212305
Top 10-keywords in 1950 {'genetics': 6, 'social science': 5, 'elementary particles': 4, 'social change': 4, 'antigens': 3, 'nucleotides': 3, 'enzyme': 3, 'history of science': 3, 'organic chemistry': 3, 'magnetic field': 3}
Top 10-keywords in 2015 {'physical sciences': 7, 'ergonomics': 5, 'injury prevention': 5, 'human factors': 5, 'suicide prevention': 5, 'occupational safety': 5, 'bioinformatics': 4, 'biomedical research': 4, 'membrane protein': 4, 'ribosome': 3}


In this study, we mainly focus on papers with at least 5 references. Therefore, we will use the *papers_filter_func* to filter out the papers without 5 references.

In [87]:
??v.last_authors_average_age

[0;31mSignature:[0m [0mv[0m[0;34m.[0m[0mlast_authors_average_age[0m[0;34m([0m[0mat_year[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
    [0;32mdef[0m [0mlast_authors_average_age[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mat_year[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""[0m
[0;34m        Returns the papers' last authors average age in a specific year[0m
[0;34m        :param at_year: year[0m
[0;34m        :return: the average last authors age at the input year[0m
[0;34m        :rtype: float[0m
[0;34m        """[0m[0;34m[0m
[0;34m[0m        [0ma[0m [0;34m=[0m [0mAuthorsListAnalyzer[0m[0;34m([0m[0mself[0m[0;34m.[0m[0mlast_authors_list[0m[0;34m([0m[0mat_year[0m[0;34m)[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0;32mreturn[0m [0ma[0m[0;34m.[0m[0mget_average_age[0m[0;34m([0m[0mat_year[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      ~/Projects/ScienceDynamics/ScienceDyn

In [99]:
v = Venue(venue_name="Science", papers_filter_func=lambda p:p.references_count < 5)
print(f"edian number of citations after five years for papers published in 2000 - {v.papers_median_citations_after_years(2000, 5, True)}")
print(f"papers' average length in 2015 - {v.papers_average_length(2015)[0]}")
print(f"Last authors average age in 2014 - {v.last_authors_average_age(2014)}")


2019-05-05 16:58:31,952 [MainThread  ] [DEBUG]  Fetching author 7697C8D5
2019-05-05 16:58:31,956 [MainThread  ] [DEBUG]  Fetching author 7FC64E58
2019-05-05 16:58:31,959 [MainThread  ] [DEBUG]  Fetching author 79912D3F
2019-05-05 16:58:31,962 [MainThread  ] [DEBUG]  Fetching author 75E906E1
2019-05-05 16:58:31,965 [MainThread  ] [DEBUG]  Fetching author 81525373
2019-05-05 16:58:31,968 [MainThread  ] [DEBUG]  Fetching author 7FA0137F
2019-05-05 16:58:31,971 [MainThread  ] [DEBUG]  Fetching author 7FE082F3
2019-05-05 16:58:31,974 [MainThread  ] [DEBUG]  Fetching author 821E6CA6
2019-05-05 16:58:31,976 [MainThread  ] [DEBUG]  Fetching author 12CE81FF
2019-05-05 16:58:31,979 [MainThread  ] [DEBUG]  Fetching author 81531803
2019-05-05 16:58:31,982 [MainThread  ] [DEBUG]  Fetching author 0581434F
2019-05-05 16:58:31,985 [MainThread  ] [DEBUG]  Fetching author 7FD32899
2019-05-05 16:58:31,988 [MainThread  ] [DEBUG]  Fetching author 7E4CE925
2019-05-05 16:58:31,992 [MainThread  ] [DEBUG]  Fet

Last authors average age in 2014 - 19.81500872600349


It is also possible to construct a venue object using a list of papers IDs. The VenueFetcher class (in venue_fetcher.py) contains a function that provides an easy way to get all papers IDs for various journals:

In [109]:
from ScienceDynamics.config.configs import AMINER_MAG_JOIN_SFRAME, SJR_SFRAME

os.path.exists("/storage/homedir/dima/.scidyn/sframes/PapersAMinerMagJoin.sframe")
SJR_SFRAME.

PosixPath('/storage/homedir/dima/.scidyn/sjr/sframes/sjr.sframe')

In [110]:
SJR_SFRAME.exists()

True

In [121]:
from ScienceDynamics.config.fetch_config import VENUE_FETCHER

VENUE_FETCHER.get_valid_venues_papers_ids_sframe(min_ref_number=5, min_journal_papers_num=100)

  """


Journal ID mapped to venue name ...,Count,Paper IDs List,Journal name
06EE1071,171,"[782E6433, 79927624, 7AA179B7, 7A3ADF48, ...",international journal of multilingualism ...
0AD28B1B,125,"[7C9B8EAC, 7CDFB67F, 7807C85A, 5BC3CAB8, ...",world futures
097C9AE0,376,"[752CB61C, 79B8217D, 7BEDF836, 7BCF313D, ...",international journal of pediatrics ...
012F7643,4967,"[7E0C0C15, 5972FE63, 7DC3D9CC, 7E7C344F, ...",ieee electron device letters ...
088C8647,749,"[7E1F905F, 81928908, 761C9AD5, 75432042, ...",journal of headache and pain ...
06A038F6,5982,"[6E099E08, 76E47B7B, 75B35FB1, 757A4709, ...",journal of molecular and cellular cardiology ...
0331F330,214,"[7736AFC7, 76ADE811, 75BE9E14, 759D0421, ...",journal of early christian studies ...
0627FF90,432,"[750ABC81, 800F235A, 802D07A6, 7FE203BE, ...",fish and fisheries
014C16AA,138,"[758A56A6, 75F3E050, 7704D695, 7B3471D2, ...",nuclear physics news
01A08C1A,1915,"[759EF730, 76525930, 7701499F, 77DC7B5C, ...",heterocycles


One of the main usages of the Venue class is to analyze how various features have changed over time. Namely, the features_dict property will return a dict, which includes the venue information and how various features have changed.

In [123]:
v.features_dict

2019-05-05 18:59:07,303 [MainThread  ] [INFO ]  Calculating venue=Science feature=papers_number
2019-05-05 18:59:07,836 [MainThread  ] [INFO ]  Calculating venue=Science feature=authors_number
2019-05-05 18:59:07,838 [MainThread  ] [DEBUG]  Fetching author 7FB622AE
2019-05-05 18:59:07,842 [MainThread  ] [DEBUG]  Fetching author 84710B54
2019-05-05 18:59:07,845 [MainThread  ] [DEBUG]  Fetching author 81C5C83B
2019-05-05 18:59:07,848 [MainThread  ] [DEBUG]  Fetching author 857966BF
2019-05-05 18:59:07,850 [MainThread  ] [DEBUG]  Fetching author 7EDE50FB
2019-05-05 18:59:07,853 [MainThread  ] [DEBUG]  Fetching author 7B1FBCCF
2019-05-05 18:59:07,855 [MainThread  ] [DEBUG]  Fetching author 7DE4A244
2019-05-05 18:59:07,857 [MainThread  ] [DEBUG]  Fetching author 80720214
2019-05-05 18:59:07,859 [MainThread  ] [DEBUG]  Fetching author 7B5E4835
2019-05-05 18:59:07,861 [MainThread  ] [DEBUG]  Fetching author 7AFCB40F
2019-05-05 18:59:07,864 [MainThread  ] [DEBUG]  Fetching author 80252995
2019

# 6. The Field of Study Class

The FieldOfStudy class is an extension of the Papers Collection class, and it supports all the Papers Collection class functionality. Moreover, the main goal of this class is to easily analyze the fields of study features over time to help understand how various field's features change over time. The class can be constructed using MAG's field of study ID. For example, if we want to analyze the "social network" field, we can use the following line.

In [128]:
# We us FieldsOfStudyFetcher (fetchers.fields_of_study,py) to get the field_id of social networks fields
import re
from ScienceDynamics.config.fetch_config import FIELDS_OF_STUDY_FETCHER

d = FIELDS_OF_STUDY_FETCHER.get_field_ids_by_name(re.compile(r".*social.*network.*", re.IGNORECASE))
print(d)

{'06D662E0': 'Social network analysis', '05242AA7': 'Social network'}


If we have two fields that contain the words "social network," we will choose to analyze the field “social network analysis” with field ID of 06D662E0. We can use the PapersCollection function to calculate various field of study features. For example, we can find the most-cited articles after five years that are published in the field in each year.

In [132]:
from ScienceDynamics.field_of_study import *
fs = FieldOfStudy(field_id='06D662E0')
fs.get_yearly_most_cited_papers_sframe(citation_after_year=5, max_publish_year=2015)

100%|██████████| 6079/6079 [00:00<00:00, 306817.81it/s]


citation_number,ids,title,venue_name,venue_type,year
9.0,740909C8,Understanding the policy landscape for climate ...,,,2015
36.0,5952A950,What's different about social media networks? a ...,Management Information Systems Quarterly ...,,2014
68.0,816DE0D4,Value network analysis and value conversion of ...,Journal of Intellectual Capital ...,,2013
176.0,803E5C6D,A Review of Facebook Research in the Social ...,Perspectives on Psychological Science ...,,2012
83.0,7A42E9EE,A realist evaluation of the role of communities ...,Implementation Science,,2011
165.0,7F4AA6EA,A matrix factorization technique with trust ...,conference on recommender systems ...,,2010
494.0,75F0BB33,Network Analysis in the Social Sciences ...,Science,,2009
267.0,7DC68258,Dynamic Spread of Happiness in a Large ...,BMJ,,2008
465.0,79FFC3D5,Why we twitter: understanding ...,knowledge discovery and data mining ...,,2007
90.0,7ECDA02D,A Graph-theoretic perspective on centra ...,Social Networks,,2006


If we select only papers with at least five references, we can calculate various field of study features using the features_dict
function:

In [133]:
fs = FieldOfStudy(field_id='06D662E0', papers_filter_func=lambda p: p.references_count < 5) # to calculate authors features only papers with at least five references need to be selected
fs.features_dict(add_field_features_over_time=True)

100%|██████████| 6079/6079 [00:00<00:00, 281792.78it/s]
2019-05-06 08:26:56,436 [MainThread  ] [DEBUG]  Fetching author 8324BCBB
2019-05-06 08:26:56,440 [MainThread  ] [DEBUG]  Fetching author 774C9AEF
2019-05-06 08:26:56,443 [MainThread  ] [DEBUG]  Fetching author 80617ABD
2019-05-06 08:26:56,446 [MainThread  ] [DEBUG]  Fetching author 806E00CA
2019-05-06 08:26:56,448 [MainThread  ] [DEBUG]  Fetching author 74F3910C
2019-05-06 08:26:56,450 [MainThread  ] [DEBUG]  Fetching author 79CC9022
2019-05-06 08:26:56,453 [MainThread  ] [DEBUG]  Fetching author 64C8CEB6
2019-05-06 08:26:56,455 [MainThread  ] [DEBUG]  Fetching author 7E015812
2019-05-06 08:26:56,458 [MainThread  ] [DEBUG]  Fetching author 644FA429
2019-05-06 08:26:56,461 [MainThread  ] [DEBUG]  Fetching author 1316204E
2019-05-06 08:26:56,464 [MainThread  ] [DEBUG]  Fetching author 7F5D0DCB
2019-05-06 08:26:56,466 [MainThread  ] [DEBUG]  Fetching author 7D40B2BC
2019-05-06 08:26:56,470 [MainThread  ] [DEBUG]  Fetching author 7AD6

{'field_id': '06D662E0',
 'name': 'Social network analysis',
 'level': 3,
 'papers_number': 2992,
 'start_year': 1975,
 'end_year': 2016,
 'features': {'papers_number': {'papers_number': {1975: 1,
    1976: 0,
    1977: 0,
    1978: 0,
    1979: 1,
    1980: 0,
    1981: 2,
    1982: 1,
    1983: 0,
    1984: 2,
    1985: 1,
    1986: 0,
    1987: 0,
    1988: 1,
    1989: 1,
    1990: 1,
    1991: 2,
    1992: 1,
    1993: 1,
    1994: 0,
    1995: 4,
    1996: 4,
    1997: 2,
    1998: 7,
    1999: 10,
    2000: 12,
    2001: 10,
    2002: 22,
    2003: 35,
    2004: 41,
    2005: 77,
    2006: 118,
    2007: 196,
    2008: 257,
    2009: 351,
    2010: 364,
    2011: 406,
    2012: 272,
    2013: 337,
    2014: 248,
    2015: 188,
    2016: 16}},
  'authors_number': {'authors_number': {1975: 3,
    1976: 0,
    1977: 0,
    1978: 0,
    1979: 2,
    1980: 0,
    1981: 5,
    1982: 1,
    1983: 0,
    1984: 2,
    1985: 2,
    1986: 0,
    1987: 0,
    1988: 1,
    1989: 3,
    1990: