# Some Starter Code for Retrieving Data Using API
In this notebook I include a basic example of 
1. retrieving data using [SemanticScholar APIs](https://api.semanticscholar.org/graph/v1)
2. store it in a pandas dataframe  
3. write it to a .csv file. 

In [2]:
import requests 
import pandas as pd 
import json 

As an example, the following API performs a search by keyword and:
1. Returns with total=639637, offset=0, next=100, and data is a list of 100 papers.
2. Each paper has paperId, year, referenceCount, citationCount, influentialCitationCount and fieldsOfStudy 

Feel free to change the strings after 'query=' and 'fields='to specify what keyword you want to search and what fields, i.e. data, you want the API to return.  Add 'limit=' to specify how many data you want it to return.
For more information on other APIs refer to [SemanticScholar APIs](https://api.semanticscholar.org/graph/v1)

In [26]:
response = requests.get('https://api.semanticscholar.org/graph/v1/paper/search?query=covid&fields=year,referenceCount,citationCount,influentialCitationCount,fieldsOfStudy&offest=0&limit=100')

In [27]:
print(json.dumps(response.json(), sort_keys = False, indent = 4))

{
    "total": 843739,
    "offset": 0,
    "next": 100,
    "data": [
        {
            "paperId": "8e787e925eeb7ad735a228b2b1e8dd6d9620be83",
            "year": 2020,
            "referenceCount": 43,
            "citationCount": 14223,
            "influentialCitationCount": 483,
            "fieldsOfStudy": [
                "Medicine"
            ]
        },
        {
            "paperId": "97881c6577c310f50fc86738c0268896b970dfa4",
            "year": 2020,
            "referenceCount": 12,
            "citationCount": 10037,
            "influentialCitationCount": 339,
            "fieldsOfStudy": [
                "Medicine"
            ]
        },
        {
            "paperId": "ca019e1e38edf9d2112ea987362da454f909ac1b",
            "year": 2020,
            "referenceCount": 4,
            "citationCount": 4737,
            "influentialCitationCount": 227,
            "fieldsOfStudy": [
                "Medicine"
            ]
        },
        {
            "paper

Note here the response.json() is a dictionary with keys 'total', 'offset','next', and 'data'. Here the value of the key 'data' is of our interest, and it is a list of dictionaries. Each dictionary stores the relevant data of a paper specified in your query. 

It seems you can only retrieve 100 data points per request. To retrieve the next 100 papers, use 'offset=100&limit =100'.

The following cell is a example of storing the retrived data into a pandas dataframe and write it into a csv file. 

In [28]:
df = pd.DataFrame(response.json()['data'])
df

Unnamed: 0,paperId,year,referenceCount,citationCount,influentialCitationCount,fieldsOfStudy
0,8e787e925eeb7ad735a228b2b1e8dd6d9620be83,2020,43,14223,483,[Medicine]
1,97881c6577c310f50fc86738c0268896b970dfa4,2020,12,10037,339,[Medicine]
2,ca019e1e38edf9d2112ea987362da454f909ac1b,2020,4,4737,227,[Medicine]
3,c273cb0fcab40abe02805c689806f59c50b7d640,2020,16,5247,134,[Medicine]
4,d23288ee99138421d6a771a14a98a9cdddd97f98,2020,5,4954,140,[Medicine]
...,...,...,...,...,...,...
95,90dae4893fea17a5bf57c3cc34bfa9f9c065845b,2020,37,1009,39,[Medicine]
96,770b665c6941f8ae96cc7ef4ec434b059108bdac,2020,23,759,52,"[Computer Science, Engineering, Physics, Medic..."
97,525065701dcc1a0c8ee64e22a832343e663da46e,2020,41,812,44,[Medicine]
98,f593e1eda77495997c0e21deb48656b57099620b,2020,29,722,43,"[Biology, Medicine]"


In [29]:
df.describe()

Unnamed: 0,year,referenceCount,citationCount,influentialCitationCount
count,100.0,100.0,100.0,100.0
mean,2020.0,52.52,2158.76,73.86
std,0.0,57.113476,1760.657408,61.115063
min,2020.0,0.0,666.0,27.0
25%,2020.0,17.75,1308.25,46.0
50%,2020.0,35.5,1725.5,57.0
75%,2020.0,62.0,2358.0,77.5
max,2020.0,353.0,14223.0,483.0


In [30]:
df.to_csv(index=False)

'paperId,year,referenceCount,citationCount,influentialCitationCount,fieldsOfStudy\n8e787e925eeb7ad735a228b2b1e8dd6d9620be83,2020,43,14223,483,[\'Medicine\']\n97881c6577c310f50fc86738c0268896b970dfa4,2020,12,10037,339,[\'Medicine\']\nca019e1e38edf9d2112ea987362da454f909ac1b,2020,4,4737,227,[\'Medicine\']\nc273cb0fcab40abe02805c689806f59c50b7d640,2020,16,5247,134,[\'Medicine\']\nd23288ee99138421d6a771a14a98a9cdddd97f98,2020,5,4954,140,[\'Medicine\']\ndd86b3551add27004b5bf3f5fb206bec9cd69c4f,2020,18,4936,128,[\'Medicine\']\n9a1210a794670f7b13add9ab9e4d038f0529ca4a,2020,45,4176,85,[\'Medicine\']\ncb2f7b692a3a6fde784aca19531e5df97d25fbfd,2020,45,3749,227,[\'Medicine\']\n00d4f4b2e38a2fbe15c672c21c522e2f95264cb0,2020,36,3705,156,[\'Medicine\']\nd1ae0a43e55e862fe2a3220b8bf0f92942617ffe,2020,36,3499,116,[\'Medicine\']\n754eef845a1fd33405661e9de4e985f020ea949a,2020,22,3525,107,[\'Medicine\']\n5973278f9a9657d8fbdb161cd2c6c33ba0bceac2,2020,111,3548,112,[\'Medicine\']\nb3a6f19fe6ef0d9f7cab2bb896e78