# Getting Data - Part 3


Information comes from Ch 9 of Data Science from Scratch, 2nd Edition by Joel Grus.  This book is available for free through the library's connection to O'reilly's learning platform.

##  What we have learned so far?

We have looked at **increasing levels of abstraction**. 

**Read and write to stdin and stdout**  

```
import sys

# for every line read in from stdin
for line in sys.stdin:
    sys.stdout.write(line)
```

**Read and write from/to a file**

```
f = open(‘testfile.txt’, ‘r’)

fw = open(‘testfilewrite.txt’, ‘w’)

fw.close()
```

**Read and write from/to a file**

```
f = open(‘testfile.txt’, ‘r’)

f.readlines()

f.read()

for line in f:
    print(line)
```

**Read and write from/to delimited files**

```
import csv

f = open(‘tab_file.txt’, ‘rb’)
reader =csv.reader(f,delimiter=’\t’)
for row in reader:

f = open(‘colon_file.txt’, ‘rb’)
reader = csv.DictReader(f, delimiter=’:’)
```

**Read and write from/to delimited files**

```
import csv

f = open('data/comma_test.txt','wb')

writer = csv.writer(f,delimiter=',')

writer.writerow([time.strftime("%m/%d/%Y"),stock,price])
```

**Read and write data with `pandas`**

```
read_csv

read_table

read_fwf

read_clipboard

to_csv('filename')
```

**Web scraping**   
Get data using `requests` and parse it using `BeautifulSoup` and some regular expressions / string manipulation.

```
from bs4 import BeautifulSoup
import requests 

url = "http://example.com"
html = requests.get(url).text 
soup = BeautifulSoup(html, 'html5lib')
```

### Today, you will learn about: 

* The JSON format, how to parse it 
* Use of APIs to get data from websites and services 

Many websites and web services provide application programming interfaces (APIs), which allow you to explicitly request data in a structured format. 

*When APIs are available, they should be used as opposed to scrapping information.*


In [1]:
import requests
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl 
%matplotlib inline  

## JSON

HTTP is a protocol for transferring text, the data you request through a web API needs to be serialized into a string format. Often this serialization uses JavaScript Object Notation (JSON). 



### loads( )
We can parse JSON using Python’s `json` module. In particular, we will use its `loads` function, which deserializes a string representing a JSON object into a Python object.

In [2]:
serialized = """{ "title" : "Data Science Book",
                  "author" : "Joel Grus",
                  "publicationYear" : 2014,
                  "topics" : [ "data", "science", "data science"] }"""

# parse the JSON to create a Python dict
deserialized = json.loads(serialized)
deserialized

{'title': 'Data Science Book',
 'author': 'Joel Grus',
 'publicationYear': 2014,
 'topics': ['data', 'science', 'data science']}

We can then use the deserialized object like a Python dict object to find information.  For example, does the book cover the topic of "data science":

In [3]:
type(serialized)

str

In [4]:
type(deserialized)

dict

In [5]:
if "data science" in deserialized["topics"]:
    print("Yes")
else: 
    print("No")

Yes


In [6]:
deserialized.keys()

dict_keys(['title', 'author', 'publicationYear', 'topics'])

### dumps( )

The `dumps` function will take a Python object (e.g., dict) and serializes it into a JSON formatted string. 

In [7]:
data = {
   'name' : 'ACME',
   'shares' : 100,
   'price' : 542.23
}
json_obj = json.dumps(data)
json_obj

'{"name": "ACME", "shares": 100, "price": 542.23}'

In [8]:
type(data)

dict

In [9]:
type(json_obj)

str

In [10]:
data2 = json.loads(json_obj)
data2

{'name': 'ACME', 'shares': 100, 'price': 542.23}

In [11]:
type(data2)

dict

In [12]:
help(json.dumps)

Help on function dumps in module json:

dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
    Serialize ``obj`` to a JSON formatted ``str``.
    
    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
    instead of raising a ``TypeError``.
    
    If ``ensure_ascii`` is false, then the return value can contain non-ASCII
    characters if they appear in strings contained in ``obj``. Otherwise, all
    such characters are escaped in JSON strings.
    
    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``RecursionError`` (or worse).
    
    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
    stri

### Example  
*Ref: https://pythonspot.com/en/json-encoding-and-decoding-with-python/*

In [13]:
# Convert JSON to Python Object, then iterate
array = '{"drinks": ["coffee", "tea", "water"]}'
data = json.loads(array)
 
for element in data['drinks']:
    print(element)

coffee
tea
water


In [14]:
json_data = '{"name": "Brian", "city": "Seattle"}'
python_obj = json.loads(json_data)
print(python_obj["name"])
print(python_obj["city"])

Brian
Seattle


In [15]:
obj = {
    "persons": [
        {
            "city": "Seattle", 
            "name": "Brian"
        }, 
        {
            "city": "Amsterdam", 
            "name": "David"
        }
    ]
}
obj

{'persons': [{'city': 'Seattle', 'name': 'Brian'},
  {'city': 'Amsterdam', 'name': 'David'}]}

In [16]:
obj['persons']

[{'city': 'Seattle', 'name': 'Brian'}, {'city': 'Amsterdam', 'name': 'David'}]

In [17]:
obj['persons']['city']

TypeError: list indices must be integers or slices, not str

In [19]:
obj['persons'][0]['city']

'Seattle'

In [20]:
obj.keys()

dict_keys(['persons'])

In [21]:
obj['persons'][0].keys()

dict_keys(['city', 'name'])

### Hierarchical Data  
*Example from Data100*

A lot of structured data isn't in CSV format, but in HTML, XML, JSON, YAML, etc. JSON might have a structure that Pandas can't read directly.

Here's an example: a group of people collected information about US congressional legislators in YAML format.

https://github.com/unitedstates/congress-legislators

Here's one of the data files:

https://github.com/unitedstates/congress-legislators/blob/master/legislators-current.yaml

YAML is a data serialization language commonly used in configuration, for more information see https://en.wikipedia.org/wiki/YAML

In [22]:
import requests
from pathlib import Path

legislators_path = 'legislators-current.yaml'
base_url = 'https://theunitedstates.io/congress-legislators/'

def download(url, path):
    """Download the contents of a URL to a local file."""
    path = Path(path) # If path was a string, now it's a Path
    if not path.exists():
        print('Downloading...', end=' ')
        resp = requests.get(url)
        with path.open('wb') as f:
            f.write(resp.content)
        print('Done!')
        
download(base_url + legislators_path, legislators_path)

Downloading... Done!


The code above will download the YAML file storing current legislators information and store it locally.  

Then we can just open the local file to look at the information.  

Note, we can also see the file in the files directory and look at it. 

In [23]:
import yaml

legislators = yaml.load(open(legislators_path), Loader=yaml.SafeLoader)
len(legislators)

541

In [24]:
type(legislators)

list

In [25]:
legislators[0]

{'id': {'bioguide': 'B000944',
  'thomas': '00136',
  'lis': 'S307',
  'govtrack': 400050,
  'opensecrets': 'N00003535',
  'votesmart': 27018,
  'fec': ['H2OH13033', 'S6OH00163'],
  'cspan': 5051,
  'wikipedia': 'Sherrod Brown',
  'house_history': 9996,
  'ballotpedia': 'Sherrod Brown',
  'maplight': 168,
  'icpsr': 29389,
  'wikidata': 'Q381880',
  'google_entity_id': 'kg:/m/034s80'},
 'name': {'first': 'Sherrod',
  'last': 'Brown',
  'official_full': 'Sherrod Brown'},
 'bio': {'birthday': '1952-11-09', 'gender': 'M'},
 'terms': [{'type': 'rep',
   'start': '1993-01-05',
   'end': '1995-01-03',
   'state': 'OH',
   'district': 13,
   'party': 'Democrat'},
  {'type': 'rep',
   'start': '1995-01-04',
   'end': '1997-01-03',
   'state': 'OH',
   'district': 13,
   'party': 'Democrat'},
  {'type': 'rep',
   'start': '1997-01-07',
   'end': '1999-01-03',
   'state': 'OH',
   'district': 13,
   'party': 'Democrat'},
  {'type': 'rep',
   'start': '1999-01-06',
   'end': '2001-01-03',
   'sta

In [26]:
x = legislators[0]
x.keys()

dict_keys(['id', 'name', 'bio', 'terms'])

In [27]:
x['id']

{'bioguide': 'B000944',
 'thomas': '00136',
 'lis': 'S307',
 'govtrack': 400050,
 'opensecrets': 'N00003535',
 'votesmart': 27018,
 'fec': ['H2OH13033', 'S6OH00163'],
 'cspan': 5051,
 'wikipedia': 'Sherrod Brown',
 'house_history': 9996,
 'ballotpedia': 'Sherrod Brown',
 'maplight': 168,
 'icpsr': 29389,
 'wikidata': 'Q381880',
 'google_entity_id': 'kg:/m/034s80'}

In [28]:
x['id']['fec']

['H2OH13033', 'S6OH00163']

In [29]:
x['id']['fec'][0]

'H2OH13033'

In [30]:
x['name']

{'first': 'Sherrod', 'last': 'Brown', 'official_full': 'Sherrod Brown'}

In [31]:
x['bio']

{'birthday': '1952-11-09', 'gender': 'M'}

Let's create a function to select out the legislator's birthday as a function. 

In [32]:
from datetime import datetime

def to_date(s):
    return datetime.strptime(s, '%Y-%m-%d')

#to_date('2020-10-06')
to_date(x['bio']['birthday'])

datetime.datetime(1952, 11, 9, 0, 0)

We can create a data frame consisting of the legislator's id, first name, last name and birthday. 

In [33]:
leg_df = pd.DataFrame(
    columns=['id', 'first', 'last', 'birthday'],
    data=[[l['id']['bioguide'], 
           l['name']['first'],
           l['name']['last'],
           to_date(l['bio']['birthday'])] for l in legislators])
leg_df.head()

Unnamed: 0,id,first,last,birthday
0,B000944,Sherrod,Brown,1952-11-09
1,C000127,Maria,Cantwell,1958-10-13
2,C000141,Benjamin,Cardin,1943-10-05
3,C000174,Thomas,Carper,1947-01-23
4,C001070,Robert,Casey,1960-04-13


In [34]:
leg_df.dtypes

id                  object
first               object
last                object
birthday    datetime64[ns]
dtype: object

We could also add their age. 

In [35]:
datetime.now() - leg_df.loc[0, 'birthday']

Timedelta('25902 days 11:41:00.145091')

### Aside:  Lambda functions

A lambda function is a small anonymous function that can take any number of arguments, but can only have one expression. 

It has the following syntax: 

`lambda` *arguments* : *expression*

The expression is executed and the result returned.

#### Example   
A lambda function that adds 10 to the number passed in as an argument.

In [36]:
# Example 
x = lambda a: a+10
print(x(5))

15


#### Example   
A lambda function that takes two inputs and multiplies them together.

In [37]:
# Example 
x = lambda a, b : a*b
print(x(5,6))

30


#### Example  
Apply the lambda function to an argument by surrounding the function and argument in parentheses:

In [38]:
(lambda x: x + 1)(2)

3

#### Example  
lambda function are often used with other methods in Python such as `apply`, `filter`, `map`, `sorted`, etc.

In [39]:
ids = ['id1', 'id2', 'id30', 'id3', 'id22', 'id100']
print(sorted(ids)) # Lexicographic sort

['id1', 'id100', 'id2', 'id22', 'id3', 'id30']


In [40]:
sorted_ids = sorted(ids, key=lambda x: int(x[2:])) # Integer sort
print(sorted_ids)

['id1', 'id2', 'id3', 'id22', 'id30', 'id100']


#### Example   
Here is an example using the `map` function, which expects a function object and any number of iterables, such as a list, dictionary, etc.   `map` executes the function_object for each element in the sequence and returns a list of the elements modified by the function object. 

In [41]:
def multiply2(x): 
    return x * 2 

x = map(multiply2, [1, 2, 3, 4])
print(x)

<map object at 0x7fcd615a1e10>


In [42]:
def print_iterator(it):
    for x in it:
        print(x, end=' ')
    print('')

print_iterator(x)

2 4 6 8 


In [43]:
mp_it = map(lambda x : x * 2, [1, 2, 3, 4])
print_iterator(mp_it)

2 4 6 8 


In [44]:
list_numbers = [1, 2, 3, 4]
tuple_numbers = (5, 6, 7, 8)
map_iterator = map(lambda x, y: x * y, list_numbers, tuple_numbers)
print_iterator(map_iterator)

5 12 21 32 


## APIs 

Most APIs these days require you to first authenticate yourself in order to use them. This creates a lot of extra boilerplate that muddies up our exposition. 

### A Simple API example 

Let's try using the [OpenNotify API](http://open-notify.org/) that serves NASA data. 

Let's use the GET request to see what data we can get in response. Get request takes the URL, in our case the url to Open Notify. Lets make a request and print what is returned. When we just make a request to a url without the right endpoint, we get the html content as response. **End points** are locations of the resources.

In [45]:
request = requests.get('http://api.open-notify.org')
print(request.text)


<!doctype html>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <link rel="stylesheet" type="text/css" href="/bootstrap.min.css" />
  <link rel="stylesheet" type="text/css" href="/style.css" />

  <title>open-notify.org APIs</title>
</head>
<body>

  <div id="wrapper">
  <div id="content">
  <div id="header">
    <div class="container">
      <div class="navbar">
        <a class="brand" href="http://www.open-notify.org">Open Notify</a>
        <h2>API Server</h2>
      </div>
    </div>
  </div>

  <div class="container">
    <div class="page-header">
      <h3>Current APIs:</h3>
    </div>
    <div class="row">
      <div class="span12">

        <h4>JSON</h4>
        <p>
          <table class="table table-bordered table-striped">
            <thead>
              <tr>
                <th>Name</th>
                <th>Description</th>
                <th>Documentation</th>
              </tr>
            </thead> 

            
            <t

In [46]:
print(request.status_code)

200


Let's try to request something from the API for an end point that doesn't exist. 

In [47]:
request2 = requests.get('http://api.open-notify.org/fake-endpoint')
print(request2.status_code)

404


We get a 404 Error.  

Now let's try for a real end point.  For example, we can get the current location to the International Space Station, with the endpoint `/iss-now.json`.  Alternatively, `/iss-pass.json` returns the time at which the space station passes overhead. 

In [48]:
issLoc = requests.get('http://api.open-notify.org/iss-now.json')
print(issLoc.text)

obj = json.loads(issLoc.text)

print(obj['timestamp'])
print(obj['iss_position']['longitude'], obj['iss_position']['latitude'])

{"timestamp": 1696954321, "iss_position": {"longitude": "-106.7702", "latitude": "38.3170"}, "message": "success"}
1696954321
-106.7702 38.3170


The API description tells use how to request and what the expected output will be. 

In this case the data returned has the following format: 

```
{
  "message": "success", 
  "timestamp": UNIX_TIME_STAMP, 
  "iss_position": {
    "latitude": CURRENT_LATITUDE, 
    "longitude": CURRENT_LONGITUDE
  }
}
```

Let's now convert the `UNIX_TIME_STAMP` to a time that is readable using the `datetime` module. 


In [49]:
print(datetime.utcfromtimestamp(obj['timestamp']).strftime('%Y-%m-%d %H:%M:%S'))

2023-10-10 16:12:01


Another endpoint available gets information on the people in space. 

In [50]:
people = requests.get('http://api.open-notify.org/astros.json')
print(people.text)

{"people": [{"craft": "Tiangong", "name": "Jing Haiping"}, {"craft": "Tiangong", "name": "Gui Haichow"}, {"craft": "Tiangong", "name": "Zhu Yangzhu"}, {"craft": "ISS", "name": "Jasmin Moghbeli"}, {"craft": "ISS", "name": "Andreas Mogensen"}, {"craft": "ISS", "name": "Satoshi Furukawa"}, {"craft": "ISS", "name": "Konstantin Borisov"}, {"craft": "ISS", "name": "Oleg Kononenko"}, {"craft": "ISS", "name": "Nikolai Chub"}, {"craft": "ISS", "name": "Loral O'Hara"}], "number": 10, "message": "success"}


In [51]:
people_json  = people.json()
print(people_json)

{'people': [{'craft': 'Tiangong', 'name': 'Jing Haiping'}, {'craft': 'Tiangong', 'name': 'Gui Haichow'}, {'craft': 'Tiangong', 'name': 'Zhu Yangzhu'}, {'craft': 'ISS', 'name': 'Jasmin Moghbeli'}, {'craft': 'ISS', 'name': 'Andreas Mogensen'}, {'craft': 'ISS', 'name': 'Satoshi Furukawa'}, {'craft': 'ISS', 'name': 'Konstantin Borisov'}, {'craft': 'ISS', 'name': 'Oleg Kononenko'}, {'craft': 'ISS', 'name': 'Nikolai Chub'}, {'craft': 'ISS', 'name': "Loral O'Hara"}], 'number': 10, 'message': 'success'}


We can make use of the json `dumps()` function optional arguments to *pretty print* the JSON array elements and object members.

In [52]:
people_json_obj = json.dumps(people.json(), indent=2)
print(people_json_obj)

{
  "people": [
    {
      "craft": "Tiangong",
      "name": "Jing Haiping"
    },
    {
      "craft": "Tiangong",
      "name": "Gui Haichow"
    },
    {
      "craft": "Tiangong",
      "name": "Zhu Yangzhu"
    },
    {
      "craft": "ISS",
      "name": "Jasmin Moghbeli"
    },
    {
      "craft": "ISS",
      "name": "Andreas Mogensen"
    },
    {
      "craft": "ISS",
      "name": "Satoshi Furukawa"
    },
    {
      "craft": "ISS",
      "name": "Konstantin Borisov"
    },
    {
      "craft": "ISS",
      "name": "Oleg Kononenko"
    },
    {
      "craft": "ISS",
      "name": "Nikolai Chub"
    },
    {
      "craft": "ISS",
      "name": "Loral O'Hara"
    }
  ],
  "number": 10,
  "message": "success"
}


In [53]:
#To print the number of people in space
print("Number of people in space:",people_json['number'])
#To print the names of people in space using a for loop
for p in people_json['people']:
    print(p['name'])

Number of people in space: 10
Jing Haiping
Gui Haichow
Zhu Yangzhu
Jasmin Moghbeli
Andreas Mogensen
Satoshi Furukawa
Konstantin Borisov
Oleg Kononenko
Nikolai Chub
Loral O'Hara


### Github API example

We’ll take a look at [GitHub’s API](https://developer.github.com/v3/), with which you can do some simple things unauthenticated. 

Here we can look at all the repository's for user joelgrus, `Data Science from Scratch` author.

The API documentation specifies the form of the query: 
`GET /users/:username/repos`

In [54]:
resp = requests.get('https://api.github.com/users/joelgrus/repos')
resp

<Response [200]>

In [55]:
repos = json.loads(resp.text)
repos

[{'id': 112873601,
  'node_id': 'MDEwOlJlcG9zaXRvcnkxMTI4NzM2MDE=',
  'name': 'advent2017',
  'full_name': 'joelgrus/advent2017',
  'private': False,
  'owner': {'login': 'joelgrus',
   'id': 1308313,
   'node_id': 'MDQ6VXNlcjEzMDgzMTM=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/1308313?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/joelgrus',
   'html_url': 'https://github.com/joelgrus',
   'followers_url': 'https://api.github.com/users/joelgrus/followers',
   'following_url': 'https://api.github.com/users/joelgrus/following{/other_user}',
   'gists_url': 'https://api.github.com/users/joelgrus/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/joelgrus/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/joelgrus/subscriptions',
   'organizations_url': 'https://api.github.com/users/joelgrus/orgs',
   'repos_url': 'https://api.github.com/users/joelgrus/repos',
   'events_url': 'https://api.github.com/users/j

To pretty print the information we again make use of the `dumps()` function with the indent argument. 

In [56]:
repos_obj = json.dumps(repos, indent=2) # alternatively call json.dumps(repos, indent=2)
print(repos_obj)

[
  {
    "id": 112873601,
    "node_id": "MDEwOlJlcG9zaXRvcnkxMTI4NzM2MDE=",
    "name": "advent2017",
    "full_name": "joelgrus/advent2017",
    "private": false,
    "owner": {
      "login": "joelgrus",
      "id": 1308313,
      "node_id": "MDQ6VXNlcjEzMDgzMTM=",
      "avatar_url": "https://avatars.githubusercontent.com/u/1308313?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/joelgrus",
      "html_url": "https://github.com/joelgrus",
      "followers_url": "https://api.github.com/users/joelgrus/followers",
      "following_url": "https://api.github.com/users/joelgrus/following{/other_user}",
      "gists_url": "https://api.github.com/users/joelgrus/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/joelgrus/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/joelgrus/subscriptions",
      "organizations_url": "https://api.github.com/users/joelgrus/orgs",
      "repos_url": "https://api.github.com/users/j

In [57]:
len(repos)

30

At this point `repos` is a list of Python `dicts`, each representing a public repository in Joel Grus's GitHub account. (Feel free to substitute your username and get your GitHub repository data instead.)


Let's look at the languages in the 5 most recently created repositories. 

In [74]:
last_5_repos = sorted(repos, key = lambda r: r['created_at'],reverse = True)[:5]

In [75]:
last_5_repos

[{'id': 616173441,
  'node_id': 'R_kgDOJLoPgQ',
  'name': 'bracket-filler',
  'full_name': 'joelgrus/bracket-filler',
  'private': False,
  'owner': {'login': 'joelgrus',
   'id': 1308313,
   'node_id': 'MDQ6VXNlcjEzMDgzMTM=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/1308313?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/joelgrus',
   'html_url': 'https://github.com/joelgrus',
   'followers_url': 'https://api.github.com/users/joelgrus/followers',
   'following_url': 'https://api.github.com/users/joelgrus/following{/other_user}',
   'gists_url': 'https://api.github.com/users/joelgrus/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/joelgrus/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/joelgrus/subscriptions',
   'organizations_url': 'https://api.github.com/users/joelgrus/orgs',
   'repos_url': 'https://api.github.com/users/joelgrus/repos',
   'events_url': 'https://api.github.com/users/joelgrus/even

In [76]:
last_5_langs = [r['language'] for r in last_5_repos]
last_5_langs

['Svelte', 'Python', 'Python', 'Python', 'Python']

We could also specify certain parameters, e.g., whether you are a owner or member. 

In [58]:
resp = requests.get('https://api.github.com/users/joelgrus/repos?type=member')
repos2 = json.loads(resp.text)
print(len(repos2))
repos2

1


[{'id': 30509483,
  'node_id': 'MDEwOlJlcG9zaXRvcnkzMDUwOTQ4Mw==',
  'name': 'dynet',
  'full_name': 'clab/dynet',
  'private': False,
  'owner': {'login': 'clab',
   'id': 2374376,
   'node_id': 'MDEyOk9yZ2FuaXphdGlvbjIzNzQzNzY=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/2374376?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/clab',
   'html_url': 'https://github.com/clab',
   'followers_url': 'https://api.github.com/users/clab/followers',
   'following_url': 'https://api.github.com/users/clab/following{/other_user}',
   'gists_url': 'https://api.github.com/users/clab/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/clab/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/clab/subscriptions',
   'organizations_url': 'https://api.github.com/users/clab/orgs',
   'repos_url': 'https://api.github.com/users/clab/repos',
   'events_url': 'https://api.github.com/users/clab/events{/privacy}',
   'received_events_

If we look at the url: 
https://api.github.com/users/joelgrus/repos?type=member

`repos` is the endpoint, we use the `?` symbol to apply the constraints or specify the parameters. 

We could do the same with the following code. 

In [59]:
params = {"type": "member"}
resp = requests.get('https://api.github.com/users/joelgrus/repos', params)
repos2 = json.loads(resp.text)
repos2

[{'id': 30509483,
  'node_id': 'MDEwOlJlcG9zaXRvcnkzMDUwOTQ4Mw==',
  'name': 'dynet',
  'full_name': 'clab/dynet',
  'private': False,
  'owner': {'login': 'clab',
   'id': 2374376,
   'node_id': 'MDEyOk9yZ2FuaXphdGlvbjIzNzQzNzY=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/2374376?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/clab',
   'html_url': 'https://github.com/clab',
   'followers_url': 'https://api.github.com/users/clab/followers',
   'following_url': 'https://api.github.com/users/clab/following{/other_user}',
   'gists_url': 'https://api.github.com/users/clab/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/clab/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/clab/subscriptions',
   'organizations_url': 'https://api.github.com/users/clab/orgs',
   'repos_url': 'https://api.github.com/users/clab/repos',
   'events_url': 'https://api.github.com/users/clab/events{/privacy}',
   'received_events_

#### Other endpoints 

Many other pieces of information can be received using the API, looking at other end points. 

For instance, we can download a single repository. [Github API documentation](https://docs.github.com/en/free-pro-team@latest/rest/reference/repos#get-a-repository)





In [60]:
resp = requests.get('https://api.github.com/repos/octocat/hello-world')
repos3 = json.loads(resp.text)
repos3

{'id': 1296269,
 'node_id': 'MDEwOlJlcG9zaXRvcnkxMjk2MjY5',
 'name': 'Hello-World',
 'full_name': 'octocat/Hello-World',
 'private': False,
 'owner': {'login': 'octocat',
  'id': 583231,
  'node_id': 'MDQ6VXNlcjU4MzIzMQ==',
  'avatar_url': 'https://avatars.githubusercontent.com/u/583231?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/octocat',
  'html_url': 'https://github.com/octocat',
  'followers_url': 'https://api.github.com/users/octocat/followers',
  'following_url': 'https://api.github.com/users/octocat/following{/other_user}',
  'gists_url': 'https://api.github.com/users/octocat/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/octocat/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/octocat/subscriptions',
  'organizations_url': 'https://api.github.com/users/octocat/orgs',
  'repos_url': 'https://api.github.com/users/octocat/repos',
  'events_url': 'https://api.github.com/users/octocat/events{/privacy}',
  'received

### Another example with GitHub API

Here we are looking at the issues for a given repository: 
https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-repository-issues

We are also making use of the `pandas` `read_json` command to directly read the JSON output from the API into a dataframe. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html

In [61]:
df = pd.read_json('https://api.github.com/repos/pydata/pandas/issues?per_page=5')

In [62]:
df[['created_at', 'title', 'body', 'comments']]

Unnamed: 0,created_at,title,body,comments
0,2023-10-10 14:52:37+00:00,DOC: Add Hail to the out-of-core community list,- [x] closes #55476\r\n- [x] ~~[Tests added an...,0
1,2023-10-10 14:51:14+00:00,DOC: Add Hail to the out-of-core community list,### Pandas version checks\n\n- [X] I have chec...,0
2,2023-10-10 13:45:34+00:00,CI: Add normal testing for Python 3.12,- [ ] closes #xxxx (Replace xxxx with the GitH...,0
3,2023-10-10 11:36:42+00:00,REGR: sort index with sliced MultiIndex,- [x] closes #55379\r\n- [x] [Tests added and ...,0
4,2023-10-10 09:56:53+00:00,EHN: read_spss stores the metadata in df.attrs,- [x] closes #54264\r\n- [x] [Tests added and ...,0


In [63]:
res = df[['created_at', 'title', 'body', 'comments']].head()
res.to_json()

'{"created_at":{"0":1696949557000,"1":1696949474000,"2":1696945534000,"3":1696937802000,"4":1696931813000},"title":{"0":"DOC: Add Hail to the out-of-core community list","1":"DOC: Add Hail to the out-of-core community list","2":"CI: Add normal testing for Python 3.12","3":"REGR: sort index with sliced MultiIndex","4":"EHN: read_spss stores the metadata in df.attrs"},"body":{"0":"- [x] closes #55476\\r\\n- [x] ~~[Tests added and passed](https:\\/\\/pandas.pydata.org\\/pandas-docs\\/dev\\/development\\/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature~~ irrelevant\\r\\n- [x] ~~All [code checks passed](https:\\/\\/pandas.pydata.org\\/pandas-docs\\/dev\\/development\\/contributing_codebase.html#pre-commit).~~ irrelevant\\r\\n- [x] ~~Added [type annotations](https:\\/\\/pandas.pydata.org\\/pandas-docs\\/dev\\/development\\/contributing_codebase.html#type-hints) to new arguments\\/methods\\/functions.~~ irrelevant\\r\\n- [x] ~~Added an entry in the latest `doc

In [64]:
df.keys()

Index(['url', 'repository_url', 'labels_url', 'comments_url', 'events_url',
       'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels',
       'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments',
       'created_at', 'updated_at', 'closed_at', 'author_association',
       'active_lock_reason', 'draft', 'pull_request', 'body', 'reactions',
       'timeline_url', 'performed_via_github_app', 'state_reason'],
      dtype='object')

#### Finding APIs

If you need data from a specific site, look for a developers or API section of the site for details, and try searching the Web for “python __ api” to find a library.

If you’re looking for lists of APIs that have Python wrappers, two directories are at [Python for Beginners](http://bit.ly/1L35VOR).
