# Web Data Extraction - Part I

__Web API:__ Application Programming Interface for Web Applications (Client --> Server)

__REST:__ Representational state transfer

__HTTP library for Python:__ [Requests](https://2.python-requests.org/en/latest/user/quickstart/)

In [2]:
# Import libraries
import pandas as pd
import requests   #!conda install requests

# Pandas display options
pd.set_option('display.max_columns', None)

---

In [3]:
response = requests.get('https://jsonplaceholder.typicode.com/todos')#get es un metodo de la libreria request. Nos devuleve un objeto. 
print(type(response))

<class 'requests.models.Response'>


### HTTP Response

[Boring Reference](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

[Funny Reference](https://http.cat/)

In [4]:
status = response.status_code#sacamnos el status del atributo code. "200"ok-"400"mal.
status

200

In [5]:
content = response.content#atributo de contenido.Los atributos no tienen parentesis y los metodos si. 
content #bytes son metodos en bruto

b'[\n  {\n    "userId": 1,\n    "id": 1,\n    "title": "delectus aut autem",\n    "completed": false\n  },\n  {\n    "userId": 1,\n    "id": 2,\n    "title": "quis ut nam facilis et officia qui",\n    "completed": false\n  },\n  {\n    "userId": 1,\n    "id": 3,\n    "title": "fugiat veniam minus",\n    "completed": false\n  },\n  {\n    "userId": 1,\n    "id": 4,\n    "title": "et porro tempora",\n    "completed": true\n  },\n  {\n    "userId": 1,\n    "id": 5,\n    "title": "laboriosam mollitia et enim quasi adipisci quia provident illum",\n    "completed": false\n  },\n  {\n    "userId": 1,\n    "id": 6,\n    "title": "qui ullam ratione quibusdam voluptatem quia omnis",\n    "completed": false\n  },\n  {\n    "userId": 1,\n    "id": 7,\n    "title": "illo expedita consequatur quia in",\n    "completed": false\n  },\n  {\n    "userId": 1,\n    "id": 8,\n    "title": "quo adipisci enim quam ut ab",\n    "completed": true\n  },\n  {\n    "userId": 1,\n    "id": 9,\n    "title": "molest

In [6]:
json_data = response.json()
print(type(json_data))#es una lista de diccionarios. 
print(len(json_data))#200 diccionarios
print(type(json_data[0]))#diccionario
json_data[0].keys()#las keys de los diccionarios. 

<class 'list'>
200
<class 'dict'>


dict_keys(['userId', 'id', 'title', 'completed'])

In [7]:
# Other attributes/methods: .headers, .links, .cookies

headers = response.headers#me devuelve metadatos. 
headers

{'Date': 'Sat, 11 Dec 2021 09:13:41 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-powered-by': 'Express', 'x-ratelimit-limit': '1000', 'x-ratelimit-remaining': '999', 'x-ratelimit-reset': '1637712008', 'vary': 'Origin, Accept-Encoding', 'access-control-allow-credentials': 'true', 'cache-control': 'max-age=43200', 'pragma': 'no-cache', 'expires': '-1', 'x-content-type-options': 'nosniff', 'etag': 'W/"5ef7-4Ad6/n39KWY9q6Ykm/ULNQ2F5IM"', 'via': '1.1 vegur', 'CF-Cache-Status': 'HIT', 'Age': '24078', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=tj3%2B7SNVNyiItp2RyiFdQCvWSY81SDfombXN8nOHmr15CvQgVYecjW0Gqv8AhLlNqd57JMU8pNnEW9yAWm5GDChYr3hHPWn3aRNB9jdFLO2G9rg1dARxfuRELLfT9L29a867G%2Fr0dZSoqkxS2eeM"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"

In [8]:
# But we like DataFrames

df = pd.DataFrame(json_data)#cada fila representa un diccionario. 
df

Unnamed: 0,userId,id,title,completed
0,1,1,delectus aut autem,False
1,1,2,quis ut nam facilis et officia qui,False
2,1,3,fugiat veniam minus,False
3,1,4,et porro tempora,True
4,1,5,laboriosam mollitia et enim quasi adipisci qui...,False
...,...,...,...,...
195,10,196,consequuntur aut ut fugit similique,True
196,10,197,dignissimos quo nobis earum saepe,True
197,10,198,quis eius est sint explicabo,True
198,10,199,numquam repellendus a magnam,True


---

In [None]:
# Main End-Point
end_point = 'https://api.github.com/'#end points
# Body
par1 = 'repos/'#key
par2 = 'ih-datapt-mad/'#owner
par3 = 'dataptmad1121_labs/'#repo
par4 = 'pulls'   # https://docs.github.com/en/rest/reference/pulls #pulls
par5 = '?state=open'#parameters #search
#par6 = '&per_page=100' #parameter #state


In [None]:
pulls_response = requests.get(end_point + par1 + par2 + par3 + par4) #hago un get en github. 
pulls_json = pulls_response.json()#lo convierto a json. 
print(type(pulls_json))
print(len(pulls_json))
print(pulls_json[0].keys())

In [None]:
df_pulls = pd.DataFrame(pulls_json)
df_pulls.head()

---

In [1]:
# The best way is with a method called: .json_normalize #cuando tengo mucho diccionarios anidados, esto lo desanida. 

df_flat = pd.json_normalize(pulls_json)#.T.reset_index(drop=True)
df_flat

NameError: name 'pd' is not defined

---

__Some useful tools:__

- https://curlconverter.com/

- https://www.postman.com/

__Some REST API to practice with:__

- https://jsonplaceholder.typicode.com

- https://docs.github.com/en/rest

- https://github.com/Kaggle/kaggle-api

- https://polygon.io/

- https://coinmarketcap.com/api/documentation/v1/#section/Quick-Start-Guide

- https://datos.gob.es/es/documentacion/guia-practica-para-la-publicacion-de-datos-abiertos-usando-apis