# Web Scraping #2

There are many ways to scrap data from the web and using APIs is one of the best way to scarp data if available because APIs follow a standardized set of rules to produce information and also produce a standardized information such as JSON or XML. A data scientist would make a request to an API via HTTP for data and the API would return data in the form of JSON or XML. Some websites still support XML, but JSON becomes a new trend these days. 

## Method

There are four ways to request information from a web server via HTTP:

- GET: is what you use when you visit a website through the address bar in the browser. 
- POST: is what you use when you fill out a form or submit information. Entering username and password to login into your Google account is using the POST method.
- PUT: is used to update an object or information from the server. For example, updating your personal information in your account is using the PUT method.
- DELETE: is used to delete an object. 

## Authentication

Some APIs require authentication to charge money per call or might offer services on a monthly subscription basis. Others use authentication to limit the number of calls per month or restrict a certain kind of information. All four methods requires a key or token to call servers.  

## Requests

One of the most common HTTP methods is GET. The GET method indicates that you’re trying to get or retrieve data from a specified resource. To make a GET request, invoke requests.get(). A Response is an object for inspecting the results of the request. A status code informs you of the status of the request. A 200 OK status means that your request was successful, whereas a 404 NOT FOUND status means that the resource you were looking for was not found. 

In [12]:
import requests
response = requests.get('https://api.github.com')
print(response.status_code)

200


In [13]:
def check_response_status(res):
    if res.status_code == 200:
        print("Success!")
    elif res.status_code == 404:
        print("Server Not Found!")

check_response_status(response)
response.raise_for_status()

Success!


To see the response’s content in bytes, you use .content in bytes or .text in string. Because the decoding of bytes to a str requires an encoding scheme, requests will need encoding.

In [16]:
print(response.content)

response.encoding = 'utf-8'
print(response.text)

b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","label_sea

 To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
 

In [18]:
import json

res_text = response.text
res_json = json.loads(res_text)
print(res_json)
print(type(res_json))

{'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.git

To send a customized GET request, the user will need to pass values through query string parameters in the URL. To do this using get(), you pass data to params. 

In [23]:
import requests

# Search GitHub's repositories for requests
response = requests.get(
    'https://api.github.com/search/repositories',
    params={'q': 'requests+language:python'},
)

json_response = response.json()
repository = json_response['items'][0]
print(f'Repository name: {repository["name"]}')
print(f'Repository description: {repository["description"]}')

Repository name: grequests
Repository description: Requests + Gevent = <3


**References**  
Beazley, D. & Jones, B. K. (2013). Python Cookbook. Sebastopol, CA: O’Reilly Media, Inc.  
Mitchell, Ryan (2015). Web Scraping with Python. Sebastopol, CA: O’Reilly Media, Inc.  
Severance. C. R. (2009). Python for Everybody. http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf  
https://www.w3schools.com/python/default.asp  
https://realpython.com/python-requests/#the-get-request  


