### APIs and webscraping

- __Exercise 1__: Let's use REST API httpbin to get our user agent (User-Agent request header is a characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent)

1. Go to __[https://httpbin.org/](https://httpbin.org/)__
2. Expand section Request Inspection
3. Expand section User agent
4. There, we are told we need to perform a get request to URL: __[https://httpbin.org/user-agent](https://httpbin.org/user-agent)__

In [1]:
# We import requests
import requests
import json

# We perform a get request to that URL (with no parameters)
response: requests.Response = requests.get("https://httpbin.org/user-agent")

# We get the response status to see if everything went well
if response.status_code in [200, 201]:
    # We extract the response
    json_response: str = response.text
    # We convert it to dict
    dict_response: dict = json.loads(json_response)
    print(dict_response)
else:
    raise requests.RequestException("Status code not OK")

{'user-agent': 'python-requests/2.31.0'}


- __Exercise 2__: Let's use REST API [https://reqres.in/](https://reqres.in/) to delete a user

1. Go to __[https://reqres.in/](https://reqres.in/)__
2. Scroll till the instructions to delete users are given
3. Perform a delete request to that URL

In [9]:
# The URL of a user is https://reqres.in/api/users/<_id_>, let's delete user 11 for instance
url: str = "https://reqres.in/api/users/11"

# Let's execute the request
response: requests.Response = requests.delete(url)

# Let's check status code
if response.status_code in [200, 201, 204]:
    # Status code = 204 means correct but no extra information
    print("Success")
else:
    print(response.status_code)

Success


- __Exercise 3__: Let's build an application using the same REST API to update one of its users. We will ask using the prompt the user_id to update, the new name and the new job

1. Go to https://reqres.in/
2. Scroll till the instructions to update users are given
3. Perform a put request to that URL with the data needed (instead of a Patch request)

In [17]:
# We create a variable that will contain the base URL we have to perform our request to
url: str = "https://reqres.in/api/users/"

# We ask which user we want to update and their new name and new job
user_id: str = input("Which user do you want to update? Provide their ID: ")
new_name: str = input("What's the new name of this user? ")
new_job: str = input("What's the new job of this user? ")

# We use the new name and new job to create the payload the request will receive
update_data: dict = {"name": new_name, "job": new_job}
# We convert it to JSON format
update_data: str = json.dumps(update_data)

# We perform the PUT request to the base URL plus the ID given by the user
response: requests.Response = requests.put(url = url + user_id, headers = {"Content-Type": "application/json"}, data = update_data)
# If the request went well, we display its response in JSON file
if response.status_code in [200, 201, 204]:
    print(response.json())
else:
    # Otherwise, something went wrong and we display the corresponding status code
    print(requests.status_code)

Which user do you want to update? Provide their ID:  3
What's the new name of this user?  Morphine Love Dion
What's the new job of this user?  Lip sync assassin


{'name': 'Morphine Love Dion', 'job': 'Lip sync assassin', 'updatedAt': '2024-03-29T18:32:46.972Z'}


- __Exercise 4__: Let's build now a similar application to create new users

1. Go to __[https://reqres.in/](https://reqres.in/)__
2. Scroll till the instructions to create users are given
3. Perform a POST request to that URL providing the correct payload for parameter data

In [27]:
# We create a variable to the endpoint URL
url: str = "https://reqres.in/api/users/"

# We ask the name and the job of this new user
user_name: str = input("What's the new name of this user? ")
user_job: str = input("What's the new job of this user? ")

# We use the new name and new job to create the payload the request will receive
user_data: dict = {"name": user_name, "job": user_job}
user_data: str = json.dumps(user_data)

# We perform a POST request to the endpoint providing the payload user_data in the parameter data
response: requests.Response = requests.post(url = url, headers = {"Content-Type": "application/json"}, data = user_data)

# If the request went well, we display its response in JSON format
if response.status_code in [200, 201]:
    print(response.json())
else:
    # Otherwise, we display the status code
    print(response.status_code)

What's the new name of this user?  Morphine Love Dion
What's the new job of this user?  Lip sync assassin


{'name': 'Morphine Love Dion', 'job': 'Lip sync assassin', 'id': '41', 'createdAt': '2024-03-29T18:46:32.259Z'}


- __Exercise 5__: Let's design a piece of code that given an item, returns the prices of the different offers for that item in Amazon.com and work out the average price

In [38]:
# We need to import BeautifulSoup
import numpy as np
from bs4 import BeautifulSoup

In [51]:
# We build the base URL
url: str = "https://www.amazon.es/s?k="

# We ask which item we want to search for
item_to_search: str = input("Which item do you want to research?")

# We replace blank spaces by the symbol + (URLs do this)
item_url: str = item_to_search.replace(" ", "+")

# We perform the GET request to that URL, but we need to add the header "User-Agent":"Defined" for the request to work
response: requests.Response = requests.get(url + item_url, headers={"User-Agent":"Defined"})

Which item do you want to research? medieval books


In [52]:
# We parse the HTML
parsed_html = BeautifulSoup(response.text, 'html5lib')

# All the prices are in a span whose class is called 'a-price'
a_prices = parsed_html.find_all("span", {"class": "a-price"})

# We create a np array to contain all the prices
# We have to get the span and the navigable text in every element
# We have to turn the price with current into float
amazon_prices: np.array = np.array([float(a_price.span.string.strip("\xa0€").replace(",",".")) for a_price in a_prices])

print(amazon_prices)

[ 20.89  15.26  18.99  26.45  13.19  13.19  28.4   29.9   13.31  20.79
  23.97  12.46   8.77  24.06  26.1   27.62  15.99  19.95   8.76  19.49
  23.97  24.5   18.95  19.95  15.75   8.35  11.93  12.45  20.95  19.17
  16.26  13.75   6.71   8.76  11.99  65.8   99.    19.5   11.5   10.75
  42.66  13.05  14.1   13.82  19.75  19.25   1.19  12.46  14.37  16.29
   8.38  27.52  37.15 111.67 144.35  21.09  12.46  11.39  16.27  35.25
  26.95  14.37  18.    23.75  25.    20.79]


In [53]:
print("Average price: {0}".format(amazon_prices.mean()))

Average price: 23.467575757575755


- __Exercise 6__: Let's put all the new released curated by allmusic's editor in a dataframe

In [60]:
base_url: str = "https://en.wikipedia.org/wiki/"

response: requests.Response = requests.get(base_url + "France")

parsed_html = BeautifulSoup(response.text, 'html5')

population_elements = parsed_html.find_all(string = "Population")

for next_sib in population_elements[0].next_siblings:
    print(next_sib)