<div class='bar_title'></div>

*Introduction to Data Science (IDS)*

# Assignment 5 - Getting Data in Python Solutions

Gunther Gust / Vanessa Haustein <br>
Chair for Enterprise AI<br>
Data Driven Decisions (D3) Group<br>
Center for Artificial Intelligence and Data Science (CAIDAS)

<img src="images/d3.png" style="width:20%; float:left;" />

<img src="images/CAIDASlogo.png" style="width:20%; float:left;" />

In [17]:
import requests
from bs4 import BeautifulSoup
import lxml
import pandas as pd

## Exercise 1: Random Cat Facts
Connect to the [catfact API](https://catfact.ninja/fact) and retrieve one random fact about cats.

In [3]:
url = 'https://catfact.ninja/fact'

In [4]:
r = requests.get(url)

# print a status update for the requests command
print(f"Status code: {r.status_code}")

# store API response to variable
response_dict = r.json()

Status code: 200


In [5]:
response_dict

{'fact': 'A cat sees about 6 times better than a human at night, and needs 1/6 the amount of of light that a human does - it has a layer of extra reflecting cells which absorb light.',
 'length': 172}

## Exercise 2: Fetching Random User Data
In this exercise, you’ll use the Random User Generator API to retrieve and display information about randomly generated users. This API is free and doesn’t require an API key.

Write a Python function to make a GET request to the Random User Generator API and display details for a few random users.

Make sure to handle status code errors in you function and display the retrieved data in a readable manner.

In [6]:
def get_random_users(count=1):
    url = f"https://randomuser.me/api/?results={count}"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()['results']
    else:
        print("Error:", response.status_code)
        return None

In [7]:
user_count = 3  # Adjust this to get more or fewer users
random_users = get_random_users(user_count)

if random_users:
    for i, user in enumerate(random_users, start=1):
        name = f"{user['name']['first']} {user['name']['last']}"
        country = user['location']['country']
        email = user['email']
        print(f"User {i}:")
        print(f"Name: {name}")
        print(f"Country: {country}")
        print(f"Email: {email}")
        print("-" * 50)

User 1:
Name: Edward Cooper
Country: New Zealand
Email: edward.cooper@example.com
--------------------------------------------------
User 2:
Name: Espartaco Zamora
Country: Mexico
Email: espartaco.zamora@example.com
--------------------------------------------------
User 3:
Name: Erique Barros
Country: Brazil
Email: erique.barros@example.com
--------------------------------------------------


## Exercise 3: Scraping Book Information

Scrape title and price of all books on http://books.toscrape.com/ and export them to a csv file.

In [12]:
url = 'http://books.toscrape.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')

In [15]:
books = soup.find_all('article', class_='product_pod')

book_data = []
for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text

    book_data.append({
        'Title': title,
        'Price': price
    })

# Display the first few books
for idx, book in enumerate(book_data[:5], 1):
    print(f"{idx}. {book['Title']} - {book['Price']}")

1. A Light in the Attic - Â£51.77
2. Tipping the Velvet - Â£53.74
3. Soumission - Â£50.10
4. Sharp Objects - Â£47.82
5. Sapiens: A Brief History of Humankind - Â£54.23


In [18]:
df = pd.DataFrame(book_data)
df.to_csv('books_data.csv', index=False)

## Exercise 4: Scraping Peace Nobel Prize Winners

Go to https://en.wikipedia.org/wiki/List_of_Nobel_laureates and check the html code. The task is to extract the name(s) of Peace Nobel Prize Winners of a given year.
Write a function `find_peace_winners(target_year)` that returns all names.

Hint: Work from top to bottom: first, locate the table that you want to search. Then go through the rows to find the correct year (that will probably be a string...) and then search for the correct column and extract the text of the cell.

In [8]:
# URL of the Wikipedia page
url = "https://en.wikipedia.org/wiki/List_of_Nobel_laureates"

# Send a GET request to the page
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find('table', {'class': 'wikitable'})

In [9]:
def find_peace_winners(target_year):
    # Iterate over the table rows, skipping the header row
    for row in table.find_all('tr')[1:]:
        # Extract the year for this row
        row_year = row.find('th').get_text(strip=True)
        
        # If the row's year matches the target year, find the names
        if row_year == str(target_year):
            # Get the cells in this row
            columns = row.find_all('td')
            
            # Check if the row contains the expected data
            if len(columns) > 0:
                # Extract the cell containing the winners' names
                winner_names_cell = columns[4]
                winner_names = winner_names_cell.get_text(strip=True).split(';')
                
                return winner_names

In [10]:
find_peace_winners('1978')

['Anwar Sadat', 'Menachem Begin']