# Choose a Data Set

Create your own dataset by scraping one of the following websites *(level 5)*:
- [Wikipedia](https://www.wikipedia.org/)
- [OpenLibrary](https://openlibrary.org/)

**OR** 

Use data gathered from one of the following APIs *(level 4)*: 
- [TMDB](https://developer.themoviedb.org/reference/intro/getting-started)
- [College Scorecard](https://collegescorecard.ed.gov/data/api-documentation/)

**OR** 

Pick a JSON dataset *(level 3)*:
- [Food/Restaurant Data](https://drive.google.com/drive/folders/1V94S6WpclvQmbnW88KVMD4EruryA1oma?usp=drive_link)
- [Fashion Data](https://drive.google.com/drive/folders/1V8SbFjtRRW8WVf3xBzg0gzLjOtMhHea_?usp=drive_link)

**OR** 

Pick a CSV dataset *(level 2)*:
- [LA Parking Tickets](https://drive.google.com/drive/folders/1vaOfwMi6QmZEGsXr8VM0ulPGzvTTBCgm?usp=drive_link)
- [Hotels](https://drive.google.com/drive/folders/1IpVFxgwBJvJHKoOuBsk6WK2qYqFYP4hi?usp=drive_link)

# My Question
### What is the probability of getting a book's name on the trending page with the letter "z" in it?

# My Answer

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

books = {'Title':[], 'Author':[], 'Year Published':[], 'Number of Logs':[]}

for i in range(1, 11):
    url = f'https://openlibrary.org/trending/forever?page={i}'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    trends = soup.find_all('div', class_="sri__main")
    
    for book in trends:
        Title = book.find('div', class_='resultTitle').text.strip()
        Author = book.find('span', class_='bookauthor').text.strip()
        
        detailed = book.find('span', class_='resultStats')
        pubdate = detailed.find('span', class_='resultDetails').text.strip()
        
        pubdate_cleaned = pubdate.replace("—", "").replace(" editions", "").strip()

        logged_text = None
        log_info = book.find(text="Logged")
        if log_info:
            logged_text = log_info.strip()
            
        books['Title'].append(Title)
        books['Author'].append(Author)
        books['Year Published'].append(pubdate_cleaned)
        books['Number of Logs'].append(logged_text)

df = pd.DataFrame(books)

books_with_z = df[df['Title'].str.contains('z', case=False, na=False)]

probability = len(books_with_z) / len(df)

print(f"{probability:.4f}")

  log_info = book.find(text="Logged")


0.0201


<p style="color: #635b9b; text-align: center;">There is about a 0.02% chance of getting a book on th trending page with a z in its title.</p>