# Choose a Data Set

You can choose to analyze any data that you would like! Remember, you need 1000 rows of non-null data in order to get 5 points for the "Data" criteria of my [rubric](https://docs.google.com/document/d/1s3wllcF3LLnytxwD8mZ-BCypXKnfaahnizWGNojT-B4/edit?usp=sharing). Consider looking at [Kaggle](https://www.kaggle.com/datasets) or [free APIs](https://free-apis.github.io/#/browse) for datasets of this size. Alternatively, you can scrape the web to make your own dataset! :D

Once you have chosen your dataset, please read your data into a dataframe and call `.info()` below. If you don't call `info` I will give you 0 points for the first criteria described on the [rubric](https://docs.google.com/document/d/1s3wllcF3LLnytxwD8mZ-BCypXKnfaahnizWGNojT-B4/edit?usp=sharing).

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import requests

In [2]:
pages = []
url = 'https://openlibrary.org/trending/yearly'
response = requests.get(url)
soup2 = BeautifulSoup(response.content, "html.parser")
pages.append(soup2)

In [3]:
page = 1
for i in range(page, 10):
    url = 'https://openlibrary.org/trending/yearly?page=' + str(page)
    response = requests.get(url)
    soup2 = BeautifulSoup(response.content, "html.parser")
    pages.append(soup2)

In [4]:
soup2


<!DOCTYPE html>

<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="" name="title"/>
<meta content="free books, books to read, free ebooks, audio books, read books for free, read books online, online library" name="keywords"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="OpenLibrary.org" name="author">
<meta content="OpenLibrary.org" name="creator">
<meta content="Original content copyright; 2007-2015" name="copyright"/>
<meta content="Global" name="distribution"/>
<meta content="#e2dcc5" name="theme-color"/>
<link href="https://openlibrary.org/trending/yearly?page=2" rel="canonical"/>
<link href="https://athena.archive.org" rel="preconnect"/>
<link href="/static/opensearch.xml" rel="search" title="Open Library" type="application/opensearchdescription+xml"/>
<link href="/static/manifest.json" rel="manifest"/>
<link href="/static/images/openlibrary-128

# My Question

### In our current year,

# My Analysis

In [5]:
book_info = {"Book Title": []}
titles = soup2.find_all('a', class_='results')

In [6]:
for title in titles:
    book_info["Book Title"]. append(title.contents)

In [7]:
book_info

{'Book Title': [['Twisted Lies'],
  ['A Court of Mist and Fury'],
  ['Twisted Games'],
  ['Ikigai: The Japanese Secret to a Long and Happy Life'],
  ["Can't Hurt Me: Master Your Mind and Defy the Odds"],
  ['The Art of Seduction'],
  ['Nineteen Eighty-Four'],
  ['The Lightning Thief'],
  ['Fifty Shades of Grey'],
  ['The Art of War'],
  ['Shatter Me'],
  ['Twisted Hate'],
  ['The Silent Patient'],
  ['Pride and Prejudice'],
  ['The Love Hypothesis'],
  ['The 7 Habits of Highly Effective People'],
  ['Ugly Love'],
  ['Sapiens: A Brief History of Humankind'],
  ["A Good Girl's Guide to Murder"],
  ['The Cruel Prince']]}

In [8]:
books = pd.DataFrame(book_info)

In [9]:
years = []
for element in soup2.find_all("span", class_="resultDetails"):  # Adjust tag and class as needed
    year = element.get_text(strip=True)  # Extract and clean text
    years.append(year)

# Step 3: Store in a DataFrame
df1 = pd.DataFrame({"Publication Year": years})

In [10]:
df1

Unnamed: 0,Publication Year
0,First published in 2022—8 editions
1,First published in 2014—25 editions
2,First published in 2021—10 editions
3,First published in 2017—10 editions
4,First published in 2018—7 editions
5,First published in 1901—20 editions
6,First published in 1949—431 editions
7,First published in 2005—77 editions
8,First published in 2000—138 editions
9,First published in 1900—1431 editions


In [11]:
authors = []
for element in soup2.find_all("span", class_="bookauthor"):  # Adjust tag and class as needed
    author = element.get_text(strip=True)  # Extract and clean text
    authors.append(author)

# Step 3: Store in a DataFrame
df2 = pd.DataFrame({"Author": authors})

In [12]:
df2

Unnamed: 0,Author
0,byAna HuangandMariona Gastó Jiménez
1,bySarah J. Maas
2,byAna Huang
3,byHéctor GarcíaandFrancesc Miralles
4,byDavid Goggins
5,byRobert GreeneandJoost Elffers
6,byGeorge Orwell
7,byRick Riordan
8,byE. L. James
9,bySun Tzu


In [13]:
elements = soup2.find_all("div", class_="details")

# Initialize the list to store numbers
numbers = []

# Loop through all elements to extract numbers
for element in elements:
    text = element.get_text(strip=True)  # Get the text content from each element
    words = text.split()  # Split the text into a list of words
    
    # Loop through each word and check if it's a number
    for word in words:
        if word.isdigit():  # Check if the word is a number
            
            # Check if the number is not 48
            if word != "48":
                numbers.append(word)
            if word != "7":
                numbers.append(word)  # Append the number to the list
            if word != "7":
                numbers.append(word)
            

# Print the list of numbers, excluding 48
print(numbers)

['1761', '1761', '1715', '1715', '1674', '1674', '1516', '1516', '1458', '1458', '1405', '1405', '1319', '1319', '1271', '1271', '1254', '1254', '1246', '1246', '1231', '1231', '1211', '1211', '1195', '1195', '1186', '1186', '1138', '1138', '7', '1131', '1131', '1107', '1107', '1097', '1097', '1048', '1048', '1047', '1047']


In [14]:
df = pd.DataFrame(numbers, columns=["Times Logged In"])

In [15]:
trendingYearly = pd.concat([books, df1, df2, df], axis=1)

In [16]:
trendingYearly

Unnamed: 0,Book Title,Publication Year,Author,Times Logged In
0,[Twisted Lies],First published in 2022—8 editions,byAna HuangandMariona Gastó Jiménez,1761
1,[A Court of Mist and Fury],First published in 2014—25 editions,bySarah J. Maas,1761
2,[Twisted Games],First published in 2021—10 editions,byAna Huang,1715
3,[Ikigai: The Japanese Secret to a Long and Hap...,First published in 2017—10 editions,byHéctor GarcíaandFrancesc Miralles,1715
4,[Can't Hurt Me: Master Your Mind and Defy the ...,First published in 2018—7 editions,byDavid Goggins,1674
5,[The Art of Seduction],First published in 1901—20 editions,byRobert GreeneandJoost Elffers,1674
6,[Nineteen Eighty-Four],First published in 1949—431 editions,byGeorge Orwell,1516
7,[The Lightning Thief],First published in 2005—77 editions,byRick Riordan,1516
8,[Fifty Shades of Grey],First published in 2000—138 editions,byE. L. James,1458
9,[The Art of War],First published in 1900—1431 editions,bySun Tzu,1458


# My Answer

### Write your answer here.