# Exploratory Data Analysis on Bob's Bookstore
Name: Seyona Creger

This analysis is focused on observing potential trends on the books at Bob's Bookstore. 


## About the data
Bob's Bookstore is a fake business that sells books to customers via an online platform. The store mostly deals with books that have animal themes, and all of the information about the store's books can be found on the store's website located at the URL [https://btech-data-analytics.github.io/bridgerland-technical-college/home.htmlLinks](https://btech-data-analytics.github.io/bridgerland-technical-college/home.htmlLinks)

In [1]:
# Creates table at the end
import pandas as pd 
# Going to be used to fetch the data from the internet
import requests
# Going to used to parse through all the data which leads to creating a pandas data frame
from bs4 import BeautifulSoup

## Scraping Bob's Bookstore Website

In [35]:
# Get the website every time we call the variable 'response'
response = requests.get('https://btech-data-analytics.github.io/bridgerland-technical-college/bookstore.html').text

In [41]:
# initialize beautifulsoup
soup = BeautifulSoup(response, 'html.parser')

In [42]:
# This code looks for a table and then searches for a tr tag with the class of book. The first one is selected.
soup.find('table').find_all('tr', class_='book')[0]

<tr class="book">
<td>978-1234567890</td>
<td>Whiskers of Wisdom: Tales from Feline Philosophers</td>
<td>Penelope Wainwright</td>
<td>English</td>
<td>256</td>
<td>Cats</td>
<td>$19.99</td>
<td><button>Buy now</button></td>
</tr>

In [43]:
# Empty lists where we will store our data into
ISBN = []
Title = []
Author = []
Language = []
Pages = []
Topic = []
Price = []
# Use a loop to find every book and append its' data to our empty lists
for book in soup.find('table').find_all('tr', class_='book'):
    ISBN.append(book.find_all('td')[0].text)
    Title.append(book.find_all('td')[1].text)
    Author.append(book.find_all('td')[2].text)
    Language.append(book.find_all('td')[3].text)
    Pages.append(book.find_all('td')[4].text)
    Topic.append(book.find_all('td')[5].text)
    Price.append(book.find_all('td')[6].text)

In [45]:
print(ISBN, Title, Author, Language, Pages, Topic, Price)

['978-1234567890', '978-2345678901', '978-3456789012', '978-4567890123', '978-5678901234', '978-6789012345', '978-7890123456', '978-8901234567', '978-9012345678', '978-0123456789', '978-1234567890', '978-2345678901', '978-3456789012', '978-4567890123', '978-5678901234'] ['Whiskers of Wisdom: Tales from Feline Philosophers', "Purrfectly Pawesome: A Cat's Life", 'Cat Tales: Adventures in Whiskerland', 'The Enigmatic Paws: Mysteries of Meowville', 'Cats in Wonderland', 'Whisker Wisdom: Life Lessons from Feline Sages', 'Catnip Chronicles: A Purrfect Journey', 'Cat-astrophe: Tales of Misadventures', "The Cat's Whisker: A Feline Fantasy", 'Fur and Friendship: Stories of Feline Companions', 'Tails of Loyalty: Canine Chronicles', "Pawsitive Adventures: A Dog's Journey", 'Barking Wisdom: Lessons from Wise Canines', 'Dogged Determination: Stories of Resilient Pooches', 'The Bark Brigade: Canine Heroes Among Us'] ['Penelope Wainwright', 'Jasper Sterling', 'Penelope Wainwright', 'Maximilian Thorne

In [46]:
# Putting this all into a pandas data frame
df = pd.DataFrame({
    'ISBN': ISBN,
    'Title': Title,
    'Author': Author,
    'Language': Language,
    'Pages': Pages,
    'Topic': Topic,
    'Price': Price
})
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ISBN      15 non-null     object
 1   Title     15 non-null     object
 2   Author    15 non-null     object
 3   Language  15 non-null     object
 4   Pages     15 non-null     object
 5   Topic     15 non-null     object
 6   Price     15 non-null     object
dtypes: object(7)
memory usage: 972.0+ bytes


In [None]:
# Make prices become a float
df['Price'] = df['Price'].str.replace('$', '').astype(float)

In [65]:
# Make pages become an int
df['Pages'] = df['Pages'].astype(int)

In [57]:
df.head()

Unnamed: 0,ISBN,Title,Author,Language,Pages,Topic,Price
0,978-1234567890,Whiskers of Wisdom: Tales from Feline Philosop...,Penelope Wainwright,English,256,Cats,19.99
1,978-2345678901,Purrfectly Pawesome: A Cat's Life,Jasper Sterling,English,192,Cats,15.99
2,978-3456789012,Cat Tales: Adventures in Whiskerland,Penelope Wainwright,English,320,Cats,21.99
3,978-4567890123,The Enigmatic Paws: Mysteries of Meowville,Maximilian Thorne,English,288,Cats,17.99
4,978-5678901234,Cats in Wonderland,Isadora Harrington,English,224,Cats,16.99


## Analysis
Specifically, we will answer the following questions in the analysis:

* Which author has the most books listed at Bob's Bookstore?
* Which is the most popular topic among books at Bob's Bookstore?
* Which topic of books is the most expensive, on average?
* Which topic of books has the most pages, on average?

### Which author has the most books listed at Bob's Bookstore?

The data tells us that Penelope Wainwright has the most books at Bob's Bookstore, by far. Penelope has four books there while the next two authors only have two books present. The rest of the authors only have a single book there. 

In [67]:
df['Author'].value_counts()

Author
Penelope Wainwright    4
Jasper Sterling        2
Benjamin Barkley       2
Maximilian Thorne      1
Celeste Nightshade     1
Isadora Harrington     1
Seraphina Montague     1
Sophie Shepherd        1
Oliver Obedience       1
Ruby Ruffington        1
Name: count, dtype: int64

### Which is the most popular topic among books at Bob's Bookstore?
From the featured page on Bob's website, I would guess that cats would be the most popular topic. 

From our data, it is evident that my guess was right. 'Cats' are the leading topic on the site, doubling the 'Dogs' topic.

In [52]:
df['Topic'].value_counts()

Topic
Cats    10
Dogs     5
Name: count, dtype: int64

### Which topic of books is the most expensive, on average?

On average, our data tells us that the 'Dogs' topic has the most expensive books. They lead by an average of \$26.59, with 'Cats' averaging only \$17.79. 

In [60]:
df.groupby('Topic')['Price'].mean().sort_values(ascending=False)

Topic
Dogs    26.59
Cats    17.79
Name: Price, dtype: float64

### Which topic of books has the most pages, on average?

On average, we can see from the code below that 'Dogs' books have the most pages. They lead with an average of 256 pages; whereas, 'Cats' only have 238.4 pages. 


In [66]:
df.groupby('Topic')['Pages'].mean().sort_values(ascending=False)

Topic
Dogs    256.0
Cats    238.4
Name: Pages, dtype: float64

## Conclusion
From the analysis of books at Bob's Bookstore, several key insights have emerged. The most prolific author on the site is Penelope Wainwright. In addition, the most popular topic present is Cats. Surprisingly, books categorized under the topic of Dogs are the most expensive on average, highlighting their perceived value or niche market demand among customers. Similarly, books categorized under Dogs also have the highest average page count, indicating substantial content depth or complexity within this category.

From these findings, we can use them to sufficiently help Bob's business. With Cats identified as the most popular topic, Bob's Bookstore can focus marketing efforts on promoting books related to Cats to attract more customers and increase sales. Recognizing that books categorized under Dogs command higher prices and have more pages, Bob's Bookstore can strategically price these books to maximize revenue while also highlighting their content depth to appeal to interested buyers.

To further enhance decision-making and customer satisfaction, future studies could explore:

* **Customer Preferences**: Conducting surveys or analyzing sales data to understand why certain topics or authors resonate more with customers.

* **Market Trends**: Monitoring industry trends and competitor analysis to stay ahead of evolving customer interests and market demands.

* **Content Analysis**: Delving deeper into the themes and narratives within popular topics like Cats and Dogs to uncover underlying reasons for their popularity and potential areas for expansion.

By leveraging these insights and conducting further studies, Bob's Bookstore can continue to refine its inventory selection, marketing strategies, and overall customer experience, ultimately driving growth and success in the competitive bookstore market.