# Top Books of different genres

## Picked a website and describe my objective

  - Browsed through different sites and picked on to scrape. Checked the "Project Ideas" section for inspiration.


  - Identify the information I'd like to scrape from the site. Decided the format of the output CSV file.


  - Summarize your project idea and outline your strategy in a Jupyter notebook. Use the "New" button above.

## Use the requests library to download web pages

  - Inspected the website's HTML source and identified the right URLs to download.


  - Downloaded and saved web pages locally using the requests library.


  - Created a function to automate downloading for different topics/search queries.

In [None]:
!pip install requests --upgrade --quiet

In [None]:
import requests

In [None]:
books_url = 'http://books.toscrape.com'

In [None]:
response = requests.get(books_url)

In [None]:
response.status_code

In [None]:
len(response.text)

In [None]:
page_contents = response.text

In [None]:
page_contents[:1000]

In [None]:
with open('webpage.html', 'w') as f:
    f.write(page_contents)

## Used Beautiful Soup to parse and extract information

  - Parsed and explored the structure of downloaded web pages using Beautiful Soup.


  - Used the right properties and methods to extract the required information.


  - Created functions to extract from the page into lists and dictionaries.
  

- (Optional) Used a REST API to acquire additional information if required.


In [None]:
!pip install beautifulsoup4 --upgrade --quiet

In [None]:
from bs4 import BeautifulSoup

In [None]:
doc = BeautifulSoup(page_contents,'html.parser')

In [None]:
a_tags = doc.find_all('a')

In [None]:
len(a_tags)

In [None]:
a_tags[:50]

In [None]:
topic_url = "http://books.toscrape.com/" + a_tags[0]['href']
print(topic_url)

In [None]:
topic_titles = []

for tag in a_tags:
    topic_titles.append(tag.text.strip())
print(topic_titles[slice(50)])

In [None]:
topic_urls = []

for tag in a_tags:
    topic_urls.append(tag['href'])
print(topic_urls[slice(50)])

In [None]:
!pip install pandas --quiet

In [None]:
import pandas as pd

In [None]:
topics_dict = {
    'title' : topic_titles[slice(50)],
    'url' : topic_urls[slice(50)]
}

In [None]:
topics_df = pd.DataFrame(topics_dict)

In [None]:
topics_df

## Created  CSV file(s) with the extracted information

- Created functions for the end-to-end process of downloading, parsing, and saving CSVs.


- Executed the function with different inputs to create a dataset of CSV files.


- Verified the information in the CSV files by reading them back using Pandas.

In [None]:
topics_df.to_csv('topics.csv', index = None)

## Got information out of the topic page

In [None]:
topic_page_url = 'http://books.toscrape.com/' + topic_urls[3]

In [None]:
topic_page_url

In [None]:
response = requests.get(topic_page_url)

In [None]:
response.status_code

In [None]:
len(response.text)

In [None]:
topic_doc = BeautifulSoup(response.text, 'html.parser')

In [None]:
book_div = topic_doc.find_all('li', class_='col-xs-6 col-sm-4 col-md-3 col-lg-3')
print(book_div)

In [None]:
len(book_div)

In [None]:
book_title = []
for container in book_div:
    name = container.article.h3.a['title']
    book_title.append(name)

In [None]:
book_title

In [None]:
product_price = []
for container in book_div:
   price = container.article.find('div',class_='product_price').p.text
   product_price.append(price)

In [None]:
product_price

In [None]:
star_rating = []
for container in book_div:
  rating = container.article.p['class'][-1]
  star_rating.append(rating)

In [None]:
star_rating

In [None]:
import pandas as pd

In [None]:
col_dict = {'title':book_title, 
            'price':product_price, 
            'rating':star_rating}

In [None]:
book_store = pd.DataFrame(col_dict)

In [None]:
book_store

In [None]:
book_store.to_csv('book_store.csv')