# Day 18: Web Scraping with Python

This folder demonstrates how to collect data from websites using web scraping techniques in Python.

## Key Concepts Covered

- Making HTTP Requests:
  - Using the requests library to download web pages.
- Parsing HTML:
  - Using BeautifulSoup and lxml to parse and extract information from HTML content.
- Extracting Data:
  - Finding and extracting quotes, authors, and tags from a sample website (quotes.toscrape.com).
- Storing Data:
  - Saving the scraped data into a pandas DataFrame and exporting it to a CSV file.

## Libraries Used

- requests: For downloading web pages.
- BeautifulSoup (bs4): For parsing HTML.
- lxml: For fast HTML parsing.
- pandas: For storing and saving the scraped data.

## Why This is Important

Web scraping allows you to collect data that is not available through APIs. It is a valuable skill for data collection and research.

This folder is a beginner-friendly introduction to web scraping in Python! 

In [113]:
import pandas as pd 
import requests
from io import StringIO
from bs4 import BeautifulSoup
import lxml
print("lxml is working!")

lxml is working!


In [114]:
import bs4
import lxml
import sys

print("Python executable:", sys.executable)
print("bs4 location:", bs4.__file__)
print("lxml location:", lxml.__file__)


Python executable: C:\Program Files\Python312\python.exe
bs4 location: C:\Users\USMAN-PC\AppData\Roaming\Python\Python312\site-packages\bs4\__init__.py
lxml location: C:\Users\USMAN-PC\AppData\Roaming\Python\Python312\site-packages\lxml\__init__.py


In [115]:
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
url = "http://quotes.toscrape.com"
webpage = requests.get(url,headers=headers)

In [116]:
webpage

<Response [200]>

In [117]:
soup = BeautifulSoup(webpage.text, "html.parser")


In [118]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Quotes to Scrape
  </title>
  <link href="/static/bootstrap.min.css" rel="stylesheet"/>
  <link href="/static/main.css" rel="stylesheet"/>
 </head>
 <body>
  <div class="container">
   <div class="row header-box">
    <div class="col-md-8">
     <h1>
      <a href="/" style="text-decoration: none">
       Quotes to Scrape
      </a>
     </h1>
    </div>
    <div class="col-md-4">
     <p>
      <a href="/login">
       Login
      </a>
     </p>
    </div>
   </div>
   <div class="row">
    <div class="col-md-8">
     <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
      <span class="text" itemprop="text">
       “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
      </span>
      <span>
       by
       <small class="author" itemprop="author">
        Albert Einstein
       </small>
       <a href="/author/Albert

In [119]:
  for i in soup.find_all("small"):
      print(i.text.strip())

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin


In [121]:
all_quotes = []


In [122]:
for quote in soup.find_all("div", class_="quote"):
    text = quote.find("span", class_="text").text.strip()
    author = quote.find("small", class_="author").text.strip()
    tags = [tag.text for tag in quote.find_all("a", class_="tag")]
    author_link = quote.find("a")["href"]

    # Store as a dict
    all_quotes.append({
        "quote": text,
        "author": author,
        "tags": ", ".join(tags),
        "author_link": f"http://quotes.toscrape.com{author_link}"
    })

In [123]:
df = pd.DataFrame(all_quotes)


In [125]:
df.to_csv("quotes.csv", index=False)

df

Unnamed: 0,quote,author,tags,author_link
0,“The world as we have created it is a process ...,Albert Einstein,"change, deep-thoughts, thinking, world",http://quotes.toscrape.com/author/Albert-Einstein
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities, choices",http://quotes.toscrape.com/author/J-K-Rowling
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational, life, live, miracle, miracles",http://quotes.toscrape.com/author/Albert-Einstein
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy, books, classic, humor",http://quotes.toscrape.com/author/Jane-Austen
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself, inspirational",http://quotes.toscrape.com/author/Marilyn-Monroe
5,“Try not to become a man of success. Rather be...,Albert Einstein,"adulthood, success, value",http://quotes.toscrape.com/author/Albert-Einstein
6,“It is better to be hated for what you are tha...,André Gide,"life, love",http://quotes.toscrape.com/author/Andre-Gide
7,"“I have not failed. I've just found 10,000 way...",Thomas A. Edison,"edison, failure, inspirational, paraphrased",http://quotes.toscrape.com/author/Thomas-A-Edison
8,“A woman is like a tea bag; you never know how...,Eleanor Roosevelt,misattributed-eleanor-roosevelt,http://quotes.toscrape.com/author/Eleanor-Roos...
9,"“A day without sunshine is like, you know, nig...",Steve Martin,"humor, obvious, simile",http://quotes.toscrape.com/author/Steve-Martin
