# Web Scraping & Data Analysis Project
## Scraping Quotes and Analyzing Author Data

**Project Overview:**
In this project, you will:
1. Scrape quotes and author information from a website https://quotes.toscrape.com/
2. Clean and organize the data
3. Perform exploratory data analysis
4. Create visualizations
5. Answer analytical questions

**Learning Objectives:**
- Web scraping with BeautifulSoup
- Data cleaning and manipulation with Pandas
- Data visualization with Matplotlib/Seaborn
- Basic statistical analysis

## Solution

In [2]:
# import the necessary modules or libraries
# if library is not available run the command below in your terminal or command prompt;
## pip install requests 
## pip install beautifulsoup4 
## pip install pandas 
## pip install matplotlib 
## pip install seaborn


import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import re

In [3]:
# Function to scrape quotes from a single page
def scrape_quotes_page(url):
    """
    Scrape quotes, authors, and tags from a given URL
    
    Parameters:
    url (str): URL to scrape
    
    Returns:
    list: List of dictionaries containing quote data
    """
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    quotes_data = []
    
    # Find all quote containers
    quotes = soup.find_all('div', class_='quote')
    
    for quote in quotes:
        # Extract quote text
        text = quote.find('span', class_='text').text.strip()
        # Remove quotation marks
        text = text.strip('""')
        
        # Extract author
        author = quote.find('small', class_='author').text
        
        # Extract tags
        tags = [tag.text for tag in quote.find_all('a', class_='tag')]
        
        # Extract quote length and word count
        quote_length = len(text)
        word_count = len(text.split())
        
        quotes_data.append({
            'quote': text,
            'author': author,
            'tags': ', '.join(tags),
            'num_tags': len(tags),
            'quote_length': quote_length,
            'word_count': word_count
        })
    
    return quotes_data

## Q1a. create a DataFrame from the scraped data and display it. Also what is the shape of this data?

## Q1b. What is the shape of this data?

## Q2. Data Cleaning: Check for and handle any issues in the data

## Q3. Find the top 5 authors with the most quotes

## Q4. Create a scatter plot showing the relationship between word count and quote length

## Q5a. Analyze the most common tags. Extract all tags and count them

## Q5b. Visualize the data above