# 🌍 CO2 Emissions Data Analysis from Worldometer
This notebook demonstrates web scraping, data cleaning, and analysis of CO2 emissions data for the United States.
We will answer four analytical questions using real-world data scraped from [Worldometer](https://www.worldometers.info/co2-emissions/us-co2-emissions/), and visualize key trends using **Seaborn**.

## 🔍 Objective
1. Scrape CO2 emissions data from at least five rows on the Worldometer website.
2. Store and clean the data using `pandas`.
3. Answer at least four data analysis questions:
   - Trend of CO2 emissions over time.
   - Year of highest emissions.
   - Average emissions over all years.
   - Emissions compared across decades.
4. Plot at least two of the answers using **Seaborn**, with labels and styled charts.

## Step 1: Web Scraping
We use requests and BeautifulSoup to pull the CO2 emissions data for the U.S. from the Worldometer website.

## Step 2: Data Cleaning
We convert strings to numeric formats and extract useful columns like `Decade`. This ensures consistency and makes our data analysis smoother.

## Step 3: Data Analysis
Here we answer four key questions about the data. Two are answered numerically and two using visualizations created with **Seaborn**.

## Q1: What is the trend of CO2 emissions over time?
We visualize the year-by-year CO2 emissions using a Seaborn line plot.

## Q2: What year had the highest CO2 emissions?
We identify the year with the highest CO2 output.

## Q3: What is the average annual CO2 emission?
We calculate the mean annual emission level for all recorded years.

## Q4: How do emissions compare across decades?
We group emissions by decade and visualize the total emissions per decade using a bar chart with labels.

## Conclusion
- CO2 emissions have shown distinct trends over the years.
- The highest emission year was identified, along with the average annual output.
- Visualizations reveal that certain decades were significantly higher in total emissions.
- This analysis helps understand how CO2 levels have shifted and can inform future environmental decisions.

In [None]:
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Basic style setup
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

In [None]:
#  Step 1: Scrape data from Worldometer (CO2 Emissions - USA)

url = 'https://www.worldometers.info/co2-emissions/us-co2-emissions/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Locate the emissions table
table = soup.find('table')
headers = [th.text.strip() for th in table.find_all('th')]

#  Extract all rows
data = []
for row in table.find_all('tr')[1:]:
    cells = [td.text.strip() for td in row.find_all('td')]
    if len(cells) == len(headers):
        data.append(cells)

#  Create DataFrame
df = pd.DataFrame(data, columns=headers)
df.head()

In [None]:
#  Step 2: Clean the Data

# Convert data types
df['Year'] = df['Year'].astype(int)
df['Fossil CO2 Emissions (tons)'] = df['Fossil CO2 Emissions (tons)'].str.replace(",", "").astype(float)

# Add Decade column
df['Decade'] = (df['Year'] // 10) * 10

# Final check
df.info()

In [None]:
# Q1: What is the trend of CO2 emissions over time?

plt.figure()
sns.lineplot(data=df, x="Year", y="Fossil CO2 Emissions (tons)", marker='o', color='green')

# Annotate last data point
last_row = df.iloc[-1]
plt.text(last_row['Year'], last_row['Fossil CO2 Emissions (tons)'] + 5e7,
         f"{int(last_row['Fossil CO2 Emissions (tons)']):,}", ha='center', fontsize=10)

plt.title("📈 Trend of U.S. CO2 Emissions Over Time")
plt.ylabel("CO2 Emissions (tons)")
plt.xlabel("Year")
plt.tight_layout()
plt.show()

In [None]:
# Q2: What year had the highest CO2 emissions?

max_val = df['Fossil CO2 Emissions (tons)'].max()
max_year = df[df['Fossil CO2 Emissions (tons)'] == max_val]['Year'].values[0]
print(f"🔺 Highest CO2 emission: {max_val:,.0f} tons in {max_year}")

In [None]:
# Q3: What is the average annual CO2 emission?

average_emission = df['Fossil CO2 Emissions (tons)'].mean()
print(f"📉 Average annual CO2 emission: {average_emission:,.0f} tons")

In [None]:
# Q4: How do emissions compare across decades?

plt.figure()
decade_data = df.groupby("Decade")['Fossil CO2 Emissions (tons)'].sum().reset_index()

sns.barplot(data=decade_data, x="Decade", y="Fossil CO2 Emissions (tons)", palette='crest')

# Annotate bars
for i in range(len(decade_data)):
    val = decade_data['Fossil CO2 Emissions (tons)'].iloc[i]
    plt.text(i, val + 5e8, f"{val/1e9:.2f}B", ha='center', fontsize=10)

plt.title("Total U.S. CO2 Emissions by Decade")
plt.xlabel("Decade")
plt.ylabel("Total CO2 Emissions (tons)")
plt.tight_layout()
plt.show()