# Step 1: Introduction + Research Question

# Does Economic Growth Hurt the Environment?

* This project investigates how GDP correlates with environmental metrics like air pollution and temperature anomalies, using regression and forecasting techniques to understand trends globally and regionally.


# Step 2: Data Collection & Cleaning

# Datasets:

* World Bank GDP
* WHO/OWID Air Pollution
* NASA Temperature Anomalies

In [1]:
import pandas as pd

# Load datasets
gdp_url = "https://raw.githubusercontent.com/SaiAnirudh659/Vis/refs/heads/main/ps4/gdp_data.csv"
pollution_url = "https://raw.githubusercontent.com/SaiAnirudh659/Vis/refs/heads/main/ps4/air_pollution.csv"
temp_url = "https://raw.githubusercontent.com/SaiAnirudh659/Vis/refs/heads/main/ps4/temperature_anomalies.csv"

gdp = pd.read_csv(gdp_url)
pollution = pd.read_csv(pollution_url)
temperature = pd.read_csv(temp_url)

# Rename for consistency
gdp = gdp.rename(columns={'Country Name': 'Country', 'Value': 'GDP'})
pollution = pollution.rename(columns={
    'Entity': 'Country',
    'PM2.5 air pollution (µg/m³)': 'Pollution'
})
temperature = temperature.rename(columns={'TemperatureAnomaly': 'Temp_Anomaly'})

# Step 3: Data Merging & Cleaning

In [2]:
pollution = pollution.rename(columns={'PollutionIndex': 'Pollution'})

In [3]:
# Merge GDP and Pollution
df = pd.merge(gdp, pollution, on=["Country", "Year"], how="inner")

# Merge with Temperature Anomalies
df = pd.merge(df, temperature, on=["Country", "Year"], how="left")

# Final columns cleanup
df = df[['Country', 'Year', 'GDP', 'Pollution', 'Temp_Anomaly']].dropna()
df.head()

Unnamed: 0,Country,Year,GDP,Pollution,Temp_Anomaly
0,United States,2010,3577771000000.0,15.222024,1.377048
1,United States,2011,4546423000000.0,38.644959,1.251963
2,United States,2012,9087469000000.0,80.118787,0.724579
3,United States,2013,13935730000000.0,43.927663,1.222565
4,United States,2014,9677496000000.0,81.627678,1.416284


# Step 4: Exploratory Data Analysis (EDA)

In [4]:
import plotly.express as px

fig = px.scatter(
    df,
    x='Pollution',
    y='GDP',
    color='Temp_Anomaly',
    hover_name='Country',
    title='🌍 GDP vs Pollution (Colored by Temperature Anomaly)',
    width=1000,     # 🔍 Wider
    height=600      # 📏 Taller
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.update_layout(title_font_size=20)

fig.show()

# My Interpretation's by this plot:
1. Relationship between GDP and Pollution
* If dots cluster upward to the right, higher GDP tends to correlate with higher pollution (positive correlation).
* If GDP increases but pollution stays flat or decreases, it might show cleaner economic growth.
2. Color = Temp Anomaly Insight
* Red/orange points indicate countries/years with higher temperature anomalies.
* If those red points are also high in GDP and Pollution, this suggests a possible climate impact of growth.
3. Outliers
* A low GDP but high pollution country = likely underdeveloped with weak regulations.
* A high GDP but low pollution point (e.g. Sweden) = example of sustainable policy.

## Animated Scatter Plot Over Years
* Tracks how each country moves over time (GDP vs Pollution)

In [5]:
import plotly.express as px

fig = px.scatter(
    df,
    x='Pollution',
    y='GDP',
    animation_frame='Year',
    animation_group='Country',
    color='Temp_Anomaly',
    hover_name='Country',
    size_max=55,
    title='GDP vs Pollution Over Years (Animated)',
    width=1000,
    height=600
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.update_layout(title_font_size=20)
fig.show()

# Separate Plots by Continent (Faceted Scatter Plot)
* We can compare continents side by side.
* Highlights regional differences in pollution vs GDP relationship.

In [6]:
!pip install pycountry-convert -q
import pycountry_convert as pc

def get_continent(country):
    try:
        country_code = pc.country_name_to_country_alpha2(country)
        return pc.country_alpha2_to_continent_code(country_code)
    except:
        return None

# Assign continent codes to each country
df['Continent'] = df['Country'].apply(get_continent)

# Map to readable names
continent_map = {
    'AF': 'Africa',
    'AS': 'Asia',
    'EU': 'Europe',
    'NA': 'North America',
    'SA': 'South America',
    'OC': 'Oceania'
}
df['Continent'] = df['Continent'].map(continent_map)

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m40.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.0/244.0 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [7]:
import plotly.express as px

# Unique sorted years for dropdown
years = sorted(df['Year'].unique())

# Create figure with all years (default)
fig = px.scatter(
    df,
    x='Pollution',
    y='GDP',
    color='Temp_Anomaly',
    facet_col='Continent',
    hover_name='Country',
    color_continuous_scale='plasma',
    trendline='ols',
    trendline_color_override='black',
    opacity=0.85,
    height=700,
    width=1200,
    title="🌍 GDP vs Pollution by Continent (Colored by Temp Anomaly) with Trendlines"
)

# Update visuals
fig.update_traces(marker=dict(size=12, line=dict(width=1, color='black')))
fig.update_layout(title_font_size=20, yaxis_title="GDP (in USD)")

# Add dropdown to filter by year
dropdown_buttons = [
    {
        "label": str(year),
        "method": "update",
        "args": [
            {"visible": [row['Year'] == year for _, row in df.iterrows()]},
            {"title": f"GDP vs Pollution by Continent - Year: {year}"}
        ],
    } for year in years
]

fig.update_layout(
    updatemenus=[
        {
            "buttons": dropdown_buttons,
            "direction": "down",
            "showactive": True,
            "x": 1.1,  # Pushes dropdown to the right side
            "xanchor": "left",  # Anchors from the left of the dropdown
            "y": 1.05,
            "yanchor": "top"
        }
    ]
)

fig.show()

### Understanding the Chart

- **Each dot** represents a **country in a specific year**, showing:
  - **X-axis:** Level of Pollution
  - **Y-axis:** GDP (in USD)
  - **Color of the dot:** Temperature Anomaly (climate variation)
  - **Tooltip (on hover):** Country name and exact values

- **Each subplot** shows data for one **Continent**.

- **The black trendline** in each subplot:
  - Is a **regression line** fitted to the dots.
  - Only appears if **two or more countries** have data for that year in that continent.
  - Its **slope tells the story**:
    - **Positive slope:** GDP increases with pollution
    - **Negative slope:** GDP decreases with pollution

- Use the **dropdown on the right** to switch between years and observe how the relationship between **GDP, Pollution, and Temperature Anomaly** changes over time.

> **Note:** No trendline appears in some panels if there's not enough data for that year in that continent.