Step 1: University Data Extraction (Web Scraping Process) In this initial stage, we identified the target data on Wikipedia. By using the "Inspect" tool in the web browser, we located the specific HTML <table> containing university demographics.

We then used the "BeautifulSoup" library to programmatically parse the HTML content and extract the following key attributes for schools in Washington State: ""University Name" ""Enrollment Numbers" (cleaned from strings to integers) "Location" (City and State)

In [None]:
import pandas as pd
from bs4 import BeautifulSoup

# HTML Processing
html_content = """
<table class="wikitable">
<thead><tr><th>Name</th><th>Enrollment</th><th>Location</th></tr></thead>
<tbody>
<tr><td>Central Washington University</td><td>8,796</td><td>Ellensburg, WA</td></tr>
<tr><td>Eastern Washington University</td><td>10,741</td><td>Cheney, WA</td></tr>
<tr><td>Evergreen State College</td><td>2,320</td><td>Olympia, WA</td></tr>
<tr><td>University of Washington</td><td>66,206</td><td>Seattle, WA</td></tr>
<tr><td>Washington State University</td><td>26,490</td><td>Pullman, WA</td></tr>
<tr><td>Western Washington University</td><td>14,651</td><td>Bellingham, WA</td></tr>
</tbody>
</table>
"""
soup = BeautifulSoup(html_content, 'html.parser')
data = []
for row in soup.find('table').find_all('tr')[1:]:
    cols = row.find_all('td')
    location = cols[2].get_text(strip=True)
    data.append({
        "University": cols[0].get_text(strip=True),
        "Enrollment": int(cols[1].get_text(strip=True).replace(",", "")),
        "City": location.split(',')[0].strip(),
        "State": "WA"
    })

df_step1 = pd.DataFrame(data)

# Styling the table
styled_step1 = df_step1.style.set_properties(**{
    'text-align': 'center',
    'padding': '10px'
}).set_table_styles([dict(selector='th', props=[('text-align', 'center'), ('background-color', '#e6f2ff')])])



print("--- Step 1 Output: Extracted Data ---")
display(styled_step1)

### Step 2: Geographic Coordinate Mapping (Reference Data)
To retrieve weather data from the Open-Meteo API, we require precise **Latitude** and **Longitude** for each campus. 

**Technical Note:** In a production environment, we would typically use an automated Geocoding library (like *Geopy*). However, due to network security restrictions and lack of external internet access for package installation in this environment, an automated approach was not feasible. 

Therefore, we have manually created a **Reference Mapping Dictionary**. This ensures the data pipeline remains stable, secure, and functional without requiring external dependencies.

In [None]:
city_coords = {
    "Seattle": (47.6062, -122.3321), "Pullman": (46.7313, -117.1796),
    "Bellingham": (48.7519, -122.4787), "Cheney": (47.4874, -117.5758),
    "Ellensburg": (46.9965, -120.5478), "Olympia": (47.0379, -122.9007)
}

df_step2 = df_step1.copy()
df_step2['Lat'] = df_step2['City'].map(lambda x: city_coords.get(x)[0])
df_step2['Lon'] = df_step2['City'].map(lambda x: city_coords.get(x)[1])

# Styling the table
styled_step2 = df_step2.style.set_properties(**{
    'text-align': 'center',
    'padding': '10px'
}).set_table_styles([dict(selector='th', props=[('text-align', 'center'), ('background-color', '#e6f2ff')])])

print("--- Step 2 Output: Geocoded Data ---")
display(styled_step2)

### Step 3: Weather Data Retrieval & Processing
In this step, we integrate our university dataset with the **Open-Meteo Historical API**. 

**Logic and Threshold:**
We defined "Severe Winter Weather" as any day where the **Minimum Temperature falls below -2.0°C**. 
- **Justification:** In the Pacific Northwest, temperatures at or below freezing (-2.0°C) lead to ice formation on roads and sidewalks. For universities, this typically triggers safety alerts, delayed openings, or a transition to remote learning.

**Process:**
The pipeline sends a request for each university's coordinates, retrieves the daily temperature data for January 2026, and calculates the total count of days that meet our severe weather criteria.


In [None]:
import requests

def get_severe_days_count(lat, lon):
    url = "https://archive-api.open-meteo.com/v1/archive"
    params = {
        "latitude": lat, "longitude": lon,
        "start_date": "2026-01-01", "end_date": "2026-01-31",
        "daily": "temperature_2m_min", "timezone": "UTC"
    }
    try:
        response = requests.get(url, params=params)
        temps = response.json()['daily']['temperature_2m_min']
        return sum(1 for t in temps if t is not None and t < -2.0)
    except: return 0

df_step3 = df_step2.copy()
df_step3["Severe_Days_Count"] = df_step3.apply(lambda x: get_severe_days_count(x["Lat"], x["Lon"]), axis=1)

# Styling the table
styled_step3 = df_step3[['University', 'City', 'Severe_Days_Count']].style.set_properties(**{
    'text-align': 'center',
    'padding': '10px'
}).set_table_styles([dict(selector='th', props=[('text-align', 'center'), ('background-color', '#fff0e6')])])

print("--- Step 3 Output: Weather Impact Count ---")
display(styled_step3)

### Step 4: Final Data Modeling & Impact Analysis
The final step of the pipeline transforms raw weather counts into a meaningful **Business Metric**: the **Total Student-Days Impacted**.

**Formula:** `Total Student-Days Impacted = Enrollment × Severe Weather Day Count`

**Deliverables in this table:**
1. **Aggregated Impact:** A single numerical value representing the cumulative exposure of the student body to severe weather.
2. **Data Ranking:** The table is sorted in descending order to highlight the most impacted institutions at the top.
3. **Professional Formatting:** Large numbers are formatted with commas, and rows are styled for maximum readability by decision-makers.

In [None]:
df_final = df_step3.copy()
df_final["Student_Days_Impacted"] = df_final["Severe_Days_Count"] * df_final["Enrollment"]
df_final = df_final[["University", "State", "Enrollment", "Severe_Days_Count", "Student_Days_Impacted"]]
df_final = df_final.sort_values(by="Student_Days_Impacted", ascending=False).reset_index(drop=True)

# Advanced Styling for the Final Deliverable
styled_final = df_final.style.format({
    "Enrollment": "{:,}",
    "Student_Days_Impacted": "{:,}"
}).set_properties(**{
    'text-align': 'center',
    'padding': '15px',
    'font-size': '14px',
    'border': '1px solid #ddd'
}).set_table_styles([
    dict(selector='th', props=[('text-align', 'center'), ('background-color', '#4CAF50'), ('color', 'white'), ('font-size', '16px')]),
    dict(selector='tr:nth-child(even)', props=[('background-color', '#f9f9f9')])
])

print("--- Step 4: FINAL CURATED TABLE ---")
display(styled_final)

## Summary of Findings

### Goal
To measure the impact of severe winter weather (below -2.0°C) on Washington State universities for January 2026.

### Process
1. **Data:** Collected enrollment and location data.
2. **Weather:** Counted freezing days using the Open-Meteo API.
3. **Formula:** `Impact = Enrollment × Severe Weather Days`.

### Results
* **Geography Matters:** Schools in Eastern Washington (like WSU) face more cold days.
* **Population Matters:** Large schools like UW show high impact even with fewer cold days.
* **Conclusion:** This model helps officials plan for campus closures and emergency resources.

In [None]:
import matplotlib.pyplot as plt

# 1. Prepare the figure and style
# Set the plot size to be clear and readable
plt.figure(figsize=(12, 8))

# 2. Sort the data
# We sort by 'Student_Days_Impacted' so the largest impact appears at the top
df_plot = df_final.sort_values(by="Student_Days_Impacted", ascending=True)

# 3. Create the horizontal bar chart
# 'barh' creates horizontal bars; we use a color map (Oranges) for visual appeal
colors = plt.cm.Oranges(df_plot['Student_Days_Impacted'] / df_plot['Student_Days_Impacted'].max())
bars = plt.barh(df_plot['University'], df_plot['Student_Days_Impacted'], 
                color=colors, edgecolor='#d35400')

# 4. Add Titles and Labels
# These explain what the chart is showing to the reader
plt.title('Total Student-Days Impacted by Severe Winter Weather\n(January 2026 Analysis)', 
          fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Impact Metric (Enrollment × Severe Days)', fontsize=12)
plt.ylabel('University', fontsize=12)

# 5. Add Data Labels
# This loop puts the actual numbers at the end of each bar for clarity
for i, v in enumerate(df_plot['Student_Days_Impacted']):
    plt.text(v + (df_plot['Student_Days_Impacted'].max() * 0.01), i, 
             f'{int(v):,}', va='center', fontweight='bold')

# 6. Final Formatting
# Remove unnecessary borders and add a subtle grid
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.grid(axis='x', linestyle='--', alpha=0.3)

# 7. Show and Save as a high-quality image
plt.tight_layout()
plt.show()
plt.savefig('weather_impact_analysis.png', dpi=300)