# Part3: Connecting to an API/Pulling in the Data and Cleaning/Formatting


*Milestone Objective:* This milestone involves performing data transformations and cleansing on weather data collected through the OpenWeatherMap API, specifically for U.S. states. The goal is to create a structured, clean dataset that facilitates further analysis by executing at least five key data transformation steps. 

### Transformation Steps and Code Outline

In [None]:
# Importing the requests library for handling HTTP requests
import requests
# Importing pandas for data manipulation and analysis
import pandas as pd

# OpenWeatherMap API Key
API_KEY = '520772527270235037f2ce9d1cf08c2e'

# List of U.S. states with their abbreviations
# This dictionary provides a mapping of state names to their standard abbreviations.
# It will be used to label data clearly and consistently across our dataset,
# and can assist in organizing or joining data from other sources if needed.
states = {
    "Alabama": "AL", "Alaska": "AK", "Arizona": "AZ", "Arkansas": "AR", "California": "CA",
    "Colorado": "CO", "Connecticut": "CT", "Delaware": "DE", "Florida": "FL", "Georgia": "GA",
    "Hawaii": "HI", "Idaho": "ID", "Illinois": "IL", "Indiana": "IN", "Iowa": "IA",
    "Kansas": "KS", "Kentucky": "KY", "Louisiana": "LA", "Maine": "ME", "Maryland": "MD",
    "Massachusetts": "MA", "Michigan": "MI", "Minnesota": "MN", "Mississippi": "MS",
    "Missouri": "MO", "Montana": "MT", "Nebraska": "NE", "Nevada": "NV", "New Hampshire": "NH",
    "New Jersey": "NJ", "New Mexico": "NM", "New York": "NY", "North Carolina": "NC",
    "North Dakota": "ND", "Ohio": "OH", "Oklahoma": "OK", "Oregon": "OR", "Pennsylvania": "PA",
    "Rhode Island": "RI", "South Carolina": "SC", "South Dakota": "SD", "Tennessee": "TN",
    "Texas": "TX", "Utah": "UT", "Vermont": "VT", "Virginia": "VA", "Washington": "WA",
    "West Virginia": "WV", "Wisconsin": "WI", "Wyoming": "WY"
}

# Step #1: Retrieve Geolocation Coordinates for Each State
# We need the latitude and longitude of each state to fetch location-specific weather
# and air quality data from the OpenWeatherMap API, as the API requires geolocation
# coordinates (lat/lon) to provide accurate data for each state.
state_coordinates = {}
for state, abbrev in states.items():
    geocode_url = f"http://api.openweathermap.org/geo/1.0/direct?q={state},US&limit=1&appid={API_KEY}"
    response = requests.get(geocode_url)
    data = response.json()
    
    # Check if data was returned and contains at least one result
    if data and isinstance(data, list) and len(data) > 0:
        state_coordinates[state] = {
            "lat": data[0].get("lat"),
            "lon": data[0].get("lon")
        }
    else:
        print(f"Coordinates not found for {state}")

# Step #2: Retrieve Weather and Air Quality Data for Each State
# Using the geolocation coordinates (latitude and longitude) obtained in Step #1,
# we can now make API calls to retrieve current weather and air quality data for each state.
# This step ensures that we gather specific weather details, such as temperature and humidity,
# and air quality information, like the Air Quality Index (AQI) for accurate analysis.
weather_data = []
for state, coords in state_coordinates.items():
    lat, lon = coords["lat"], coords["lon"]
    
    # Weather API
    weather_url = f"http://api.openweathermap.org/data/2.5/weather?lat={lat}&lon={lon}&appid={API_KEY}"
    weather_response = requests.get(weather_url)
    weather_info = weather_response.json()
    
    # Air Quality API
    air_quality_url = f"http://api.openweathermap.org/data/2.5/air_pollution?lat={lat}&lon={lon}&appid={API_KEY}"
    air_quality_response = requests.get(air_quality_url)
    air_quality_info = air_quality_response.json()

    # Collect data if both API calls are successful
    if "main" in weather_info and "list" in air_quality_info:
        weather_data.append({
            "State": state,
            "Temperature (K)": weather_info["main"]["temp"],
            "Humidity (%)": weather_info["main"]["humidity"],
            "Weather Description": weather_info["weather"][0]["description"],
            "Air Quality Index": air_quality_info["list"][0]["main"]["aqi"]
        })

# Step #3: Convert Data into DataFrame
df = pd.DataFrame(weather_data)

# Step #4: Data Transformation and Cleaning Steps

# Transformation #1: Convert Temperature from Kelvin to Fahrenheit
df['Temperature (F)'] = (df['Temperature (K)'] - 273.15) * 9/5 + 32
df.drop(columns=['Temperature (K)'], inplace=True)  # Drop the Kelvin column

# Transformation #2: Standardize Casing for Weather Descriptions
df['Weather Description'] = df['Weather Description'].str.title()

# Transformation #3: Replace Numerical Air Quality Index with Descriptive Labels
aqi_labels = {1: "Good", 2: "Fair", 3: "Moderate", 4: "Poor", 5: "Very Poor"}
df['Air Quality Index'] = df['Air Quality Index'].map(aqi_labels)

# Transformation #4: Handle Missing Values in Air Quality Index by Filling with "Unavailable"
# df['Air Quality Index'].fillna("Unavailable", inplace=True)
df['Air Quality Index'] = df['Air Quality Index'].fillna("Unavailable")

# Transformation #5: Ensure Unique State Names by Dropping Duplicates
df.drop_duplicates(subset=["State"], keep="first", inplace=True)

# Transformation #6: Temperature Grouping
# Categorize temperatures into groups (Cold, Cool, Warm, Hot) based on Fahrenheit values for easy analysis.
def categorize_temperature(temp_f):
    if temp_f <= 32:
        return "Cold"
    elif 32 < temp_f <= 60:
        return "Cool"
    elif 60 < temp_f <= 80:
        return "Warm"
    else:
        return "Hot"

df['Temperature Category'] = df['Temperature (F)'].apply(categorize_temperature)

# Final Output: Print the cleaned DataFrame
print(df)


Coordinates not found for North Carolina
Coordinates not found for North Dakota
Coordinates not found for South Carolina


Note: The API did not return data for North Carolina, North Dakota, and South Carolina, with a specific message indicating 'Coordinates not found' for these states. This may result from incomplete location data in the API's database. The missing coordinates will impact our later steps when joining data from multiple sources, as we will lack geographic identifiers for these states. Without accurate coordinates, we cannot reliably join or analyze data for these locations alongside other sources. Therefore, these states will be excluded from our analysis.

### Ethical Implications of Data Wrangling from the Weather API Data 

In the process of data wrangling for this project, several transformations were performed to standardize and clean the data, including temperature conversions, categorical grouping, handling missing values, and mapping numerical air quality indices to descriptive labels. Since the data involves public weather and air quality information, itâ€™s generally compliant with data usage regulations, provided we adhere to API terms of service and refrain from personal identification. One risk created by these transformations is the potential misrepresentation of states missing coordinates (e.g., North Carolina, North Dakota, and South Carolina), as excluding them could skew results or limit the geographic scope of the analysis. Assumptions were also made in temperature categorization and handling missing values, which may affect interpretation. Data was sourced from the OpenWeatherMap API, a reputable source for weather and geolocation data, ensuring credibility. The data was acquired in an ethical manner by using a public API with proper attribution. To mitigate any ethical concerns, particularly around the missing data, we explicitly note exclusions and the potential impact on our findings, and refrain from drawing conclusions about those excluded regions.