# First Sprint (10/27 deadline):
## Issues:
1. Find Data and Note Source

2. Read in Data

3. Convert Data to a Usable Format

4. Build Ways to Handle Errors

5. Get General Info from Data

## Issue #1: Find Data and Note Source
Data set #1: 2024 Louisville Daily Max and Min Temperatures

Data: 4147807.csv

Source: https://www.ncdc.noaa.gov/cdo-web/search

Data set #2: 2024 Louisville Daily Ozone Readings

Data: ad_viz_plotval_data.csv

Source: https://www.epa.gov/outdoor-air-quality-data/download-daily-data

## Issue #2: Read in Data

In [273]:
import pandas as pd 

Reading in temperature data set:

In [274]:
temp_df = pd.read_csv("4147807.csv")
temp_df.head()

Unnamed: 0,STATION,NAME,DATE,PRCP,SNOW,SNWD,TMAX,TMIN,TOBS
0,USC00154958,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-01,0.0,0.0,0.0,37,31,35
1,USC00154958,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-02,0.0,0.0,0.0,40,29,29
2,USC00154958,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-03,0.0,0.0,0.0,42,23,35
3,USC00154958,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-04,0.0,0.0,0.0,41,27,27
4,USC00154958,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-05,0.0,0.0,0.0,44,22,39


Reading in ozone data set:

In [275]:
ozone_df = pd.read_csv("ad_viz_plotval_data.csv")
ozone_df.head()

Unnamed: 0,Date,Source,Site ID,POC,Daily Max 8-hour Ozone Concentration,Units,Daily AQI Value,Local Site Name,Daily Obs Count,Percent Complete,...,AQS Parameter Description,Method Code,CBSA Code,CBSA Name,State FIPS Code,State,County FIPS Code,County,Site Latitude,Site Longitude
0,01/01/2024,AQS,180190008,1,0.018,ppm,17,Charlestown State Park- 1051.8 meters East of ...,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,19,Clark,38.393822,-85.664118
1,01/02/2024,AQS,180190008,1,0.022,ppm,20,Charlestown State Park- 1051.8 meters East of ...,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,19,Clark,38.393822,-85.664118
2,01/03/2024,AQS,180190008,1,0.024,ppm,22,Charlestown State Park- 1051.8 meters East of ...,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,19,Clark,38.393822,-85.664118
3,01/04/2024,AQS,180190008,1,0.025,ppm,23,Charlestown State Park- 1051.8 meters East of ...,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,19,Clark,38.393822,-85.664118
4,01/05/2024,AQS,180190008,1,0.024,ppm,22,Charlestown State Park- 1051.8 meters East of ...,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,19,Clark,38.393822,-85.664118


## Issue 3: Covert Data to a Usable Format

### Temperature data:

Checking for null values:

In [276]:
temp_df.isnull().sum()


STATION    0
NAME       0
DATE       0
PRCP       0
SNOW       0
SNWD       0
TMAX       0
TMIN       0
TOBS       0
dtype: int64

In [277]:
temp_df.isnull().values.any()

np.False_

Converting date from object to datetime:

In [278]:
temp_df['DATE'] = pd.to_datetime(temp_df['DATE'])


In [279]:
temp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 9 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   STATION  366 non-null    object        
 1   NAME     366 non-null    object        
 2   DATE     366 non-null    datetime64[ns]
 3   PRCP     366 non-null    float64       
 4   SNOW     366 non-null    float64       
 5   SNWD     366 non-null    float64       
 6   TMAX     366 non-null    int64         
 7   TMIN     366 non-null    int64         
 8   TOBS     366 non-null    int64         
dtypes: datetime64[ns](1), float64(3), int64(3), object(2)
memory usage: 25.9+ KB


Renaming columns, and removing unneeded columns:

In [280]:
temp_df.columns

Index(['STATION', 'NAME', 'DATE', 'PRCP', 'SNOW', 'SNWD', 'TMAX', 'TMIN',
       'TOBS'],
      dtype='object')

In [281]:
temp_df = temp_df.rename(columns = {'TMAX': 'Max_Temp', 'TMIN': 'Min_Temp', 'NAME': 'Station_Name', 'DATE': 'Date'})


In [282]:
temp_df.drop(['STATION', 'PRCP', 'SNOW', 'SNWD', 'TOBS'], axis=1, inplace=True)

In [283]:
temp_df.head()

Unnamed: 0,Station_Name,Date,Max_Temp,Min_Temp
0,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-01,37,31
1,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-02,40,29
2,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-03,42,23
3,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-04,41,27
4,"LOUISVILLE WEATHER FORECAST OFFICE, KY US",2024-01-05,44,22


### Ozone data:

Checking for null values:

In [284]:
ozone_df.isnull().sum()

Date                                      0
Source                                    0
Site ID                                   0
POC                                       0
Daily Max 8-hour Ozone Concentration      0
Units                                     0
Daily AQI Value                           0
Local Site Name                         341
Daily Obs Count                           0
Percent Complete                          0
AQS Parameter Code                        0
AQS Parameter Description                 0
Method Code                               0
CBSA Code                                 0
CBSA Name                                 0
State FIPS Code                           0
State                                     0
County FIPS Code                          0
County                                    0
Site Latitude                             0
Site Longitude                            0
dtype: int64

In [285]:
ozone_df[ozone_df['Local Site Name'].isnull()]

Unnamed: 0,Date,Source,Site ID,POC,Daily Max 8-hour Ozone Concentration,Units,Daily AQI Value,Local Site Name,Daily Obs Count,Percent Complete,...,AQS Parameter Description,Method Code,CBSA Code,CBSA Name,State FIPS Code,State,County FIPS Code,County,Site Latitude,Site Longitude
344,01/01/2024,AQS,180430008,1,0.017,ppm,16,,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
345,01/03/2024,AQS,180430008,1,0.024,ppm,22,,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
346,01/04/2024,AQS,180430008,1,0.022,ppm,20,,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
347,01/05/2024,AQS,180430008,1,0.016,ppm,15,,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
348,01/06/2024,AQS,180430008,1,0.019,ppm,18,,17,100.0,...,Ozone,47,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
680,12/26/2024,AQS,180430008,1,0.022,ppm,20,,17,100.0,...,Ozone,87,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
681,12/27/2024,AQS,180430008,1,0.025,ppm,23,,17,100.0,...,Ozone,87,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
682,12/28/2024,AQS,180430008,1,0.018,ppm,17,,17,100.0,...,Ozone,87,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322
683,12/29/2024,AQS,180430008,1,0.041,ppm,38,,17,100.0,...,Ozone,87,31140,"Louisville/Jefferson County, KY-IN",18,Indiana,43,Floyd,38.317813,-85.833322


In [286]:
ozone_df['Local Site Name'] = ozone_df['Local Site Name'].fillna('Unknown')

In [287]:
ozone_df.isnull().sum()

Date                                    0
Source                                  0
Site ID                                 0
POC                                     0
Daily Max 8-hour Ozone Concentration    0
Units                                   0
Daily AQI Value                         0
Local Site Name                         0
Daily Obs Count                         0
Percent Complete                        0
AQS Parameter Code                      0
AQS Parameter Description               0
Method Code                             0
CBSA Code                               0
CBSA Name                               0
State FIPS Code                         0
State                                   0
County FIPS Code                        0
County                                  0
Site Latitude                           0
Site Longitude                          0
dtype: int64

Converting 'Date' column from object to datetime:

In [288]:
ozone_df['Date'] = pd.to_datetime(ozone_df['Date'])

In [289]:
ozone_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2243 entries, 0 to 2242
Data columns (total 21 columns):
 #   Column                                Non-Null Count  Dtype         
---  ------                                --------------  -----         
 0   Date                                  2243 non-null   datetime64[ns]
 1   Source                                2243 non-null   object        
 2   Site ID                               2243 non-null   int64         
 3   POC                                   2243 non-null   int64         
 4   Daily Max 8-hour Ozone Concentration  2243 non-null   float64       
 5   Units                                 2243 non-null   object        
 6   Daily AQI Value                       2243 non-null   int64         
 7   Local Site Name                       2243 non-null   object        
 8   Daily Obs Count                       2243 non-null   int64         
 9   Percent Complete                      2243 non-null   float64       
 10  

Removing columns and renaming columns:

In [290]:
ozone_df.columns = ozone_df.columns.str.replace(' ', '_')

In [291]:
ozone_df.rename(columns = {'Daily_Max_8-hour_Ozone_Concentration': 'Max_Concentration', 'Daily_AQI_Value': 'AQI_Value', 'AQS_Parameter_Description': 'Substance_Measured'}, inplace=True)

In [292]:
ozone_df.drop(['Source', 'POC', 'Method_Code', 'CBSA_Code', 'CBSA_Name', 'State_FIPS_Code', 'State', 'County_FIPS_Code', 'AQS_Parameter_Code', 'Percent_Complete', 'Site_ID', 'Site_Latitude', 'Site_Longitude'], axis=1, inplace=True)

In [293]:
ozone_df.head()

Unnamed: 0,Date,Max_Concentration,Units,AQI_Value,Local_Site_Name,Daily_Obs_Count,Substance_Measured,County
0,2024-01-01,0.018,ppm,17,Charlestown State Park- 1051.8 meters East of ...,17,Ozone,Clark
1,2024-01-02,0.022,ppm,20,Charlestown State Park- 1051.8 meters East of ...,17,Ozone,Clark
2,2024-01-03,0.024,ppm,22,Charlestown State Park- 1051.8 meters East of ...,17,Ozone,Clark
3,2024-01-04,0.025,ppm,23,Charlestown State Park- 1051.8 meters East of ...,17,Ozone,Clark
4,2024-01-05,0.024,ppm,22,Charlestown State Park- 1051.8 meters East of ...,17,Ozone,Clark


Not all sites are in Lousiville Metro- removing counties that are not Jefferson:

In [294]:
ozone_df['County'].value_counts()

County
Jefferson    1077
Clark         344
Floyd         341
Bullitt       241
Oldham        240
Name: count, dtype: int64

In [295]:
ozone_df = ozone_df[ozone_df['County'] == 'Jefferson']

In [296]:
ozone_df

Unnamed: 0,Date,Max_Concentration,Units,AQI_Value,Local_Site_Name,Daily_Obs_Count,Substance_Measured,County
926,2024-03-01,0.023,ppm,21,Watson Lane,17,Ozone,Jefferson
927,2024-03-02,0.024,ppm,22,Watson Lane,17,Ozone,Jefferson
928,2024-03-03,0.034,ppm,31,Watson Lane,17,Ozone,Jefferson
929,2024-03-04,0.032,ppm,30,Watson Lane,17,Ozone,Jefferson
930,2024-03-05,0.029,ppm,27,Watson Lane,17,Ozone,Jefferson
...,...,...,...,...,...,...,...,...
1998,2024-10-26,0.039,ppm,36,Algonquin Parkway,17,Ozone,Jefferson
1999,2024-10-27,0.040,ppm,37,Algonquin Parkway,17,Ozone,Jefferson
2000,2024-10-28,0.036,ppm,33,Algonquin Parkway,17,Ozone,Jefferson
2001,2024-10-29,0.046,ppm,43,Algonquin Parkway,17,Ozone,Jefferson


Checking datatypes:

In [297]:
ozone_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1077 entries, 926 to 2002
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Date                1077 non-null   datetime64[ns]
 1   Max_Concentration   1077 non-null   float64       
 2   Units               1077 non-null   object        
 3   AQI_Value           1077 non-null   int64         
 4   Local_Site_Name     1077 non-null   object        
 5   Daily_Obs_Count     1077 non-null   int64         
 6   Substance_Measured  1077 non-null   object        
 7   County              1077 non-null   object        
dtypes: datetime64[ns](1), float64(1), int64(2), object(4)
memory usage: 75.7+ KB


In [298]:
ozone_df['County'].value_counts()

County
Jefferson    1077
Name: count, dtype: int64

Note about ozone levels- AQI = Air Quality Index
0-50 good, 51-100 moderate, 101-150 unhealthy for sensitive groups, 151-200 unhealthy

alerts issued when AQI is expected to reach 101 or more