# Project Outline: Weather Data Cleaning
This is the importing, cleaning, and merging of weather data for the project "To Everything There is a Season : Using Weather Data and Demographic Information in the Predictive Modeling of Crimes in Dallas, Texas" by Ashley Steele.

[1. Importing Libraries and Cleaning Function](#1.-Importing-Libraries-and-Cleaning-Function)


[2. Exploratory Data Analysis (EDA) and Cleaning by Year](#2.-Exploratory-Data-Analysis-(EDA)-and-Cleaning-by-Year)
- [2.1: 2015 Weather Data](#2.1:-2015-Weather-Data)
- [2.2: 2016 Weather Data](#2.2:-2016-Weather-Data)
- [2.3: 2017 Weather Data](#2.3:-2017-Weather-Data)
- [2.4: 2018 Weather Data](#2.4:-2018-Weather-Data)


[3. Combining Weather Data](#3.-Combining-Weather-Data)


[4. Final Export of Weather Data to CSV](#4.-Final-Export-of-Weather-Data-to-CSV)

## 1. Importing Libraries and Cleaning Function
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
%matplotlib inline

In [10]:
# Importing our important datetime stuff!
from datetime import datetime
from dateutil.parser import parse

In [11]:
# Setting the number of decimal places this notebook uses
pd.set_option('precision', 2)

In [12]:
# Creating a resuable function to help us clean up all column names 
def clean_columns(name):
    old_column_names= list(name.columns)
    new_column_names= [str(x).lower().replace(' ', '_').replace(';', '_').replace('-', '').replace('__', '_') for x in list(name.columns)]
    return name.rename(columns= dict(zip(old_column_names, new_column_names)), inplace = True)

[Back to Outline](#Working-Outline)

## 2. Exploratory Data Analysis (EDA) and Cleaning by Year
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

For this project we needed to collect daily weather information for the years 2015 to 2018 in order to see the influence, if any, of weather on our crime calls. To do this, we used the daily weather available on Weather Unnderground by using [this](https://github.com/SteeleAlloy/final_capstone/blob/master/Weather_scraping_script.ipynb) script. Let's see what cool information we can find with this data!

**To help us keep our data more organized before we merge here are the naming conventions for working dfs by year:**
- df1= 2015
- df2= 2016
- df3= 2017
- df4= 2018

### 2.1: 2015 Weather Data
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [13]:
# Importing the csv so we can use it
df1= pd.read_csv('2015_weather.csv')

In [14]:
# What does this data look like right out of the scraper?
df1.head()

Unnamed: 0.1,Unnamed: 0,Time,TemperatureF,DewpointF,PressureIn,WindDirection,WindDirectionDegrees,WindSpeedMPH,WindSpeedGustMPH,Humidity,HourlyPrecipIn,Conditions,Clouds,dailyrainin,SoftwareType,DateUTC,station
0,0,2015-01-01 00:01:00,34.0,25.6,30.32,SSE,157,0.0,3.2,71,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:01:00,KTXIRVIN10
1,1,2015-01-01 00:06:00,34.0,25.6,30.32,NW,315,3.2,4.2,71,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:06:00,KTXIRVIN10
2,2,2015-01-01 00:11:00,34.0,25.6,30.32,West,270,3.7,4.2,71,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:11:00,KTXIRVIN10
3,3,2015-01-01 00:16:00,34.0,25.9,30.33,North,360,1.1,4.2,72,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:16:00,KTXIRVIN10
4,4,2015-01-01 00:21:00,34.0,25.9,30.33,North,360,2.7,4.2,72,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:21:00,KTXIRVIN10


In [15]:
# Let's clean up those columns!
clean_columns(df1)

In [16]:
# Sanity check: do our columns look beautiful?
df1.head()

Unnamed: 0,unnamed:_0,time,temperaturef,dewpointf,pressurein,winddirection,winddirectiondegrees,windspeedmph,windspeedgustmph,humidity,hourlyprecipin,conditions,clouds,dailyrainin,softwaretype,dateutc,station
0,0,2015-01-01 00:01:00,34.0,25.6,30.32,SSE,157,0.0,3.2,71,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:01:00,KTXIRVIN10
1,1,2015-01-01 00:06:00,34.0,25.6,30.32,NW,315,3.2,4.2,71,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:06:00,KTXIRVIN10
2,2,2015-01-01 00:11:00,34.0,25.6,30.32,West,270,3.7,4.2,71,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:11:00,KTXIRVIN10
3,3,2015-01-01 00:16:00,34.0,25.9,30.33,North,360,1.1,4.2,72,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:16:00,KTXIRVIN10
4,4,2015-01-01 00:21:00,34.0,25.9,30.33,North,360,2.7,4.2,72,0.0,,,0.0,VISReader 3.5.9.91,2015-01-01 06:21:00,KTXIRVIN10


In [17]:
# What columns does this dataset have?
df1.columns

Index(['unnamed:_0', 'time', 'temperaturef', 'dewpointf', 'pressurein',
       'winddirection', 'winddirectiondegrees', 'windspeedmph',
       'windspeedgustmph', 'humidity', 'hourlyprecipin', 'conditions',
       'clouds', 'dailyrainin', 'softwaretype', 'dateutc', 'station'],
      dtype='object')

In [18]:
# Let's whittle this df down to the columns we want (time, temp,humidty, hourly percipitation )
df1= df1[['time', 'temperaturef','humidity', 'hourlyprecipin']]

In [19]:
# Creating a variable of columsn to keep for later use
keep_cols= ['time', 'temperaturef','humidity', 'hourlyprecipin']

In [20]:
# Also doing same for renaming
cols_renammed= {'temperaturef': 'temp_in_F', 'hourlyprecipin':'percip_inches'}

In [21]:
# Sanity check: does everything look good?
df1.head()

Unnamed: 0,time,temperaturef,humidity,hourlyprecipin
0,2015-01-01 00:01:00,34.0,71,0.0
1,2015-01-01 00:06:00,34.0,71,0.0
2,2015-01-01 00:11:00,34.0,71,0.0
3,2015-01-01 00:16:00,34.0,72,0.0
4,2015-01-01 00:21:00,34.0,72,0.0


In [22]:
# Renaming columns to make easier to understand
df1.rename(columns= {'temperaturef': 'temp_in_F', 'hourlyprecipin':'percip_inches'}, inplace = True)

In [23]:
# Double checking for any nulls and funny stuff before going on
df1.isnull().sum()

time             0
temp_in_F        0
humidity         0
percip_inches    0
dtype: int64

In [24]:
# What do these variables look like, dtype wise?
df1.dtypes

time              object
temp_in_F        float64
humidity           int64
percip_inches    float64
dtype: object

In [25]:
# Converting time to a date time so it can be split
df1['time']= pd.to_datetime(df1['time'])

In [26]:
# Sanity check: Did the conversion actually work?
df1['time'].describe

<bound method NDFrame.describe of 0       2015-01-01 00:01:00
1       2015-01-01 00:06:00
2       2015-01-01 00:11:00
3       2015-01-01 00:16:00
4       2015-01-01 00:21:00
5       2015-01-01 00:26:00
6       2015-01-01 00:31:00
7       2015-01-01 00:36:00
8       2015-01-01 00:43:00
9       2015-01-01 00:48:00
10      2015-01-01 00:53:00
11      2015-01-01 00:58:00
12      2015-01-01 01:03:00
13      2015-01-01 01:08:00
14      2015-01-01 01:13:00
15      2015-01-01 01:18:00
16      2015-01-01 01:23:00
17      2015-01-01 01:28:00
18      2015-01-01 01:33:00
19      2015-01-01 01:39:00
20      2015-01-01 01:44:00
21      2015-01-01 01:49:00
22      2015-01-01 01:54:00
23      2015-01-01 01:59:00
24      2015-01-01 02:04:00
25      2015-01-01 02:09:00
26      2015-01-01 02:14:00
27      2015-01-01 02:19:00
28      2015-01-01 02:24:00
29      2015-01-01 02:29:00
                ...        
96562   2015-12-31 21:26:00
96563   2015-12-31 21:31:00
96564   2015-12-31 21:37:00
96565   2015-1

In [27]:
# Creating a new colum for month, day, and year
df1['year']= df1['time'].dt.year
df1['month'] = df1['time'].dt.month
df1['day'] = df1['time'].dt.day

In [28]:
# Visual check of the first 300 rows of data
df1.head(300)

Unnamed: 0,time,temp_in_F,humidity,percip_inches,year,month,day
0,2015-01-01 00:01:00,34.0,71,0.00,2015,1,1
1,2015-01-01 00:06:00,34.0,71,0.00,2015,1,1
2,2015-01-01 00:11:00,34.0,71,0.00,2015,1,1
3,2015-01-01 00:16:00,34.0,72,0.00,2015,1,1
4,2015-01-01 00:21:00,34.0,72,0.00,2015,1,1
5,2015-01-01 00:26:00,34.0,71,0.00,2015,1,1
6,2015-01-01 00:31:00,34.0,71,0.00,2015,1,1
7,2015-01-01 00:36:00,34.0,70,0.00,2015,1,1
8,2015-01-01 00:43:00,34.0,70,0.00,2015,1,1
9,2015-01-01 00:48:00,34.0,73,0.00,2015,1,1


In [29]:
# Creating a date feature to make grouping easier
df1['date']= df1['time'].dt.date

In [30]:
# Grouping by date and average
df1_final = df1.groupby('date').mean()

In [31]:
# Sanity check: did it work?
df1_final.head()

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-01-01,34.35,89.47,0.03,2015.0,1.0,1.0
2015-01-02,38.73,96.56,0.02,2015.0,1.0,2.0
2015-01-03,43.9,85.07,0.02,2015.0,1.0,3.0
2015-01-04,34.73,67.04,0.0,2015.0,1.0,4.0
2015-01-05,35.69,60.75,0.0,2015.0,1.0,5.0


In [32]:
# Last minute check for weird temp outliers!
df1_final.loc[df1_final['temp_in_F']< 0]

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1


Excellent! Moving on to the next time frame!

### 2.2: 2016 Weather Data
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [33]:
# Creating df2 and importing data
df2= pd.read_csv('2016_weather.csv')

In [34]:
# What does our data look like at import?
df2.head()

Unnamed: 0.1,Unnamed: 0,Time,TemperatureF,DewpointF,PressureIn,WindDirection,WindDirectionDegrees,WindSpeedMPH,WindSpeedGustMPH,Humidity,HourlyPrecipIn,Conditions,Clouds,dailyrainin,SoftwareType,DateUTC,station
0,0,2016-01-01 00:03:00,43.0,35.3,30.47,NNE,22,6.3,8.3,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:03:00,KTXIRVIN10
1,1,2016-01-01 00:08:00,43.0,35.3,30.47,NNE,22,3.2,8.3,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:08:00,KTXIRVIN10
2,2,2016-01-01 00:14:00,42.8,35.1,30.47,NNE,22,4.2,7.3,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:14:00,KTXIRVIN10
3,3,2016-01-01 00:19:00,42.8,35.1,30.47,NNW,337,6.3,7.8,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:19:00,KTXIRVIN10
4,4,2016-01-01 00:24:00,42.8,35.1,30.47,North,360,3.2,7.8,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:24:00,KTXIRVIN10


In [35]:
# Cleaning up column names/standardizing
clean_columns(df2)

In [36]:
# Sanity check: did the cleaning work?
df2.head()

Unnamed: 0,unnamed:_0,time,temperaturef,dewpointf,pressurein,winddirection,winddirectiondegrees,windspeedmph,windspeedgustmph,humidity,hourlyprecipin,conditions,clouds,dailyrainin,softwaretype,dateutc,station
0,0,2016-01-01 00:03:00,43.0,35.3,30.47,NNE,22,6.3,8.3,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:03:00,KTXIRVIN10
1,1,2016-01-01 00:08:00,43.0,35.3,30.47,NNE,22,3.2,8.3,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:08:00,KTXIRVIN10
2,2,2016-01-01 00:14:00,42.8,35.1,30.47,NNE,22,4.2,7.3,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:14:00,KTXIRVIN10
3,3,2016-01-01 00:19:00,42.8,35.1,30.47,NNW,337,6.3,7.8,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:19:00,KTXIRVIN10
4,4,2016-01-01 00:24:00,42.8,35.1,30.47,North,360,3.2,7.8,74,0.0,,,0.0,VISReader 3.6.2.3,2016-01-01 06:24:00,KTXIRVIN10


In [37]:
# Keeping columns we need
df2= df2[keep_cols]

In [38]:
# Did our columns drop?
df2.head()

Unnamed: 0,time,temperaturef,humidity,hourlyprecipin
0,2016-01-01 00:03:00,43.0,74,0.0
1,2016-01-01 00:08:00,43.0,74,0.0
2,2016-01-01 00:14:00,42.8,74,0.0
3,2016-01-01 00:19:00,42.8,74,0.0
4,2016-01-01 00:24:00,42.8,74,0.0


In [39]:
# Doing standard renaming
df2.rename(columns= cols_renammed, inplace = True)

In [40]:
# Converting time variable to datetime
df2['time']= pd.to_datetime(df2['time'])

In [41]:
# Making new year, month, and day columns
df2['year']= df2['time'].dt.year
df2['month']= df2['time'].dt.month
df2['day']= df2['time'].dt.day

In [42]:
# Checking out our data so far
df2.head()

Unnamed: 0,time,temp_in_F,humidity,percip_inches,year,month,day
0,2016-01-01 00:03:00,43.0,74,0.0,2016,1,1
1,2016-01-01 00:08:00,43.0,74,0.0,2016,1,1
2,2016-01-01 00:14:00,42.8,74,0.0,2016,1,1
3,2016-01-01 00:19:00,42.8,74,0.0,2016,1,1
4,2016-01-01 00:24:00,42.8,74,0.0,2016,1,1


In [43]:
# Creating a date feature to make grouping by easier!
df2['date'] = df2['time'].dt.date

In [44]:
# Grouping by date and average
df2_final = df2.groupby('date').mean()

In [45]:
# What does the result of this grouping look like?
df2_final.head()

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-01-01,42.58,67.71,0.0,2016.0,1.0,1.0
2016-01-02,45.1,56.95,0.0,2016.0,1.0,2.0
2016-01-03,45.84,64.74,0.0,2016.0,1.0,3.0
2016-01-04,42.85,69.03,0.0,2016.0,1.0,4.0
2016-01-05,40.41,72.44,0.0,2016.0,1.0,5.0


In [46]:
# Last minute outlier check!
df2_final.loc[df2_final['temp_in_F']< 0]

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1


Excellent! Moving on to the next time frame!

### 2.3:  2017 Weather Data
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [47]:
# Creating df3 and importing file!
df3= pd.read_csv('2017_weather.csv')

In [48]:
# Cleaning column names
clean_columns(df3)

In [49]:
# Dropping columns we don't need
df3= df3[keep_cols]

In [50]:
# Renaming columns we kept
df3.rename(columns= cols_renammed, inplace = True)

In [51]:
# Checking to make sure that everything up to this point worked out ok
df3.head()

Unnamed: 0,time,temp_in_F,humidity,percip_inches
0,2017-01-01 00:01:00,48.9,60,0.0
1,2017-01-01 00:06:00,48.9,60,0.0
2,2017-01-01 00:11:00,48.9,60,0.0
3,2017-01-01 00:16:00,48.7,61,0.0
4,2017-01-01 00:21:00,48.4,61,0.0


In [52]:
# Converting time variable to a datetime type
df3['time']= pd.to_datetime(df3['time'])

In [53]:
# Using this new datetime to make a year, month, and day column
df3['year']= df3['time'].dt.year
df3['month']= df3['time'].dt.month
df3['day']= df3['time'].dt.day

In [54]:
# What does this look like now?
df3.head()

Unnamed: 0,time,temp_in_F,humidity,percip_inches,year,month,day
0,2017-01-01 00:01:00,48.9,60,0.0,2017,1,1
1,2017-01-01 00:06:00,48.9,60,0.0,2017,1,1
2,2017-01-01 00:11:00,48.9,60,0.0,2017,1,1
3,2017-01-01 00:16:00,48.7,61,0.0,2017,1,1
4,2017-01-01 00:21:00,48.4,61,0.0,2017,1,1


In [55]:
# Creating a date feature to make grouping easier!
df3['date'] = df3['time'].dt.date

In [56]:
# Gropuing by date to get avg.
df3_final = df3.groupby('date').mean()

In [57]:
# What does our final dataset look like now?
df3_final.head()

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-01,53.93,82.89,0.00258,2017.0,1.0,1.0
2017-01-02,62.35,75.07,0.0199,2017.0,1.0,2.0
2017-01-03,48.51,68.21,0.0,2017.0,1.0,3.0
2017-01-04,35.31,59.4,0.0,2017.0,1.0,4.0
2017-01-05,34.18,71.59,0.0,2017.0,1.0,5.0


In [58]:
# Last minute outlier check!
df3_final.loc[df3_final['temp_in_F']< 0]

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1


Excellent! Moving on to the next time frame!

### 2.4: 2018 Weather Data
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [59]:
# Creating df4 and reading in weather csv!
df4= pd.read_csv('2018_weather.csv')

In [60]:
# What do our data look like after import?
df4.head()

Unnamed: 0.1,Unnamed: 0,Clouds,Conditions,DateUTC,DewpointF,HourlyPrecipIn,Humidity,PressureIn,SoftwareType,SolarRadiationWatts/m^2,TemperatureF,Time,WindDirection,WindDirectionDegrees,WindSpeedGustMPH,WindSpeedMPH,dailyrainin,station
0,0,,,2018-01-01 06:04:00,7.2,0.0,52.0,30.68,VISReader 3.7.9.2,,22.1,2018-01-01 00:04:00,North,360.0,10.9,2.7,0.0,KTXIRVIN10
1,1,,,2018-01-01 06:09:00,7.2,0.0,52.0,30.68,VISReader 3.7.9.2,,22.1,2018-01-01 00:09:00,ESE,112.0,8.9,3.2,0.0,KTXIRVIN10
2,2,,,2018-01-01 06:10:00,7.6,0.0,53.0,30.68,VISReader 3.7.9.2,,22.1,2018-01-01 00:10:00,NNE,22.0,8.9,4.2,0.0,KTXIRVIN10
3,3,,,2018-01-01 06:15:00,7.4,0.0,53.0,30.69,VISReader 3.7.9.2,,21.9,2018-01-01 00:15:00,NW,315.0,6.8,3.2,0.0,KTXIRVIN10
4,4,,,2018-01-01 06:20:00,8.2,0.0,55.0,30.69,VISReader 3.7.9.2,,21.9,2018-01-01 00:20:00,North,360.0,7.8,5.3,0.0,KTXIRVIN10


In [61]:
# Cleaning up column names
clean_columns(df4)

Ah! It seems that the amount of available information increased in 2018, thus causing us to have different columns. Going ahead and fixing that now.

In [62]:
# What are the columns we collected for this year?
df4.columns

Index(['unnamed:_0', 'clouds', 'conditions', 'dateutc', 'dewpointf',
       'hourlyprecipin', 'humidity', 'pressurein', 'softwaretype',
       'solarradiationwatts/m^2', 'temperaturef', 'time', 'winddirection',
       'winddirectiondegrees', 'windspeedgustmph', 'windspeedmph',
       'dailyrainin', 'station'],
      dtype='object')

In [63]:
#Keeping only columns we need
df4= df4[['dateutc', 'temperaturef','hourlyprecipin', 'humidity']]

In [64]:
# Renaming kept columns
df4.rename(columns= {'dateutc':'time', 'temperaturef':'temp_in_F','hourlyprecipin':'percip_inches'}, inplace = True)

In [65]:
# Reordering our columns
df4 = df4[['time', 'temp_in_F','percip_inches', 'humidity']]

In [66]:
# Did it all work?
df4.head()

Unnamed: 0,time,temp_in_F,percip_inches,humidity
0,2018-01-01 06:04:00,22.1,0.0,52.0
1,2018-01-01 06:09:00,22.1,0.0,52.0
2,2018-01-01 06:10:00,22.1,0.0,53.0
3,2018-01-01 06:15:00,21.9,0.0,53.0
4,2018-01-01 06:20:00,21.9,0.0,55.0


In [67]:
# Converting time variable into datetime
df4['time']= pd.to_datetime(df4['time'])

In [68]:
# Creating year, month, and day columns using new time
df4['year'] = df4['time'].dt.year
df4['month'] = df4['time'].dt.month
df4['day'] = df4['time'].dt.day

In [69]:
# Double checking that all of the commands above worked!
df4.head()

Unnamed: 0,time,temp_in_F,percip_inches,humidity,year,month,day
0,2018-01-01 06:04:00,22.1,0.0,52.0,2018,1,1
1,2018-01-01 06:09:00,22.1,0.0,52.0,2018,1,1
2,2018-01-01 06:10:00,22.1,0.0,53.0,2018,1,1
3,2018-01-01 06:15:00,21.9,0.0,53.0,2018,1,1
4,2018-01-01 06:20:00,21.9,0.0,55.0,2018,1,1


In [70]:
# Creating a date feature to make grouping by easier
df4['date'] = df4['time'].dt.date

Within this time frame there were several 6 minute incriments of weather that reported back odd values, such as a temperature of -999. These are only one or two of the reports for the days in question (weather readings are taken every 6 minutes in our original dataset). Let's go ahead and drop these odd errors to help make our data more concise!

In [71]:
# Dropping the weird error temperatures
df4= df4.loc[df4['temp_in_F']>0]

In [72]:
# Grouping by day and average
df4_final = df4.groupby('date').mean()

In [73]:
# Checking out the final layout of our grouped data
df4_final.head()

Unnamed: 0_level_0,temp_in_F,percip_inches,humidity,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-01-01,22.41,0.0,55.36,2018,1,1
2018-01-02,24.72,0.0,52.45,2018,1,2
2018-01-03,29.8,0.0,53.61,2018,1,3
2018-01-04,35.89,0.0,61.28,2018,1,4
2018-01-05,42.94,0.0,54.08,2018,1,5


In [74]:
# Last minute outlier check!
df4_final.loc[df4_final['temp_in_F']< 0]

Unnamed: 0_level_0,temp_in_F,percip_inches,humidity,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1


We now finally have all of our time frames aggregated and organized! Let's go ahead and combine them to make one csv!

## 3. Combining Weather Data
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [75]:
# Let's add all of our final dfs together in on place!
weather_final = pd.concat([df1_final, df2_final, df3_final, df4_final], sort= False)

In [76]:
# Sanity check: How does it look?
weather_final

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-01-01,34.35,89.47,3.25e-02,2015.0,1.0,1.0
2015-01-02,38.73,96.56,2.14e-02,2015.0,1.0,2.0
2015-01-03,43.90,85.07,1.91e-02,2015.0,1.0,3.0
2015-01-04,34.73,67.04,0.00e+00,2015.0,1.0,4.0
2015-01-05,35.69,60.75,0.00e+00,2015.0,1.0,5.0
2015-01-06,44.74,66.53,0.00e+00,2015.0,1.0,6.0
2015-01-07,33.85,52.47,0.00e+00,2015.0,1.0,7.0
2015-01-08,28.98,48.42,0.00e+00,2015.0,1.0,8.0
2015-01-09,34.05,51.09,0.00e+00,2015.0,1.0,9.0
2015-01-10,34.25,57.01,1.27e-03,2015.0,1.0,10.0


In [77]:
# Let's double check that all of our temps in Fareneheit are reasonable and not weird errors
weather_final.loc[weather_final['temp_in_F']< 0]

Unnamed: 0_level_0,temp_in_F,humidity,percip_inches,year,month,day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1


Woo hoo!! All that we have left is to export this file so we can combine it with all of our other data!

## 4. Final Export of Weather Data to CSV
[Return to Outline](#Project-Outline:-Weather-Data-Cleaning)

In [78]:
# Exporting the final product!
weather_final.to_csv('weather_final.csv')

Want to know what I did with this data? Check out my project page [here](https://steelealloy.github.io/final_capstone/)