In [142]:
import sys
import pandas as pd

print(sys.version)
print("")
print(f"Pandas: version {pd.__version__}")

3.8.3 (default, Jul  2 2020, 16:21:59) 
[GCC 7.3.0]

Pandas: version 1.1.1


In [143]:
new_york = pd.read_csv("../data/weather/new_york_ny.csv")

print(new_york.shape)
new_york.head()

(17056, 25)


Unnamed: 0,date_time,maxtempC,mintempC,totalSnow_cm,sunHour,uvIndex,moon_illumination,moonrise,moonset,sunrise,...,WindGustKmph,cloudcover,humidity,precipMM,pressure,tempC,visibility,winddirDegree,windspeedKmph,location
0,2009-01-01 00:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,38,22,62,0.0,1017,-6,10,316,27,10007
1,2009-01-01 06:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,33,7,63,0.0,1023,-8,10,315,24,10007
2,2009-01-01 12:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,26,4,42,0.0,1025,-3,10,304,23,10007
3,2009-01-01 18:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,21,16,49,0.0,1025,-5,10,294,13,10007
4,2009-01-02 00:00:00,0,0,0.1,7.0,2,38,11:28 AM,11:54 PM,08:20 AM,...,14,46,49,0.0,1023,-6,10,260,8,10007


In [144]:
new_york.columns

Index(['date_time', 'maxtempC', 'mintempC', 'totalSnow_cm', 'sunHour',
       'uvIndex', 'moon_illumination', 'moonrise', 'moonset', 'sunrise',
       'sunset', 'DewPointC', 'FeelsLikeC', 'HeatIndexC', 'WindChillC',
       'WindGustKmph', 'cloudcover', 'humidity', 'precipMM', 'pressure',
       'tempC', 'visibility', 'winddirDegree', 'windspeedKmph', 'location'],
      dtype='object')

### Planning

#### Pre-modeling Steps

**Accounting for Seasonal Variations**


- Create dynamic datetime splits separating the four seasons and taking the year as an input,
- Use these datetime splits to subset New York City's data by year and season into smaller DataFrames,
- Identify max, min, avg, and med of the following seasonal weather metrics (both metric and imperial because **AMERICA**):
    - tempC
    - FeelsLikeC, see above
    - totalSnow_cm  **<--**  if None, final app will return a verbal message saying it does not Snow there
    - humidity
    
- Store the above data in a single DataFrame and create visualizations for the annual progression of each


**NOTE:** Data will begin on the first day of Spring 2009 and be cut off on the last day of summer 2020 so as to account for potentally eroneous data arising from partial seasons. Go back and make this adjustment to the *weather.py* file.


**Applying the above process to all of the data**


- Encapsulate the above process in a function or functions,
- Devise means of applying the above function(s) to all csv files in the root data/weather directory,
- Organize and store yearly averages in a by_year DataFrame,
- Create visualizations showing how the average seasonal weather of each compares with the average of all,
- Encapsulate the above visualization process into a function for easy use.


**Testing that this process will function in Fast API in notebook**


- Replicate Ryan Herr's example notebook material for testing Fast API locally to this notebook.
- Test that the above visualization-making functions work locally.

In [145]:
new_york.dtypes

date_time             object
maxtempC               int64
mintempC               int64
totalSnow_cm         float64
sunHour              float64
uvIndex                int64
moon_illumination      int64
moonrise              object
moonset               object
sunrise               object
sunset                object
DewPointC              int64
FeelsLikeC             int64
HeatIndexC             int64
WindChillC             int64
WindGustKmph           int64
cloudcover             int64
humidity               int64
precipMM             float64
pressure               int64
tempC                  int64
visibility             int64
winddirDegree          int64
windspeedKmph          int64
location               int64
dtype: object

In [146]:
new_york.date_time

0        2009-01-01 00:00:00
1        2009-01-01 06:00:00
2        2009-01-01 12:00:00
3        2009-01-01 18:00:00
4        2009-01-02 00:00:00
                ...         
17051    2020-09-02 18:00:00
17052    2020-09-03 00:00:00
17053    2020-09-03 06:00:00
17054    2020-09-03 12:00:00
17055    2020-09-03 18:00:00
Name: date_time, Length: 17056, dtype: object

In [147]:
# Checking date_time format

date = new_york.date_time[0]
print(type(date), date)

<class 'str'> 2009-01-01 00:00:00


In [148]:
# converting datetime from string to datetime object

from datetime import datetime

date = datetime.strptime(date, '%Y-%m-%d %H:%M:%S')

print(type(date), date)

<class 'datetime.datetime'> 2009-01-01 00:00:00


In [149]:
# Applying change to all dates

new_york.date_time = new_york.date_time.apply(lambda d: datetime.strptime(d, '%Y-%m-%d %H:%M:%S'))
new_york.date_time.head()

0   2009-01-01 00:00:00
1   2009-01-01 06:00:00
2   2009-01-01 12:00:00
3   2009-01-01 18:00:00
4   2009-01-02 00:00:00
Name: date_time, dtype: datetime64[ns]

### Meterological Seasons

**Spring:** March 1 - May 31

**Summer:** June 1 - August 31

**Fall:** September 1 - November 30

**Winter:** December 1 - April 30

In [150]:
splits = []

for i in range(2009, 2021):
    for j in range(1, 13):
        splits.append(datetime(i, j, 1))
    
print(splits)

[datetime.datetime(2009, 1, 1, 0, 0), datetime.datetime(2009, 2, 1, 0, 0), datetime.datetime(2009, 3, 1, 0, 0), datetime.datetime(2009, 4, 1, 0, 0), datetime.datetime(2009, 5, 1, 0, 0), datetime.datetime(2009, 6, 1, 0, 0), datetime.datetime(2009, 7, 1, 0, 0), datetime.datetime(2009, 8, 1, 0, 0), datetime.datetime(2009, 9, 1, 0, 0), datetime.datetime(2009, 10, 1, 0, 0), datetime.datetime(2009, 11, 1, 0, 0), datetime.datetime(2009, 12, 1, 0, 0), datetime.datetime(2010, 1, 1, 0, 0), datetime.datetime(2010, 2, 1, 0, 0), datetime.datetime(2010, 3, 1, 0, 0), datetime.datetime(2010, 4, 1, 0, 0), datetime.datetime(2010, 5, 1, 0, 0), datetime.datetime(2010, 6, 1, 0, 0), datetime.datetime(2010, 7, 1, 0, 0), datetime.datetime(2010, 8, 1, 0, 0), datetime.datetime(2010, 9, 1, 0, 0), datetime.datetime(2010, 10, 1, 0, 0), datetime.datetime(2010, 11, 1, 0, 0), datetime.datetime(2010, 12, 1, 0, 0), datetime.datetime(2011, 1, 1, 0, 0), datetime.datetime(2011, 2, 1, 0, 0), datetime.datetime(2011, 3, 1, 0

In [151]:
splits = splits[1:]
len(splits)

143

In [161]:
# Testing split:)

jan_2009 = new_york[new_york.date_time < splits[0]]

print(jan_2009.shape)
jan_2009

(124, 32)


Unnamed: 0,date_time,maxtempC,mintempC,totalSnow_cm,sunHour,uvIndex,moon_illumination,moonrise,moonset,sunrise,...,winddirDegree,windspeedKmph,location,maxtempF,mintempF,DewPointF,FeelsLikeF,HeatIndexF,WindChillF,tempF
0,2009-01-01 00:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,316,27,10007,32.0,32.0,10.4,6.8,21.2,6.8,21.2
1,2009-01-01 06:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,315,24,10007,32.0,32.0,6.8,3.2,17.6,3.2,17.6
2,2009-01-01 12:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,304,23,10007,32.0,32.0,6.8,15.8,26.6,15.8,26.6
3,2009-01-01 18:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,294,13,10007,32.0,32.0,8.6,14.0,23.0,14.0,23.0
4,2009-01-02 00:00:00,0,0,0.1,7.0,2,38,11:28 AM,11:54 PM,08:20 AM,...,260,8,10007,32.0,32.0,6.8,17.6,23.0,17.6,21.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119,2009-01-30 18:00:00,0,0,0.1,10.0,2,27,09:56 AM,10:51 PM,08:08 AM,...,236,9,10007,32.0,32.0,26.6,24.8,30.2,24.8,30.2
120,2009-01-31 00:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,288,18,10007,32.0,32.0,17.6,14.0,24.8,14.0,24.8
121,2009-01-31 06:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,291,18,10007,32.0,32.0,14.0,8.6,21.2,8.6,21.2
122,2009-01-31 12:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,291,24,10007,32.0,32.0,10.4,15.8,26.6,15.8,26.6


In [162]:
desc = jan_2009.tempC.describe()[1:] 
desc

mean    -2.435484
std      3.965793
min    -14.000000
25%     -5.000000
50%     -3.000000
75%      1.000000
max      7.000000
Name: tempC, dtype: float64

In [154]:
# to_fahr function

def to_fahr(temp: float, system="celsius") -> float:
    """
    Converts temperature in celsius or kelvin to fahrenheit.
    """
    if type(system) != str:
        raise Exception(TypeError (f'Invalid system type {type(system)}, expected {str}'))
    elif system == "celsius":
        return ((temp * 9) / 5) + 32
    elif system == "kelvin":
        return ((temp * 9) / 5) - 459.67
    else:
        raise Exception(ValueError (f'Invalid system parameter "{system}"'))

In [163]:
# Applying to_fahr function to jan_2009 temp. stats
# in order to insure that the function works properly:)

fahr = desc.apply(lambda temp: to_fahr(temp))
fahr

mean    27.616129
std     39.138427
min      6.800000
25%     23.000000
50%     26.600000
75%     33.800000
max     44.600000
Name: tempC, dtype: float64

In [166]:
# Applying to_fahr function to the complete dataset

new_york["maxtempF"] = new_york["maxtempC"].apply(lambda temp: to_fahr(temp))
new_york["mintempF"] = new_york["mintempC"].apply(lambda temp: to_fahr(temp))
new_york["DewPointF"] = new_york["DewPointC"].apply(lambda temp: to_fahr(temp))
new_york["FeelsLikeF"] = new_york["FeelsLikeC"].apply(lambda temp: to_fahr(temp))
new_york["HeatIndexF"] = new_york["HeatIndexC"].apply(lambda temp: to_fahr(temp))
new_york["WindChillF"] = new_york["WindChillC"].apply(lambda temp: to_fahr(temp))
new_york["tempF"] = new_york["tempC"].apply(lambda temp: to_fahr(temp))

print(new_york.shape)
new_york.head()

(17056, 32)


Unnamed: 0,date_time,maxtempC,mintempC,totalSnow_cm,sunHour,uvIndex,moon_illumination,moonrise,moonset,sunrise,...,winddirDegree,windspeedKmph,location,maxtempF,mintempF,DewPointF,FeelsLikeF,HeatIndexF,WindChillF,tempF
0,2009-01-01 00:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,316,27,10007,32.0,32.0,10.4,6.8,21.2,6.8,21.2
1,2009-01-01 06:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,315,24,10007,32.0,32.0,6.8,3.2,17.6,3.2,17.6
2,2009-01-01 12:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,304,23,10007,32.0,32.0,6.8,15.8,26.6,15.8,26.6
3,2009-01-01 18:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,294,13,10007,32.0,32.0,8.6,14.0,23.0,14.0,23.0
4,2009-01-02 00:00:00,0,0,0.1,7.0,2,38,11:28 AM,11:54 PM,08:20 AM,...,260,8,10007,32.0,32.0,6.8,17.6,23.0,17.6,21.2


In [167]:
new_york.columns

Index(['date_time', 'maxtempC', 'mintempC', 'totalSnow_cm', 'sunHour',
       'uvIndex', 'moon_illumination', 'moonrise', 'moonset', 'sunrise',
       'sunset', 'DewPointC', 'FeelsLikeC', 'HeatIndexC', 'WindChillC',
       'WindGustKmph', 'cloudcover', 'humidity', 'precipMM', 'pressure',
       'tempC', 'visibility', 'winddirDegree', 'windspeedKmph', 'location',
       'maxtempF', 'mintempF', 'DewPointF', 'FeelsLikeF', 'HeatIndexF',
       'WindChillF', 'tempF'],
      dtype='object')

In [168]:
# Re-subsetting jan_2009 to account for the Fahrenheit columns

jan_2009 = new_york[new_york.date_time < splits[0]]

print(jan_2009.shape)
jan_2009

(124, 32)


Unnamed: 0,date_time,maxtempC,mintempC,totalSnow_cm,sunHour,uvIndex,moon_illumination,moonrise,moonset,sunrise,...,winddirDegree,windspeedKmph,location,maxtempF,mintempF,DewPointF,FeelsLikeF,HeatIndexF,WindChillF,tempF
0,2009-01-01 00:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,316,27,10007,32.0,32.0,10.4,6.8,21.2,6.8,21.2
1,2009-01-01 06:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,315,24,10007,32.0,32.0,6.8,3.2,17.6,3.2,17.6
2,2009-01-01 12:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,304,23,10007,32.0,32.0,6.8,15.8,26.6,15.8,26.6
3,2009-01-01 18:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,294,13,10007,32.0,32.0,8.6,14.0,23.0,14.0,23.0
4,2009-01-02 00:00:00,0,0,0.1,7.0,2,38,11:28 AM,11:54 PM,08:20 AM,...,260,8,10007,32.0,32.0,6.8,17.6,23.0,17.6,21.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119,2009-01-30 18:00:00,0,0,0.1,10.0,2,27,09:56 AM,10:51 PM,08:08 AM,...,236,9,10007,32.0,32.0,26.6,24.8,30.2,24.8,30.2
120,2009-01-31 00:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,288,18,10007,32.0,32.0,17.6,14.0,24.8,14.0,24.8
121,2009-01-31 06:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,291,18,10007,32.0,32.0,14.0,8.6,21.2,8.6,21.2
122,2009-01-31 12:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,291,24,10007,32.0,32.0,10.4,15.8,26.6,15.8,26.6


In [177]:
# Subsetting remaining data

subsets = []

subsets.append(jan_2009)

# Loop through splits array
for j in range(len(splits) - 1):
    subset = new_york[new_york.date_time > splits[j]]
    subset = subset[subset.date_time < splits[j + 1]]
    subset = subset.reset_index(drop=True)
    subsets.append(subset)
    
subsets = subsets[:140]

In [178]:
print(len(subsets))
subsets[0]

140


Unnamed: 0,date_time,maxtempC,mintempC,totalSnow_cm,sunHour,uvIndex,moon_illumination,moonrise,moonset,sunrise,...,winddirDegree,windspeedKmph,location,maxtempF,mintempF,DewPointF,FeelsLikeF,HeatIndexF,WindChillF,tempF
0,2009-01-01 00:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,316,27,10007,32.0,32.0,10.4,6.8,21.2,6.8,21.2
1,2009-01-01 06:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,315,24,10007,32.0,32.0,6.8,3.2,17.6,3.2,17.6
2,2009-01-01 12:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,304,23,10007,32.0,32.0,6.8,15.8,26.6,15.8,26.6
3,2009-01-01 18:00:00,0,0,0.0,8.7,2,31,11:07 AM,10:50 PM,08:20 AM,...,294,13,10007,32.0,32.0,8.6,14.0,23.0,14.0,23.0
4,2009-01-02 00:00:00,0,0,0.1,7.0,2,38,11:28 AM,11:54 PM,08:20 AM,...,260,8,10007,32.0,32.0,6.8,17.6,23.0,17.6,21.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119,2009-01-30 18:00:00,0,0,0.1,10.0,2,27,09:56 AM,10:51 PM,08:08 AM,...,236,9,10007,32.0,32.0,26.6,24.8,30.2,24.8,30.2
120,2009-01-31 00:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,288,18,10007,32.0,32.0,17.6,14.0,24.8,14.0,24.8
121,2009-01-31 06:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,291,18,10007,32.0,32.0,14.0,8.6,21.2,8.6,21.2
122,2009-01-31 12:00:00,0,0,0.0,10.0,2,31,10:19 AM,11:58 PM,08:07 AM,...,291,24,10007,32.0,32.0,10.4,15.8,26.6,15.8,26.6


In [179]:
subsets[-1]

Unnamed: 0,date_time,maxtempC,mintempC,totalSnow_cm,sunHour,uvIndex,moon_illumination,moonrise,moonset,sunrise,...,winddirDegree,windspeedKmph,location,maxtempF,mintempF,DewPointF,FeelsLikeF,HeatIndexF,WindChillF,tempF
0,2020-08-01 06:00:00,29,25,0.0,14.5,7,82,07:08 PM,03:31 AM,05:53 AM,...,228,9,10007,84.2,77.0,68.0,80.6,80.6,77.0,77.0
1,2020-08-01 12:00:00,29,25,0.0,14.5,7,82,07:08 PM,03:31 AM,05:53 AM,...,98,10,10007,84.2,77.0,66.2,86.0,86.0,82.4,82.4
2,2020-08-01 18:00:00,29,25,0.0,14.5,7,82,07:08 PM,03:31 AM,05:53 AM,...,155,20,10007,84.2,77.0,68.0,84.2,84.2,80.6,80.6
3,2020-08-02 00:00:00,31,25,0.0,10.2,7,90,07:56 PM,04:30 AM,05:54 AM,...,179,15,10007,87.8,77.0,68.0,80.6,80.6,77.0,77.0
4,2020-08-02 06:00:00,31,25,0.0,10.2,7,90,07:56 PM,04:30 AM,05:54 AM,...,163,17,10007,87.8,77.0,71.6,80.6,80.6,77.0,77.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
118,2020-08-30 18:00:00,26,20,0.0,11.6,6,79,06:35 PM,03:24 AM,06:21 AM,...,315,11,10007,78.8,68.0,59.0,80.6,80.6,78.8,78.8
119,2020-08-31 00:00:00,23,21,0.0,10.3,6,82,07:10 PM,04:27 AM,06:22 AM,...,251,12,10007,73.4,69.8,57.2,77.0,77.0,75.2,75.2
120,2020-08-31 06:00:00,23,21,0.0,10.3,6,82,07:10 PM,04:27 AM,06:22 AM,...,62,12,10007,73.4,69.8,55.4,69.8,75.2,69.8,69.8
121,2020-08-31 12:00:00,23,21,0.0,10.3,6,82,07:10 PM,04:27 AM,06:22 AM,...,116,17,10007,73.4,69.8,57.2,77.0,77.0,73.4,73.4


In [136]:
temp = []

for sub in subsets:
    tuple_ = (
        sub[0],
        sub[1],
        sub[2].tempC.describe()[1:],
        sub[2].tempF.describe()[1:]
    )
    
    temp.append(tuple_)
    
for table in temp:
    print(f"{table[0]} {table[1]}")
    print("------------------")
    print("Celsius")
    print(table[2])
    print("")
    print("Fahrenheit")
    print(table[3])
    print("")

Spring 2009
------------------
Celsius
mean     9.885559
std      6.725937
min    -12.000000
25%      6.000000
50%     10.000000
75%     15.000000
max     27.000000
Name: tempC, dtype: float64

Fahrenheit
mean    49.794005
std     12.106687
min     10.400000
25%     42.800000
50%     50.000000
75%     59.000000
max     80.600000
Name: tempF, dtype: float64

Summer 2009
------------------
Celsius
mean    21.316076
std      3.672872
min     11.000000
25%     19.000000
50%     21.000000
75%     24.000000
max     32.000000
Name: tempC, dtype: float64

Fahrenheit
mean    70.368937
std      6.611169
min     51.800000
25%     66.200000
50%     69.800000
75%     75.200000
max     89.600000
Name: tempF, dtype: float64

Fall 2009
------------------
Celsius
mean    13.366391
std      5.062665
min      1.000000
25%     10.000000
50%     13.000000
75%     17.500000
max     28.000000
Name: tempC, dtype: float64

Fahrenheit
mean    56.059504
std      9.112798
min     33.800000
25%     50.000000
50%  

In [141]:
Season = []
Year = []
avg_tempC = []
avg_tempF = []
min_tempC = []
min_tempF = []
max_tempC = []
max_tempF = []
med_tempC = []
med_tempF = []
std_C = []
std_F = []



for table in temp:
    Season.append(table[0])
    Year.append(table[1])
    avg_tempC.append(table[2]["mean"])
    avg_tempF.append(table[3]["mean"])
    min_tempC.append(table[2]["min"])
    min_tempF.append(table[3]["min"])
    max_tempC.append(table[2]["max"])
    max_tempF.append(table[3]["max"])
    med_tempC.append(table[2]["50%"])
    med_tempF.append(table[3]["50%"])
    std_C.append(table[2]["std"])
    std_F.append(table[3]["std"])
    

temp = {
    "Season" : Season,
    "Year" : Year,
    "Avg. Temp C" : avg_tempC,
    "Avg. Temp F" : avg_tempF,
    "Min Temp C" : min_tempC,
    "Min Temp F" : min_tempF,
    "Max Temp C" : max_tempC,
    "Max Temp F" : max_tempF,
    "Median Temp C" : med_tempC,
    "Median Temp F" : med_tempF,
    "Standard Deviation C" : std_C,
    "Standard Deviation F" : std_F
}

temp_df = pd.DataFrame(temp)

print(temp_df.shape)
temp_df

(47, 12)


Unnamed: 0,Season,Year,Avg. Temp C,Avg. Temp F,Min Temp C,Min Temp F,Max Temp C,Max Temp F,Median Temp C,Median Temp F,Standard Deviation C,Standard Deviation F
0,Spring,2009,9.885559,49.794005,-12.0,10.4,27.0,80.6,10.0,50.0,6.725937,12.106687
1,Summer,2009,21.316076,70.368937,11.0,51.8,32.0,89.6,21.0,69.8,3.672872,6.611169
2,Fall,2009,13.366391,56.059504,1.0,33.8,28.0,82.4,13.0,55.4,5.062665,9.112798
3,Winter,2009,0.142061,32.25571,-11.0,12.2,16.0,60.8,0.0,32.0,4.576401,8.237522
4,Spring,2010,11.746594,53.143869,-1.0,30.2,29.0,84.2,11.0,51.8,5.923334,10.662001
5,Summer,2010,23.520436,74.336785,10.0,50.0,36.0,96.8,24.0,75.2,3.859414,6.946944
6,Fall,2010,14.044077,57.279339,1.0,33.8,30.0,86.0,14.0,57.2,6.055748,10.900347
7,Winter,2010,-0.54039,31.027298,-15.0,5.0,15.0,59.0,-1.0,30.2,4.180015,7.524027
8,Spring,2011,9.640327,49.352589,-3.0,26.6,24.0,75.2,10.0,50.0,5.803363,10.446054
9,Summer,2011,22.517711,72.53188,14.0,57.2,31.0,87.8,23.0,73.4,2.9641,5.33538


### Planning

**Visualizations and Modeling**

**NOTE:** In order to ensure that these notebooks are still readable and a manageable size with regards to the data contained herein, this notebook will henceforth be designated for temperature data. All other analyses will be performed in other notebooks.

- Go back and redo the splits but by month and perserving the date-time stamps
- Re-subset and recalculate the statistics for each months
- Follow the steps outlined by *Jose Portilla* in the *General Forcasting Models* section of his *Python for Time Series Data Analysis* couse on **Udemy**
    - Perform train-test split in order to predict 2 years of monthly highs and lows into the future.
    - Fit and forecast on the data shown using statsmodels
    - Compare predictions with test data
    - Forecast into the future:)