What follows is recorded weather data for Sault Ste. Marie, Michigan for January 1, 1942 through November 28, 2022. 
Original data can be found [here](https://www.ncei.noaa.gov/access/past-weather/49783) . 

We'll be isolating just December 25 in this notebook and saving that to a CSV file for use in a supervised machine learning notebook. 

In [19]:
# Import the necessary modules
import pandas as pd
from datetime import datetime
import seaborn as sns


In [20]:
#import all weather station files 

 # Create list of CSV file names for each weather station located in Chippewa County, Michigan. Weather stations just across the river in Ontario, Canada are not included in this dataset.
allsnow = ['..\DATA\Dunbar.csv', '..\DATA\Kincheloe AFB.csv', '..\DATA\Kinross.csv', '..\DATA\Rudyard.csv','..\DATA\SSM 7-1SSW.csv','..\DATA\SSM.csv','..\DATA\SSM2-1E.csv','..\DATA\SSM7-8SE.csv', '..\DATA\Sault-Weather.csv']

In [21]:
allsnow = pd.concat((pd.read_csv(i) for i in allsnow)).reset_index(drop = True) # Import the weather station files, stack them together and reset the index for the new row order. 
allsnow.head()

Unnamed: 0,Date,AvgTemp,MaxTemp,MinTemp,Precip,Snowfall,SnowDepth,Unnamed: 7
0,1/1/1942,,27.0,6.0,0.06,1.0,5.0,
1,1/2/1942,,20.0,3.0,0.66,8.0,13.0,
2,1/3/1942,,9.0,-6.0,0.01,0.5,,
3,1/4/1942,,17.0,-9.0,0.0,0.1,,
4,1/5/1942,,10.0,-16.0,0.0,0.0,,


In [22]:
allsnow.tail()

Unnamed: 0,Date,AvgTemp,MaxTemp,MinTemp,Precip,Snowfall,SnowDepth,Unnamed: 7
70726,11/24/2022,37.0,46.0,31.0,0.05,0.0,2.0,
70727,11/25/2022,39.0,39.0,29.0,0.01,0.0,0.0,
70728,11/26/2022,40.0,51.0,29.0,0.0,0.0,0.0,
70729,11/27/2022,40.0,42.0,34.0,0.0,0.0,0.0,
70730,11/28/2022,35.0,35.0,26.0,0.0,0.0,0.0,


In [23]:
allsnow.drop('Unnamed: 7', axis=1, inplace= True)

In [24]:
allsnow.shape

(70731, 7)

In [25]:
allsnow.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70731 entries, 0 to 70730
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       70731 non-null  object 
 1   AvgTemp    6130 non-null   float64
 2   MaxTemp    67601 non-null  float64
 3   MinTemp    67229 non-null  float64
 4   Precip     70557 non-null  float64
 5   Snowfall   68801 non-null  float64
 6   SnowDepth  66675 non-null  float64
dtypes: float64(6), object(1)
memory usage: 3.8+ MB


In [26]:
allsnow = allsnow.drop('AvgTemp', axis= 1)

In [27]:
allsnow.isnull().sum()

Date            0
MaxTemp      3130
MinTemp      3502
Precip        174
Snowfall     1930
SnowDepth    4056
dtype: int64

In [28]:
allsnow = allsnow.dropna(axis = 0)
allsnow.isnull().sum()

Date         0
MaxTemp      0
MinTemp      0
Precip       0
Snowfall     0
SnowDepth    0
dtype: int64

In [29]:
allsnow.shape

(64369, 6)

 From Margaret Martinez
https://sparkbyexamples.com/pandas/pandas-filter-dataframe-rows-on-dates/


In [None]:
# df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]

# Filter by a single date
#df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]


In [30]:
allsnow.head()

Unnamed: 0,Date,MaxTemp,MinTemp,Precip,Snowfall,SnowDepth
0,1/1/1942,27.0,6.0,0.06,1.0,5.0
1,1/2/1942,20.0,3.0,0.66,8.0,13.0
10,1/11/1942,16.0,-6.0,0.07,1.8,16.0
13,1/14/1942,33.0,8.0,0.02,0.5,14.0
14,1/15/1942,32.0,2.0,0.1,1.8,15.0


In [31]:
Xmas = allsnow[allsnow["Date"].str.contains("12/25/")]


In [32]:
Xmas.head()

Unnamed: 0,Date,MaxTemp,MinTemp,Precip,Snowfall,SnowDepth
1454,12/25/1945,25.0,18.0,0.12,1.7,8.0
1805,12/25/1946,20.0,13.0,0.22,4.5,9.0
3084,12/25/1950,11.0,-7.0,0.0,0.4,14.0
3449,12/25/1951,17.0,-16.0,0.0,0.0,11.0
3815,12/25/1952,37.0,30.0,0.05,0.8,1.0


In [34]:
#saving this to CSV
Xmas.to_csv(r"..\DATA\UPNorthChristmas.csv")

## How many white Christmases have there been?  


With some amount of snow on the ground for some six months out of the year, it's not a stretch to say there will or won't be a so-called white Christmas. I wanted to see how many times over the years there was a blanket of fresh snowfall on Christmas Day - 

In [None]:
Xmas.describe()

In [16]:
len(Xmas)
#DataFrame.to_markdown()

167

In [18]:
pd.set_option('display.max_rows', None)  
print(Xmas)

#it's following the year output for each concatenated weather station dataset, starting over with the earliest year and counting up, for each dataset. 
#so which weather station gets precedence, to have its data represented for Christmas Day?  
      

             Date  MaxTemp  MinTemp  Precip  Snowfall  SnowDepth
1454   12/25/1945     25.0     18.0    0.12       1.7        8.0
1805   12/25/1946     20.0     13.0    0.22       4.5        9.0
3084   12/25/1950     11.0     -7.0    0.00       0.4       14.0
3449   12/25/1951     17.0    -16.0    0.00       0.0       11.0
3815   12/25/1952     37.0     30.0    0.05       0.8        1.0
4180   12/25/1953     41.0     31.0    0.00       0.0        0.0
4545   12/25/1954     36.0     16.0    0.00       0.0        4.0
4910   12/25/1955     36.0     19.0    0.01       0.4       13.0
5276   12/25/1956     26.0     -1.0    0.00       0.0        9.0
5641   12/25/1957     36.0     17.0    0.30       3.0        3.0
6006   12/25/1958     10.0    -18.0    0.00       0.0       19.0
6371   12/25/1959     33.0     21.0    0.03       0.6        4.0
6737   12/25/1960     28.0     19.0    0.08       1.1        8.0
7102   12/25/1961     32.0      2.0    0.00       0.0       12.0
7832   12/25/1963     32.