## Introduction

The purpose of this notebook is to discover and present findings on whether it rains more in New York or Seattle. This information would help inform the decision of Professor Egan's family to visit him or not.

This data comes from the National Oceanic and Atmospheric Administration (NOAA) which works to understand climate and weather. The data was found using NOAA's climate data online search tool which can be found here: https://www.ncei.noaa.gov/cdo-web/search?datasetid=GHCND.

This data was prepared and cleaned for this notebook.The steps that produced the dataset used in this notebook can be found here: https://colab.research.google.com/drive/1cQ8WlPSXBgfm6vCniygzgs4ZNOrTTfLd?usp=sharing

## Import libraries

In [2]:
import pandas as pd
import altair as alt
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid')
import missingno as msno

## Load clean data

In [3]:
df = pd.read_csv('https://docs.google.com/spreadsheets/d/1rdgVQwEmNjwnLUHfG7Lvn_NxkBx_nxDjETl9WRPkEg4/export?format=csv')

##### $\rightarrow$ Review the contents of the data set.

In [4]:
df.head(10)

Unnamed: 0,date,city,precipitation,month,year
0,2020-01-01,NY,0.0,1,2020
1,2020-01-02,NY,0.0,1,2020
2,2020-01-03,NY,0.13,1,2020
3,2020-01-04,NY,0.16,1,2020
4,2020-01-05,NY,0.0,1,2020
5,2020-01-06,NY,0.03,1,2020
6,2020-01-07,NY,0.03,1,2020
7,2020-01-08,NY,0.0,1,2020
8,2020-01-09,NY,0.0,1,2020
9,2020-01-10,NY,0.0,1,2020


## State your questions

The overall problem is to compare how much it rains in Seattle and New York City. To answer this general problem, you will need to ask specific questions about the data.


Questions about the data that will help me solve the problem are
1. What is the average rainfall for each city?
2. How frequently does it rain/not rain in each city?
3. Does rainfall vary greatly month to month or year over year?

## Analysis

Perform analyses necessary to answer the questions. You will likely start by trying many things, some of which are useful and some of which are not. Don't be afraid to try different analyses at first. You will edit your notebook to a clean version that retains only the essential components at the end of the project.

Table 1: This table looks at the measures of central tendency for the data

In [5]:
df.groupby(by = 'city').describe()

Unnamed: 0_level_0,precipitation,precipitation,precipitation,precipitation,precipitation,precipitation,precipitation,precipitation,month,month,month,month,month,year,year,year,year,year,year,year,year
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
NY,1461.0,0.118207,0.35926,0.0,0.0,0.0,0.05,8.05,1461.0,6.52293,...,10.0,12.0,1461.0,2021.498973,1.118723,2020.0,2020.0,2021.0,2022.0,2023.0
Seattle,1461.0,0.105921,0.237903,0.0,0.0,0.0,0.11,2.97,1461.0,6.52293,...,10.0,12.0,1461.0,2021.498973,1.118723,2020.0,2020.0,2021.0,2022.0,2023.0


Table 1: From the figure displayed above we can see that the average precipitation is higher in New York City (0.118) than in Seattle (0.105).

Figure 1: This chart was created by separating the data between cities and then plotting the average precipitation from all days in the dataset.

In [6]:
alt.Chart(df).mark_bar().encode(
    alt.Y('city', title = ""),
    alt.X('mean(precipitation)', title = "Average Rainfall (inches)"),
    alt.Color('city:N', title = 'City', scale=alt.Scale(domain=['NY', 'Seattle'], range=['black', 'steelblue'])),
).properties(width=350, height=50)


Figure 1: This chart gives us a visual representation of a single value from the table above. We can see the the average precipitation is higher in New York.

Figure 2: This chart looks at the average monthly rainfall for both cities.

In [7]:
alt.Chart(df).mark_line(point = True).encode(
    alt.X('month(date):T', title = 'Month'),
    alt.Y('mean(precipitation):Q', title = 'Precipitation (inches)'),
    alt.Color('city:N', title = 'City', scale=alt.Scale(domain=['NY', 'Seattle'], range=['black', 'steelblue']))
).properties(title='Average Monthly Rainfall', width=400, height=400)

Figure 2: Looking at this chart we can see Seattle has close to no rainfall between July and August so that would be one of the best times to visit.

Figure 3: This chart looks at days without rainfall year-over-year.This chart was created by filtering out days with precipitation over 0.00 and plotting the number of days left over for each city. The data was separated by year to see if there were any significant differences between annual rainfall

In [8]:
alt.Chart(df, title='Days Without Rainfall (Yearly)').mark_bar(size = 30).transform_filter('(datum.precipitation) == 0.00').encode(
    alt.X('year(date):T', bin = True, title = ''),
    alt.Y('count(precipitation):Q', title = 'Number of Days without rainfall'),
    alt.Column('city:N', title = 'City'),
    alt.Color('city:N', title = '', scale=alt.Scale(domain=['NY', 'Seattle'], range=['black', 'steelblue']))
).properties(width=250, height=275)

Figure 3: The chart shows that rainfall is fairly consistent year over year for both cities. An important takeaway from this chart is that New York has more days without rainfall than Seattle despite having a higher average precipitation. This suggests that it rains harder but less often in New York.


### Results for communication assignment

This file should clearly produce the graphs, tables, models, etc that appear in the communication assignment.

In [9]:
df.groupby(by = 'city').describe()

Unnamed: 0_level_0,precipitation,precipitation,precipitation,precipitation,precipitation,precipitation,precipitation,precipitation,month,month,month,month,month,year,year,year,year,year,year,year,year
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
NY,1461.0,0.118207,0.35926,0.0,0.0,0.0,0.05,8.05,1461.0,6.52293,...,10.0,12.0,1461.0,2021.498973,1.118723,2020.0,2020.0,2021.0,2022.0,2023.0
Seattle,1461.0,0.105921,0.237903,0.0,0.0,0.0,0.11,2.97,1461.0,6.52293,...,10.0,12.0,1461.0,2021.498973,1.118723,2020.0,2020.0,2021.0,2022.0,2023.0


In [10]:
alt.Chart(df).mark_bar().encode(
    alt.Y('city', title = ""),
    alt.X('mean(precipitation)', title = "Average Rainfall (inches)"),
    alt.Color('city:N', title = 'City', scale=alt.Scale(domain=['NY', 'Seattle'], range=['black', 'steelblue'])),
).properties(width=350, height=50)


In [11]:
alt.Chart(df).mark_line(point = True).encode(
    alt.X('month(date):T', title = 'Month'),
    alt.Y('mean(precipitation):Q', title = 'Precipitation (inches)'),
    alt.Color('city:N', title = 'City', scale=alt.Scale(domain=['NY', 'Seattle'], range=['black', 'steelblue']))
).properties(title='Average Monthly Rainfall', width=400, height=400)

In [12]:
alt.Chart(df, title='Days Without Rainfall (Yearly)').mark_bar(size = 30).transform_filter('(datum.precipitation) == 0.00').encode(
    alt.X('year(date):T', bin = True, title = ''),
    alt.Y('count(precipitation):Q', title = 'Number of Days without rainfall'),
    alt.Column('city:N', title = 'City'),
    alt.Color('city:N', title = '', scale=alt.Scale(domain=['NY', 'Seattle'], range=['black', 'steelblue']))
).properties(width=250, height=275)

## Conclusion

Provide a brief description of your conclusions.

Overall the findings within the data is that New York rains less often than Seattle but generally has higher precipitation on days when rainfall occurs. Looking at rainfall month-month however shows us that rainfall in Seattle is much lower than New York during the summer months (July-August) and if Professor Egan’s family wanted to visit him while avoiding the rain this would be the driest time.
