<a href="https://www.kaggle.com/code/chalseo/seattle-historical-precipitation-1948-2010?scriptVersionId=113130564" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import numpy as np

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Read in data from file
seattle_weather_df = pd.read_csv('/kaggle/input/did-it-rain-in-seattle-19482017/seattleWeather_1948-2017.csv')

# Did it Rain in Seattle?

Data source: [Did it Rain in Seattle? (1948-2017)](https://www.kaggle.com/rtatman/did-it-rain-in-seattle-19482017) 
<br><br>
The dataset I am using is from Kaggle.com, called, “Did it rain in Seattle? (1948-2017)” This simple dataset was aggregated from publicly available data on the NOAA (National Oceanic and Atmospheric Administration) website, and contains few columns: the date, whether it rained, min temperature, max temperature, and precipitation. The dataset was prepped in the following manner so that weather, and precipitation trends can be best represented over time.  This notebook focuses on higher-level views of 60+ years of data<br>

To prepare this dataset for analysis, you will need to add additional columns for year and decade (derived from the DATE column), as well as average daily temperature (the average of TMIN and TMAX columns). If preferred, column names can also be changed for use on the x and y axes of the charts below. For these visualizations, new columns are created such that data can be further sliced by year and decade as well as show averages of numerical variables over the dataset’s 1948-2017 time period.
<br>

I have downloaded the dataset to a csv file along with the notebook submission.

In [3]:
# Prepare data for processing by adding DECADE, YEAR, DATE, and TAVG columns to data frame
seattle_weather_df['DECADE'] = seattle_weather_df['DATE'].apply(lambda x: int(x[0:3] + '0'))
seattle_weather_df['YEAR'] = seattle_weather_df['DATE'].apply(lambda x: int(x[0:4]))
seattle_weather_df['DAY'] = seattle_weather_df['DATE'].apply(lambda x: x[-4:])
seattle_weather_df['RAIN'] = seattle_weather_df['RAIN'].apply(lambda d: 'Yes' if d else 'No' )
seattle_weather_df['TAVG'] = np.divide(np.add(seattle_weather_df['TMAX'], seattle_weather_df['TMIN']), 2)
seattle_weather_df['PRCP'] = seattle_weather_df['PRCP'].apply(lambda p: round(p*100,2))

# Show seattle_weather_df data
seattle_weather_df.head()

Unnamed: 0,DATE,PRCP,TMAX,TMIN,RAIN,DECADE,YEAR,DAY,TAVG
0,1948-01-01,47.0,51,42,Yes,1940,1948,1-01,46.5
1,1948-01-02,59.0,45,36,Yes,1940,1948,1-02,40.5
2,1948-01-03,42.0,45,35,Yes,1940,1948,1-03,40.0
3,1948-01-04,31.0,45,34,Yes,1940,1948,1-04,39.5
4,1948-01-05,17.0,45,32,Yes,1940,1948,1-05,38.5


In [4]:
# Get yearly averages of weather data
yearly_weather_df = seattle_weather_df.groupby('YEAR').mean()
yearly_weather_df = yearly_weather_df.round(2)
yearly_weather_df['DECADE'] = yearly_weather_df['DECADE'].astype('int')

# Show yearly_weather_df data
yearly_weather_df.head()

Unnamed: 0_level_0,PRCP,TMAX,TMIN,DECADE,TAVG
YEAR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1948,12.51,57.01,41.2,1940,49.11
1949,8.89,59.15,41.39,1940,50.27
1950,15.11,57.04,41.0,1950,49.02
1951,11.04,58.55,41.05,1950,49.8
1952,6.5,58.74,41.47,1950,50.11


### Figure 1: Bubble Chart examining relationships between temperature, precipitation and year<br>
In this chart:
<ul>
    <li>There's a generally positive trend with temperature in regards to time</li>
    <li>Earlier years are colder on average, and later years are hotter on average</li>
    <li>1955 is the coldest year on average, and 2015 was the hottest</li>
    <li>There are a couple of outlier years that don't seem to match the trend: 1985 (colder than expected), and 1958/1967 (hotter than expected)</li>
    <li>According to the size of bubbles, precipitation doesn't vary too widely</li>
</ul>

In [5]:
# Plot bubble chart
fig = px.scatter(yearly_weather_df, x='TMIN', y="TMAX", 
            title="Seattle Average Temperature v. Precipitation by Year, 1948-2017", size='PRCP', color=yearly_weather_df.index)

fig.update_layout(
    xaxis=dict(title='Min. Temperature (F)'),
    yaxis=dict(title='Max. Temperature (F)')
)

fig.show()

### Figure 2: Strip Chart examining average temperature over time, with respect to whether it rained
This strip chart focuses on the distrubution of data, and in this chart:
<ul>
    <li>There's not too much variation over the years in the precipitation distribution</li>
    <li>Rainy days are colder on average, non-rain are generally hotter</li>
    <li>Rainy days generally have a less wide spread/variance</li>
    <li>Non-rainy days are hotter have more variance, and temperature extremes</li>
    <li>The hottest day was in 2009, and the coldest was in 1950</li>
</ul>

In [6]:
# Plot strip chart
fig2 = px.strip(seattle_weather_df, x='YEAR', y='TAVG', color='RAIN')

fig2.update_layout(
    title="Seattle Daily Average Temperatures, 1948-2017",
    xaxis=dict(title='Year',),
    yaxis=dict(title='Temperature (F)',)
)

fig2.show()

### Figure 3: Bar Chart examining precipitation over time
This chart allows the user to dig in and look at specific precipitation data points by year. In this chart:
<ul>
    <li>The year with the most precipitation is 1950</li>
    <li>The year with the least precipitation is 1952</li>
</ul>

In [7]:
# Create bar chart 
fig3 = px.bar(seattle_weather_df, x='YEAR', y='PRCP')

fig3.update_layout(
    title="Seattle Precipitation by Year, 1948-2017",
    xaxis=dict(title='Year'),
    yaxis=dict(title='Precipitation %'),
    )

fig3.show()

### Figure 4: Area/Line Chart examining min and max temperature, and precipitation over time
This chart shows an example of using graph_object traces to plot multiple chart types of the same figure (subplots). It also shows the use of two y-axes. In this chart:
<ul>
    <li>Yearly precipitation varies wildly from year to year, and there doesn't appear to be a trend</li>
    <li>From 1948 to 2017, highest temperature average has increased by about 4.7%</li>
    <li>From 1948 to 2017, lowest temperature average has increased by about 2.6%</li>
</ul>

In [8]:
# Create a figure with subplots
fig4 = make_subplots(specs=[[{"secondary_y": True}]])

# Get data for each axis
X = yearly_weather_df.index
Y = yearly_weather_df['PRCP']
Y2 = yearly_weather_df['TMIN']
Y3 = yearly_weather_df['TMAX']

# Add traces and specify secondary axis
fig4.add_trace(go.Scatter(x=X, y=Y, name="Precip. %", line_color='indigo'), secondary_y=True)
fig4.add_trace(go.Scatter(x=X, y=Y2, name="Min Temp F", fill='tonexty', mode='lines', line=dict(color='paleturquoise')), secondary_y=False)
fig4.add_trace(go.Scatter(x=X, y=Y3, name="Max Temp F", fill='tonexty', mode='lines', line=dict(color='teal')), secondary_y=False)

# Update chart labels
fig4.update_layout(title_text="Seattle Average Precipitation & Temperature by Year, 1948-2017")
fig4.update_xaxes(title_text="Year")
fig4.update_yaxes(title_text="Precipitation %", secondary_y=True)
fig4.update_yaxes(title_text="Temperature (F)", secondary_y=False)

fig4.show()