Weather dataset

Questions: Where is it hottest? dryest? windiest? during the summer
Looking only at the continental US

In [1]:
import altair as alt
import folium
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from datetime import datetime
from vega_datasets import data

In [2]:
weather = pd.read_csv('weather.csv')

Where are the weather stations?

In [3]:
stations = weather.groupby(['station', 'latitude', 'longitude']).count().reset_index()

In [4]:
m = folium.Map(location=[40, -100], zoom_start=1)
for i in range(0, len(stations)):
    folium.Marker([stations['latitude'].iloc[i], stations['longitude'].iloc[i]]).add_to(m)
m

Delete stations that are not in the contiguous 48 states.

In [5]:
subdata = weather[~weather['state'].isin(['AB', 'AK', 'BC', 'GU', 'HI', 'MP', 'MB', 'NB', 'NL', 'NS', 'NT', 'ON', 'PE', 'PR', 'QC', 'VI'])]

In [6]:
stations = subdata.groupby(['station', 'latitude', 'longitude']).count().reset_index()
m = folium.Map(location=[40, -100], zoom_start=4)
for i in range(0, len(stations)):
    folium.Marker([stations['latitude'].iloc[i], stations['longitude'].iloc[i]]).add_to(m)
m

Sanity check: Loof for data that are too big / too small

In [7]:
subdata.min()

station      ABERDEEN
state              AL
latitude       24.555
longitude    -124.555
elevation       -36.0
date         20170101
TMIN           -98.86
TMAX           -10.84
TAVG           -20.56
AWND              0.0
WDF5              2.0
WSF5         4.026492
SNOW              0.0
SNWD              0.0
PRCP              0.0
dtype: object

-98.86 seems low. Let us at rows with small values of TMIN.

In [8]:
subdata.loc[subdata.TMIN < -50]

Unnamed: 0,station,state,latitude,longitude,elevation,date,TMIN,TMAX,TAVG,AWND,WDF5,WSF5,SNOW,SNWD,PRCP
91571,DECATUR PRYOR FLD,AL,34.6525,-86.9453,180.4,20170804,-71.86,87.98,,6.039738,240.0,16.105968,0.0,0.0,0.0
148276,ALTUS AFB,OK,34.3622,-98.9761,386.2,20170915,-98.86,96.08,,13.869028,160.0,29.974996,0.0,0.0,0.0
154693,MAYPORT PILOT STN,FL,30.4,-81.4167,4.9,20170517,-70.78,86.0,,13.869028,150.0,27.066974,,,0.0
154694,MAYPORT PILOT STN,FL,30.4,-81.4167,4.9,20170518,-79.78,84.02,,12.750558,150.0,23.935258,,,0.0
168315,MAYPORT PILOT STN,FL,30.4,-81.4167,4.9,20170516,-50.8,80.06,,9.171454,130.0,23.040482,,,0.0


Probably an error (bad sign?). Replace by NaN.

In [9]:
subdata.at[subdata['TMIN'] < -50, 'TMIN'] = np.NaN

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [10]:
subdata.max()

station      Younts Peak
state                 WY
latitude           48.98
longitude       -67.7928
elevation         3541.8
date            20170921
TMIN               98.96
TMAX              129.92
TAVG              105.26
AWND          112.518082
WDF5               360.0
WSF5           180.07367
SNOW           67.992163
SNWD          280.000151
PRCP            26.03151
dtype: object

Transforming column to get the date

In [11]:
date = pd.to_datetime(subdata['date'], format='%Y%m%d')

In [12]:
subdata['date'] = date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subdata['date'] = date


Select summer time

In [13]:
summer = subdata.loc[(date >= datetime(2017, 6, 21)) & (date <= datetime(2017, 9, 20))]

Get background map of USA for plotting

In [14]:
usa = data.us_10m.url

Minimum temperatire during summer

In [15]:
Tmin = summer.groupby(['station', 'latitude', 'longitude']).agg({'TMIN': 'min'}).reset_index()

In [27]:
alt.layer(
    alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
        fill='#ddd', stroke='#fff', strokeWidth=1
    ),
    alt.Chart(Tmin).mark_circle(size=15).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        color=alt.Color('TMIN:Q', scale=alt.Scale(domain=[0, 80], clamp=True, scheme='plasma'))
    )
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)

Maximum temperature during summer

In [18]:
Tmax = summer.groupby(['station', 'latitude', 'longitude']).agg({'TMAX': 'max'}).reset_index()

In [26]:
alt.layer(
    alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
        fill='#ddd', stroke='#fff', strokeWidth=1
    ),
    alt.Chart(Tmax).mark_circle(size=15).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        color=alt.Color('TMAX:Q', scale=alt.Scale(domain=[65, 130], clamp=True, scheme='plasma'))
    )
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)

Average temperature during summer

In [20]:
Tavg = summer.groupby(['station', 'latitude', 'longitude']).agg({'TAVG': 'mean'}).reset_index()

In [28]:
alt.layer(
    alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
        fill='#ddd', stroke='#fff', strokeWidth=1
    ),
    alt.Chart(Tavg).mark_circle(size=15).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        color=alt.Color('TAVG:Q', scale=alt.Scale(domain=[45, 100], clamp=True, scheme='plasma'))
    )
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)

Days of canicule: When the temperature does not get cool during the night. How many days is minimum temperature higher than 70?

In [29]:
canicule = summer.loc[summer['TMIN'] >= 70]
canicule.groupby(['station', 'latitude', 'longitude', 'state']).count().reset_index()

Unnamed: 0,station,latitude,longitude,state,elevation,date,TMIN,TMAX,TAVG,AWND,WDF5,WSF5,SNOW,SNWD,PRCP
0,ABILENE RGNL AP,32.4106,-99.6822,TX,41,41,41,41,41,41,41,41,41,41,41
1,ADRIAN LENAWEE CO AP,41.8678,-84.0794,MI,1,1,1,1,0,1,1,1,0,0,1
2,AKRON CANTON RGNL AP,40.9167,-81.4333,OH,5,5,5,5,5,5,5,5,5,5,5
3,AKRON FULTON INTL AP,41.0375,-81.4642,OH,7,7,7,7,0,7,7,7,0,0,7
4,ALABASTER SHELBY CO AP,33.1783,-86.7817,AL,58,58,58,58,58,58,58,58,58,58,58
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,WOOSTER WAYNE CO AP,40.8731,-81.8867,OH,1,1,1,1,0,1,1,1,0,0,1
610,YAKIMA AIR TERMINAL,46.5683,-120.5428,WA,2,2,2,2,2,2,2,2,2,2,2
611,YORK AP,39.9181,-76.8742,PA,3,3,3,3,0,3,3,3,0,0,3
612,YOUNGSTOWN RGNL AP,41.2544,-80.6739,OH,1,1,1,1,1,1,1,1,1,1,1


In [30]:
len(canicule.loc[canicule['station'] == 'ABILENE RGNL AP'])

41