    
# Working With Time Series in Pandas
## Exercises
    

<hr style="border:2px solid blue"> </hr>


In [1]:
import pandas as pd

from vega_datasets import data

### 1. Resample by the day and take the average temperature. Visualize the average temperature over time.


In [2]:
# Tossing the dataframe into a reusable variable
df = data.sf_temps()
df.head()

Unnamed: 0,temp,date
0,47.8,2010-01-01 00:00:00
1,47.4,2010-01-01 01:00:00
2,46.9,2010-01-01 02:00:00
3,46.5,2010-01-01 03:00:00
4,46.0,2010-01-01 04:00:00


In [3]:
# Step one is to make sure Pandas sees date as correct dtype
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8759 entries, 0 to 8758
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   temp    8759 non-null   float64       
 1   date    8759 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(1)
memory usage: 137.0 KB


In [4]:
# Next I'll make sure proper date format is understood, to ensure quicker processing
df.date = pd.to_datetime(df.date, format = '%Y-%m-%d %I:%M:%S')
df.date

0      2010-01-01 00:00:00
1      2010-01-01 01:00:00
2      2010-01-01 02:00:00
3      2010-01-01 03:00:00
4      2010-01-01 04:00:00
               ...        
8754   2010-12-31 19:00:00
8755   2010-12-31 20:00:00
8756   2010-12-31 21:00:00
8757   2010-12-31 22:00:00
8758   2010-12-31 23:00:00
Name: date, Length: 8759, dtype: datetime64[ns]

In [5]:
# Step 2 and 3 combined: Set 'Date' column as Index and sort the index:
df = df.set_index('date').sort_index()

In [6]:
df.head()

Unnamed: 0_level_0,temp
date,Unnamed: 1_level_1
2010-01-01 00:00:00,47.8
2010-01-01 01:00:00,47.4
2010-01-01 02:00:00,46.9
2010-01-01 03:00:00,46.5
2010-01-01 04:00:00,46.0


In [7]:
df.resample('D')

<pandas.core.resample.DatetimeIndexResampler object at 0x7f8e406de0d0>

In [8]:
# Resampling dataframe by the day via 'D'
df.resample('D').mean()

Unnamed: 0_level_0,temp
date,Unnamed: 1_level_1
2010-01-01,49.170833
2010-01-02,49.304167
2010-01-03,49.391667
2010-01-04,49.445833
2010-01-05,49.491667
...,...
2010-12-27,48.991667
2010-12-28,49.000000
2010-12-29,49.012500
2010-12-30,49.012500


### 2. Write the code necessary to visualize the minimum temperature over time.


### 3. Write the code necessary to visualize the maximum temperature over time.


### 4. Which month is the coldest, on average?


### 5. Which month has the highest average temperature?


### 6. Resample by the day and calculate the min and max temp for the day (Hint: .agg(['min', 'max'])). Use this resampled dataframe to calculate the change in temperature for the day. Which month has the highest daily temperature variability?


### 7. Bonus: Visualize the daily min, average, and max temperature over time on a single line plot, i.e. the min, average, and maximum temperature should be 3 seperate lines.