In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# What is Time Series Data?

Important to consider progression of time
> - Is the temporal information a key focus of the data?  

## Examples

- Stock prices
- Temperature over the year
- Atmoshperic changes over the course of decades

## Loading in time series

In [None]:
# Load and display
df_temp = pd.read_csv("min_temp.csv")
display(df_temp.head(20))
display(df_temp.info())

## Make data readable as a datetime

In [None]:
# Creating a proper datetime using the string formatting
df_temp['Date'] = pd.to_datetime(df_temp['Date'], format='%d/%m/%y')

# Make the temporal data as the focus
df_temp = df_temp.set_index('Date')

In [None]:
display(df_temp.head(10))
display(df_temp.info())

## Slicing time series data

In [None]:
after_1990 = df_temp['1990':]
display(after_1990.head(15))

In [None]:
df_temp

## Follow-up: Why should we make the date as the index?

# Visualizing Time Series

## Showing Changes Over Time

Can identify patterns and trends with visualizations

In [None]:
# New York Stock Exchange average monthly returns [1961-1966] from curriculum
nyse = pd.read_csv("NYSE_monthly.csv")
col_name= 'Month'
nyse[col_name] = pd.to_datetime(nyse[col_name])
nyse.set_index(col_name, inplace=True)

display(nyse.head(10))
display(nyse.info())

### Line Plot

In [None]:
nyse.plot(figsize = (16,6))
plt.show()

### Dot Plot

In [None]:
nyse.plot(figsize = (16,6), style="*")
plt.show()

### Question time: Dot vs Line Plots

Note the difference between this and the line plot

When would you want a dot vs a line plot?

### Grouping Plots

What if we wanted to look at year-to-year (e.g., temperature throughout many years)

Couple options to choose from

### Example all separated annual (from curriculum)

In [None]:
# Annual Frequency
year_groups = nyse.groupby(pd.Grouper(freq ='A'))

In [None]:
#Create a new DataFrame and store yearly values in columns 
nyse_annual = pd.DataFrame()

for yr, group in year_groups:
    nyse_annual[yr.year] = group.values.ravel()
    
# Plot the yearly groups as subplots
nyse_annual.plot(figsize = (13,8), subplots=True, legend=True)
plt.show()

### Example all together annual (from curriculum)

In [None]:
# Plot overlapping yearly groups 
nyse_annual.plot(figsize = (15,5), subplots=False, legend=True)
plt.show()

## Showing Distributions

Sometimes the distribution of the values are important.

What are some reasons?

- Checking for normality (for stat testing)
- First check on raw & transformed data

### Histogram

In [None]:
nyse.hist(figsize = (10,6))
plt.show()

In [None]:
# Bin it to make it more obvious if normal
nyse.hist(figsize = (10,6), bins = 7)
plt.show()

### Density

In [None]:
nyse.plot(kind='kde', figsize = (15,10))
plt.show()

### Box Plot

- Shows distribution over time
- Can help show outliers
- Seasonal trends

#### Example

In [None]:
# Generate a box and whiskers plot for temp_annual dataframe
nyse_annual.boxplot(figsize = (12,7))
plt.show()

### Heat Maps

Use color to show patterns throughout a time period for data

#### Example of how heat maps are useful

In [None]:
# Create a new DataFrame and store yearly values in columns for temperature
temp_annual = pd.DataFrame()

for yr, group in df_temp.groupby(pd.Grouper(freq ='A')):
    temp_annual[yr.year] = group.values.ravel()

##### Plotting each line plot in a subplot

Let's use our strategy in plotting multiple line plots to see if we can see a pattern:

In [None]:
# Plot the yearly groups as subplots
temp_annual.plot(figsize = (16,8), subplots=True, legend=True)
plt.show()

You likely will have a hard time seeing exactly the temperature shift is throughout the year (if it even exists!)

We can try plotting all the lines together to see if a pattern is more obvious in our visual.

##### Plotting all line plots in one plot

In [None]:
# Plot overlapping yearly groups 
temp_annual.plot(figsize = (15,5), subplots=False, legend=True)
plt.show()

That's great we can see that the temperature decreases in the middle of the data! But now we sacrificed being able to observe any pattern for an individual year. 

This is where using a heat map can help visualize patterns throughout the year for temperature! And of course, the heat map can be used for more than just temperature related data.

##### And finally, using a heat map to visualize a pattern

In [None]:
# Year and month 
year_matrix = temp_annual.T
plt.matshow(year_matrix, interpolation=None, aspect='auto', cmap=plt.cm.Spectral_r)
plt.show()

☝🏼 Look at that beautiful visual pattern! Makes me want to weep with joy for all the information density available to us!

# Resampling

Converting the time series into a particular frequency

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling

## Downsampling

- resample at a lower rate
- may loose information
- more computationally efficient

### Example

In [None]:
# Average out so we have monthly means (compared to using days)
monthly = df_temp.resample('MS')
month_mean = monthly.mean()

In [None]:
month_mean.head(10)
df_temp.head(10)

## Upsampling

- resample at a higher rate
- should keep information

### Example

In [None]:
# Data to every 12hours but only fill the parts known (blank otherwise)
bidaily = df_temp.resample('12H').asfreq()
bidaily.head(10)

In [None]:
# Interpolate to every 12hours but fill the parts unknown (no blanks)
bidaily = df_temp.resample('12H').ffill()
bidaily.head(10)

In [None]:
hourly = df_temp.resample('1H').ffill()
hourly.head(30)