# Time Series Data

* Time series data adds new manipulation options to our data, and pandas was actually developed with time series data in mind.
* Pandas can handle date/time formats
* See https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime



## Resampling 
* the process of converting a time series from one frequency to another.
  * downsampling: going from a high frequency (e.g. daily) to a lower frequency (e.g. weekly)
  * upsampling: going from a lower frequency to higher frequency
  * remapping: aligning data to a set frequency (e.g. mapping weekly data to sundays)
  
Offset aliases: https://pandas.pydata.org/docs/user_guide/timeseries.html

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# We can create date ranges with
#pd.date_range?

In [None]:
# Some sample data


In [None]:
# You need a date or time index on your dataframe to do some resampling
# When we resample we need to determine the new frequency we want and how we want to resample
# Let's change our daily data down to weekly data


In [None]:
# Just like groupby, this is an object which will do the resampling for us
# Since we are downsampling (D->W) we need to decide how to aggregate our datapoints
# We are now very used to this!


* Notice the frequency is now W-SUN (weekly beginning on sunday)
* When we downsample we are "binning" our values and need to determine which end of the bin is open/closed
* By default the right side is **closed** for weekly binning, which we did here
  * Closed vs. open can be confusing! For example, is an observation at midnight on October 13, 2020 a Tuesday observation, or a Monday observation?
  * If you have defined thing as left closed, then it's Monday. If you defined them as right closed, then it's Tuesday.

# Here's an example
* if you have a bunch of time sampled data in seconds and you are downsampling to minutes then:
  * if you are **left closed** you are saying "downsample to minutes where all of the values are **<** the next minute whole number"
  * if you are **right closed** you are saying "downsample to minutes where all of the values are **<=** the next minute whole number"


In [None]:
# Let's look at 9 seconds which cross the minute boundry


In [None]:
# if we resample this to 1 minute intervals closed on the left 
# then the first five seconds will be binned to the left value (<)


In [None]:
# if we resample this to 1 minute intervals closed on the right 
# then the first six seconds will be binned to the left value (<=)


<a href="https://stackoverflow.com/questions/48340463/how-to-understand-closed-and-label-arguments-in-pandas-resample-method">https://stackoverflow.com/questions/48340463/how-to-understand-closed-and-label-arguments-in-pandas-resample-method</a>
<img src="https://i.stack.imgur.com/nX6yv.png"></img>

In [None]:
# Also, downsampling really is an aggregation exercise, so you can do all sorts of things
# With upsampling there is no need to aggregate. 

# Let's create a dataframe, with two weekly indices, and four columns. First the indicies:


In [None]:
# Now we upsample from weekly frequency to daily frequency,


In [None]:
# As you notice, there will be NaN values, let's engage in interpolation
# Forward fill or backward fill


In [None]:
# We can also choose to only fill a certain number of periods, by using the limit 
# parameter in the ffill() function. For instance, here, we are limiting to 
# interpolating three observations


# Working with time series data
* We've now seen downsampling and upsampling, and have a better sense of how date ranges are handled in pandas
* Let's go back to a favorite dataset of ours which has lots of interesting time series data in it and try and explore a bit

In [None]:
df=pd.read_excel("datasets/AnnArbor-TicketViolation2016.xls",skiprows=1)
print(df.columns)
print(df.dtypes)

In [None]:
df.head()

In [None]:
# First up, let's create a date/time index. We have an issue date column and 
# an issuetime column
def clean_time(x):
    pass

df=df.set_index(df[["Issue Date ","IssueTime"]].apply(clean_time, axis=1))
df.head()

In [None]:
%matplotlib inline
# Now let's plot the fines over the year!
import matplotlib.pyplot as plt


In [None]:
# That's meaningless... How would we find signal in that noise?
# Let's zoom in on a single month, pandas does the "right thing" with date/time slicing!


In [None]:
# This, is, btw, much cooler than it seems at first; check this out


In [None]:
# so this means we can use date/times as masks!


In [None]:
# Now let's resample this and look at daily totals


In [None]:
# January 10th 2016 was a sunday! Looks pretty clear that sundays very few tickets are given out!
#We could also look at tickets per hour in a single week


In [None]:
# That 13th-14th has some big values, let's zoom in a bit


In [None]:
# We can also explore multiple series of data plotted on the same chart by executing plot() on a
# dataframe multiple times in a single cell
df.loc["2016-01-13":"2016-01-14", " Fine "].resample("15T").apply(len).plot()
