## Panda time series
A series or dataframe using a datetime index index of a range index.

Used for storing events/data that fits on a timeline.
- weather data
- temperature readings
- heart rate monitoring (EKG)
- Quarterly sales
- Stock prices


In [3]:
import pandas as pd
import numpy as np


### Creating a datetimeindex
Use date_range() with three of four parameters:
- start
- end
- periods
- freq

In [18]:
datetimeindex = pd.date_range(start='2018-01-01', end='2019-12-31', freq="D")
datetimeindex

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10',
               ...
               '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
               '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
               '2019-12-30', '2019-12-31'],
              dtype='datetime64[ns]', length=730, freq='D')

In [21]:
df = pd.DataFrame(
    dict(
        n = range(len(datetimeindex)),
        rand=np.random.random(len(datetimeindex))
    ),
    index=datetimeindex
)
df

Unnamed: 0,n,rand
2018-01-01,0,0.309735
2018-01-02,1,0.775406
2018-01-03,2,0.716517
2018-01-04,3,0.162927
2018-01-05,4,0.925509
...,...,...
2019-12-27,725,0.149614
2019-12-28,726,0.964352
2019-12-29,727,0.604193
2019-12-30,728,0.038104


In [30]:
df.loc["2018-03-15"] # Specifict datum
df.loc[:"2018-03-15"] # Från start till
df.loc["2018-03-15":"2018-03-20"] # Från datum till datum
df.loc["2018-03"]
df.loc["2018"]
df.loc["2018-03-15":"2018-04"]

Unnamed: 0,n,rand
2018-03-15,73,0.014592
2018-03-16,74,0.514785
2018-03-17,75,0.50513
2018-03-18,76,0.951583
2018-03-19,77,0.183226
2018-03-20,78,0.165062
2018-03-21,79,0.605856
2018-03-22,80,0.752729
2018-03-23,81,0.297867
2018-03-24,82,0.398402


### Resampling 
Resampling is the practice of creating new samples with a lower or higher freq than the original data

#### Downsampling
When the sample frequency is lower than the original we can aggregate values from the original data

In [36]:
from sklearn.utils import resample
from helpers import hdisplay
resample_method = "Y"
hdisplay([
    df.head(),
df.resample(resample_method).agg({"n": 'max', "rand": 'sum'}).head(10)],
['Original', f"Resampled using '{resample_method}'"]
)

Unnamed: 0,n,rand
2018-01-01 00:00:00,0,0.309735
2018-01-02 00:00:00,1,0.775406
2018-01-03 00:00:00,2,0.716517
2018-01-04 00:00:00,3,0.162927
2018-01-05 00:00:00,4,0.925509

Unnamed: 0,n,rand
2018-12-31 00:00:00,364,184.751663
2019-12-31 00:00:00,729,190.059239


#### Upsamling
When the sample frequency is higher than the original we can fill between values.
- ffill()
- bfill()
- nearest()
- interpolation()
- fillna()

In [51]:
resample_method = "6H"
hdisplay([
    df.head(10),
df.resample(resample_method).max().fillna(df[["n", "rand"]].mean()).head(10)],
['Original', f"Resampled using '{resample_method}'"]
)

Unnamed: 0,n,rand
2018-01-01 00:00:00,0,0.309735
2018-01-02 00:00:00,1,0.775406
2018-01-03 00:00:00,2,0.716517
2018-01-04 00:00:00,3,0.162927
2018-01-05 00:00:00,4,0.925509
2018-01-06 00:00:00,5,0.904025
2018-01-07 00:00:00,6,0.308862
2018-01-08 00:00:00,7,0.862209
2018-01-09 00:00:00,8,0.488998
2018-01-10 00:00:00,9,0.887913

Unnamed: 0,n,rand
2018-01-01 00:00:00,0.0,0.309735
2018-01-01 06:00:00,364.5,0.51344
2018-01-01 12:00:00,364.5,0.51344
2018-01-01 18:00:00,364.5,0.51344
2018-01-02 00:00:00,1.0,0.775406
2018-01-02 06:00:00,364.5,0.51344
2018-01-02 12:00:00,364.5,0.51344
2018-01-02 18:00:00,364.5,0.51344
2018-01-03 00:00:00,2.0,0.716517
2018-01-03 06:00:00,364.5,0.51344
