## Pandas time series 
A series or dataframe using a datetime index instead of a range index.

Used for storing events/data that fits on a timeline.
- Weather data.
- Temperature readings.
- Heart rate monitoring (EKG)
- Quarterly sales.
- Stock prices.

In [35]:
import pandas as pd
import numpy as np

### Creating a datetimeindex

use date_range() with 3 of the 4 parameters:
- start
- end
- periods
- freq

In [36]:
datetimeindex = pd.date_range(start= '2018-01-01', end="2019-12-31", freq="D")
datetimeindex

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10',
               ...
               '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
               '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
               '2019-12-30', '2019-12-31'],
              dtype='datetime64[ns]', length=730, freq='D')

In [37]:
df = pd.DataFrame(
    dict(
        n = range(len(datetimeindex)),
        rand = np.random.random(len(datetimeindex))
    ),
    index=datetimeindex
)

df

Unnamed: 0,n,rand
2018-01-01,0,597.79m
2018-01-02,1,201.75m
2018-01-03,2,199.47m
2018-01-04,3,575.06m
2018-01-05,4,944.94m
...,...,...
2019-12-27,725,225.78m
2019-12-28,726,356.04m
2019-12-29,727,855.43m
2019-12-30,728,942.21m


In [38]:
df.loc["2018-03-15"]
df.loc["2018-01-15"]
df.loc["2018-03-15":"2018-03-20"]
df.loc["2018-03"]
df.loc["2019"]
df.loc["2019-03-29":"2019-05"]


Unnamed: 0,n,rand
2019-03-29,452,74.46m
2019-03-30,453,18.90m
2019-03-31,454,133.16m
2019-04-01,455,385.59m
2019-04-02,456,123.10m
...,...,...
2019-05-27,511,362.75m
2019-05-28,512,843.95m
2019-05-29,513,158.09m
2019-05-30,514,962.62m


### Resampling
Resampling is the practice of creating new samples with a lower or higher frequency than the original data.

### Downsampling

When the sample frequency is lower than the original we can aggregate values from the original data

In [39]:
df.resample("2D")

<pandas.core.resample.DatetimeIndexResampler object at 0x0000020F63383A90>

In [40]:
from helpers import hdisplay

resample_method = "Y"

hdisplay([
    df.head(10),
    df.resample(resample_method).agg({"n": "max", "rand": "sum"}).head(10)],
    ["Original", f"Resampled using '{resample_method}'"]
)

Unnamed: 0,n,rand
2018-01-01 00:00:00,0,0.597794
2018-01-02 00:00:00,1,0.20175
2018-01-03 00:00:00,2,0.199467
2018-01-04 00:00:00,3,0.575055
2018-01-05 00:00:00,4,0.944936
2018-01-06 00:00:00,5,0.295957
2018-01-07 00:00:00,6,0.823455
2018-01-08 00:00:00,7,0.472547
2018-01-09 00:00:00,8,0.770428
2018-01-10 00:00:00,9,0.541101

Unnamed: 0,n,rand
2018-12-31 00:00:00,364,189.113335
2019-12-31 00:00:00,729,170.61996


### Upsampling
When the sample frequency is higher than the original, we can fill between values:
- ffill
- bfill
- nearest
- interpolation
- fillna

In [41]:
resample_method = "6H"

hdisplay([
    df.head(10),
    df.resample(resample_method).max().fillna(df[["n", "rand"]].mean()).head(10)],
    ["Original", f"Resampled using '{resample_method}'"]
)

Unnamed: 0,n,rand
2018-01-01 00:00:00,0,0.597794
2018-01-02 00:00:00,1,0.20175
2018-01-03 00:00:00,2,0.199467
2018-01-04 00:00:00,3,0.575055
2018-01-05 00:00:00,4,0.944936
2018-01-06 00:00:00,5,0.295957
2018-01-07 00:00:00,6,0.823455
2018-01-08 00:00:00,7,0.472547
2018-01-09 00:00:00,8,0.770428
2018-01-10 00:00:00,9,0.541101

Unnamed: 0,n,rand
2018-01-01 00:00:00,0.0,0.597794
2018-01-01 06:00:00,364.5,0.492785
2018-01-01 12:00:00,364.5,0.492785
2018-01-01 18:00:00,364.5,0.492785
2018-01-02 00:00:00,1.0,0.20175
2018-01-02 06:00:00,364.5,0.492785
2018-01-02 12:00:00,364.5,0.492785
2018-01-02 18:00:00,364.5,0.492785
2018-01-03 00:00:00,2.0,0.199467
2018-01-03 06:00:00,364.5,0.492785


### Working on real data

In [44]:
apple = pd.read_csv("../Data/HistoricalData_Apple.csv", parse_dates=["Date"], index_col="Date")
apple.rename(columns={"Close/Last": "Close"}, inplace=True)

to_int_columns = ["Close", "Open", "High", "Low"]

apple[to_int_columns] = apple[to_int_columns].map(lambda value: value.replace("$", "")).astype(("float64"))

#apple = apple.astype(({col: "float64" for col in to_int_columns}))

apple.head()

AttributeError: 'DataFrame' object has no attribute 'map'

In [None]:
#pd.set_option("display.float_format", '{:.2f}'.format)
pd.set_eng_float_format(accuracy=2, use_eng_prefix=True)

apple.loc["2018", "Close"].mean()
apple.resample("Y").mean()


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

data = apple.loc["2020-10"]
#data = apple.loc["2020"].resample("M").mean()
#data = apple.resample("QS").mean()
plt.xticks(rotation=15)
sns.lineplot(data=data, x=data.index, y="Open")