In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set() 
sns.set_style("whitegrid")


# Time series/date  functionality in Pandas

Pandas was initially developed for the purpose of analyzing financial time series data. Therefore, It contains a variety of functionalities to deal with date and time related data.

## Timestamp
+ Date and time are together encapsulated in ```Timestamp``` objects.
+ ```Timestamps``` are integer numbers, representing the nanoseconds ($10^{-9}$ seconds) elapsed since the Unix epoch 1970-01-01 when the first Unix machine officially started to tick.


https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

#### ```pd.to_datetime()```

It allows us to convert any human-readable format for date and time strings to timestamps.

In [2]:
pd.to_datetime(1637571610, unit='s')

Timestamp('2021-11-22 09:00:10')

**`pd.to_datetime()` can take dictionary with datetime features**

In [3]:
my_dict = {'year': [2015, 2016],
                'month': [2, 3],
                'day': [4, 5],
                'minute' : [10, 20]
               }
my_dict

{'year': [2015, 2016], 'month': [2, 3], 'day': [4, 5], 'minute': [10, 20]}

In [None]:
pd.to_datetime(my_dict) # --> it returns a pandas.Series

#### Feature Engineering of dates


- ts.dt.year
- ts.dt.month: January == 1 ... December == 12
- ts.dt.month_name()
- ts.dt.day
- ts.dt.weekday: Monday == 0 … Sunday == 6
- ts.dt.day_name()
- ts.dt.minute
- ts.dt.quarter: January-->March Q1 == 1, April-->June Q2 == 2, July-->September Q3 == 3, October-->December Q4 == 4

**Let's create a toy dataframe with dates**

Q: How do we create a two-columns pandas dataframe having 'description' and 'date' as column names and description and date listes as corrisponding values?

In [None]:
# We can pass a complete column to convert all entries to date time objects [<- usual case]


**Time-Related Features of df**

In [None]:
# Let's create new date features


---

##  DateTimeIndex

```pd.date_range()```

It allows us to generate a range of dates as **DateTimeIndex**, that is an array of **Timestamps**


In [None]:
start_date = ''
end_date = ''

In [None]:
# Specify start and end

pd.date_range(start = start_date, end = end_date)

In [None]:
# Specify start and period (period = total number of days)

pd.date_range(start = start_date, periods= )

In [None]:
# Specify 'end' and 'period' 

pd.date_range(end=end_date,periods=) 

In [None]:
# Change the frequency

pd.date_range(start=start_date, periods=10, freq='')

In [None]:
date_range = pd.date_range(start=start_date, end=end_date,freq='') 
date_range

**If you have a DateTimeIndex object you can extract as well time related feature**

In [None]:
date_range.day_name()

**Let's read some data in a dataframe**

The file contains 8 years stock prices

In [None]:
!head stock_px.csv

### In time series, the time usually goes as index, special DateTimeIndex
But for your project you don't need a DateTime index, because dates is one of your features.

In [None]:
# read in data
df = pd.read_csv('stock_px.csv',parse_dates=True,index_col=0)

# parse_dates=True will try to interpret the index_col as a pd.DatetimeIndex
df

In [None]:
df.info()

In [None]:
df.index

In [None]:
df.index.year

In [None]:
df

In [None]:
# Plots - you don't have to specify x because it's always the index

df[['AAPL','MSFT','XOM']].plot(figsize=(12, 5)) #plot from pandas
plt.ylabel('Prices')

### Resample
```df.resample()```

It works a bit like the the .groupby() function.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

it changes rows to a different time interval. Only works when the index is a datetime object

#### down (fewer rows)
e.g. :
+ df.resample('2D').max()
+ df.resample('3D').mean()


#### up (more rows)
e.g.:
+ df.resample('8h').ffill()
+ df.resample('8h').interpolate(method='linear')

#### down

In [None]:
apple_df = df[['AAPL']]
apple_df.info()

In [None]:
apple_df.index

In [None]:
apple_df

In [None]:
for i in apple_df.resample('5D'):
    print(i)

In [None]:
 apple_df.resample('5D').mean()

### up

In [None]:
 apple_df.head(10)

In [None]:
for i in apple_df.resample(''):
    print(i)

### Rolling averages

Example where the rolling avarage is used :https://interaktiv.tagesspiegel.de/lab/karte-sars-cov-2-in-deutschland-landkreise/

We can use it to remove for example the day-to-day noise from the observations.
- df.rolling('3D').mean() 

In [None]:
 apple_df.head(10)

In [None]:
for i in apple_df.rolling(window='3D'): # window of three consecutive day
    print(i)

In [None]:
apple_df.rolling(window='3D').mean().head() # Average of the current day and previus 2 days 

In [None]:
# Plot with one single line of pandas
apple_df.rolling(window='7D').mean().rename(columns={"AAPL": "AAPL_3D_mean"}).plot(figsize=(12,5));

In [None]:
# Plotting all together (with the original data)
ax =apple_df.plot(c = 'r', figsize=(15,9))
apple_df.rolling('30D').mean().rename(columns={"AAPL": "AAPL_3D_mean"}).plot(ax=ax)

---
#### Bikeshare data

In [None]:
train = pd.read_csv('../data/train.csv')

In [None]:
train.sample()

In [None]:
train.info()