# 3.1. DateTime in Pandas 🐼

## Additional Learning Resources
Refer to [scikit-learn documentation](https://scikit-learn.org/stable/) and the [Pandas user guide](https://pandas.pydata.org/docs/) for detailed explanations of the functions used in this notebook.
For a quick refresher on splitting data:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```


## How do I work with dates and time in Pandas?

### Time series/date functionality in Pandas

Pandas was initially developed for the purpose of analyzing financial time series data. 
Therefore, It contains a variety of functionalities to deal with date and time related data.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set() 
sns.set_style("darkgrid")

## Timestamp objects
* Date and time are together encapsulated in Timestamp objects.
* The datatype of a TimeStamp object is datetime64

### There are two main methods of creating Timestamps or a DateTimeIndex:
+ 1) pd.to_datetime()
+ 2) pd.date_range()


## 1. Converting Strings to Dates 

* ### Method: pd.to_datetime( )

It allows us to convert any human-readable format for date and time strings to timestamps.

In [None]:
type(pd.to_datetime("Monday 24 January 2000 1:10 pm"))

In [None]:
pd.to_datetime("24.01.2022")

In [None]:
pd.to_datetime("24/01/2022", dayfirst=True)

In [None]:
pd.to_datetime("24.01.2022 13:10:01")

In [None]:
pd.to_datetime("today")

In [None]:
pd.to_datetime(123, unit = 'm') 

# Default time from 1.1.1970. 00:00.00 when the first Unix machine started to tick.

In [None]:
dates = ['24/01/2024', '25/01/2024']
dates = pd.to_datetime(dates)
dates

In [None]:
my_dict = {'year': [2015, 2016],
                'month': [2, 3],
                'day': [4, 5],
                'minute' : [10, 20]}

In [None]:
dict_1 = pd.to_datetime(my_dict)

In [None]:
dict_1

## 2. Creating Date Ranges





* ### Method: pd.date_range()

* It allows us to generate a range of dates as DateTimeIndex, that is an array of Timestamps

In [None]:
pd.date_range('1 May 2021', periods=31, freq='D')

In [None]:

pd.date_range(start= '1 May 2021', end= '31 May 2021', freq='D')

In [None]:
dates = pd.date_range('1 May 2021', '31 May 2021')
dates

In [None]:
dates.day_name()

In [None]:
max_temperature = [18,24,11,15,20,20,22,15,16,17,10,11,20,18,16,16,14,10,10,10,15,15,15,17,13,16,17,17,12,13,12]
min_temperature = [9,15,5,7,10,9,12,7,6,7,3,6,3,4,3,2,3,4,2,6,7,6,5,6,5,7,7,4,6,2,2,]

In [None]:
df = pd.DataFrame({'dates': dates, 'max_t': max_temperature, 'min_t': min_temperature})

In [None]:
df.info()

In [None]:
df.head(5)

### Very important for your project!!! 

## 3. Accessing DateTime Indices

* ts.dt.year

* ts.dt.month

* ts.dt.month_name()

* ts.dt.day

* ts.dt.weekday: Monday == 0 … Sunday == 6

* ts.dt.day_name()

* ts.dt.minute

* ts.dt.quarter

In [None]:
dates.weekday

In [None]:
dates.month_name()

In [None]:
dates.day

In [None]:
dates.quarter

## 4. Timedelta

### Pandas has inbuilt Timedelta objects
* An array of Timedelta objects is a TimedeltaIndex
* The datatype of a TimeDelta object is timedelta64
* There are three methods of creating Timedeltas or a TimedeltaIndex:

+ #### pd.to_timedelta()
+ #### pd.timedelta_range()
+ ####  Subtract two pd.Timestamp objects


In [None]:
dates

In [None]:
dates[-1]

In [None]:
dates[0]

In [None]:
dates[-1] - dates[0]

In [None]:
pd.timedelta_range('10 days', periods=20, freq='12h')

### Other datetime functionalities:

- conveniently slicing/indexing based on dates
- convert time zones (dt.tz_localize, dt.tz_convert)
- rolling aggregates and resampling

## 5. Creating DatetimeIndex

In [None]:
df.set_index(pd.to_datetime(df['dates']), inplace= True)


In [None]:
df

In [None]:
df.drop(['dates'], axis = 1, inplace= True)


In [None]:
df.head(5)

In [None]:
plt.figure(figsize=(15,8))
sns.lineplot(data=df, markers=True)

plt.xticks(rotation=50)
plt.ylabel('Temperature [C]')
plt.xlabel('Date')

plt.title('Max and Min Temp')

plt.show()


## 6. Other Concepts:

### Resample


#### down by 3 days

* df.resample().mean()
* df.resample().max()

In [None]:
df.head(7)

In [None]:
df.resample('7D').max()

### up by 8h

* df.resample().ffill()
* df.resample().bfill()

In [None]:
df.resample('8h').ffill()

In [None]:
df.resample('8h').bfill()

### Rolling averages


* df.rolling().mean()

In [None]:
df.rolling('3D').mean()

In [None]:
# The moving average for 3 days it will take the 3 days, adds them up and divide them by 3
# Average of the last n values in a dataset applied row by row so in the and you will have a series of averages

In [None]:
df.rolling(3).mean().rename(columns={"max_t": "mean_max_T", "min_t": "mean_min_T"}).plot(figsize=(12,5))

In [None]:
ax = df.plot.line( y=['min_t', 'max_t'], figsize=(12,5), color = 'k', style="--")
df.rolling('3D').mean().rename(columns={"max_t": "mean_max_T", "min_t": "mean_min_T"}).plot(ax = ax)
plt.ylabel('Temperature [C]')
plt.ylim([0,35])


## 7. Bikeshare Dataset

In [None]:
df_bike = pd.read_csv('train.csv', index_col=0, parse_dates=True)

In [None]:
df_bike.info()

In [None]:
df_bike.head()

In [None]:
df_bike.loc['2011-11-01':'2011-11-15'].between_time('00:00:00',"01:00:00")

## 👀  What next?  
### Create time-related features
+ Examine whether the bicycle count shows any time-related patterns.
+ Extract features like hour, month etc. from the datetime column.
+ Plot small sections of the data (1 day, 1 week etc.)
+ Group by a time feature and observe grouped means

[Course Material: 3.1. DateTime in Pandas](https://spiced.space/ordinal-oregano/ds-course/chapters/project_bicycles/ts_pandas/README.html)

In [None]:
# Practice: implement the steps discussed above
