In [1]:
import pandas as pd

- the method to convert a given time to datetime is by using **pd.to_datetime()**

In [2]:
# Example of date time conversion using pandas

pd.to_datetime('January/02/2011')

Timestamp('2011-01-02 00:00:00')

In [3]:
pd.to_datetime('Jan, 01, 2010')

Timestamp('2010-01-01 00:00:00')

- as long as the date in the given in somewhat normal format(generally acceptable format), pandas will covert the given date into the datetime

In [4]:
# following will produce an error because this is not generally how date is written in everyday life

pd.to_datetime('Jan:1:1970')

ParserError: Unknown string format: Jan:1:1970

- Way to fix the error we get from above syntax is by supplying the format argument that represents how the date is written as an argumen.
**Example**

In [None]:
# to fix the error we got above

pd.to_datetime('Jan:1:1970', format='%b:%d:%Y')



# here we have provided the format of the date. The syntax used in the format can be easily found in pandas datetime
# documentation

- **Lets learn datatime manipulation with a hands on approach**

In [None]:
# getting data

df = pd.read_csv('https://gist.githubusercontent.com/zgulde/c60209f379155744ced4dfc57068e55b/raw/97fc3ab1e2f6fff6ffd409d60029c53954d5784e/coffee_consumption.csv')

In [None]:
df

- we can see that the dataframe has two columns and an index column. One of the column has date stored in it and the other has the amount of coffee consumed on that particular date

In [None]:
# lets check the datatypes of the dataframe


df.dtypes


# the date is not of datetime format. So, we must convert it first

In [None]:
# lets convert our date column to datetime type
df.date = pd.to_datetime(df.date)
df.dtypes

- We can access the year, month, month name, day, day name from the datetime data type
    - one way of doing so is df.date.dt.year or month or month_name() or day or day_name()
    - another way is after setting the date as index and then using df.index.year or month or month_name() or day 

In [None]:
# Accessing the year, month, day, and day name from our datetime dtype.

df['year'] = df.date.dt.year
df['month'] = df.date.dt.month
df['day'] = df.date.dt.day
df['weekday'] = df.date.dt.day_name()
df.head()

In [None]:
# cleanup demonstration columns
del df['year']
del df['day']
del df['month']
del df['weekday']

**DateTime Indexers**
- Setting datetime as index allows for data manipulation in various ways and can be extremely handy tool
- So, it is almost always we set the date as index whenever we are manipulating data with respect to date

In [None]:
# setting date as index. sort_index is extremely important

df = df.set_index('date').sort_index()

In [None]:
df

In [None]:
df.index.min(), df.index.max()

- We can slice the dataframe using the datetime index, however the indices are inclusive. 
**Example**

In [None]:
# the ending date here is inclusive compared to list slicing

df['2019-05-03':'2019-05-25']

**We can downsample or upsample our data when we have a dateime and date as index**


**downsample:** wider time window == less data points

**upsample:** smaller time window == more data points

In [None]:
# Our data was missing some days, it seemed like after every five days there were 2 days missing. So, we can use 
# upsample to get more data points as we are adding days 

df_by_day = df.asfreq('D')

In [None]:
df_by_day

**Filling missing values that we just introduced**

In [None]:
df_by_day.coffee_consumption.ffill()

In [None]:
df_by_day.coffee_consumption.bfill()

In [None]:
# we will fill with 0 because we have noone come to the office during the weekend
# so 0 coffee consumption
df = df_by_day.fillna(0)

In [None]:
df

### Resampling
- Kind of like groupby for dates

In [None]:
df.resample('3W').mean()

In [None]:
df.resample('3M').sum()

In [None]:
# getting the month to month average of total weekly coffee consumption
df.resample('W').sum().resample('M').mean()

In [None]:
# df.resample('MS') - month start
# df.resample('Q') - Quarterly

## Plotting

In [None]:
df.plot()

In [None]:
df.resample('3M').mean().plot(marker = 'o')

### rolling windows
- the idea is like rolling average, moving average

In [None]:
df.rolling(2).mean()

### Lagging and Leading
- way to compare each data point with previous data points or future data points
- **shift** moves data points forward or backward
- **diff** takes the difference between future or past fata points

### Other Things You Can Do

**Create a range of datetime**

In [7]:
pd.date_range('2020-01-01', periods = 365, freq = 'D')

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2020-12-21', '2020-12-22', '2020-12-23', '2020-12-24',
               '2020-12-25', '2020-12-26', '2020-12-27', '2020-12-28',
               '2020-12-29', '2020-12-30'],
              dtype='datetime64[ns]', length=365, freq='D')

In [13]:
# creating a range of the period of time

dti = pd.period_range('2020/10/01', freq = 'D', periods = 5)
dti

PeriodIndex(['2020-10-01', '2020-10-02', '2020-10-03', '2020-10-04',
             '2020-10-05'],
            dtype='period[D]', freq='D')

**Manipulate the timezone**

In [9]:
dti = pd.date_range('2020-01-01', periods = 5, freq = 'H')

# converting the time zone to us central
dti = dti.tz_localize('US/Central')
dti

DatetimeIndex(['2020-01-01 00:00:00-06:00', '2020-01-01 01:00:00-06:00',
               '2020-01-01 02:00:00-06:00', '2020-01-01 03:00:00-06:00',
               '2020-01-01 04:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq=None)