# Working With Dates in Pandas

What is it?
- this is focusing on all the ways we can manipulate dates in pandas

Why do we care?
- being able to manipulate dates will allow us to prepare the data to analyze trends over time

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

## Create your own date

#### using pandas

In [None]:
#with Timestamp()


#### using datetime module from datetime library

In [None]:
#import


In [None]:
#with datetime()


In [None]:
#datatype


#### calculate now (using datetime module)

In [None]:
#datatype


## Add/subtract dates

#### subtract two dates

#### use Timedelta to alter a date

## Transform to date format

### One date

In [None]:
date = 'Jan 1 1970'

In [None]:
#datatype


In [None]:
#use pd.to_datetime() to convert


In [None]:
#datatype


### One date, but confuse pandas

In [None]:
date = 'Jan:7:1970'

In [None]:
#datatype


In [None]:
#use pd.to_datetime() to convert


We can fix this error using the `format` argument.

For info on formatting: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [None]:
#use format arguement


#### using `strftime` to reformat date to more readable version

In [None]:
#datatype


### Now a whole columns of dates

Data: the amount of coffee consumed per day

In [None]:
url = "https://gist.githubusercontent.com/ryanorsinger/\
b309f8db19e0ca71b213d4877d835e77/raw/f5841017310e2f4ca070b313529ceec2375336ba/coffee_consumption.csv"
df = pd.read_csv(url)

In [None]:
#look at data


In [None]:
#datatypes


In [None]:
#use pd.to_datetime() to convert


In [None]:
#can also use .astype() to convert


<div class="alert alert-block alert-info"> <b>NOTE:</b> use pd.to_datetime when you have weird dates, so you can use the format argument </div>

In [None]:
#datatypes


## Now that they are in a date format, let's manipulate them

### extract pieces of the date

<div class="alert alert-block alert-info"> <b>NOTE:</b> use <code>.dt</code> when using datetime methods/attributes on a series (just like using <code>.str</code>when using string functions) </div>

the documentation: https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html

### add them back to our initial dataframe

In [None]:
#add them all


### reformat date using `strftime()`

## Mini exercise 

1. import the datetime module from the datetime library
2. convert the date column to a datetime
3. subtract one day from each date
4. reformat the dates as "Sun - March 26, 2022"


In [None]:
url = 'https://gist.githubusercontent.com/misty-garcia/\
8c099128d3f59c32afaa5aa2c3e4fb62/raw/2a4c06ea955266e276a78af5d2e1083cfd348703/mockdates'

df = pd.read_csv(url,sep='\t')

## Time to make it more complex!

Scenario: We're looking at cryptocurrency close value and volume over time. 

In [None]:
#save url
sheet_url = 'https://docs.google.com/spreadsheets/d/1kTrAFSrr-xP3REs0Lly0TdV4ekrHahBXLg9r5qKxmV8/edit#gid=0'
csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

#read in df
df = pd.read_csv(csv_export_url)
df

In [None]:
#lowercase columns


#### let's plot the our close value

In [None]:
#distribution

# plt.title('close')
# plt.show()

In [None]:
#line plot

# plt.title('the close values on a plot')
# plt.show()

<div class="alert alert-block alert-info"> 
    
<b>NOTE:</b> when we plot a single series using .plot(), the x-axis is the index value 

</div> 
 
 

## How do we make pandas time aware?

1. Convert 'date' column to datetime object
2. Set the datetime column as index
3. Sort the datetime index

### 1. Convert 'date' column to datetime object

Reminder: format argument allows us to tell pandas the makeup of our date, so it can be understood

### 2. Set the datetime column as index

### 3. Sort the datetime index

#### Now let's plot it again!

In [None]:
#line plot

# plt.title('the close value over time')
# plt.show()

Q: Why is the graph different now?

## Let's look at the the mean close value on each day of the week

<div class="alert alert-block alert-info"> <b>NOTE:</b> when the datetime is an index, we don't use <code>.dt</code> to use datetime methods/attributes</div>

In [None]:
#pull out weekday name & save


#### let's plot it!

In [None]:
# plt.figure(figsize=(10,6))


# plt.title('the mean close value each day of the week')
# plt.show()

### I don't like that the days aren't in order

In [None]:
#use dayofweek attribute


#### let's plot it better this time!

In [None]:
# plt.figure(figsize=(10,6))


# plt.title('the mean close value each day of the week')
# plt.show()

## How do we get a subset of the dataframe?

`.loc` vs `.iloc`

- loc subsets based on NAME
- iloc subsets based on POSITION

Why is this helpful?
- we can use `.loc` to name a date or range of dates to subset our df

In [None]:
#single .loc value


In [None]:
#range of .loc values


## What if we want a different period of data?
- downsampling
- upsampling
- resampling
- rolling averages
- shift/difference

In [None]:
#drop extra columns


### Downsampling: reduce frequency

reduce the number of rows by removing more precise units of time
- use `asfreq` to change the period

#### Example: the level of granularity of our data is currently to the hour

#### reduce granularity

In [None]:
#set frequency to daily


In [None]:
#set frequency to monthly


### Upsampling: Increase frequency
increase the number of rows by adding more precise units of time
- use `asfreq` to change the period (same as before)

#### Example: the level of granularity of our data is currently to the hour

#### increase granularity

In [None]:
#set frequency to minutes


#### fill the nulls that were generated

'ffill' = forward fill

'bfill' = backward fill

In [None]:
#use ffill method


In [None]:
# use bfill method


### Resampling - Aggregating over time
select a level of granularity and get an aggregated value from it

In [None]:
#get the daily mean


In [None]:
#get mean, min, max


#### let's plot it!

In [None]:
# plt.figure(figsize=(14,10))


# plt.title('plotting close over time with various resampling techniques')
# plt.legend()
# plt.show()

### Rolling averages

used to smooth out short-term fluctuations in time series data and highlight long-term trends
- use `rolling()` to calculate

In [None]:
#use the rolling function


> the rolling average is the previous number of units averaged together  
> in this example, the rolling average on 2017-07-05 is the average of 2017-07-01 - 2017-07-05

#### let's plot it!

In [None]:
# plt.figure(figsize=(12,8))

#original granularity

#resample by week and look at 4 weeks 

#resample by week and look at 12 weeks 

# plt.legend()
# plt.title('original vs rolling averages')
# plt.show()

### How about Lagging or Leading the data?

* `.shift`: move the data backwards and forwards by a given amount
* `.diff`: find the difference with the previous observation (or a specified further back observation)

In [None]:
#shift by one


In [None]:
#shift by negative 1


In [None]:
#difference by one


# Recap

- use `datetime.datetime` module
- to cast as a date
    - `.astype('datetime64')`
    - `pd.to_datetime()`
        - can use `format` argument for funky dates
- strftime notation
    - https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
- attritubes and methods
    - https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html
- to make pandas time aware
    1. Convert 'date' column to datetime object
    2. Set the datetime column as index
    3. Sort the datetime index
- sampling methods
    - downsampling/ upsampling
        - `.asfreq()` 
        - fill nulls
            - `.ffill`/`.bfill`
    - resampling
        - `.resample()`
    - rolling average
        - `.rolling()`
        
    