<img src="Images\Slide1.PNG" />

In [None]:
import pandas as pd
import datetime as dt

<img src="Images\Slide2.PNG" />

<img src="Images\Slide3.PNG" />

In [None]:
someday = dt.date(2010, 1, 20)

In [None]:
#Try running each one of these lines

someday.year
#someday.month
#someday.day

In [None]:
#And it can also be turned into a string
str(someday)

<div class = "alert alert-block alert-info">
There's also a datetime object, which contains both date and time

In [None]:
str(dt.datetime(2010, 1, 10, 15, 13, 57)) #default is 00:00:00, can only accept numbers that are actual dates and times

In [None]:
sometime = dt.datetime(2010, 1, 10, 17, 13, 57)

In [None]:
#Try running each one of these lines

#sometime.year
#sometime.month
sometime.day
#sometime.hour
#sometime.minute
#sometime.second

<img src="Images\Slide4.PNG" />

## The `pandas Timestamp` Object

<div class = "alert alert-block alert-info">
Ok, now back to pandas. Pandas was originally developed to analyze financial data, which is why it's so flexible while dealing with Time Series Data <br>
The first object we'll learn is the Timestamp object. As you can see, pandas offers a lot of flexibility when it comes to defining it.

In [None]:
#Let's run each of these lines:

pd.Timestamp("2015-03-31")
#pd.Timestamp("2015/03/31")
#pd.Timestamp("2013, 11, 04")
#pd.Timestamp("1/1/2015")
#pd.Timestamp("19/12/2015")
#pd.Timestamp("12/19/2015")
#pd.Timestamp("4/3/2000")
#pd.Timestamp("2021-03-08 08:35:15")
pd.Timestamp("2021-03-08 6:13:29 PM")

<div class = "alert alert-block alert-info">
Timestamp can accept datetime objects. not just strings.


In [None]:
pd.Timestamp(dt.date(2015, 1, 1))

In [None]:
pd.Timestamp(dt.datetime(2000, 2, 3, 21, 35, 22))

<div class = "alert alert-block alert-info">
The Timestamp object has pretty cool attributes

In [None]:
someday = pd.Timestamp("12/01/2015")
#someday.is_month_end
#someday.is_month_start
#someday.is_quarter_start

<img src="Images\Slide6.PNG" />

## The `pandas DateTimeIndex` Object

<div class = "alert alert-block alert-info">
Next, we'll learn about the other main object, the DateTimeIndex <br>
First let's try to create a DateTimeIndex - 

In [None]:
dates = ["2016/01/02", "2016/04/12", "2009/09/07"]
pd.DatetimeIndex(dates)

<div class = "alert alert-block alert-info">
Now, let's create a Series where the index is a DateTimeIndex

In [None]:
dates = [dt.date(2016, 1, 10), dt.date(1994, 6, 13), dt.date(2003, 12, 29)]
dtIndex = pd.DatetimeIndex(dates)

In [None]:
values = [100, 200, 300]
pd.Series(data = values, index = dtIndex)

<div class = "alert alert-block alert-info">
There's a lot we can do once our index is a DateTimeIndex - we'll return to to that soon.

## The `pd.to_datetime()` Method

<div class = "alert alert-block alert-info">
Next, let's observe a simple method that turns strings or python datetime objects to Timestamps and DateTimeIndex

In [None]:
pd.to_datetime("2001-04-19")
pd.to_datetime(dt.date(2015, 1, 1))
#pd.to_datetime(dt.datetime(2015, 1, 1, 14, 35, 20))

#And what if we input a list?
pd.to_datetime(["2015-01-03", "2014/02/08", "2016", "July 4th, 1996"])

In [None]:
times = pd.Series(["2015-01-03", "2014/02/08", "2016", "July 4th, 1996"])
times

In [None]:
pd.to_datetime(times)

In [None]:
dates = pd.Series(["July 4th, 1996", "10/04/1991", "Hello", "2015-02-31"])
dates

<div class = "alert alert-block alert-info">
Will this work?

In [None]:
pd.to_datetime(dates, errors = "raise") #change to coerce

In [None]:
#And of course it also accepts unix times
pd.to_datetime([1349720105, 1349806505, 1349892905, 1349979305, 1350065705], unit = "s")

## Create Range of Dates with the `pd.date_range()` Method

<div class = "alert alert-block alert-info">
Next, let's learn how to create date ranges - 

In [None]:
pd.date_range(start = "2016-01-01", end = "2016-01-25", freq = "D")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "2D")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "W")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "12H")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "W-MON")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "M")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "MS")
#pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "A")

In [None]:
times = pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "D")

<div class = "alert alert-block alert-info">
If we look at our date range object. we'll see it is indeed a DateTimeIndex of Timestamps

In [None]:
type(times)

In [None]:
type(times[0])

<div class = "alert alert-block alert-info">
We can also use the "periods" parameter, along with either start or end, to create a date_range

In [None]:
pd.date_range(start = "2012-09-09", periods = 50, freq = "6H")

In [None]:
pd.date_range(end = "1999-12-31", periods = 100, freq = "7H")

<img src="Images\Slide5.PNG" />

## The `.dt` Accessor

In [None]:
bunch_of_dates = pd.date_range(start = "2000-01-01", end = "2010-12-31", freq = "24D")

In [None]:
s = pd.Series(bunch_of_dates)
s.head(3)

<div class = "alert alert-block alert-info">
What if we want to check the day of an object in our series

In [None]:
s[0].day

<div class = "alert alert-block alert-info">
And if we want to check the day of all objects in a series?

In [None]:
s.day

In [None]:
s.dt.day
s.dt.weekday_name
#s.dt.is_month_start

In [None]:
mask = s.dt.is_month_end
s[mask]

## The `.dt` Accessor - `floor()\ceil()\round()`

In [None]:
ser = pd.Series(pd.date_range(end = "1999-12-31", periods = 100, freq = "1S"))
#ser.dt.floor("T")
#ser.dt.ceil("T")
#ser.dt.round("T")

## Selecting from a `DataFrame` with a `DateTimeIndex`

<div class = "alert alert-block alert-info">
Now let's get serious - let's work with real Time Series data. <br>
First, let's import Apple's daily stock price from 1980 to 2014 (shame you didn't buy any in 1980, huh?)

In [None]:
pd.read_csv('appl_1980_2014.csv')

In [None]:
data = pd.read_csv('appl_1980_2014.csv', parse_dates=[0]) #Why did we add parse_dates?

In [None]:
data.head()

<div class = "alert alert-block alert-info">
But something isn't right here - look at the index... <br>
Let's fix that

In [None]:
data = data.sort_values('Date').set_index('Date')

<div class = "alert alert-block alert-info">
Excellent - now, let's see how we can access specific rows

In [None]:
data.loc["2014-07-08"]
data.iloc[300]

In [None]:
data.loc["2013-10-01" : "2013-10-07"]
data.iloc[100 : 110]

<div class = "alert alert-block alert-info">
Let's try to extract all the rows that occured on my birthday each year

In [None]:
pd.date_range(start = "1994-10-18", end = "2014-12-31", freq = pd.DateOffset(years = 1)) 
#We can't use A because we'll get the 31/12 of each year

In [None]:
birthdays = pd.date_range(start = "1994-10-18", end = "2014-12-31", freq = pd.DateOffset(years = 1)) 

In [None]:
mask = data.index.isin(birthdays)

In [None]:
data[mask]

## Resample

<div class = "alert alert-block alert-info">
Now we're going to learn an extremely powerful method when working with Time Series <br>
For that, let's import a dataframe of events which occured in specific points in time.

In [None]:
df_io = pd.read_csv('data.csv',index_col=0,parse_dates=['date_'])
df_io.head()

<div class = "alert alert-block alert-info">
This doesn't look so good - let's fix that

In [None]:
df = df_io.copy().sort_values('date_').set_index('date_').drop(columns='val_updated')
df.head()

<div class = "alert alert-block alert-info">
Okay, now, what do we do if we want to count have many events occured each hour? <br>
Here's a bad way to do it:

In [None]:
bad = df.copy()
bad['day'] = bad.index.date
bad['hour'] = bad.index.hour
(bad
.groupby(['day','hour'])
.count()
)

<div class = "alert alert-block alert-info">
And here's a good way

In [None]:
df.resample('H').count() # H is for Hour

In [None]:
df.resample('10T').count()

<div class = "alert alert-block alert-info">
It also does other functions - 

In [None]:
#df.resample('H').sum()
#df.resample('H').mean()

<div class = "alert alert-block alert-info">
And best of all? You can use it with Groupby, it's extremely simple <br>
In our specific case, we want to cound events in an interval per event type.

In [None]:
per_event = (df
             .groupby('event_type')
             .resample('15T')
             .apply('count')
             .rename(columns={'event_type':'amount'})
            )
per_event.groupby('event_type').head(2)

## `Timestamp` Object Attributes

<div class = "alert alert-block alert-info">
Because we saw that Timestamps have some useful attributes, let's look what we can do with them

In [None]:
data.insert(0, "Day of Week", data.index.weekday_name)

In [None]:
data.head(5)

In [None]:
data.insert(1, "Is Start of Month", data.index.is_month_start)

In [None]:
data[data["Is Start of Month"]]

## The `.truncate()` Method

<div class = "alert alert-block alert-info">
Let's learn a quick method to slice parts of a dataframe with DateTimeIndex

In [None]:
data = pd.read_csv('appl_1980_2014.csv', index_col="Date", parse_dates=[0]) 
data.sort_index(inplace = True)

In [None]:
data.truncate(before = "2012-06-12", after = "2012-06-14") #If we hadn't sorted the index chronologically, this wouldn't work!

<img src="Images\Slide7.PNG" />

## `pd.DateOffset` Objects

<div class = "alert alert-block alert-info">
The DateOffset Object is a span of time that pandas can deal with (remember it from my birthday frequencies?) <br>
For example, if we wanted to add a day to every date in our index, how can we do that?

In [None]:
# this is obviously not going to work
data.index +1

In [None]:
data.index + pd.DateOffset(days =1)
#data.index + pd.DateOffset(months = 1, days =1)
#data.index + pd.DateOffset(months = 1, weeks = 2, days =1)
#data.index + pd.DateOffset(years = 3, weeks = 2, days =1)
#data.index + pd.DateOffset(months = 1, weeks = 2, days =1, hours = 2)


<div class = "alert alert-block alert-info">
Let's try something more complex, we have a module called tseries, which has many different functions. <br>
Today we'll use pandas.tseries.offsets <br>
For example, what if we want to round our dates to the nearest end of the month (each date is different)

In [None]:
from pandas.tseries import offsets

In [None]:
#data.index - offsets.MonthEnd()
#data.index + offsets.MonthEnd()
#data.index - offsets.Day()
#data.index - YearBegin()

## The `Timedelta` Object

<div class = "alert alert-block alert-info">
Last object for this lesson - the Timedelta object <br>
The Timedelta is basicly a span of time, a duration <br>
There are 2 ways to create a Timedelta object

In [None]:
timeA = pd.Timestamp("2016-03-31 04:35:16 PM")
timeB = pd.Timestamp("2016-03-20 02:16:49 AM")

In [None]:
timeA - timeB
#timeB - timeA

In [None]:
type(timeA - timeB)

In [None]:
type(timeA)

In [None]:
pd.Timedelta(days = 3, hours = 12, minutes = 45)
#pd.Timedelta(weeks = 8, days = 3, hours = 12, minutes = 45)
#For some reason years doesn't work - so just use days

In [None]:
pd.Timedelta("14 days 6 hours 12 minutes 49 seconds")

<div class = "alert alert-block alert-info">
Now let's look at examples of using Timedeltas in our work

In [None]:
shipping = pd.read_csv("ecommerce.csv", index_col = "ID", parse_dates = ["order_date", "delivery_date"])
shipping.head(3)

In [None]:
shipping["Delivery Time"] = shipping["delivery_date"] - shipping["order_date"]

In [None]:
shipping.head(3)

<div class = "alert alert-block alert-info">
As you can see, it's a different data type

In [None]:
shipping.dtypes

In [None]:
mask = shipping["Delivery Time"] > pd.Timedelta("1000 days")
shipping[mask]

In [None]:
shipping["Delivery Time"].mean()