# Working with Dates and Times

If you need to work with dates and times you can rapidly run into issues. 4 - 3 is 1, but how many days are there between Jan 7, 2012 and March 23, 2017? Thankfully, computers are good at these sorts of calculations, and Python provide a library for working with dates and times that makes these sorts of calculations very straightforward. If you don't intend to work with date and time information in this way this section is something you can safely skip, but if you do this will save you a lot of time and frustration.

### The Datetime Library
Python provides a library for handling dates and time called "datetime". We'll start here, because it will help to understand datetimes in pandas, but also because you may want to use datetime alongside pandas.

First, we need to import datetime. Because I hate typing long names this will import datetime and rename it to "dt".

In [None]:
import datetime as dt

First, let's create two datetime objects that we can play with.

In [None]:
present = dt.datetime.now()
print(present)
past = dt.datetime(year=1997, month=2, day=18, hour=2, minute=17, second=39)
print(past)

Above, we created a variable `present` that grabbed the time the moment you ran that cell. If you run it again `present` will change because it will be reset to that time. We also created a variable `past` that we set to 2:17:39 am on February 18th, 1997. (We could easily set it to 2 pm by setting the time to 14. Or, if you don't love 24-hour time, setting it equal to 2 + 12.)

You can also now see another reason I like to rename datetime: datetime the library has another object in it called datetime! When we ran `dt.datetime.now()` we asked for the now function in the datetime object in the datetime library, which is under the alias dt. It can be easier to read if one of these datetimes is dt and the other datetime.

Let's do some simple math now to determine how long it has been since 2:17:39 am on February 18th, 1997.

In [None]:
diff = present - past
print(diff)

`diff` is a `timedelta` object, a datetime object that stores the differences between other datetime objects. It has three attributes we can access: days, seconds, and microseconds. For instance, if we just want the days since `past` we can write:

In [None]:
diff.days

If we want seconds this will work:

In [None]:
diff.seconds

Note, however, that diff.seconds only gave us the seconds that were NOT included as days! If we actually wanted to measure the total number of seconds since `past` we would need to write this:

In [None]:
seconds = diff.seconds + (diff.days * 24 * 60 * 60)
print(seconds)

#### Quickly, just for practice, make another variable that uses dt.datetime.now() to set a time to the current runtime, and print how many seconds since the variable called present.

In [None]:
# your code goes here


### Datetimes in Pandas
Let's now work with some datetime objects in a pandas dataframe. The code below will create a dataframe with an intake date and a discharge date for 50 people, presumably at a hospital.

In [None]:
import random
import pandas as pd

date_dict = {'Intake': [], 'Discharge': []}

for x in range(0, 50):
    year = random.randint(1990, 2020)
    intake = dt.datetime(year=year, month=random.randint(1, 12), day=random.randint(1, 28), 
                         hour=random.randint(0, 23))
    discharge = intake + dt.timedelta(days=random.randint(0, 30), seconds=random.randint(60 * 60, 23 * 60 * 60))
    date_dict['Intake'].append(intake)
    date_dict['Discharge'].append(discharge)
    
date_frame = pd.DataFrame(date_dict)
date_frame.head()

Let's check the dtypes.

In [None]:
date_frame.dtypes

As you can see, the dtypes are `datetimes64[ns]`. They are pandas datetime objects, and you can do math with them and get timedelta objects. Technically, they are slightly different than datetime, but the concepts are very similar.

One difference is how we access the year, month, day, etc. In the case of the pandas row we need to do `.dt.year` instead of just `.year` (or whatever time period). The code below prints the year for every intake.

In [None]:
date_frame['Intake'].dt.year

However, the pandas version has some nice extra features. What if you suspected that the care patients got was different during weekdays versus weekends, or that certain sorts of patients were more likely to wait until a weekend to get care? You could do a lot of math, or you could do this:

In [None]:
date_frame['Intake'].dt.weekday

That's the day of the week, starting at Monday (0) and ending on Sunday (6). Now, we could start writing out a list of days and use this as a list index to pull the name of the day (`weekday_names[date_frame['Intake'].dt.weekday]`, roughly), but pandas has anticipated this need.

In [None]:
date_frame['Intake'].dt.day_name()

Note that unlike `weekday`, which is an attribute, `day_name` is a method, and so it must be called as a function (`day_name()`).

When dealing with timedelta objects in pandas we again use the `dt` notation. Like a regular timedelta this has days, seconds, and microseconds, but these will be accessed via `dt.days`, `dt.seconds`, and `dt.microseconds`.

Below, we are creating a column that creates a new column of the seconds the person was in the hospital. Note that to use the dot notation on the calculated timedelta we need to enclose the calculation in parentheses, which makes it clear to Python that we finish that calculation and then get the `.dt.seconds` from that.

In [None]:
date_frame['Seconds of Hospitalization'] = (date_frame['Discharge'] - date_frame['Intake']).dt.seconds
date_frame.head()

#### Below, create a new column that has the number of days the person was in the hospital. (Note the name of the dataframe, as well. Unlike last notebook it isn't df.)

In [None]:
# your code goes here


What about dates that come as text? The dataframe below has dates in three formats, all text: month-day-year, day-month-year, and year-month-day.

In [None]:
text_dict = {'Month First': [], 'Day First': [], 'Year First': []}

for x in range(0, 50):
    year = str(random.randint(1990, 2020))
    month = str(random.randint(1, 12))
    day = str(random.randint(1, 28))
    text_dict['Month First'].append(month + '-' + day + '-' + year)
    text_dict['Day First'].append(day + '-' + month + '-' + year)
    text_dict['Year First'].append(year + '-' + month + '-' + day)
    
text_frame = pd.DataFrame(text_dict)
text_frame.head()

Pandas provides us with the `to_datetime` function. Let's run that on the month first data.

In [None]:
pd.to_datetime(text_frame['Month First'])

To run a day-first column we only need to pass the keyword argument `dayfirst=True`.

In [None]:
pd.to_datetime(text_frame['Day First'], dayfirst=True)

#### Now, write code that handles the year-first column and makes a new column, Date, that is the date in a datetime format. Year-first does not require any special arguments.

In [None]:
# your code goes here


What abotu reassembling datetimes from components? The frame below (`separate_frame`) has three pieces of data that, together, could be the year, month, and day of a datetime object.

In [None]:
separate_frame = pd.DataFrame()
separate_frame['Intake Year'] = date_frame['Intake'].dt.year
separate_frame['Intake Month'] = date_frame['Intake'].dt.month
separate_frame['Intake Day'] = date_frame['Intake'].dt.day
separate_frame['Intake Hour'] = date_frame['Intake'].dt.hour
separate_frame.head()

It turns out that `pd.to_datetime` can also convert a selection of columns into a pandas Series of datetime objects if the column names match keyword arguments, like "year", or "Month". (Capitalization, in this particular case, does not matter.) Minimally, you need a year, a month, and a day.

The code below asks for Intake Year, Intake Month, and Intake Day and renames them "year", "month", and "day". This is NOT an in-place operation, because this new object is meant to be temporary.

In [None]:
separate_frame[['Intake Year', 'Intake Month', 'Intake Day']].rename(columns={'Intake Year': 'year',
                                                                              'Intake Month': 'month',
                                                                             'Intake Day': 'day'})

Below, we take the whole operation above and enclose it in the parentheses of `pd.to_datetime`.

In [None]:
pd.to_datetime(separate_frame[['Intake Year', 'Intake Month', 'Intake Day']].rename(columns={'Intake Year': 'year',
                                                                              'Intake Month': 'month',
                                                                             'Intake Day': 'day'}))

If that were assigned to a column we would have a new, assembled, datetime column.

#### Below, make a new column in the separate_frame dataframe that is the datetime object that represents all the columns together.

In [None]:
# your code goes here


And that's it. While dates and times can be very tricky if you treat them like numbers converting them to datetime objects can make them easy to deal with.