# Working with Dates and Times

## Summary
In this notebook, we'll be covering:
- [The Datetime Library](#The-Datetime-Library)
- [Datetimes in Pandas](#Datetimes-in-Pandas)

Working with dates and times in Python can often be confusing at first. For example, you know how to use Python to evaluate 4 - 3 (which equals 1, of course). But how can we use Python to compute the number of days betweeen Jan 7, 2012 and March 23, 2017? 

Thankfully, computers are good at these sorts of calculations, and Python provide a library for working with dates and times that makes these sorts of calculations very straightforward. If you don't intend to work with date and time information in this way this section is something you can safely skip, but if you do this will save you a lot of time and frustration.

### The Datetime Library
Python provides a library for handling dates and time called "datetime". Not only is the datetime library useful on its own, but it also integrates nicely with pandas.

First, we need to import datetime. By convention, datetime is typically imported and renamed to "dt".

In [1]:
import datetime as dt

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


First, let's create two datetime objects that we can play with.

In [2]:
present = dt.datetime.now()
print(present)
past = dt.datetime(year=1997, month=2, day=18, hour=2, minute=17, second=39)
print(past)

2024-07-09 09:58:59.462661
1997-02-18 02:17:39


Above, we created a variable `present` that grabbed the time the moment you ran that cell. If you run it again, `present` will change because it will be reset to the new time. We also created a variable `past` that we set to 2:17:39 am on February 18th, 1997. (We could easily set it to 2 pm by setting the time to 14. Or, if you don't love 24-hour time, setting it equal to 2 + 12.)

You can also now see why the datetime library is renamed as "dt": datetime (the library) has another object in it called datetime! When we ran `dt.datetime.now()` we asked for the `now()` function in the `datetime` object in the `datetime` library, which is under the alias `dt`. It can be easier to read if one of these datetimes is dt and the other datetime.

Let's do some simple math now to determine how long it has been since 2:17:39 am on February 18th, 1997.

In [3]:
diff = present - past
print(diff)

10003 days, 7:41:20.462661


`diff` is a `timedelta` object, a datetime object that stores the differences between other datetime objects. It has three attributes we can access: days, seconds, and microseconds. For instance, if we just want the days since `past` we can write:

In [4]:
diff.days

10003

If we want seconds this will work:

In [5]:
diff.seconds

27680

Note, however, that diff.seconds only gave us the seconds that were NOT included as days! If we actually wanted to measure the total number of seconds since `past` we would need to write this:

In [6]:
seconds = diff.seconds + (diff.days * 24 * 60 * 60)
print(seconds)

864286880


#### Quickly, just for practice, make another variable that uses dt.datetime.now() to set a time to the current runtime, and print how many seconds since the variable called present.

In [None]:
# your code goes here


### Datetimes in Pandas
Let's now work with some datetime objects in a pandas dataframe. The code below will create a dataframe with an intake date and a discharge date for 50 people, presumably at a hospital.

In [7]:
import random
import pandas as pd

date_dict = {'Intake': [], 'Discharge': []}

for x in range(0, 50):
    year = random.randint(1990, 2020)
    intake = dt.datetime(year=year, month=random.randint(1, 12), day=random.randint(1, 28), 
                         hour=random.randint(0, 23))
    discharge = intake + dt.timedelta(days=random.randint(0, 30), seconds=random.randint(60 * 60, 23 * 60 * 60))
    date_dict['Intake'].append(intake)
    date_dict['Discharge'].append(discharge)
    
date_frame = pd.DataFrame(date_dict)
date_frame.head()

Unnamed: 0,Intake,Discharge
0,1990-06-04 04:00:00,1990-06-08 06:29:53
1,2014-02-05 07:00:00,2014-02-09 13:10:29
2,1990-04-02 23:00:00,1990-04-29 16:38:05
3,1992-11-04 11:00:00,1992-11-16 01:35:04
4,2019-01-14 17:00:00,2019-02-01 10:19:14


Let's check the dtypes.

In [8]:
date_frame.dtypes

Intake       datetime64[ns]
Discharge    datetime64[ns]
dtype: object

As you can see, the dtypes are `datetimes64[ns]`. They are pandas datetime objects, and you can do math with them and get timedelta objects, just like we did above using the datetime library. Technically, they are slightly different than datetime, but the concepts are very similar.

One difference is how we access the year, month, day, etc. In the case of the pandas row we need to do `.dt.year` instead of just `.year` (or whatever time period). The code below prints the year for every intake.

In [9]:
date_frame['Intake'].dt.year

0     1990
1     2014
2     1990
3     1992
4     2019
5     1993
6     2012
7     1996
8     2008
9     1999
10    1995
11    1992
12    2010
13    2007
14    1997
15    2019
16    2009
17    1993
18    1996
19    2012
20    2012
21    1992
22    2004
23    1998
24    1990
25    2010
26    1993
27    2016
28    2006
29    1995
30    1995
31    2016
32    2018
33    2006
34    2000
35    1997
36    2020
37    2018
38    2014
39    1995
40    2020
41    2002
42    2003
43    2013
44    2016
45    2013
46    2016
47    1992
48    2016
49    1992
Name: Intake, dtype: int32

However, the pandas version has some nice extra features. What if you suspected that the care patients got was different during weekdays versus weekends, or that certain sorts of patients were more likely to wait until a weekend to get care? You could do a lot of math, or you could do this:

In [10]:
date_frame['Intake'].dt.weekday

0     0
1     2
2     0
3     2
4     0
5     4
6     4
7     5
8     5
9     3
10    5
11    2
12    1
13    0
14    0
15    2
16    3
17    6
18    3
19    3
20    4
21    4
22    2
23    0
24    3
25    6
26    4
27    3
28    4
29    5
30    2
31    5
32    4
33    5
34    1
35    1
36    3
37    6
38    5
39    2
40    6
41    5
42    0
43    1
44    3
45    2
46    0
47    3
48    4
49    6
Name: Intake, dtype: int32

That's the day of the week, starting at Monday (0) and ending on Sunday (6). Now, we could start writing out a list of days and use this as a list index to pull the name of the day (`weekday_names[date_frame['Intake'].dt.weekday]`, roughly), but pandas has anticipated this need with the `day_name()` function.

In [11]:
date_frame['Intake'].dt.day_name()

0        Monday
1     Wednesday
2        Monday
3     Wednesday
4        Monday
5        Friday
6        Friday
7      Saturday
8      Saturday
9      Thursday
10     Saturday
11    Wednesday
12      Tuesday
13       Monday
14       Monday
15    Wednesday
16     Thursday
17       Sunday
18     Thursday
19     Thursday
20       Friday
21       Friday
22    Wednesday
23       Monday
24     Thursday
25       Sunday
26       Friday
27     Thursday
28       Friday
29     Saturday
30    Wednesday
31     Saturday
32       Friday
33     Saturday
34      Tuesday
35      Tuesday
36     Thursday
37       Sunday
38     Saturday
39    Wednesday
40       Sunday
41     Saturday
42       Monday
43      Tuesday
44     Thursday
45    Wednesday
46       Monday
47     Thursday
48       Friday
49       Sunday
Name: Intake, dtype: object

Note that unlike `weekday`, which is an attribute, `day_name` is a method, and so it must be called as a function (`day_name()`).

When dealing with timedelta objects in pandas we again use the `dt` notation. Like a regular timedelta this has days, seconds, and microseconds, but these will be accessed via `dt.days`, `dt.seconds`, and `dt.microseconds`.

Below, we are creating a column that represents the seconds the person was in the hospital. Note that to use the dot notation on the calculated timedelta we need to enclose the calculation in parentheses, which makes it clear to Python that we finish that calculation and then get the `.dt.seconds` from that.

In [12]:
date_frame['Seconds of Hospitalization'] = (date_frame['Discharge'] - date_frame['Intake']).dt.seconds
date_frame.head()

Unnamed: 0,Intake,Discharge,Seconds of Hospitalization
0,1990-06-04 04:00:00,1990-06-08 06:29:53,8993
1,2014-02-05 07:00:00,2014-02-09 13:10:29,22229
2,1990-04-02 23:00:00,1990-04-29 16:38:05,63485
3,1992-11-04 11:00:00,1992-11-16 01:35:04,52504
4,2019-01-14 17:00:00,2019-02-01 10:19:14,62354


#### Below, create a new column that has the number of days the person was in the hospital. (Note the name of the dataframe, as well. Unlike last notebook, it isn't df.)

In [None]:
# your code goes here


What about dates that come as text? The dataframe below has dates in three formats, all text: month-day-year, day-month-year, and year-month-day.

In [13]:
text_dict = {'Month First': [], 'Day First': [], 'Year First': []}

for x in range(0, 50):
    year = str(random.randint(1990, 2020))
    month = str(random.randint(1, 12))
    day = str(random.randint(1, 28))
    text_dict['Month First'].append(month + '-' + day + '-' + year)
    text_dict['Day First'].append(day + '-' + month + '-' + year)
    text_dict['Year First'].append(year + '-' + month + '-' + day)
    
text_frame = pd.DataFrame(text_dict)
text_frame.head()

Unnamed: 0,Month First,Day First,Year First
0,3-3-2018,3-3-2018,2018-3-3
1,1-14-2002,14-1-2002,2002-1-14
2,2-9-2016,9-2-2016,2016-2-9
3,5-12-2015,12-5-2015,2015-5-12
4,6-26-1995,26-6-1995,1995-6-26


Pandas provides us with the `to_datetime` function, which serves to convert text to datetime format. Let's run that on the month first data.

In [14]:
pd.to_datetime(text_frame['Month First'])

0    2018-03-03
1    2002-01-14
2    2016-02-09
3    2015-05-12
4    1995-06-26
5    1993-02-28
6    2008-11-12
7    2011-04-15
8    2009-12-09
9    1995-04-05
10   2015-09-24
11   2000-05-02
12   2008-06-16
13   2011-12-20
14   2020-06-22
15   1992-02-05
16   2006-04-02
17   2007-03-01
18   2006-05-04
19   2008-09-10
20   1997-06-26
21   2002-05-03
22   1998-02-24
23   2012-11-26
24   2019-02-02
25   1997-11-02
26   1991-04-05
27   1996-07-04
28   2002-10-25
29   2005-09-28
30   2016-10-26
31   1996-02-18
32   1991-04-05
33   1997-10-09
34   2006-08-18
35   1994-07-11
36   1993-10-03
37   1997-07-28
38   2010-01-04
39   1997-04-01
40   2009-11-07
41   2012-10-13
42   1991-10-15
43   2001-08-27
44   2010-12-07
45   1993-07-06
46   2008-05-19
47   1995-01-08
48   1996-11-23
49   2020-04-13
Name: Month First, dtype: datetime64[ns]

To run a day-first column we only need to pass the keyword argument `dayfirst=True`.

In [15]:
pd.to_datetime(text_frame['Day First'], dayfirst=True)

0    2018-03-03
1    2002-01-14
2    2016-02-09
3    2015-05-12
4    1995-06-26
5    1993-02-28
6    2008-11-12
7    2011-04-15
8    2009-12-09
9    1995-04-05
10   2015-09-24
11   2000-05-02
12   2008-06-16
13   2011-12-20
14   2020-06-22
15   1992-02-05
16   2006-04-02
17   2007-03-01
18   2006-05-04
19   2008-09-10
20   1997-06-26
21   2002-05-03
22   1998-02-24
23   2012-11-26
24   2019-02-02
25   1997-11-02
26   1991-04-05
27   1996-07-04
28   2002-10-25
29   2005-09-28
30   2016-10-26
31   1996-02-18
32   1991-04-05
33   1997-10-09
34   2006-08-18
35   1994-07-11
36   1993-10-03
37   1997-07-28
38   2010-01-04
39   1997-04-01
40   2009-11-07
41   2012-10-13
42   1991-10-15
43   2001-08-27
44   2010-12-07
45   1993-07-06
46   2008-05-19
47   1995-01-08
48   1996-11-23
49   2020-04-13
Name: Day First, dtype: datetime64[ns]

#### Now, write code that handles the year-first column and makes a new column, Date, that is the date in a datetime format. Year-first does not require any special arguments.

In [16]:
# your code goes here


What about reassembling datetimes from components? The dataframe below (`separate_frame`) has three pieces of data that, together, could be the year, month, and day of a datetime object.

In [17]:
separate_frame = pd.DataFrame()
separate_frame['Intake Year'] = date_frame['Intake'].dt.year
separate_frame['Intake Month'] = date_frame['Intake'].dt.month
separate_frame['Intake Day'] = date_frame['Intake'].dt.day
separate_frame['Intake Hour'] = date_frame['Intake'].dt.hour
separate_frame.head()

Unnamed: 0,Intake Year,Intake Month,Intake Day,Intake Hour
0,1990,6,4,4
1,2014,2,5,7
2,1990,4,2,23
3,1992,11,4,11
4,2019,1,14,17


It turns out that `pd.to_datetime` can also convert a selection of columns into a pandas Series of datetime objects if the column names match keyword arguments, like "year", or "Month". (Capitalization, in this particular case, does not matter.) Minimally, you need a year, a month, and a day.

The code below asks for Intake Year, Intake Month, and Intake Day and renames them "year", "month", and "day". This is NOT an in-place operation, because this new object is meant to be temporary.

In [18]:
separate_frame[['Intake Year', 'Intake Month', 'Intake Day']].rename(columns={'Intake Year': 'year',
                                                                              'Intake Month': 'month',
                                                                             'Intake Day': 'day'})

Unnamed: 0,year,month,day
0,1990,6,4
1,2014,2,5
2,1990,4,2
3,1992,11,4
4,2019,1,14
5,1993,1,22
6,2012,7,6
7,1996,4,6
8,2008,12,13
9,1999,4,22


Below, we take the whole operation above and enclose it in the parentheses of `pd.to_datetime`.

In [19]:
pd.to_datetime(separate_frame[['Intake Year', 'Intake Month', 'Intake Day']].rename(columns={'Intake Year': 'year',
                                                                              'Intake Month': 'month',
                                                                             'Intake Day': 'day'}))

0    1990-06-04
1    2014-02-05
2    1990-04-02
3    1992-11-04
4    2019-01-14
5    1993-01-22
6    2012-07-06
7    1996-04-06
8    2008-12-13
9    1999-04-22
10   1995-10-07
11   1992-01-22
12   2010-06-01
13   2007-04-23
14   1997-12-08
15   2019-01-02
16   2009-05-28
17   1993-12-05
18   1996-07-18
19   2012-11-15
20   2012-05-11
21   1992-07-10
22   2004-01-28
23   1998-03-09
24   1990-04-12
25   2010-03-07
26   1993-01-01
27   2016-10-20
28   2006-02-10
29   1995-07-08
30   1995-08-02
31   2016-05-07
32   2018-10-19
33   2006-03-25
34   2000-05-02
35   1997-08-05
36   2020-01-23
37   2018-07-08
38   2014-01-11
39   1995-04-26
40   2020-06-21
41   2002-03-23
42   2003-10-13
43   2013-05-21
44   2016-04-14
45   2013-02-27
46   2016-10-10
47   1992-11-12
48   2016-04-22
49   1992-11-01
dtype: datetime64[ns]

If that were assigned to a column we would have a new, assembled, datetime column.

#### Below, make a new column in the separate_frame dataframe that is the datetime object that represents all the columns together.

In [None]:
# your code goes here


And that's it. While dates and times can be very tricky if you treat them like numbers, converting them to datetime objects can make them easy to deal with.