# Week 6 Munging Continued - Date Time Datatype and Integrative Lab 

Week 6 reading: **Pandas for Everyone** chapter 11 (pages 213 - 240)

Outline: 

* Chapter 11 - Datetime
    1. Python datetime calculations

    
* Lab



## Overview

Working with dates and times is difficult. 

**Quick quiz:** How many time zones does the United States have?
1. 4 (Pacific, Mountain, etc.)
2. 6 
3. 8
4. 11

The reflex answer for most of us is #1 - four time zones (Pacific, Mountain, Central, Eastern), but that answer is wrong!

From [timeanddate.com](https://www.timeanddate.com/time/zone/usa):

>**How Many Time Zones Are There in the US?**

>There are 9 time zones by law in the USA and its dependencies, however, adding the time zones of 2 uninhabited US territories gives **11 time zones in total** \[emphasis mine\]. The contiguous US has 4 standard time zones. In addition Alaska, Hawaii, and 5 US dependencies all have their own time zones. As neither Hawaii nor the 5 dependencies use Daylight Saving Time (DST), there are only 6 corresponding DST time zones.

In general, times and time zones are a **nightmare** for programers. For example, a few years ago I worked for a company that created and sold software to hospitals for patient tracking and charting. With that thought in mind, consider this:

1. Many, if not all, hospital orders (medications, therapy, lab tests & results) are **EXTREMELY** time sensitive. In many cases, **_life or death_**-type time sensitive.
2. During the change to Daylight Savings time in the spring, **almost an entire hour doesn't exist** (from 2:01 a.m. to 2:59 a.m.). What happens to orders that were written for "every four hours" and it falls on 2:28 a.m.?
3. During the change from DST to Standard Time, **almost an entire hour exists twice**.

The answer at that time (more than a decade ago), was to literally _turn off the application_ for the time-change hour, go back to charting everything by hand, then have the clinicians enter all the charting later. 

Since **Pandas for Everyone** does a good job of highlighting Pandas datetime functionality, the remainder of this *From the Expert* will discuss datetime calculations using Python's native datetime library and then finish with topics pertinent to this week's lab.

## 1. Python datetime calculations

The Python datetime library consists of five parts:

* date
* time
* datetime
* timedelta
* tzinfo

The first three are reasonably self-explanatory, but what about `timedelta` and `tzinfo`? The `timedelta` class is used for dealing with time duration between events and `tzinfo` is the basis for working with time zones.

First thing we will do is import the classes we need:

In [12]:
from datetime import date
from datetime import time
from datetime import datetime

Next, we will read the Ebola data file into a list so that we can manipulate the date and day columns. 

In [6]:
ebola = []
with open ('../pandas_for_everyone/data/country_timeseries.csv') as infile:
    for line in infile:
        ebola.append(line)

# The first row is header information
print(ebola[0])

# The next line is actual data
print(ebola[1])

Date,Day,Cases_Guinea,Cases_Liberia,Cases_SierraLeone,Cases_Nigeria,Cases_Senegal,Cases_UnitedStates,Cases_Spain,Cases_Mali,Deaths_Guinea,Deaths_Liberia,Deaths_SierraLeone,Deaths_Nigeria,Deaths_Senegal,Deaths_UnitedStates,Deaths_Spain,Deaths_Mali

1/5/2015,289,2776,,10030,,,,,,1786,,2977,,,,,



Remember, data loaded like this is just one big string. We have to use something like `split()` to get the element we want.

In [8]:
ebola[1].split(',')

['1/5/2015',
 '289',
 '2776',
 '',
 '10030',
 '',
 '',
 '',
 '',
 '',
 '1786',
 '',
 '2977',
 '',
 '',
 '',
 '',
 '\n']

Now, from that split, we want the first (zeroth) element:

In [13]:
# Remember, this line just explores "what if." 
# If I want to save the value, I have to set it = to something.

ebola[1].split(',')[0] 

'1/5/2015'

The function that will convert this to a date is part of the `datetime` class called `strptime()`. This function takes as parameters a string and a **format**. These formatting strings are explained on page 216 of **Pandas for Everyone** and are a 'semi-standard' way to represent time and date string formats.

Let's convert that string date above to a date object. Since we only have a date string to work with, we will chain a `date()` function on the end to give us what we want.

In [17]:
date_string = ebola[1].split(',')[0] 

the_date = datetime.strptime(date_string, '%m/%d/%Y').date()

In [18]:
print(the_date)

2015-01-05


In [19]:
type(the_date)

datetime.date

It is worth noting that on page 218, the book prints the type of an extracted date and gets a `Timestamp` object. In most cases, that `Timestamp` object is equivalent to Python's native `Datetime` object.

In [20]:
the_date.year

2015

In [21]:
the_date.month

1

In [22]:
the_date.day

5

We can even get the day number (0 = Monday, 6 = Sunday) and use it to convert to a text day name:

In [24]:
day_names = {0 : 'Monday',
            1: 'Tuesday',
            2: 'Wednesday',
            3: 'Thursday',
            4: 'Friday',
            5: 'Saturday',
            6: 'Sunday'}


day_number = the_date.weekday()
print(day_number)
print(day_names[day_number])

0
Monday


# Lab Info

TBD