# **WORKING WITH DATES AND TIMES IN PYTHON**

- In Python, it is necessary to understand how to analyse and sort dates and times especially in complex situations. Most date/time information include:

    - Weather information
    
    - Computer Logs of timestamps
    
    - Sales data including the date/time range
    
    
- In subsequent tasks and as you progress in your learning in Python, you would be working with more complex dates. You would work with complex time/date format such as:

    - A date formatted as February 2, 1998
    
    - 24-hour or 12 hour time structures
    
    - Addition and subtraction across date/time boundaries.
    
    
- There is a functionality in Python that allows us to execute all the operations regardless of how complex they might appear.


- For this lesson, we would be making use of the archives containing the [records of visitors](https://raw.githubusercontent.com/Tess-hacker/WORKING-WITH-COMPLEX-DATE-TIME-DATASET-IN-PYTHON/master/potus_visitors_2015.csv) who were at the White House and met with the President in the year 2015. The purpose of this record is to document all appointments for all White House visitors excluding staff members and people not categorized as visitors.


- The following details are available - an explanation of the columns - for you to understand the dataset:

    - *name*: The name of the visitor.

    - *appt_made_date*: The date and time that the appointment was created.
    
    - *appt_start_date*: The date and time that the appointment was scheduled to start.

    - *appt_end_date*: The date and time that the appointment was scheduled to end.
    
    - *visitee_namelast*: The last name of the visitee (the person the visitor was meeting with).
    
    - *visitee_namefirst*: The first name of the visitee.
    
    - *meeting_room*: The room in which the appointment was scheduled.

    - *description*: Optional comments added by the WAVES operator.
    
- **WAVES** means the **Workers and Visitors Entry System** used for scheduling appointments for all White House Visitors.


## **LEARNING OBJECTIVES**


- At the end of this lesson, we would have learnt:

    - How to calculate the month in which the White House had the most visitors.
    
    - How to calculate the time of the day during the year the White House was visited the most.
    
    - How to calculate the summary statistics on **visit length** and **how early visits were booked ahead of time**.
    
    - How to create a clean and well-formatted summary of daily visits during the year.
    
    
- Ready? LET'S HAVE FUN!

In [1]:
# The first step is to import our dataset using the csv function
# this method is for those using the raw dataset on github
import csv
from csv import reader
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
url = 'https://raw.githubusercontent.com/Tess-hacker/WORKING-WITH-COMPLEX-DATE-TIME-DATASET-IN-PYTHON/master/potus_visitors_2015.csv'
potus_csv = pd.read_csv(url,sep = ',', error_bad_lines=False)
# print (potus[1:10])
print (potus_csv[:10])


                     name       appt_made_date appt_start_date appt_end_date  \
0       Joshua T. Blanton  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
1         Jack T. Gutting  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
2       Bradley T. Guiles  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
3          Loryn F. Grieb  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
4        Travis D. Gordon  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
5         Taylor D. Gibbs  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
6       Dameriah A. Smith  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
7  Dylan S. Hopkinstaylor  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
8      Joseph S. Barbaria  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   
9    Jonathan L. Buckland  2014-12-18T00:00:00     1/6/15 9:30  1/6/15 23:59   

  visitee_namelast visitee_namefirst meeting_room  \
0              NaN             potus    west wing   
1            

In [2]:
#this is the method applicable if you have the dataset available on your computer. I'll be referring to this import subsequently.
from csv import reader
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import unicodecsv as csv
opened_file = open(r'potus_visitors_2015.csv', 'r', encoding='utf-8')
read_file = reader(opened_file)
potus =list(read_file)
potus= potus[1:] #we have eliminated the header
print (potus[:10])

[['Joshua T. Blanton', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Jack T. Gutting', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Bradley T. Guiles', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Loryn F. Grieb', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Travis D. Gordon', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Taylor D. Gibbs', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Dameriah A. Smith', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard'], ['Dylan S. Hopkinstaylor', 

- In Python, to make use of a module, we have to import them into our workspace. A module or package simply contains variables, functions and classes that can be imported into a Python script.


- There are several ways through which we can import these modules into our Python script. Let's examine them below:

    -   **By Name**: We can import modules using their names. This is, by far, the most commonly used method for importing modules. This is shown as follows:
    
         `import csv`
        
         `csv.reader`
         
         
-   **Import by Alias**: We can import a whole module using an alias. This is also a very common method of importing modules in Python. This is shown as:

    `import pandas as pd`

    `import numpy as np`



-   **Import Definitions from Module by Name**: This is the approach of importing definitions from a module by specifying the name of the definition we want.
    
    `from pandas import Series, DataFrame` - multiple definitions
        
    `from csv import reader` - single definition



-   **Import all Definitions using a Wildcard**: In the case where you need to use as many definitions as possible contained within a module, this is a very efficient approach. It is done as follows:
    
     `from csv import *`
        
     `reader()`
        
     `writer()`
        
     `get_dialect()`
     
     
- Critical attention should be paid to the fact that all the methods of importing modules specified above have their upside(s) and downside(s). We need to understand the purpose of each and every approach and the effect it would have on our codes' readability before we employ any of the approaches.

    - If we're importing a long-name module by its name and using it often, our code can become harder to read.

    - If we use an uncommon alias, it may not be clear in our code which module we are using.
    
    - If we use the specific definition or wildcard approach, and the script is long or complex, it may not be immediately clear where a definition comes from. This can also be a problem if we use this approach with multiple modules.

    - If we use the specific definition or wildcard approach, it's easier to accidentally overwrite an imported definition.
    


- Therefore, it is important that you choose wisely before you embark on any of the approaches.

## **PYTHON DATETIME MODULE**

- The `datetime()` module in Python is used to work with relevant datasets - involving complex time and date exercises. The module contains several classes specifically:

   `datetime.datetime` - This class is used for working with both **date and time data**.
    
    
   `datetime.time` - This class is used when we are working with **time-related data only**.
    
    
   `datetime.timedelta` - This is the class used when we are **representing time periods** within our dataset.
   
   
- We can go ahead and choose the way to import the `datetime` module as well as the definition that would be needed for analysing our dataset.


- The `datetime.datetime` function is the most commonly used class from the `datetime` module because of its capability and diversity. In this function, there are **six arguments** namely (in order of structure):

    - Year (required)
    
    - Month (required)
    
    - Day (required)
    
    - Hour (Optional)
    
    - Minute (Optional)
    
    - Second (Optional)
    
    
- Let us put the function to good use by applying it below:

In [3]:
import datetime as dt #using the alias import method
ibm_founded = dt.datetime(1911, 6, 16)
man_on_moon = dt.datetime(1969, 7, 20, 20, 17)
print (ibm_founded)
print ('\n')
print (man_on_moon)

1911-06-16 00:00:00


1969-07-20 20:17:00


## **DATETIME.STRPTIME CONSTRUCTOR**

- In our dataset, we have the 2nd, 3rd and 4th columns containing the date/time data. Because this is the area of focus in this lesson, we need to examine these columns carefully and determine the structure and nature of the date/time information contained within the columns.


- However, we would need more than just looking at our dataset to determine their nature because for the date, we do not know the arrangement format - either dd/mm/yr or mm/dd/yr - and for the time, we need to know whether it is structured in the 12-hour or 24-hour format.


- To be sure of the structure of the information contained within the columns, we need to print a substantial number of the columns in the dataset. Let us do this:

In [4]:
print (potus[-1][2])

12/18/15 16:30


- The above information shows us that our dates are structured in the month/date/year format and the time is formatted using the 24-hour time structure.


- So, we can now make use of our previous knowledge to split the values into datetime objects and converting the variables into numeric formats. We would ordinarily use the manual method of splitting the strings but this is where the `datetime.strptime()` function comes in handy. In the lesson titled [Object Oriented Python](https://github.com/Tess-hacker/Object-Oriented-Python), you were taught that the `__init__()` method is known sometimes as the constructor. 


- The classes that exist within a module can contain several constructors that can be used to acheive certain actions within your codes. Similarly, the `datetime` class has a constructor which we can use to split our variables accordingly.


- The `datetime.strptime` constructor contains a class/ special syntax system called the `strftime()` whose function is to describe date/time formats. The syntax uses format codes such as the '%' character followed by a single character to specify the date and/or time format which we want.


- The first argument of the `datetime.strptime` constructor is **the string that we want to parse** while the second is **a string that specifies the format.** Let me show you a demonstration:

In [5]:
parse = dt.datetime.strptime("18/12/15","%d/%m/%y") #where 'd','m', and 'y' stand for date, month and year respectively.
print(type(parse))
print (parse)

<class 'datetime.datetime'>
2015-12-18 00:00:00


- **Pay attention** that if you pass a string which is not arranged in the format which you have passed in the second argument, then, an error will be returned. So, be sure to ascertain that your string before you run the codes.  Let's demonstrate this below:

In [6]:
wrong_parse = dt.datetime.strptime("12/18/15","%d/%m/%y") #where 'd','m', and 'y' stand for date, month and year respectively.
print(type(wrong_parse))
print (wrong_parse)

ValueError: time data '12/18/15' does not match format '%d/%m/%y'

- In the case where we want to replace the separator in the second argument of the function from a slash(/) to a dash(-), we can totally do that and we would get the same result. Where we enter the first argument in a different format - say month/day/year - we would still get a result in the year/month/date format.


- There are also other formats through which this function can be used and you can find its [documentation here](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior). This is also shown below:

| **Strftime Code** | **Meaning** | **Examples** |
| --- | --- | --- |
| `%d` | Day of the month as a zero-padded number | `04` |
| `%A` | Day of the week as a word | `Monday` |
| `%m` | Month as a zero-padded number | `09` |
| `%Y` | Year as a four-digit number | `1901` |
| `%y` | Year as a two-digit number with zero-padding | `01` (2001) |
|  |  | `88` (1988) |
| `%B` | Month as a word | `September` |
| `%H` | Hour in 24 hour time as zero-padded number | `05` (5 a.m.) |
|  |  | `15` (3 p.m.) |
| `%p` | a.m. or p.m. | `AM` |
| `%I` | Hour in 12 hour time as zero-padded number | `05` (5 a.m., or 5 p.m. if `AM`/`PM` indicates otherwise) |
| `%M` | Minute as a zero-padded number | `07` |


- For rows 1,3,5,7,9, and 10: The `strptime` parser will parse non-zero padded numbers without raising an error.


- For rows 2,6,and 8: The Date parts containing words will be interpreted using the locale settings on your computer, so `strptime` won't be able to parse "febrero" ("February" in Spanish) if your locale is set to an English language locale.


- For row 5: The year values from 00-68 will be interpreted as 2000-2068 while values from 70-99 are interpreted as 1970-1999.


- Now, let us make use of this technique to convert the date values in the second column of the dataset (`app_start_date`) to datetime objects:

In [8]:
strptime_format = "%m/%d/%y %H:%M" #the strptime format which our string is structured 
for row in potus: # iterate over the dataset
    app_start_date = row[2] # assign the target row to a variable
    app_start_date = dt.datetime.strptime(app_start_date, strptime_format)
    row[2] = app_start_date
print (app_start_date)

2015-12-18 16:30:00


- The `datetime` class has certain attributes which allows us to extract certain datetime information from our dataset. These attributes include:

    - `datetime.day` : To extract the day of the month.
    
    - `datetime.month` : To extract the month of the year.
    
    - `datetime.year` : To extract the year itself.
    
    - `datetime.hour` : To extract the hour of the day.
    
    - `datetime.minute` : To extract the minute of the hour.
    
    
- If we wanted to replicate a datetime format similar to the one we just created using the `strptime` function, we would need to use these attributes. However, having to write a code to construct the datetime string using these attributes individually would be inefficient which is why the `strptime` function comes in handy.


- Kindly note that it is easy to confuse the `strftime` constructor with the `strptime` function. So, an efficient way to address and avoid this occassion is to differentiate them as follows:

    - `strptime` is a combination of 'str-P-time' meaning **string parse time**
    
    - `strftime` is a combination of 'str-F-time' meaning **string format time**
    
    
- With the `strftime` constructor, we can format our datetime variables using any of the formats shown in the table above. Now, let us create a formatted frequency table in order to analyse the appointment dates shown in the dataset. With our result, we would be able to **ascertain the number of appointments that fell within each month period for the year**.

In [9]:
strftime_format = "%B, %Y"
visitors_per_month = {}
for row in potus:
    app_start_date = row[2]
    app_start_date = app_start_date.strftime(strftime_format)
    if app_start_date not in visitors_per_month:
        visitors_per_month[app_start_date] = 1
    else:
        visitors_per_month[app_start_date] += 1
print (visitors_per_month)

{'January, 2015': 1248, 'February, 2015': 2165, 'March, 2015': 2262, 'April, 2015': 4996, 'May, 2015': 3013, 'June, 2015': 7743, 'July, 2015': 2930, 'August, 2015': 1350, 'September, 2015': 4416, 'October, 2015': 3669, 'November, 2015': 1133, 'December, 2015': 13029}


## **THE TIME CLASS**


- As discussed earlier, I stated that the `datetime` class contains characteristics/functions that hold values for dates and times. However, in the case of the `time` class, it only holds the time data i.e. minutes, seconds, microseconds and hours. 


- To instantiate the `time` class, we use: `datetime.time()` syntax. The class arguments are arranged in order as follows:

    - *hour*
    
    - *minute*
    
    - *second*
    
    - *microsecond*
    
    
- So, if we instantiate the class without passing in the arguments in their specified order, they are automatically placed according to the fundamental order.


- We can also create a `time` object from a `datetime.datetime` function using the syntax `datetime.datetime.time()`. 


- Be informed that the `time` class does not have the `strptime()` constructor but where this constructor is needed, we can use the `datetime.strptime` function and then convert it to a `time` object after. 


- Just like the `datetime` class, the `time` class contains similar functions of extracting individual values from the `time` class syntax. For instance, we can write the function: `time.hour()`, `time.minute()` and so on. 


- Let us now go ahead and apply this to the *potus* data.

In [12]:
appt_times = []
for row in potus:
    datetime_variable = row[2]
    time_object = datetime_variable.time()
    appt_times.append(time_object)
print (appt_times[:30]) # reducing our output for efficiency sake

[datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30)]


### **Comparing `Time` Objects**


- Another good feature of the `time` class is that we can compare time objects with one another to know the maximum and minimum values. Since the class supports comparisons, the in-built Python functions of `min()` and `max()` are compatible with this class.


- We would now assign the minimum and maximum time values which we have created to specific variables. This will enable us to calculate the **earliest appointment time** and **latest appointment time** for all the visits to the potus.

In [14]:
min_appointmenttime = min(appt_times)
max_appointmenttime = max(appt_times)
print (min_appointmenttime)
print (max_appointmenttime)

06:00:00
21:30:00


## **CALCULATIONS WITH DATES AND TIMES**


- Like I stated above, the `time()` objects support comparisons; so do the `datetime()` objects. This means that the mathematical comparison operators such as `< > + - =` and so on can be used with the `datetime()` objects. However, we find out that using these operators directly could return an error. 


- In the case where we use the `-` operator, the result type we get we get is the `datetime.timedelta` object. Recall that this is one of the objects under the `datetime` class which I have listed in the introductory part of the lesson. 


- The `timedelta()` object represents **a period of time** compared with other classes which only refer to a peculiar moment in time. To instantiate the `timedelta` type, the following syntax is used:

    `datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)`


- Since the syntax does not follow a conventional format, it is important to specify which argument we are assigning a value to within the function to avoid cases of syntax errors. 


- Below are the different operations possible between the `datetime()` and the `timedelta()` objects:

| **Operation** | **Explanation** | **Resultant Type** |
| --- | --- | --- |
| `datetime-datetime` | Calculate the time between two specific dates/times | timedelta |
| `datetime-timedelta` | Subtract a time period from a date or time | datetime |
| `datetime+timedelta` | Add a time period to a date or time | datetime |
| `timedelta+timedelta` | Add two periods of time together | timedelta |
| `timedelta-timedelta` | Calculate the difference between two time periods | timedelta |

- *Note that the `datetime` object in the table can be substituted with the `time` objects*.


- Let us experiment with this syntax in the following codes:

In [17]:
five_days = dt.timedelta(5)
print ('The resulting number of days is:')
print (five_days)
print ('\n')
ten_weeks = dt.timedelta(weeks = 10)
print ('The resulting number of weeks is:')
print (ten_weeks)
print ('\n')
two_hours_thirty_minutes = dt.timedelta(hours = 2, minutes = 30)
print ('The resulting number of hours and minutes is:')
print (two_hours_thirty_minutes)
print ('\n')
# we can also use the timedelta to add or subtract time from datetime objects. Let's see this below:
date1 = dt.date(2000, 10, 21)
date1_plustwoweeks = date1 + dt.timedelta(weeks= 2)
print ('The weeks increased by 2 weeks is:')
print (date1_plustwoweeks)
print ('\n')

dt_1 = dt.datetime(1981, 1, 31)
dt_2 = dt.datetime(1984, 6, 28)
dt_3 = dt.datetime(2016, 5, 24)
dt_4 = dt.datetime(2001, 1, 1, 8, 24, 13)
dt_2minus_dt_1 = dt_2 - dt_1
dt_3plus_fiftysixdays = dt_3 + dt.timedelta(days = 56)
dt_4minus_3600seconds = dt_4 - dt.timedelta(seconds = 3600)
print ('The subtraction of dt_2 from dt_1 is:')
print (dt_2minus_dt_1)
print ('\n')
print ('The addition of 56 days to dt_3 is:')
print (dt_3plus_fiftysixdays)
print ('\n')
print ('The subtraction of 3600 seconds from dt_4 is:')
print (dt_4minus_3600seconds)

The resulting number of days is:
5 days, 0:00:00


The resulting number of weeks is:
70 days, 0:00:00


The resulting number of hours and minutes is:
2:30:00


The weeks increased by 2 weeks is:
2000-11-04


The subtraction of dt_2 from dt_1 is:
1244 days, 0:00:00


The addition of 56 days to dt_3 is:
2016-07-19 00:00:00


The subtraction of 3600 seconds from dt_4 is:
2001-01-01 07:24:13


## **SUMMARIZING APPOINTMENT LENGTHS**


- The final task under this lesson is to calculate summarized values of the appointment times of each visitor to the potus using the `datetime` function and the types contained under it.


- Our task is to create a frequency table using a for loop to get the meeting length of each visitor and invariably use the `min` and `max` functions to detect the values for the shortest and longest appointment times.

In [36]:
# #  first, we need to create a code which converts the appointment end date to a datetime object
# for row in potus:
#     end_date = row[3]
#     end_date = dt.datetime.strptime(str(end_date), "%Y/%m/%d %H:%M:%S")
#     row[3] = end_date
appointment_lengths = {} #  now we create an empty dictionary for the frequency table
for row in potus: # Now we loop over each rows to assign our individual values to the empty dictionary
    start_date = row[2]
    end_date = row[3]
    length = end_date - start_date
    if length not in appointment_lengths:
        appointment_lengths[length] = 1
    else:
        appointment_lengths[length] += 1
min_length = min (appointment_lengths)
max_length = max (appointment_lengths)
print (appointment_lengths)
print (min_length)
print (max_length)

{datetime.timedelta(seconds=52140): 1213, datetime.timedelta(seconds=50340): 1543, datetime.timedelta(seconds=48540): 696, datetime.timedelta(seconds=46740): 681, datetime.timedelta(seconds=44940): 357, datetime.timedelta(seconds=41340): 1115, datetime.timedelta(seconds=53940): 511, datetime.timedelta(seconds=17940): 301, datetime.timedelta(seconds=47040): 2, datetime.timedelta(seconds=43140): 1041, datetime.timedelta(seconds=39540): 1548, datetime.timedelta(seconds=37740): 5897, datetime.timedelta(seconds=35940): 996, datetime.timedelta(seconds=34140): 921, datetime.timedelta(seconds=21540): 8173, datetime.timedelta(seconds=30540): 2855, datetime.timedelta(seconds=28740): 2027, datetime.timedelta(seconds=32340): 862, datetime.timedelta(seconds=49140): 12, datetime.timedelta(seconds=24240): 103, datetime.timedelta(seconds=44040): 39, datetime.timedelta(seconds=42840): 6, datetime.timedelta(seconds=35040): 119, datetime.timedelta(seconds=32640): 13, datetime.timedelta(seconds=42240): 32

## **CONCLUSION**

- So far, we have learnt how to work on complex operations in the time and date ambit of the appointments with the potus.


- Now, you should go and practice further and see your possessed skillset.