---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Pandas</h1>

## _14-Handling DateTime.ipynb_

## Learning agenda of this notebook
The importance of Date and time features in Data Science can be seen in the domains of sales, marketing, finance, HR, E-commerce and many more. In order to answer following questions in data science, one needs to have a clear understanding of handling datetime in Python:
- How the stock markets will behave tomorrow?
- How many products will be sold in the upcoming week?
- When is the best time to launch a new product?
- How long before a position at the company gets filled?

1. Recap of Python Modules related to date and time
    - Python Time module
    - Python Datetime module
2. Overview of DateTime Series in Pandas
    - Parsing a string column containing datetime mannually
    - Change a string column to Datetime while Reading the CSV File
    - Change a string column to Datetime in Dataframe using `pd.to_datetime()` method
3. Dealing with DateTime Series in Pandas

## 1. Recap of Python Modules Related to Tate and Time

## a. Python Time Module
- Python Time module is principally for working with UNIX time stamps; expressed as a floating point number taken to be seconds since the unix epoch (00:00:00 UTC on 1 January 1970)

**Use `dir()` to get the list of methods in the Python `time` module**

In [1]:
import time
print(dir(time))

['_STRUCT_TM_ITEMS', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'altzone', 'asctime', 'ctime', 'daylight', 'get_clock_info', 'gmtime', 'localtime', 'mktime', 'monotonic', 'monotonic_ns', 'perf_counter', 'perf_counter_ns', 'process_time', 'process_time_ns', 'sleep', 'strftime', 'strptime', 'struct_time', 'time', 'time_ns', 'timezone', 'tzname', 'tzset']


**The `time.time()` method returns the current time in seconds since UNIX Epoch (00:00:00 UTC on 1 January 1970)**

In [2]:
seconds = time.time()
seconds

1638686542.7508328

**Getting number of seconds elapsed since UNIX epoch from the command line**

In [3]:
!date +%s

1638686542


**The `time.ctime(seconds)` function takes seconds passed since epoch as argument and returns a string representing local time**

In [4]:
# The UNIX Epoch in system local time is five hours ahead of mid night 1st Jan 1970
# (Coordinated Universal Time a successor to Greenwich Mean Time)
dtg1 = time.ctime(0)
dtg1

'Thu Jan  1 05:00:00 1970'

In [5]:
seconds = time.time()
dtg2 = time.ctime(seconds)
dtg2

'Sun Dec  5 11:42:22 2021'

In [6]:
#Get time using shell command
!date

Sun Dec  5 11:42:22 PKT 2021


## b. Python Datetime Module
The `datetime` module can support many of the same operations as `time` module, but provides a more object oriented set of types, and also has some limited support for time zones as well.

In [7]:
# use dir() to get the list of complete functions in datetime module
import datetime
print(dir(datetime))

['MAXYEAR', 'MINYEAR', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'date', 'datetime', 'datetime_CAPI', 'sys', 'time', 'timedelta', 'timezone', 'tzinfo']


**(i) The `datetime.date(year, month, day)` method is used to create any random date**

In [8]:
import datetime
d1 = datetime.date(2021,10,9)
print(d1)
print(type(d1))

2021-10-09
<class 'datetime.date'>


**(ii) The `datetime.datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])` method is used to create any random date, along with time**

In [9]:
dtg = datetime.datetime(2021,12,31)
print(dtg)
print(type(dtg))

2021-12-31 00:00:00
<class 'datetime.datetime'>


In [10]:
print(datetime.datetime(2021, 12, 31, 4, 30, 54, 678))

2021-12-31 04:30:54.000678


**(iii) The `datetime.datetime.today()` and `datetime.datetime.now()` methods are used to fetch the current date and time**

In [11]:
print(datetime.datetime.today())

2021-12-05 11:42:23.048861


In [12]:
print(datetime.datetime.now())

2021-12-05 11:42:23.055799


**(iv)  The `time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) ` methods returns a time object. All arguments are optional**

In [13]:
t1 = datetime.time(10, 15)
print(t1)
print(type(t1))

10:15:00
<class 'datetime.time'>


**(v) Let us explore some commonly used attributes related with the `<class 'datetime.time'>`.**
- `dtg.year:` returns the year
- `dtg.month:` returns the month
- `dtg.day:` returns the date
- `dtg.hour:` returns the hour
- `dtg.min:` returns the minutes

In [14]:
dtg = datetime.datetime(2021, 12, 31, 4, 30, 54, 678)
print(dtg)
print(type(dtg))
print(dtg.month)

2021-12-31 04:30:54.000678
<class 'datetime.datetime'>
12


In [15]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv")
df

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,12/31/2000 23:00
18237,Spirit Lake,,DISK,IA,12/31/2000 23:00
18238,Eagle River,,,WI,12/31/2000 23:45
18239,Eagle River,RED,LIGHT,WI,12/31/2000 23:45


In [16]:
df.dtypes

City               object
Colors Reported    object
Shape Reported     object
State              object
Time               object
dtype: object

**You can observe that the datatype of `Time` column is object, i.e., it is stored as string as `month/date/year hr:min`**

### b. Extract Information from  a Datetime Column Stored as String
- Let us try to slice substrings from each element from the `Time` Series using `df.Time.str.slice(start=None, stop=None, step=None)` method

In [17]:
df.Time.str.slice(-5,-3)

0        22
1        20
2        14
3        13
4        19
         ..
18236    23
18237    23
18238    23
18239    23
18240    23
Name: Time, Length: 18241, dtype: object

In [18]:
df.Time.str.slice(-5,-3).astype(int)

0        22
1        20
2        14
3        13
4        19
         ..
18236    23
18237    23
18238    23
18239    23
18240    23
Name: Time, Length: 18241, dtype: int64

- **So we got the hours. This is a cumbersome as well as error prone approach to extract all the components of datetime from this entire column**
- **Let us change the `Time` column datatype to Pandas datetime**

### b. Change Datatype of `Time` Column While Reading the CSV File
- The `pd.read_csv()` function provides with `parse_dates` parameter, which converts the specified column into `datetime64` type
- Let us see this in practice

In [19]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv", parse_dates=['Time'])
df

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


In [20]:
df.dtypes

City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object

In [21]:
type(df.Time[0])

pandas._libs.tslibs.timestamps.Timestamp

### c. Change Datatype of `Time` Column in Dataframe using `pd.to_datetime()` method
- If we have already read the csv file to a dataframe and want to change the `Time` column containing string data to Pandas datetime using `pd.to_datetime()` method, which convert argument to datetime
```
pd.to_datetime(arg)
```
Where arg can be int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like object to convert to a datetime.
You can check for other arguments using help() function

In [22]:
pd.to_datetime('Feb 19, 2021')

Timestamp('2021-02-19 00:00:00')

Note the format is `'year-month-date'`

In [23]:
pd.to_datetime('19/02/2021')

Timestamp('2021-02-19 00:00:00')

In [24]:
pd.to_datetime('19-02-2021')

Timestamp('2021-02-19 00:00:00')

In [25]:
# to_datetime() function will convert all these different formats into a common format
dates = ['2021-01-05', 'Jan 5, 2021', '01/05/2021', '2021.01.05', '2021/01/05','20210105']
pd.to_datetime(dates)

DatetimeIndex(['2021-01-05', '2021-01-05', '2021-01-05', '2021-01-05',
               '2021-01-05', '2021-01-05'],
              dtype='datetime64[ns]', freq=None)

Seems to be working fine (with month in the middle). Let us try storing  6 March, 2021 as '06/03/2021'

In [26]:
pd.to_datetime('06/03/2021')

Timestamp('2021-06-03 00:00:00')

**Oops!, it has interpreted it as 3 June 2021**
You can see different datetime formates for better understanding. But for the time being use the format `month/date/year`. However, in the current scenario our dataset, the `Time` column has stored the date as string in year-month-date sequence

In [27]:
# you can also passed customized format to it and generate the same format as previous using format parameter
pd.to_datetime('2021$01$05', format='%Y$%m$%d')

Timestamp('2021-01-05 00:00:00')

In [28]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv")
df.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [29]:
df.dtypes

City               object
Colors Reported    object
Shape Reported     object
State              object
Time               object
dtype: object

In [30]:
df['Time'] = pd.to_datetime(df.Time)

In [31]:
df.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [32]:
df.dtypes

City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object

**You can explore different attributes/methods of datetime series**

`Series.dt.year`: Returns year of datetime object

`Series.dt.month`: Returns year of datetime object

`Series.dt.day`: Returns year of datetime object

`Series.dt.hour`: Returns year of datetime object

`Series.dt.minute`: Returns year of datetime object

`Series.dt.dayofweek`: Returns 0-6, Sunday is taken as 0


`Series.dt.day_name()`: Returns name of the day as string

`Series.dt.month_name()`: Returns month as string

For details Read: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html

In [33]:
df.Time.dt.hour

0        22
1        20
2        14
3        13
4        19
         ..
18236    23
18237    23
18238    23
18239    23
18240    23
Name: Time, Length: 18241, dtype: int64

In [34]:
df.Time.dt.day_name()

0         Sunday
1         Monday
2         Sunday
3         Monday
4        Tuesday
          ...   
18236     Sunday
18237     Sunday
18238     Sunday
18239     Sunday
18240     Sunday
Name: Time, Length: 18241, dtype: object

In [35]:
df.Time.dt.month_name()

0            June
1            June
2        February
3            June
4           April
           ...   
18236    December
18237    December
18238    December
18239    December
18240    December
Name: Time, Length: 18241, dtype: object

In [36]:
df.Time.dt.day_name()

0         Sunday
1         Monday
2         Sunday
3         Monday
4        Tuesday
          ...   
18236     Sunday
18237     Sunday
18238     Sunday
18239     Sunday
18240     Sunday
Name: Time, Length: 18241, dtype: object

## 3. Dealing with DateTime Series in Pandas

In [37]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv")
df['Time'] = pd.to_datetime(df.Time)
df

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


**Suppose I want to display only those UFO sightings that has been seen after 1st January 1995**

In [38]:
# Create a datetime object to be used for comparison
ts = pd.to_datetime('1995/03/24')
df.loc[df.Time >= ts, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
7948,North Dade,,,FL,1995-03-24 01:27:00
7949,Las Vegas,,,NV,1995-03-24 05:00:00
7950,Las Vegas,,,NV,1995-03-24 05:00:00
7951,Grover Beach,,,CA,1995-03-25 00:00:00
7952,Monterey,,,CA,1995-03-25 00:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


**Suppose I want to display only those UFO sightings that has been seen between 1st March 1995 and 06 March 1995**

In [39]:
# Create a datetime object to be used for comparison
ts1 = pd.to_datetime('1995/03/1')
ts2 = pd.to_datetime('1995/03/7')
df.loc[(df.Time >= ts1) & (df.Time <= ts2), :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
7860,Greenville,,LIGHT,IL,1995-03-01 21:00:00
7861,Sedalia,,,MO,1995-03-01 21:00:00
7862,Redmond,RED,,WA,1995-03-02 22:30:00
7863,Prescott Valley,,OVAL,AZ,1995-03-04 00:00:00
7864,Folsom,,,NJ,1995-03-04 16:32:00
7865,Anaheim,,OTHER,CA,1995-03-05 12:00:00
7866,Columbus,,,OH,1995-03-06 00:55:00
7867,Hilltop,,,NJ,1995-03-06 19:00:00
7868,Florence,,,OR,1995-03-06 19:10:00
7869,Mountain City,,,TN,1995-03-06 19:45:00


**Suppose I want to display the record of the maximum date under the `Time` column**

In [40]:
ts = df.Time.max()
ts

Timestamp('2000-12-31 23:59:00')

In [41]:
df.loc[df.Time == ts]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
18240,Ybor,,OVAL,FL,2000-12-31 23:59:00


**Suppose I want to display the oldest record as per the `Time` column**

In [42]:
ts = df.Time.min()
ts

Timestamp('1930-06-01 22:00:00')

In [43]:
df.loc[df.Time == ts]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00


**Suppose I want to check out the difference between the oldest and the newest record as per the `Time` column**

In [44]:
td = df.Time.max() - df.Time.min()
print(td)
print(type(td))

25781 days 01:59:00
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>


# Summary:
Pandas captures 4 general time related concepts:
- **Date times:** A specific date and time with timezone support. Similar to `datetime.datetime` from the standard library. To create this object we can use `to_datetime()` or `date_range()` methods
- **Time deltas:** An absolute time duration. Similar to `datetime.timedelta` from the standard library. To create this object we can use `to_timedelta()` or `timedelta_range()` methods
- **Time spans:** A span of time defined by a point in time and its associated frequency. To create this object we can use `Period()` or `period_range()` methods
- **Date offsets:** A relative time duration that respects calendar arithmetic. To create this object we can use `DateOffset()` method

**Read Documentation for details:** 
https://pandas.pydata.org/docs/user_guide/timeseries.html#overview