---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.17 (Pandas-09)</h1>

## _Handling DateTime.ipynb_

## Learning agenda of this notebook
The importance of Date and time features in Data Science can be seen in the domains of sales, marketing, finance, HR, E-commerce and many more. In order to answer following questions in data science, one needs to have a clear understanding of handling datetime in Python:
- How the stock markets will behave tomorrow?
- How many products will be sold in the upcoming week?
- When is the best time to launch a new product?
- How long before a position at the company gets filled?

1. Recap of Python Modules related to date and time
    - Python Time module
    - Python Datetime module
2. Overview of DateTime Series in Pandas
    - Parsing a string column containing datetime mannually
    - Change a string column to Datetime while Reading the CSV File
    - Change a string column to Datetime in Dataframe using `pd.to_datetime()` method
3. Dealing with DateTime Series in Pandas

## 1. Recap of Python Modules Related to Tate and Time

## a. Python Time Module
- Python Time module is principally for working with UNIX time stamps; expressed as a floating point number taken to be seconds since the unix epoch (00:00:00 UTC on 1 January 1970)

**Use `dir()` to get the list of methods in the Python `time` module**

In [None]:
import time
print(dir(time))

**The `time.time()` method returns the current time in seconds since UNIX Epoch (00:00:00 UTC on 1 January 1970)**

In [None]:
seconds = time.time()
seconds

**Getting number of seconds elapsed since UNIX epoch from the command line**

In [None]:
!date +%s

**The `time.ctime(seconds)` function takes seconds passed since epoch as argument and returns a string representing local time**

In [None]:
# The UNIX Epoch in system local time is five hours ahead of mid night 1st Jan 1970
# (Coordinated Universal Time a successor to Greenwich Mean Time)
dtg1 = time.ctime(0)
dtg1

In [None]:
seconds = time.time()
dtg2 = time.ctime(seconds)
dtg2

In [None]:
#Get time using shell command
!date

## b. Python Datetime Module
The `datetime` module can support many of the same operations as `time` module, but provides a more object oriented set of types, and also has some limited support for time zones as well.

In [None]:
# use dir() to get the list of complete functions in datetime module
import datetime
print(dir(datetime))

**(i) The `datetime.date(year, month, day)` method is used to create any random date**

In [None]:
import datetime
d1 = datetime.date(2021,10,9)
print(d1)
print(type(d1))

**(ii) The `datetime.datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])` method is used to create any random date, along with time**

In [None]:
dtg = datetime.datetime(2021,12,31)
print(dtg)
print(type(dtg))

In [None]:
print(datetime.datetime(2021, 12, 31, 4, 30, 54, 678))

**(iii) The `datetime.datetime.today()` and `datetime.datetime.now()` methods are used to fetch the current date and time**

In [None]:
print(datetime.datetime.today())

In [None]:
print(datetime.datetime.now())

**(iv)  The `time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) ` methods returns a time object. All arguments are optional**

In [None]:
t1 = datetime.time(10, 15)
print(t1)
print(type(t1))

**(v) Let us explore some commonly used attributes related with the `<class 'datetime.time'>`.**
- `dtg.year:` returns the year
- `dtg.month:` returns the month
- `dtg.day:` returns the date
- `dtg.hour:` returns the hour
- `dtg.min:` returns the minutes

In [None]:
dtg = datetime.datetime(2021, 12, 31, 4, 30, 54, 678)
print(dtg)
print(type(dtg))
print(dtg.month)

In [None]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv")
df

In [None]:
df.dtypes

**You can observe that the datatype of `Time` column is object, i.e., it is stored as string as `month/date/year hr:min`**

### b. Extract Information from  a Datetime Column Stored as String
- Let us try to slice substrings from each element from the `Time` Series using `df.Time.str.slice(start=None, stop=None, step=None)` method

In [None]:
df.Time.str.slice(-5,-3)

In [None]:
df.Time.str.slice(-5,-3).astype(int)

- **So we got the hours. This is a cumbersome as well as error prone approach to extract all the components of datetime from this entire column**
- **Let us change the `Time` column datatype to Pandas datetime**

### b. Change Datatype of `Time` Column While Reading the CSV File
- The `pd.read_csv()` function provides with `parse_dates` parameter, which converts the specified column into `datetime64` type
- Let us see this in practice

In [None]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv", parse_dates=['Time'])
df

In [None]:
df.dtypes

In [None]:
type(df.Time[0])

### c. Change Datatype of `Time` Column in Dataframe using `pd.to_datetime()` method
- If we have already read the csv file to a dataframe and want to change the `Time` column containing string data to Pandas datetime using `pd.to_datetime()` method, which convert argument to datetime
```
pd.to_datetime(arg)
```
Where arg can be int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like object to convert to a datetime.
You can check for other arguments using help() function

In [None]:
pd.to_datetime('Feb 19, 2021')

Note the format is `'year-month-date'`

In [None]:
pd.to_datetime('19/02/2021')

In [None]:
pd.to_datetime('19-02-2021')

In [None]:
# to_datetime() function will convert all these different formats into a common format
dates = ['2021-01-05', 'Jan 5, 2021', '01/05/2021', '2021.01.05', '2021/01/05','20210105']
pd.to_datetime(dates)

Seems to be working fine (with month in the middle). Let us try storing  6 March, 2021 as '06/03/2021'

In [None]:
pd.to_datetime('06/03/2021')

**Oops!, it has interpreted it as 3 June 2021**
You can see different datetime formates for better understanding. But for the time being use the format `month/date/year`. However, in the current scenario our dataset, the `Time` column has stored the date as string in year-month-date sequence

In [None]:
# you can also passed customized format to it and generate the same format as previous using format parameter
pd.to_datetime('2021$01$05', format='%Y$%m$%d')

In [None]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv")
df.head()

In [None]:
df.dtypes

In [None]:
df['Time'] = pd.to_datetime(df.Time)

In [None]:
df.head()

In [None]:
df.dtypes

**You can explore different attributes/methods of datetime series**

`Series.dt.year`: Returns year of datetime object

`Series.dt.month`: Returns year of datetime object

`Series.dt.day`: Returns year of datetime object

`Series.dt.hour`: Returns year of datetime object

`Series.dt.minute`: Returns year of datetime object

`Series.dt.dayofweek`: Returns 0-6, Sunday is taken as 0


`Series.dt.day_name()`: Returns name of the day as string

`Series.dt.month_name()`: Returns month as string

For details Read: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html

In [None]:
df.Time.dt.hour

In [None]:
df.Time.dt.day_name()

In [None]:
df.Time.dt.month_name()

In [None]:
df.Time.dt.day_name()

## 3. Dealing with DateTime Series in Pandas

In [None]:
import pandas as pd
df = pd.read_csv("datasets/uforeports.csv")
df['Time'] = pd.to_datetime(df.Time)
df

**Suppose I want to display only those UFO sightings that has been seen after 1st January 1995**

In [None]:
# Create a datetime object to be used for comparison
ts = pd.to_datetime('1995/03/24')
df.loc[df.Time >= ts, :]

**Suppose I want to display only those UFO sightings that has been seen between 1st March 1995 and 06 March 1995**

In [None]:
# Create a datetime object to be used for comparison
ts1 = pd.to_datetime('1995/03/1')
ts2 = pd.to_datetime('1995/03/7')
df.loc[(df.Time >= ts1) & (df.Time <= ts2), :]

**Suppose I want to display the record of the maximum date under the `Time` column**

In [None]:
ts = df.Time.max()
ts

In [None]:
df.loc[df.Time == ts]

**Suppose I want to display the oldest record as per the `Time` column**

In [None]:
ts = df.Time.min()
ts

In [None]:
df.loc[df.Time == ts]

**Suppose I want to check out the difference between the oldest and the newest record as per the `Time` column**

In [None]:
td = df.Time.max() - df.Time.min()
print(td)
print(type(td))

# Summary:
Pandas captures 4 general time related concepts:
- **Date times:** A specific date and time with timezone support. Similar to `datetime.datetime` from the standard library. To create this object we can use `to_datetime()` or `date_range()` methods
- **Time deltas:** An absolute time duration. Similar to `datetime.timedelta` from the standard library. To create this object we can use `to_timedelta()` or `timedelta_range()` methods
- **Time spans:** A span of time defined by a point in time and its associated frequency. To create this object we can use `Period()` or `period_range()` methods
- **Date offsets:** A relative time duration that respects calendar arithmetic. To create this object we can use `DateOffset()` method

**Read Documentation for details:** 
https://pandas.pydata.org/docs/user_guide/timeseries.html#overview