---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.21 (Pandas-13)</h1>

<img align="right" width="400" height="400"  src="images/pandas-apps.png"  >

## _Working with Time Series Data_

**Read Documentation for details:** 
https://pandas.pydata.org/docs/user_guide/timeseries.html#overview

In [None]:
# To install this library in Jupyter notebook
#import sys
#!{sys.executable} -m pip install pandas

In [None]:
import pandas as pd
pd.__version__ , pd.__path__

## Learning agenda of this notebook
1. Recap of Python's Built-in Time and Datetime Modules
    - Python Time module
    - Python Datetime module
    - Time Zones
2. Converting Strings to Pandas DateTime64 type
    - Convert a Scalar String to DateTime
    - Convert Pandas Series to DateTime
    - Handling Issues of DateTime Formats
    - Convert a Single Integer to Pandas DateTime
3. Reading a sample dataset having DateTime Data
4. Practicing with UFO Dataset
5. Practicing with Crypto Dataset
6. 

## 1. Recap of Python Modules Related to Date and Time

## a. Python Time Module
- Python Time module is principally for working with UNIX time stamps; expressed as a floating point number taken to be seconds since the unix epoch (00:00:00 UTC on 1 January 1970)

In [None]:
# Use `dir()` to get the list of methods in the Python `time` module
import time
print(dir(time))

**(i) The `time.time()` method returns the current time in seconds since UNIX Epoch (00:00:00 UTC on 1 January 1970)**

In [None]:
seconds = time.time()
seconds

> You can achieve the same using the system `date` command and passing it `+%s` command line arugment

In [None]:
!date +%s

**(ii) The `time.ctime()` method returns a date time string corresponding to the number of seconds passed to it since UNIX Epoch.**

In [None]:
# Showing `+5:00` hours time delta because of local time zone (PKT) differs from UTC with 5 hours
dtg1 = time.ctime(0)
dtg1

In [None]:
#If you pass the current elapsed seconds since UNIX epoch to the `ctime()` method, it returns current datetime
seconds = time.time()
dtg2 = time.ctime(seconds)
dtg2

In [None]:
#Get time using shell command
!date

## b. Python Datetime Module
The `datetime` module can support many of the same operations as `time` module, but provides a more object oriented set of types, and also has some limited support for time zones as well.

In [None]:
# use dir() to get the list of complete functions in datetime module
import datetime
print(dir(datetime))

**(i) The `datetime.datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])` method is used to create any random date, along with time**

In [None]:
dtg = datetime.datetime(2021,12,31)
print(dtg)
print(type(dtg))

In [None]:
print(datetime.datetime(2021, 12, 31, 4, 30, 54, 678))

**(ii)  The `time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) ` methods returns a time object. All arguments are optional**

In [None]:
t1 = datetime.time(10, 15)
print(t1)
print(type(t1))

**(iii) You can explore some commonly used attributes related with the `<class 'datetime.time'>`.**
- `dtg.year:` returns the year
- `dtg.month:` returns the month
- `dtg.day:` returns the date
- `dtg.hour:` returns the hour
- `dtg.minute:` returns the minutes
- `dtg.second:` returns the seconds

In [None]:
dtg = datetime.datetime(2021, 12, 31, 4, 25, 58)
print(dtg)
print(type(dtg))

In [None]:
dtg.year

In [None]:
dtg.month

In [None]:
dtg.day

In [None]:
dtg.hour

In [None]:
dtg.minute

In [None]:
dtg.second

### c. Time Zones:
- Since noon happens at different times in different parts of the world, therefore, the world is divided in different time zones.
- On Mac, Linux, and Windows operating systems, the information about these time zones is kept in files.
- Let me show you the contents of these files on my Mac system

In [None]:
# The UNIX Epoch in system local time is five hours ahead of mid night 1st Jan 1970
# (Coordinated Universal Time a successor to Greenwich Mean Time)
dtg1 = time.ctime(0)
dtg1

> You may have noticed that above cell does not display the exact UNIX epoch, i.e., mid-night 1st January 1970 rather is 5 hours ahead. This is because my machine is configured as per the time zone of Pakistan having a `+5:00` timedelta from Cooridnated Universtal Time (UTC a successor to GMT)

In [None]:
!ls /usr/share/zoneinfo/

In [None]:
!ls /usr/share/zoneinfo/Asia

>On all UNIX based systems (Mac, Linux), `TZ` is an environment variable that can be set to any of the above files to get the date of that appropriate zone. By default the system is configured to set it to the local time of the country

In [None]:
! date

In [None]:
! TZ=Asia/Karachi    date

In [None]:
! TZ=Asia/Calcutta   date

>So you can observe if we run `date` command after setting the TZ variable to Karachi and Calcutta, thir local date times are displayed. Being in different time zones Pakistan Standard Time is 30 minutes before India

## 2. Pandas Time Series Data Structures
- **Timestamp & DatetimeIndex:**
    - A Time stamp refer to particular momentt in time, e.g., 28 July, 1969 at 11:00 am
    - It is a replacement of Python's built-in datetime object
    - The `pd.to_datetime()` method is used to create a Timestamp object
    - The `pd.date_range()` method is used to generate a DatetimeIndex object
- **Period & PeriodIndex:**
    - A Period refer to length of time between a start and end point, with each interval of uniform length
    - The `pd.to_period()` method is used to create a Period object
    - The `pd.period_range()` method is used to create a PeriodIndex
- **Timedelta & TimedeltaIndex:**
    - A time delta or duration refer to an exact length of time, e.g., a duration of 235.54 seconds
    - A time delta is created when you subtract two dates, while a TimedeltaIndex is created when you subtract two Periods

## 2. Converting Strings to Pandas Timestamp Object
- The `to_datetime()` method is used to convert its only required argument `arg` to a Timestamp object.
can be int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like object to convert to a datetime.

```
pd.to_datetime(arg, format=None, unit=None, origin='unix')
```
- Where,
    - `arg` can be a string, Series, int, datetime, list, tuple, 1-d array, DataFrame/dict-like object.
- Rest of the arguments will be discussed later

### a. Basic Conversion with Scalar String

In [2]:
#YYYY-MM-DD
str_date = '2022-03-06 08:30:15'
print(str_date)
print(type(str_date))

2022-03-06 08:30:15
<class 'str'>


In [3]:
ts = pd.to_datetime(str_date)
print(ts)
print(type(ts))

2022-03-06 08:30:15
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


**`pd.Timestamp Attributes`**

`Series.dt.[ts.]second`: Returns seconds

`ts.minute`: Returns year

`ts.hour`: Returns hour

`ts.day`: Returns day

`ts.month`: Returns month as January=1, December=12

`ts.year`: Returns the year of datetime object

`Series.dt.day_name()`: Returns name of the day as string

`Series.dt.month_name()`: Returns month as string

For details Read: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html

In [4]:
ts.year

2022

In [5]:
ts.month

3

In [10]:
ts.day

6

In [11]:
ts.month_name()

'March'

In [12]:
ts.hour

8

In [13]:
ts.minute

30

In [14]:
ts.quarter

1

### b. Convert Pandas Series of Strings to Series of Timestamps

In [15]:
# A pandas series having same date but in different formats
s1 = pd.Series(['2022-03-06 08:30', '2022/03/06 08:30', '6 March, 2022 08:30', 'Mar 06, 2022 08:30', '202203060830'])
type(s1)
s1

0       2022-03-06 08:30
1       2022/03/06 08:30
2    6 March, 2022 08:30
3     Mar 06, 2022 08:30
4           202203060830
dtype: object

In [16]:
# to_datetime() function will convert all these different formats into a common format
s2 = pd.to_datetime(s1)
s2

0   2022-03-06 08:30:00
1   2022-03-06 08:30:00
2   2022-03-06 08:30:00
3   2022-03-06 08:30:00
4   2022-03-06 08:30:00
dtype: datetime64[ns]

In [17]:
type(s2)

pandas.core.series.Series

In [19]:
type(s2[0])

pandas._libs.tslibs.timestamps.Timestamp

In [20]:
s2[0].day, s2[0].month

(6, 3)

### c. Handling Issues of DateTime Formats
From above examples, it appears that `pd.to_date()` works fine for all date formats. Let us try storing  6 March, 2022 as '06/03/2022' or '06-03-2022'

**(i) Problem 1:**

In [21]:
ts = pd.to_datetime('06-03-2022')
ts

Timestamp('2022-06-03 00:00:00')

In [22]:
ts.day, ts.month

(3, 6)

**Oops!**, Pandas `to_date()` method has converted the string to datetime, but interpreted it as 3 June 2022
>It seems that `pd.to_date()` expects the month to be given in the middle, when all the three components are given as decimal numbers. However, it handles it intelligently if the date value is greater than 12, or when month is given as as string.

**(ii) Problem 2:**

In [23]:
#ts = pd.to_datetime('2022-03-06 08-PM')

**Oops again**!, Pandas `to_date()` method has raised an error saying `ParserError: Unknown string format: 2022-03-06 08-PM`
>It seems that `pd.to_date()` expects the time to be in 24 hours clock and not if the ambiguity is handled using AM (Ante-Meridiem meaning before Midday) or PM (Post-Meridiem, meaning after midday.)

**(iii) Solution of above two Problems:**
>Solution is that we need to pass a `format string` to the `format` argument of the `pd.to_datetime()` method. The format string need to be prepared as per the string date format.
Visit this link to see for Format codes: https://pandas.pydata.org/docs/reference/api/pandas.Period.strftime.html

In [27]:
# If you try to run following LOC it will raise an error 'ParserError: Unknown string format: 2022-01-06 09-AM'
ts = pd.to_datetime('06-03-2022 08-PM', format = '%d-%m-%Y %I-%p')

In [28]:
ts

Timestamp('2022-03-06 20:00:00')

In [29]:
ts.day, ts.month

(6, 3)

### d. Convert a Single Integer to Pandas Timestamp
- The `to_datetime()` method can also be used to convert the first argument passed as integer to a Pandas Timestamp object. 
- The `unit` argument tells about the unit of the `arg`, and it can be seconds, days or years
- The `origin` argument can be any reference point from where you want to start counting your units from. The default value of `origin` is the UNIX epoch.
```
pd.to_datetime(arg, format=None, unit=None, origin='unix')
```

In [30]:
!date +%s

1641381351


In [31]:
ts = pd.to_datetime(1641347642, unit='s', origin='unix')
ts

Timestamp('2022-01-05 01:54:02')

>You can mention the origin as some other reference point of your choice

In [32]:
ts = pd.to_datetime(10, unit='D', origin='2022-01-01')
ts

Timestamp('2022-01-11 00:00:00')

## 3. Reading a Sample Dataset having DateTime Data

### a.  Option 1: Read the Dataset as such and then convert the Column Datatype to Timestamp64

**Example 1:** A dataset with datetime in a format as expected by `pd.to_date()`

In [33]:
# yyyy-mm-dd hr:min
! cat datasets/datetime1.csv

name,dob,address,gender
Khurram,2022-03-06 21:10,Lahore,Male
Fatima,2022/03/06 08:30,Islamabad,Female
Huzaifa,2022-03-06 20:15,Karachi,Male
Shaista,2022.03.06 18:05,Peshawer,Female

In [34]:
import pandas as pd
df = pd.read_csv("datasets/datetime1.csv")
df

Unnamed: 0,name,dob,address,gender
0,Khurram,2022-03-06 21:10,Lahore,Male
1,Fatima,2022/03/06 08:30,Islamabad,Female
2,Huzaifa,2022-03-06 20:15,Karachi,Male
3,Shaista,2022.03.06 18:05,Peshawer,Female


In [35]:
df.dtypes

name       object
dob        object
address    object
gender     object
dtype: object

In [36]:
pd.to_datetime(df.loc[:,'dob'])

0   2022-03-06 21:10:00
1   2022-03-06 08:30:00
2   2022-03-06 20:15:00
3   2022-03-06 18:05:00
Name: dob, dtype: datetime64[ns]

In [37]:
df['dob'] = pd.to_datetime(df.loc[:,'dob'])

In [38]:
df.dtypes

name               object
dob        datetime64[ns]
address            object
gender             object
dtype: object

**Example 2:** A dataset with datetime in a format NOT expected by `pd.to_date()`

In [48]:
# dd-mm-yyyy hr-PM
! cat datasets/datetime2.csv

name,dob,address,gender
Khurram,02-07-1980 08-PM,Lahore,Male
Fatima,15-06-2001 06-AM,Islamabad,Female
Huzaifa,08-04-1999 05-PM,Karachi,Male
Shaista,10-09-2005 02-AM,Peshawer,Female

In [49]:
df = pd.read_csv("datasets/datetime2.csv")
df

Unnamed: 0,name,dob,address,gender
0,Khurram,02-07-1980 08-PM,Lahore,Male
1,Fatima,15-06-2001 06-AM,Islamabad,Female
2,Huzaifa,08-04-1999 05-PM,Karachi,Male
3,Shaista,10-09-2005 02-AM,Peshawer,Female


In [50]:
df.dtypes

name       object
dob        object
address    object
gender     object
dtype: object

In [52]:
# Following LOC will now generate `ParserError: Unknown string format: 02-07-1980 08-PM`
#pd.to_datetime(df.loc[:,'dob'])

In [53]:
pd.to_datetime(df.loc[:,'dob'], format = '%d-%m-%Y %I-%p')

0   1980-07-02 20:00:00
1   2001-06-15 06:00:00
2   1999-04-08 17:00:00
3   2005-09-10 02:00:00
Name: dob, dtype: datetime64[ns]

In [54]:
df['d0b'] = pd.to_datetime(df.loc[:,'dob'], format = '%d-%m-%Y %I-%p')

In [55]:
df.dtypes

name               object
dob                object
address            object
gender             object
d0b        datetime64[ns]
dtype: object

### b.  Option 2: Do the Conversion while Reading the CSV File

>**One can use the `parse_dates` and `date_parser` argument to the `pd.read_csv()` method to do this conversion while reading the csv file. However, the `pd.to_datetime()` method discussed above is recommended.**

## 4. Practicing with UFO Dataset

### a. Understanding the Dataset

In [56]:
import pandas as pd
df = pd.read_csv("datasets/ufo.csv")

In [57]:
df

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,12/31/2000 23:00
18237,Spirit Lake,,DISK,IA,12/31/2000 23:00
18238,Eagle River,,,WI,12/31/2000 23:45
18239,Eagle River,RED,LIGHT,WI,12/31/2000 23:45


In [58]:
# The Time column of the dataframe contains strings
df.loc[0,'Time']

'6/1/1930 22:00'

In [59]:
pd.to_datetime(df.loc[:,'Time'])

0       1930-06-01 22:00:00
1       1930-06-30 20:00:00
2       1931-02-15 14:00:00
3       1931-06-01 13:00:00
4       1933-04-18 19:00:00
                ...        
18236   2000-12-31 23:00:00
18237   2000-12-31 23:00:00
18238   2000-12-31 23:45:00
18239   2000-12-31 23:45:00
18240   2000-12-31 23:59:00
Name: Time, Length: 18241, dtype: datetime64[ns]

In [60]:
df['Time'] = pd.to_datetime(df.loc[:,'Time'])

In [61]:
df.dtypes

City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object

**Suppose I want to display only those UFO sightings that has been seen after 1st January 1995**

In [62]:
df.loc[df.Time >= '1995/03/24', :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
7948,North Dade,,,FL,1995-03-24 01:27:00
7949,Las Vegas,,,NV,1995-03-24 05:00:00
7950,Las Vegas,,,NV,1995-03-24 05:00:00
7951,Grover Beach,,,CA,1995-03-25 00:00:00
7952,Monterey,,,CA,1995-03-25 00:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


In [63]:
# Create a datetime object to be used for comparison
ts = pd.to_datetime('1995/03/24')
df.loc[df.Time >= ts, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
7948,North Dade,,,FL,1995-03-24 01:27:00
7949,Las Vegas,,,NV,1995-03-24 05:00:00
7950,Las Vegas,,,NV,1995-03-24 05:00:00
7951,Grover Beach,,,CA,1995-03-25 00:00:00
7952,Monterey,,,CA,1995-03-25 00:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


**Suppose I want to display only those UFO sightings that has been seen between 1st March 1995 and 06 March 1995**

In [64]:
# Create a datetime object to be used for comparison
ts1 = pd.to_datetime('1995/03/1')
ts2 = pd.to_datetime('1995/03/7')
df.loc[(df.Time >= ts1) & (df.Time <= ts2), :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
7860,Greenville,,LIGHT,IL,1995-03-01 21:00:00
7861,Sedalia,,,MO,1995-03-01 21:00:00
7862,Redmond,RED,,WA,1995-03-02 22:30:00
7863,Prescott Valley,,OVAL,AZ,1995-03-04 00:00:00
7864,Folsom,,,NJ,1995-03-04 16:32:00
7865,Anaheim,,OTHER,CA,1995-03-05 12:00:00
7866,Columbus,,,OH,1995-03-06 00:55:00
7867,Hilltop,,,NJ,1995-03-06 19:00:00
7868,Florence,,,OR,1995-03-06 19:10:00
7869,Mountain City,,,TN,1995-03-06 19:45:00


**Suppose I want to display the record of the maximum date under the `Time` column**

In [65]:
ts = df.Time.max()
ts

Timestamp('2000-12-31 23:59:00')

In [66]:
df.loc[df.Time == ts]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
18240,Ybor,,OVAL,FL,2000-12-31 23:59:00


**Suppose I want to display the oldest record as per the `Time` column**

In [67]:
ts = df.Time.min()
ts

Timestamp('1930-06-01 22:00:00')

In [68]:
df.loc[df.Time == ts]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00


**Suppose I want to check out the difference between the oldest and the newest record as per the `Time` column**

In [69]:
td = df.Time.max() - df.Time.min()
print(td)
print(type(td))

25781 days 01:59:00
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>


## 5. Practicing with Crypto Dataset

In [70]:
import pandas as pd
df = pd.read_csv("datasets/cryptodata.csv", )
df

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume
0,2020-03-13 08-PM,ETHUSD,129.94,131.82,126.87,128.71,1940673.93
1,2020-03-13 07-PM,ETHUSD,119.51,132.02,117.10,129.94,7579741.09
2,2020-03-13 06-PM,ETHUSD,124.47,124.85,115.50,119.51,4898735.81
3,2020-03-13 05-PM,ETHUSD,124.08,127.42,121.63,124.47,2753450.92
4,2020-03-13 04-PM,ETHUSD,124.85,129.51,120.17,124.08,4461424.71
...,...,...,...,...,...,...,...
23669,2017-07-01 03-PM,ETHUSD,265.74,272.74,265.00,272.57,1500282.55
23670,2017-07-01 02-PM,ETHUSD,268.79,269.90,265.00,265.74,1702536.85
23671,2017-07-01 01-PM,ETHUSD,274.83,274.93,265.00,268.79,3010787.99
23672,2017-07-01 12-PM,ETHUSD,275.01,275.01,271.00,274.83,824362.87


In [71]:
df.dtypes

Date       object
Symbol     object
Open      float64
High      float64
Low       float64
Close     float64
Volume    float64
dtype: object

In [72]:
# ParserError: Unknown string format: 2020-03-13 08-PM
#pd.to_datetime(df.loc[:,'Date'])

In [73]:
pd.to_datetime(df.loc[:,'Date'], format = '%Y-%m-%d %I-%p')

0       2020-03-13 20:00:00
1       2020-03-13 19:00:00
2       2020-03-13 18:00:00
3       2020-03-13 17:00:00
4       2020-03-13 16:00:00
                ...        
23669   2017-07-01 15:00:00
23670   2017-07-01 14:00:00
23671   2017-07-01 13:00:00
23672   2017-07-01 12:00:00
23673   2017-07-01 11:00:00
Name: Date, Length: 23674, dtype: datetime64[ns]

In [74]:
df['Date'] = pd.to_datetime(df.loc[:,'Date'], format = '%Y-%m-%d %I-%p')

In [75]:
df.dtypes

Date      datetime64[ns]
Symbol            object
Open             float64
High             float64
Low              float64
Close            float64
Volume           float64
dtype: object

**Let us create a new column in the dataframe that shows the day of week in each row**

In [76]:
df['dayofweek'] = df['Date'].dt.day_name()

In [77]:
df

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume,dayofweek
0,2020-03-13 20:00:00,ETHUSD,129.94,131.82,126.87,128.71,1940673.93,Friday
1,2020-03-13 19:00:00,ETHUSD,119.51,132.02,117.10,129.94,7579741.09,Friday
2,2020-03-13 18:00:00,ETHUSD,124.47,124.85,115.50,119.51,4898735.81,Friday
3,2020-03-13 17:00:00,ETHUSD,124.08,127.42,121.63,124.47,2753450.92,Friday
4,2020-03-13 16:00:00,ETHUSD,124.85,129.51,120.17,124.08,4461424.71,Friday
...,...,...,...,...,...,...,...,...
23669,2017-07-01 15:00:00,ETHUSD,265.74,272.74,265.00,272.57,1500282.55,Saturday
23670,2017-07-01 14:00:00,ETHUSD,268.79,269.90,265.00,265.74,1702536.85,Saturday
23671,2017-07-01 13:00:00,ETHUSD,274.83,274.93,265.00,268.79,3010787.99,Saturday
23672,2017-07-01 12:00:00,ETHUSD,275.01,275.01,271.00,274.83,824362.87,Saturday


**Let us find the oldest and newest record in the dataframe**

In [78]:
df['Date'].min()

Timestamp('2017-07-01 11:00:00')

In [79]:
df['Date'].max()

Timestamp('2020-03-13 20:00:00')

In [80]:
df['Date'].max() - df['Date'].min()

Timedelta('986 days 09:00:00')

**Let us find the records of the year 2020 only**

In [81]:
mask = df['Date'] >= '2020'
mask

0         True
1         True
2         True
3         True
4         True
         ...  
23669    False
23670    False
23671    False
23672    False
23673    False
Name: Date, Length: 23674, dtype: bool

In [82]:
df.loc[mask]

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume,dayofweek
0,2020-03-13 20:00:00,ETHUSD,129.94,131.82,126.87,128.71,1940673.93,Friday
1,2020-03-13 19:00:00,ETHUSD,119.51,132.02,117.10,129.94,7579741.09,Friday
2,2020-03-13 18:00:00,ETHUSD,124.47,124.85,115.50,119.51,4898735.81,Friday
3,2020-03-13 17:00:00,ETHUSD,124.08,127.42,121.63,124.47,2753450.92,Friday
4,2020-03-13 16:00:00,ETHUSD,124.85,129.51,120.17,124.08,4461424.71,Friday
...,...,...,...,...,...,...,...,...
1744,2020-01-01 04:00:00,ETHUSD,129.57,130.00,129.50,129.56,702786.82,Wednesday
1745,2020-01-01 03:00:00,ETHUSD,130.37,130.44,129.38,129.57,496704.23,Wednesday
1746,2020-01-01 02:00:00,ETHUSD,130.14,130.50,129.91,130.37,396315.72,Wednesday
1747,2020-01-01 01:00:00,ETHUSD,128.34,130.14,128.32,130.14,635419.40,Wednesday


**Let us find the records of the year 2020 only**

In [83]:
mask = (df['Date'] >= '2020-01-01') & (df['Date'] <= '2020-01-31')
mask

0        False
1        False
2        False
3        False
4        False
         ...  
23669    False
23670    False
23671    False
23672    False
23673    False
Name: Date, Length: 23674, dtype: bool

In [84]:
df.loc[mask]

Unnamed: 0,Date,Symbol,Open,High,Low,Close,Volume,dayofweek
1028,2020-01-31 00:00:00,ETHUSD,184.55,185.68,183.48,183.82,1107068.24,Friday
1029,2020-01-30 23:00:00,ETHUSD,186.62,186.89,182.99,184.55,1262371.00,Thursday
1030,2020-01-30 22:00:00,ETHUSD,185.03,186.63,183.90,186.62,992325.34,Thursday
1031,2020-01-30 21:00:00,ETHUSD,184.40,185.03,183.19,185.03,701167.77,Thursday
1032,2020-01-30 20:00:00,ETHUSD,181.26,185.14,181.26,184.40,2180199.04,Thursday
...,...,...,...,...,...,...,...,...
1744,2020-01-01 04:00:00,ETHUSD,129.57,130.00,129.50,129.56,702786.82,Wednesday
1745,2020-01-01 03:00:00,ETHUSD,130.37,130.44,129.38,129.57,496704.23,Wednesday
1746,2020-01-01 02:00:00,ETHUSD,130.14,130.50,129.91,130.37,396315.72,Wednesday
1747,2020-01-01 01:00:00,ETHUSD,128.34,130.14,128.32,130.14,635419.40,Wednesday


## 6. Making the Date Column as Index of the Dataframe
- Once you load a csv file having datetime column in a dataframe, it is read as string
- You use the `pd.to_datetime()` method to change the datatype to Pandas Datetime64 type
- Now you can use `df.set_index()` method to make the datetime column as row index of the dataframe.
- This allows you to treat the entire dataset in the dataframe as a Time Series Data
    - Selecting/Indexing using strings
    - Slicing using `df[date1:date2]`
    - Use of `df.loc[date1, col]`
    - Select missing values `df[df.col.isnull()]
    - Do upsampling/downsampling using `df.asfrequency()

In [None]:
import pandas as pd
df = pd.read_csv("datasets/cryptodata.csv", )
df

In [None]:
df['Date'] = pd.to_datetime(df.loc[:,'Date'], format = '%Y-%m-%d %I-%p')

In [None]:
df.set_index('Date', inplace=True)
df

>Now, since the data of the `Date` column has become the row indices of this dataframe, therefore, we can use `.loc[]` on the dates :)
- Since index is still unique so the searching will be done in O(1) time
- If non-unique but sorted the searching will take O(logn) time
- If non-unique and non-sorted the searching will take O(n) time

In [None]:
# Slice data of January and February 2020
df['2020-01':'2020-02']

In [None]:
# Get only the Close column showing closing of January and February 2020
df['2020-01':'2020-02']['Close']

In [None]:
# Compute the mean
df['2020-01':'2020-02']['Close'].mean()

>The given dataframe is showing data on hourly basis, what if we want to resample it on daily, weekly or monthly basis

In [None]:
df.loc[:,'High'].resample('D').max()

>The given dataframe is showing data on hourly basis, what if we want to resample it on daily, weekly or monthly basis

# Bonus: Creating a DatetimeIndex
- The `pd.date_range()` method returns a range of equally spaced time points as a DatetimeIndex, which is an immutable container for datetimes.

```
pd.date_range(start=None, end=None, periods=None, freq=None)
```

- Where,
    - `start` is the left bound (str or datetime)
    - `start` is the right bound (str or datetime)
    - `periods` is the number of periods to generate
    - `freq` can be `s`, `min`, `h`, `d`, `m`, `q`, `y` for seconds, minutes, ....


- Out of the four parameters: start, end, periods, and freq, exactly three must be specified

In [91]:
dr = pd.date_range("2022-01-01 11:20 ", periods=10, freq="h")
dr

DatetimeIndex(['2022-01-01 11:20:00', '2022-01-01 12:20:00',
               '2022-01-01 13:20:00', '2022-01-01 14:20:00',
               '2022-01-01 15:20:00', '2022-01-01 16:20:00',
               '2022-01-01 17:20:00', '2022-01-01 18:20:00',
               '2022-01-01 19:20:00', '2022-01-01 20:20:00'],
              dtype='datetime64[ns]', freq='H')

In [92]:
type(dr)

pandas.core.indexes.datetimes.DatetimeIndex

In [86]:
pd.date_range("2022-01-01", periods=10, freq="d")

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
               '2022-01-09', '2022-01-10'],
              dtype='datetime64[ns]', freq='D')

In [87]:
s = pd.Series(pd.date_range("2022-01-01", periods=10, freq="d"))
s

0   2022-01-01
1   2022-01-02
2   2022-01-03
3   2022-01-04
4   2022-01-05
5   2022-01-06
6   2022-01-07
7   2022-01-08
8   2022-01-09
9   2022-01-10
dtype: datetime64[ns]

In [88]:
s.dt.day_name()

0     Saturday
1       Sunday
2       Monday
3      Tuesday
4    Wednesday
5     Thursday
6       Friday
7     Saturday
8       Sunday
9       Monday
dtype: object

In [90]:
type(s[0])

pandas._libs.tslibs.timestamps.Timestamp

# Bonus: Creating a Period

# Bonus: Creating a PeriodIndex

# Bonus: Creating a Timedelta

# Bonus: Creating a TimedeltaIndex