---
# Data Science and Artificial Intelliegence Practicum
## 3.2-modul. Data Wrangling
---

## 3.2.6 - DateTime

In [1]:
import numpy as np
import pandas as pd
import datetime as dt

### `datetime` module
Basic date and time types

#### *class* `datetime.`**datetime**
*`(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)`*

A combination of a date and a time. Attributes: **`year`**, **`month`**, **`day`**, **`hour`**, **`minute`**, **`second`**, **`microsecond`**, and **`tzinfo`**.

In [2]:
now = dt.datetime.now()
print(now)

2022-04-06 13:43:36.857521


In [3]:
print(now.date())

2022-04-06


In [4]:
print(now.time())

13:43:36.857521


In [5]:
now.day

6

In [6]:
now.year

2022

In [7]:
now.month

4

In [8]:
now.day

6

In [9]:
now.hour

13

In [10]:
now.minute

43

In [11]:
now.second

36

In [12]:
now.microsecond

857521

####  *class* `datetime.`**date**
*`(year, month, day)`*

An idealized naive date, assuming the current Gregorian calendar always was, and always will be, in effect. Attributes: **`year`**, **`month`**, and **`day`**.

In [13]:
print(dt.date.today())

2022-04-06


In [14]:
print(dt.date(2022, 4, 6))  # -> year | month | day

2022-04-06


#### *class* `datetime.`**time**
(*hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0*)

An idealized time, independent of any particular day, assuming that every day has exactly 24\*60\*60 seconds. (There is no notion of “leap seconds” here.) Attributes: **`hour`**, **`minute`**, **`second`**, **`microsecond`**, and **`tzinfo`**.

In [15]:
t1 = dt.time(18, 33, 55)
print(t1.minute)

33


In [16]:
print(dt.time(21, 12, 34))

21:12:34


In [17]:
today = dt.date.today()
independence_day = dt.date(2022, 9, 1)
diff = independence_day - today

print(f"{diff.days} left for Independence Day of Uzbekistan")

148 left for Independence Day of Uzbekistan


In [18]:
now = dt.datetime.now()
event = dt.datetime(2022, 4, 9, 18, 35, 00)
diff = event - now

days = diff.days
seconds = diff.seconds
minutes = int(seconds / 60)
hours = int(minutes / 60)

print(f"{days} days left for an event")
print(f"{hours} hours left for an event")
print(f"{minutes} minutes left for an event")
print(f"{seconds} seconds left for an event")

3 days left for an event
4 hours left for an event
291 minutes left for an event
17482 seconds left for an event


#### `strftime`
strftime cheatsheet -> https://strftime.org

In [19]:
print(now.strftime("%H:%M:%S"))

print(now.strftime("%d-%B, %A"))

print(now.strftime("%d/%m/%Y, %H:%M"))

13:43:37
06-April, Wednesday
06/04/2022, 13:43


In [20]:
type(now)

datetime.datetime

### Working with datetime

In [21]:
df = pd.read_csv("https://github.com/anvarnarz/praktikum_datasets/blob/main/global_earthquake.csv?raw=true")
df.head()

Unnamed: 0,Date,Time,Latitude,Longitude,Type,Depth,Depth Error,Depth Seismic Stations,Magnitude,Magnitude Type,...,Magnitude Seismic Stations,Azimuthal Gap,Horizontal Distance,Horizontal Error,Root Mean Square,ID,Source,Location Source,Magnitude Source,Status
0,01/02/1965,13:44:18,19.246,145.616,Earthquake,131.6,,,6.0,MW,...,,,,,,ISCGEM860706,ISCGEM,ISCGEM,ISCGEM,Automatic
1,01/04/1965,11:29:49,1.863,127.352,Earthquake,80.0,,,5.8,MW,...,,,,,,ISCGEM860737,ISCGEM,ISCGEM,ISCGEM,Automatic
2,01/05/1965,18:05:58,-20.579,-173.972,Earthquake,20.0,,,6.2,MW,...,,,,,,ISCGEM860762,ISCGEM,ISCGEM,ISCGEM,Automatic
3,01/08/1965,18:49:43,-59.076,-23.557,Earthquake,15.0,,,5.8,MW,...,,,,,,ISCGEM860856,ISCGEM,ISCGEM,ISCGEM,Automatic
4,01/09/1965,13:32:50,11.938,126.427,Earthquake,15.0,,,5.8,MW,...,,,,,,ISCGEM860890,ISCGEM,ISCGEM,ISCGEM,Automatic


In [22]:
df[['Date', 'Time']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23409 entries, 0 to 23408
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Date    23409 non-null  object
 1   Time    23409 non-null  object
dtypes: object(2)
memory usage: 365.9+ KB


looks like *Date* and *Time* columns are not actaully `datetime` object, but they are object/string

In [65]:
# this is not an actual datetime object
date = df.at[0, 'Date']
date.year

AttributeError: ignored

Let's convert *Date* and *Time* columns into `datetime` data type

### `pandas.to_datetime`
Convert argument to datetime.

In [41]:
date1 = df.loc[0, 'Date']

print(date1)
print(f"type: {type(date1)}")

01/02/1965
type: <class 'str'>


#### convert Date

In [42]:
df['Date']

0        01/02/1965
1        01/04/1965
2        01/05/1965
3        01/08/1965
4        01/09/1965
            ...    
23404    12/28/2016
23405    12/28/2016
23406    12/28/2016
23407    12/29/2016
23408    12/30/2016
Name: Date, Length: 23409, dtype: object

from this Series we can see the time format is like -> **`month/day/year`**

In [48]:
df['datetime'] = pd.to_datetime(df['Date'], format="%m/%d/%Y")
df.head()

Unnamed: 0,Date,Time,Latitude,Longitude,Type,Depth,Depth Error,Depth Seismic Stations,Magnitude,Magnitude Type,...,Azimuthal Gap,Horizontal Distance,Horizontal Error,Root Mean Square,ID,Source,Location Source,Magnitude Source,Status,datetime
0,01/02/1965,13:44:18,19.246,145.616,Earthquake,131.6,,,6.0,MW,...,,,,,ISCGEM860706,ISCGEM,ISCGEM,ISCGEM,Automatic,1965-01-02
1,01/04/1965,11:29:49,1.863,127.352,Earthquake,80.0,,,5.8,MW,...,,,,,ISCGEM860737,ISCGEM,ISCGEM,ISCGEM,Automatic,1965-01-04
2,01/05/1965,18:05:58,-20.579,-173.972,Earthquake,20.0,,,6.2,MW,...,,,,,ISCGEM860762,ISCGEM,ISCGEM,ISCGEM,Automatic,1965-01-05
3,01/08/1965,18:49:43,-59.076,-23.557,Earthquake,15.0,,,5.8,MW,...,,,,,ISCGEM860856,ISCGEM,ISCGEM,ISCGEM,Automatic,1965-01-08
4,01/09/1965,13:32:50,11.938,126.427,Earthquake,15.0,,,5.8,MW,...,,,,,ISCGEM860890,ISCGEM,ISCGEM,ISCGEM,Automatic,1965-01-09


In [50]:
date2 = df.loc[0, 'datetime']

print(date2)
print(f"type: {type(date2)}")

1965-01-02 00:00:00
type: <class 'pandas._libs.tslibs.timestamps.Timestamp'>


In [66]:
# now this is an actual datetime object
date3 = df.at[0, 'datetime']
date3.year

1965

In [78]:
df.at[1, 'datetime'].year

1965

In [81]:
# show only years
df['datetime'].dt.year

0        1965
1        1965
2        1965
3        1965
4        1965
         ... 
23404    2016
23405    2016
23406    2016
23407    2016
23408    2016
Name: datetime, Length: 23409, dtype: int64

In [93]:
df['month'] = df['datetime'].dt.month
df['month'].head()

0    1
1    1
2    1
3    1
4    1
Name: month, dtype: int64

### **Question**: Which month do most earthquakes occur?

In [87]:
# we can see which month had the most earthquakes
df['month'].value_counts()

3     2113
8     2014
12    2001
11    1987
9     1985
4     1970
5     1964
10    1952
1     1891
7     1880
2     1828
6     1824
Name: month, dtype: int64

### **Answer**: in March

#### convert Time

In [91]:
df['Time']

0        13:44:18
1        11:29:49
2        18:05:58
3        18:49:43
4        13:32:50
           ...   
23404    08:22:12
23405    09:13:47
23406    12:38:51
23407    22:30:19
23408    20:08:28
Name: Time, Length: 23409, dtype: object

In [94]:
df['time'] = pd.to_datetime(df['Time'], format="%H:%M:%S")
df['time'].head()

0   1900-01-01 13:44:18
1   1900-01-01 11:29:49
2   1900-01-01 18:05:58
3   1900-01-01 18:49:43
4   1900-01-01 13:32:50
Name: time, dtype: datetime64[ns]

In [101]:
df.at[0, 'time'].minute

44

In [102]:
# show only hours
df['time'].dt.hour

0        13
1        11
2        18
3        18
4        13
         ..
23404     8
23405     9
23406    12
23407    22
23408    20
Name: time, Length: 23409, dtype: int64