Na reprezentovanie času a dátumu v Pandas používame nasledujúce typy:

- Date times(Timestamp) - dtype datetime64[ns]
- Time deltas (Timedelta) - dtype timedelta64[ns]
- Time spans (Period) - dtype period[freq]
- Date offsets - toto je specialny typ

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Datetime
sa pouziva na reprezentaciu datumu a casu v pythone. V pandase sa oznacuje pojmom timestamp ale python vie pracovat aj s objektami typu datetime.

Prečo?
Python datetime objekt nepodporuje nanosekundy na rozdiel od pandas timestamp

In [4]:
from datetime import datetime as dt

now = dt.now()
print(type(now))
# print(now.nanosecods) # <-- takato funkcia neexistuje, datetime podporuje len mikrosekundy
now

<class 'datetime.datetime'>


datetime.datetime(2022, 4, 8, 12, 15, 41, 142766)

In [5]:
#konverzia datetime to timestamp
now_timestamp = pd.Timestamp(now)
type(now_timestamp)

pandas._libs.tslibs.timestamps.Timestamp

In [6]:
now_timestamp.nanosecond

0

## Timestamp
Pandasov timestamp je zalozeny na efektivnejsom datovom type numpy.datetime64. V pandase preto pracujeme s typom timestamp nie datetime !!!

V pandase timestamp oznacuje jeden bod v case s nanosekundovou presnostou.

In [7]:
timestamp = pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, second=20, microsecond=79, nanosecond=99)
print(timestamp)
print(type(timestamp))
timestamp.nanosecond

2020-06-09 08:30:20.000079099
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


99

In [8]:
print(pd.Timestamp('2019-8-1'))
print(pd.Timestamp(2020, 6, 9, 12))
print(pd.Timestamp('2020-06-09 00:00:00'))
print(pd.Timestamp('August 9, 2020 13:45'))
print(pd.Timestamp('2020-01-01T14'))
print(pd.Timestamp(300)) # <--- number of seconds after UNIX epoch (January 1, 1970)
print(pd.Timestamp(1513393355.5))

2019-08-01 00:00:00
2020-06-09 12:00:00
2020-06-09 00:00:00
2020-08-09 13:45:00
2020-01-01 14:00:00
1970-01-01 00:00:00.000000300
1970-01-01 00:00:01.513393355


In [9]:
# Nan hodnota ma svoj specialny objekt 
nan_dt = pd.Timestamp(np.nan)
print(type(nan_dt))
nan_dt

<class 'pandas._libs.tslibs.nattype.NaTType'>


NaT

In [11]:
sample_timestamps = pd.date_range("2022-04-08", freq="D", periods=3) #date_range funkcia vie vytvorit tzv Datetime index
sample_timestamps

DatetimeIndex(['2022-04-08', '2022-04-09', '2022-04-10'], dtype='datetime64[ns]', freq='D')

In [13]:
df = pd.DataFrame(sample_timestamps, columns=["time"])
df

Unnamed: 0,time
0,2022-04-08
1,2022-04-09
2,2022-04-10


## Časové pásma
Timestamp vie pracovať aj s časovými pásmami. Defaultne spravanie je neznale o casovych pasmach ale v konstruktore vieme poslat hodnotu casoveho pasma z kniznice pytz

In [14]:
import pytz
len(pytz.all_timezones)

594

In [15]:
pytz.all_timezones[:10]

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau']

In [17]:
ts = pd.Timestamp(1565469465, tz="Europe/Bratislava")
ts

Timestamp('1970-01-01 01:00:01.565469465+0100', tz='Europe/Bratislava')

In [18]:
sample_timestamps = pd.date_range("2022-04-08", freq="D", periods=5, tz="Etc/GMT+1") # pomocou funkcie daterange viem vytvorit takzvany  DatetimeIndex
sample_timestamps

DatetimeIndex(['2022-04-08 00:00:00-01:00', '2022-04-09 00:00:00-01:00',
               '2022-04-10 00:00:00-01:00', '2022-04-11 00:00:00-01:00',
               '2022-04-12 00:00:00-01:00'],
              dtype='datetime64[ns, Etc/GMT+1]', freq='D')

In [20]:

df = pd.DataFrame(sample_timestamps, columns=["Time"])
df

Unnamed: 0,Time
0,2022-04-08 00:00:00-01:00
1,2022-04-09 00:00:00-01:00
2,2022-04-10 00:00:00-01:00
3,2022-04-11 00:00:00-01:00
4,2022-04-12 00:00:00-01:00


In [21]:
df =pd.read_csv("dataset/timestamps_dataset.csv")
df.head()

Unnamed: 0,timestamp,event,xpos,ypos,url,displayHeight,displayWidth,classOnly,expressed,relative,tagOnly,xpath,userId,task_id
0,2018-11-13 09:39:52.794,mousemove,956.0,531.0,https://www.firotour.sk/#yeself3,4093.0,1903.0,,,html > body > div.body__wrapper:nth-child(10) ...,,,ZglodR,3
1,2018-11-13 09:39:52.813,mousemove,957.0,536.0,https://www.firotour.sk/#yeself3,4093.0,1903.0,,,html > body > div.body__wrapper:nth-child(10) ...,,,ZglodR,3
2,2018-11-13 09:39:52.840,mousemove,984.0,553.0,https://www.firotour.sk/#yeself3,4093.0,1903.0,,,html > body > div.body__wrapper:nth-child(10) ...,,,ZglodR,3
3,2018-11-13 09:39:54.800,mousemove,985.0,554.0,https://www.firotour.sk/#yeself3,4093.0,1903.0,,,html > body > div.body__wrapper:nth-child(10) ...,,,ZglodR,3
4,2018-11-13 09:39:54.867,mousemove,986.0,554.0,https://www.firotour.sk/#yeself3,4093.0,1903.0,,,html > body > div.body__wrapper:nth-child(10) ...,,,ZglodR,3


In [22]:
df.timestamp

0        2018-11-13 09:39:52.794
1        2018-11-13 09:39:52.813
2        2018-11-13 09:39:52.840
3        2018-11-13 09:39:54.800
4        2018-11-13 09:39:54.867
                  ...           
10416    2018-11-13 10:00:36.872
10417    2018-11-13 10:00:36.956
10418    2018-11-13 10:00:36.989
10419    2018-11-13 10:00:37.037
10420    2018-11-13 10:00:37.054
Name: timestamp, Length: 10421, dtype: object

In [24]:
df =pd.read_csv("dataset/timestamps_dataset.csv", parse_dates=["timestamp"])
df.timestamp

0       2018-11-13 09:39:52.794
1       2018-11-13 09:39:52.813
2       2018-11-13 09:39:52.840
3       2018-11-13 09:39:54.800
4       2018-11-13 09:39:54.867
                  ...          
10416   2018-11-13 10:00:36.872
10417   2018-11-13 10:00:36.956
10418   2018-11-13 10:00:36.989
10419   2018-11-13 10:00:37.037
10420   2018-11-13 10:00:37.054
Name: timestamp, Length: 10421, dtype: datetime64[ns]

In [25]:
df.timestamp.dt.day

0        13
1        13
2        13
3        13
4        13
         ..
10416    13
10417    13
10418    13
10419    13
10420    13
Name: timestamp, Length: 10421, dtype: int64

In [26]:
df.timestamp.dt.weekday

0        1
1        1
2        1
3        1
4        1
        ..
10416    1
10417    1
10418    1
10419    1
10420    1
Name: timestamp, Length: 10421, dtype: int64

## Time Delta
Čo keď chcem timestampy odpočítavať?

Timedelta ako dátový typ sa nachádza aj v pythone nie len v pandase !!!

In [27]:
x = pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, second=20, microsecond=79, nanosecond=99)
y = pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, second=20, microsecond=79, nanosecond=89)
result=x-y
result

Timedelta('0 days 00:00:00.000000010')

In [30]:
#zaporna TD:
y-x

Timedelta('-1 days +23:59:59.999999990')

In [33]:
# konstruktor timedelta v pandas
td1 = pd.Timedelta("1 days 00:42:00") # len pomocou stringu 
td1

Timedelta('1 days 00:42:00')

In [34]:
# Konstruktor v pythone 
from datetime import timedelta

td2 = timedelta(days=55, seconds=3621, microseconds=992006)
print(td2)

td1 + td2

55 days, 1:00:21.992006


Timedelta('56 days 01:42:21.992006')

In [35]:
ts = pd.Timestamp(dt.now())
ts

Timestamp('2022-04-08 12:43:27.312231')

In [36]:
ts + td2

Timestamp('2022-06-02 13:43:49.304237')

In [38]:
df.timestamp + td2 # vector + scalar = bradcast

0       2019-01-07 10:40:14.786006
1       2019-01-07 10:40:14.805006
2       2019-01-07 10:40:14.832006
3       2019-01-07 10:40:16.792006
4       2019-01-07 10:40:16.859006
                   ...            
10416   2019-01-07 11:00:58.864006
10417   2019-01-07 11:00:58.948006
10418   2019-01-07 11:00:58.981006
10419   2019-01-07 11:00:59.029006
10420   2019-01-07 11:00:59.046006
Name: timestamp, Length: 10421, dtype: datetime64[ns]