11.1: Date and Time Data Types and Tools.

Three modules:
- datetime
- time
- calender

In [1]:
import pandas as pd
from datetime import datetime

In [2]:
now = datetime.now()

In [3]:
now

datetime.datetime(2022, 5, 22, 19, 48, 22, 225843)

In [4]:
now.year

2022

In [5]:
now.day

22

In [6]:
now.month

5

This is a datetime dtype!

**TimeDelta**: represents the temporal differences between two datetime object

In [7]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
# year, month, day[, hour[, minute[, second[, microsecond

In [8]:
delta

datetime.timedelta(days=926, seconds=56700)

In [9]:
delta.days

926

In [10]:
delta.seconds

56700

*TimeDelta* represents an Interval

In [11]:
from datetime import timedelta

In [12]:
start = datetime(2011,12,8)

In [13]:
start + timedelta(12)
# Representation: (days, seconds, microseconds). Why? Because I felt like it.

datetime.datetime(2011, 12, 20, 0, 0)

In [14]:
2*timedelta(12) 

datetime.timedelta(days=24)

In [15]:
start - 2 * timedelta(12)

datetime.datetime(2011, 11, 14, 0, 0)

Coverting Between Strings and Datetime

Two methods to use while converting datetime to TimeStamp:
- str
- strftime

In [16]:
stamp = datetime(2011,1,3)

In [17]:
type(str(stamp))

str

In [18]:
type(stamp.strftime('%Y-%m-%d'))

str

In [19]:
# converting string to dates

In [20]:
value = '2011-01-03'

In [21]:
datetime.strptime(value,'%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [22]:
datestrs = ['7/6/2011', '8/6/2011']

In [23]:
[datetime.strptime(x,'%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [24]:
# using dateutils parser

In [25]:
from dateutil.parser import parse

In [26]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [27]:
parse('6/12/2011',dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

In [28]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [29]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

#### 11.2: Time Series Basics

In [30]:
import numpy as np

In [31]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
datetime(2011, 1, 7), datetime(2011, 1, 8),
datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [32]:
dates

[datetime.datetime(2011, 1, 2, 0, 0),
 datetime.datetime(2011, 1, 5, 0, 0),
 datetime.datetime(2011, 1, 7, 0, 0),
 datetime.datetime(2011, 1, 8, 0, 0),
 datetime.datetime(2011, 1, 10, 0, 0),
 datetime.datetime(2011, 1, 12, 0, 0)]

In [33]:
ts = pd.Series(np.random.randn(6),index=dates)

In [34]:
ts

2011-01-02    0.117631
2011-01-05   -1.788784
2011-01-07   -1.070296
2011-01-08    1.110210
2011-01-10    1.644853
2011-01-12   -0.377348
dtype: float64

In [35]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [36]:
ts[::2]

2011-01-02    0.117631
2011-01-07   -1.070296
2011-01-10    1.644853
dtype: float64

In [37]:
ts+ts[::2]

2011-01-02    0.235262
2011-01-05         NaN
2011-01-07   -2.140593
2011-01-08         NaN
2011-01-10    3.289705
2011-01-12         NaN
dtype: float64

In [38]:
# Indexing, Selection, Subsetting

In [39]:
stamp = ts.index[2]

In [40]:
stamp

Timestamp('2011-01-07 00:00:00')

In [41]:
# you can pass a string too
ts['1/10/2011']

1.6448526839926028

In [42]:
# longer series
longer_ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000',periods=1000))

In [43]:
longer_ts

2000-01-01    0.496316
2000-01-02    0.315413
2000-01-03    0.767056
2000-01-04   -0.599891
2000-01-05    1.026140
                ...   
2002-09-22    0.719053
2002-09-23   -0.065869
2002-09-24   -0.954851
2002-09-25   -0.302666
2002-09-26    1.172769
Freq: D, Length: 1000, dtype: float64

In [44]:
# subsetting
longer_ts['2001']

2001-01-01   -0.215884
2001-01-02   -0.156254
2001-01-03   -0.688635
2001-01-04    2.495151
2001-01-05    0.996077
                ...   
2001-12-27   -2.090104
2001-12-28   -0.361548
2001-12-29    1.721702
2001-12-30    0.030523
2001-12-31    0.148049
Freq: D, Length: 365, dtype: float64

In [47]:
longer_ts['2001-05']

2001-05-01    0.337013
2001-05-02    0.719560
2001-05-03    0.658236
2001-05-04    1.144798
2001-05-05    1.109920
2001-05-06   -0.720272
2001-05-07   -1.878936
2001-05-08   -0.729151
2001-05-09   -0.278509
2001-05-10    0.549187
2001-05-11   -0.427244
2001-05-12    0.135380
2001-05-13    0.193580
2001-05-14    1.001890
2001-05-15    0.827002
2001-05-16   -0.190671
2001-05-17   -0.024091
2001-05-18   -0.298219
2001-05-19    0.757886
2001-05-20   -0.321029
2001-05-21   -0.607205
2001-05-22   -0.841080
2001-05-23   -0.175788
2001-05-24   -0.261565
2001-05-25   -1.040185
2001-05-26   -1.416381
2001-05-27    1.174368
2001-05-28   -0.165266
2001-05-29    0.114656
2001-05-30    0.464543
2001-05-31    1.563331
Freq: D, dtype: float64

In [51]:
# slicing with datetime object works as well
ts[datetime(2011, 1, 7):]

2011-01-07   -1.070296
2011-01-08    1.110210
2011-01-10    1.644853
2011-01-12   -0.377348
dtype: float64

In [52]:
ts

2011-01-02    0.117631
2011-01-05   -1.788784
2011-01-07   -1.070296
2011-01-08    1.110210
2011-01-10    1.644853
2011-01-12   -0.377348
dtype: float64

In [53]:
# you can slice with range queries
ts['1/6/2011':'1/11/2011']

2011-01-07   -1.070296
2011-01-08    1.110210
2011-01-10    1.644853
dtype: float64

In [54]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
long_df = pd.DataFrame(np.random.randn(100, 4),
index=dates,
columns=['Colorado', 'Texas',
'New York', 'Ohio'])

In [55]:
long_df

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,0.098711,1.565876,-0.461956,-0.037028
2000-01-12,-0.933961,-0.737093,0.920102,-1.115268
2000-01-19,0.644758,-0.441188,0.189316,-3.454829
2000-01-26,0.060567,2.228219,-0.849120,-1.092678
2000-02-02,0.322181,0.173213,1.342561,-0.696317
...,...,...,...,...
2001-10-31,1.543711,1.845427,-0.857620,-0.795808
2001-11-07,-1.265315,0.419243,-1.150999,0.326631
2001-11-14,0.893491,-1.983335,0.675635,0.174976
2001-11-21,1.297340,0.084668,0.418598,-2.079990


In [56]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,0.525346,1.252253,0.976747,-0.444626
2001-05-09,-0.663276,-0.336301,-0.941531,-0.368713
2001-05-16,-1.834747,0.725608,1.214688,-0.195962
2001-05-23,-0.412804,-0.735857,-0.02525,-0.60824
2001-05-30,-0.87608,-0.239259,-0.749275,-1.316176


Time Series With Duplicate Indices

In [57]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
'1/2/2000', '1/3/2000'])

In [58]:
dup_ts = pd.Series(np.arange(5), index=dates)

In [62]:
dup_ts.index.duplicated()

array([False, False,  True,  True, False])

In [66]:
dup_ts.groupby(level=0).count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

11.3: Data Ranges, Frequencies and Shifting

In [67]:
ts

2011-01-02    0.117631
2011-01-05   -1.788784
2011-01-07   -1.070296
2011-01-08    1.110210
2011-01-10    1.644853
2011-01-12   -0.377348
dtype: float64

In [69]:
resampler = ts.resample('D')

In [70]:
resampler

<pandas.core.resample.DatetimeIndexResampler object at 0x7fd210edc9a0>

In [71]:
# Generating Data Ranges

In [72]:
index = pd.date_range('2012-04-01','2012-06-01')

In [73]:
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [74]:
pd.date_range(start='2012-04-01',periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')