In [1]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from pandas.tseries.offsets import Hour, Minute, Second

### Basic date manipulations in Python

Python supports a datetime module which has several functionalitites:
1. Datetime objects representing a specific date in time with years, months and days
2. Addition and substraction of datetime objects
3. Format datetime objects based on different format specification

Pandas supports handling of null value timestampts with its NaT which stands for Not a Time.

In [2]:
current_date = datetime.now()
current_date

datetime.datetime(2023, 10, 1, 23, 40, 11, 884016)

In [3]:
current_date.year, current_date.month, current_date.day

(2023, 10, 1)

In [4]:
delta = datetime(2019, 1, 7) - datetime(2003, 10, 2, 8, 15)
delta

datetime.timedelta(days=5575, seconds=56700)

In [5]:
delta = delta + timedelta(10)
delta

datetime.timedelta(days=5585, seconds=56700)

In [6]:
stamp = datetime(2011,1,3)
print(str(stamp))

2011-01-03 00:00:00


In [7]:
stamp = stamp.strftime("%D")
stamp

'01/03/11'

In [8]:
value = "2011-01-03"
datetime.strptime(value, "%Y-%m-%d")

datetime.datetime(2011, 1, 3, 0, 0)

In [9]:
dates_arr = ["2015-03-18", "1996-07-03", None]
datetime_index = pd.to_datetime(dates_arr)
datetime_index

DatetimeIndex(['2015-03-18', '1996-07-03', 'NaT'], dtype='datetime64[ns]', freq=None)

### Time Series in Pandas

Times series are essentially an a Pandas series with its index set to the timestamps. 

Timestamps show the specific value of something in that time. For now we will just fill the values with random ones

In [10]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.standard_normal(6), index = dates)
ts

2011-01-02   -0.214287
2011-01-05   -0.767551
2011-01-07   -0.709757
2011-01-08   -1.373033
2011-01-10   -0.289797
2011-01-12   -1.214900
dtype: float64

In [11]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

The same things for regular Pandas series apply here, operations are broadcast and we can also select a value by using the date as an index in our timeseries.

In [12]:
ts = ts*2
ts

2011-01-02   -0.428575
2011-01-05   -1.535101
2011-01-07   -1.419514
2011-01-08   -2.746065
2011-01-10   -0.579593
2011-01-12   -2.429800
dtype: float64

In [13]:
ts["2011-01-12"]

-2.4297996592298485

Something more interesting is when we are manipulating a big timeseries. We will use ```pd.date_range``` for the creation of our timeseries. Let's demonstrate selecting months and slicing.

In [14]:
longer_ts = pd.Series(np.random.standard_normal(1000),index = pd.date_range("2000-01-01", periods = 1000))


In [15]:
longer_ts["2001-03-04" : "2002-03-04"]

2001-03-04    0.607415
2001-03-05   -0.027249
2001-03-06   -0.701389
2001-03-07    2.450786
2001-03-08   -0.685847
                ...   
2002-02-28    1.637894
2002-03-01   -1.236736
2002-03-02    0.959028
2002-03-03    1.930938
2002-03-04   -1.084801
Freq: D, Length: 366, dtype: float64

In [16]:
longer_ts[datetime(2000,3,4): datetime(2002,1,1)]

2000-03-04    0.431030
2000-03-05   -1.674624
2000-03-06   -0.269372
2000-03-07   -0.153318
2000-03-08   -0.785142
                ...   
2001-12-28    0.713485
2001-12-29    0.502124
2001-12-30   -0.458151
2001-12-31    0.760570
2002-01-01   -1.563111
Freq: D, Length: 669, dtype: float64

We can also slice with timestamps that are not present in our timeseries.

In [17]:
ts

2011-01-02   -0.428575
2011-01-05   -1.535101
2011-01-07   -1.419514
2011-01-08   -2.746065
2011-01-10   -0.579593
2011-01-12   -2.429800
dtype: float64

In [18]:
ts["2011-01-06":"2011-01-11"]

2011-01-07   -1.419514
2011-01-08   -2.746065
2011-01-10   -0.579593
dtype: float64

In [19]:
ts.truncate(after = "2011-01-04") #truncate based on the date provided, here we select everything after the provided date

2011-01-02   -0.428575
dtype: float64

In [20]:
dates = pd.date_range("2000-01-01", periods = 100, freq = "W-WED")
df_towns = pd.DataFrame(np.random.standard_normal((100,4)), index = dates, columns = ["Tokyo", "Montreal", "Madrid", "London"])
df_towns

Unnamed: 0,Tokyo,Montreal,Madrid,London
2000-01-05,0.286703,-0.678211,0.771036,-1.499259
2000-01-12,-0.048176,-0.156228,-0.622029,-2.331602
2000-01-19,-1.415016,0.335937,1.536523,0.212271
2000-01-26,0.435100,-0.877103,0.822028,-0.346504
2000-02-02,0.446094,0.745248,2.507659,-0.886549
...,...,...,...,...
2001-10-31,-1.371002,-0.293811,-1.015878,1.082889
2001-11-07,0.834216,0.301002,0.917793,0.013535
2001-11-14,0.294612,-1.344772,0.233542,-1.223276
2001-11-21,-0.530712,0.456243,-0.513399,0.241145


In [21]:
df_towns.loc["2001-10-31"]

Tokyo      -1.371002
Montreal   -0.293811
Madrid     -1.015878
London      1.082889
Name: 2001-10-31 00:00:00, dtype: float64

### Working with duplicates

In [22]:
dates = pd.DatetimeIndex(["2000-01-01", "2000-01-02", "2000-01-02","2000-01-02", "2000-01-03"])
duplicates = pd.Series(data = [1,2,3,3,4], index = dates)
duplicates

2000-01-01    1
2000-01-02    2
2000-01-02    3
2000-01-02    3
2000-01-03    4
dtype: int64

In [23]:
duplicates["2000-01-02"]

2000-01-02    2
2000-01-02    3
2000-01-02    3
dtype: int64

In [24]:
duplicates.groupby(level = 0).mean()

2000-01-01    1.000000
2000-01-02    2.666667
2000-01-03    4.000000
dtype: float64

### Working with frequencies
When working with time series, we have many options for frequencies. For example once a week, every hour or different frequencies. Pandas has a ```resample``` method for this purpose.

In [25]:
ts

2011-01-02   -0.428575
2011-01-05   -1.535101
2011-01-07   -1.419514
2011-01-08   -2.746065
2011-01-10   -0.579593
2011-01-12   -2.429800
dtype: float64

In [26]:
ts = ts.resample("D") #parameter indicates how to resample
ts

<pandas.core.resample.DatetimeIndexResampler object at 0x0000017F673964F0>

```pd.date_range``` can be used to generate large sequences of data starting or ending on a particular date. Also these periods can be a specific number of days or other.

In [27]:
dates = pd.date_range("2016-12-12", "2017-12-12")
dates

DatetimeIndex(['2016-12-12', '2016-12-13', '2016-12-14', '2016-12-15',
               '2016-12-16', '2016-12-17', '2016-12-18', '2016-12-19',
               '2016-12-20', '2016-12-21',
               ...
               '2017-12-03', '2017-12-04', '2017-12-05', '2017-12-06',
               '2017-12-07', '2017-12-08', '2017-12-09', '2017-12-10',
               '2017-12-11', '2017-12-12'],
              dtype='datetime64[ns]', length=366, freq='D')

In [28]:
date_quarterly = pd.date_range("2016-12-01", "2017-12-12", freq = "Q-JAN")
date_quarterly

DatetimeIndex(['2017-01-31', '2017-04-30', '2017-07-31', '2017-10-31'], dtype='datetime64[ns]', freq='Q-JAN')

In [29]:
date_20days = pd.date_range("2018-01-03 08:46:21", periods = 20, normalize = True)
date_20days

DatetimeIndex(['2018-01-03', '2018-01-04', '2018-01-05', '2018-01-06',
               '2018-01-07', '2018-01-08', '2018-01-09', '2018-01-10',
               '2018-01-11', '2018-01-12', '2018-01-13', '2018-01-14',
               '2018-01-15', '2018-01-16', '2018-01-17', '2018-01-18',
               '2018-01-19', '2018-01-20', '2018-01-21', '2018-01-22'],
              dtype='datetime64[ns]', freq='D')

In [30]:
hour = Hour()
minute = Minute()
second = Second()
fifteen_seconds = Second(15)
high_freq = pd.date_range("2023-01-01", "2023-01-01 23:59",freq = fifteen_seconds)
high_freq

DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 00:00:15',
               '2023-01-01 00:00:30', '2023-01-01 00:00:45',
               '2023-01-01 00:01:00', '2023-01-01 00:01:15',
               '2023-01-01 00:01:30', '2023-01-01 00:01:45',
               '2023-01-01 00:02:00', '2023-01-01 00:02:15',
               ...
               '2023-01-01 23:56:45', '2023-01-01 23:57:00',
               '2023-01-01 23:57:15', '2023-01-01 23:57:30',
               '2023-01-01 23:57:45', '2023-01-01 23:58:00',
               '2023-01-01 23:58:15', '2023-01-01 23:58:30',
               '2023-01-01 23:58:45', '2023-01-01 23:59:00'],
              dtype='datetime64[ns]', length=5757, freq='15S')

In [31]:
print("Shifting provides us with a way to move data backwards or forwards in time.")

Shifting provides us with a way to move data backwards or forwards in time.


In [32]:
shifting_example = pd.date_range("2015-01-01", "2015-01-05")
shifting_example

DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
               '2015-01-05'],
              dtype='datetime64[ns]', freq='D')

In [33]:
shifting_example.shift(5)

DatetimeIndex(['2015-01-06', '2015-01-07', '2015-01-08', '2015-01-09',
               '2015-01-10'],
              dtype='datetime64[ns]', freq='D')

In [34]:
shifting_example.shift(-5)

DatetimeIndex(['2014-12-27', '2014-12-28', '2014-12-29', '2014-12-30',
               '2014-12-31'],
              dtype='datetime64[ns]', freq='D')

In [35]:
series_shifting = pd.Series([1,2,3,4,5], index = shifting_example)
series_shifting

2015-01-01    1
2015-01-02    2
2015-01-03    3
2015-01-04    4
2015-01-05    5
Freq: D, dtype: int64

An interesting trick is to compute how much percentage wise has a timeseries valu changed, based on a shift.

In [36]:
percentage = (series_shifting / series_shifting.shift(1) - 1) * 100
percentage

2015-01-01           NaN
2015-01-02    100.000000
2015-01-03     50.000000
2015-01-04     33.333333
2015-01-05     25.000000
Freq: D, dtype: float64