In [1]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta

### Basic date manipulations in Python

Python supports a datetime module which has several functionalitites:
1. Datetime objects representing a specific date in time with years, months and days
2. Addition and substraction of datetime objects
3. Format datetime objects based on different format specification

Pandas supports handling of null value timestampts with its NaT which stands for Not a Time.

In [2]:
current_date = datetime.now()
current_date

datetime.datetime(2023, 9, 30, 22, 47, 40, 674855)

In [3]:
current_date.year, current_date.month, current_date.day

(2023, 9, 30)

In [4]:
delta = datetime(2019, 1, 7) - datetime(2003, 10, 2, 8, 15)
delta

datetime.timedelta(days=5575, seconds=56700)

In [5]:
delta = delta + timedelta(10)
delta

datetime.timedelta(days=5585, seconds=56700)

In [6]:
stamp = datetime(2011,1,3)
print(str(stamp))

2011-01-03 00:00:00


In [7]:
stamp = stamp.strftime("%D")
stamp

'01/03/11'

In [8]:
value = "2011-01-03"
datetime.strptime(value, "%Y-%m-%d")

datetime.datetime(2011, 1, 3, 0, 0)

In [9]:
dates_arr = ["2015-03-18", "1996-07-03", None]
datetime_index = pd.to_datetime(dates_arr)
datetime_index

DatetimeIndex(['2015-03-18', '1996-07-03', 'NaT'], dtype='datetime64[ns]', freq=None)

### Time Series in Pandas

Times series are essentially an a Pandas series with its index set to the timestamps. 

Timestamps show the specific value of something in that time. For now we will just fill the values with random ones

In [10]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.standard_normal(6), index = dates)
ts

2011-01-02    1.315954
2011-01-05    0.568645
2011-01-07    0.857548
2011-01-08    2.074120
2011-01-10    0.086525
2011-01-12   -1.008184
dtype: float64

In [11]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

The same things for regular Pandas series apply here, operations are broadcast and we can also select a value by using the date as an index in our timeseries.

In [12]:
ts = ts*2
ts

2011-01-02    2.631909
2011-01-05    1.137291
2011-01-07    1.715096
2011-01-08    4.148240
2011-01-10    0.173050
2011-01-12   -2.016368
dtype: float64

In [13]:
ts["2011-01-12"]

-2.0163683747939287

Something more interesting is when we are manipulating a big timeseries. We will use ```pd.date_range``` for the creation of our timeseries. Let's demonstrate selecting months and slicing.

In [15]:
longer_ts = pd.Series(np.random.standard_normal(1000),index = pd.date_range("2000-01-01", periods = 1000))


In [20]:
longer_ts["2001-03-04" : "2002-03-04"]

2001-03-04   -0.319586
2001-03-05    1.436451
2001-03-06    2.149790
2001-03-07   -0.831536
2001-03-08    1.851785
                ...   
2002-02-28   -0.778848
2002-03-01   -0.101942
2002-03-02    1.336625
2002-03-03   -0.030887
2002-03-04   -0.847817
Freq: D, Length: 366, dtype: float64

In [21]:
longer_ts[datetime(2000,3,4): datetime(2002,1,1)]

2000-03-04   -1.226807
2000-03-05    1.476355
2000-03-06   -0.533667
2000-03-07    0.188040
2000-03-08   -2.060823
                ...   
2001-12-28    1.105419
2001-12-29    0.512233
2001-12-30   -0.037159
2001-12-31    0.234620
2002-01-01    1.880367
Freq: D, Length: 669, dtype: float64

We can also slice with timestamps that are not present in our timeseries.

In [22]:
ts

2011-01-02    2.631909
2011-01-05    1.137291
2011-01-07    1.715096
2011-01-08    4.148240
2011-01-10    0.173050
2011-01-12   -2.016368
dtype: float64

In [23]:
ts["2011-01-06":"2011-01-11"]

2011-01-07    1.715096
2011-01-08    4.148240
2011-01-10    0.173050
dtype: float64

In [27]:
ts.truncate(after = "2011-01-04") #truncate based on the date provided, here we select everything after the provided date

2011-01-02    2.631909
dtype: float64

In [32]:
dates = pd.date_range("2000-01-01", periods = 100, freq = "W-WED")
df_towns = pd.DataFrame(np.random.standard_normal((100,4)), index = dates, columns = ["Tokyo", "Montreal", "Madrid", "London"])
df_towns

Unnamed: 0,Tokyo,Montreal,Madrid,London
2000-01-05,-0.217649,-0.507561,-0.309086,-0.038593
2000-01-12,0.988522,0.608522,1.186053,0.503930
2000-01-19,-0.259732,-0.124327,-0.200456,1.853130
2000-01-26,0.065074,-1.271344,0.173177,-0.249478
2000-02-02,-0.548159,0.636478,0.257262,-0.760347
...,...,...,...,...
2001-10-31,1.053324,-0.690643,0.845887,-0.637795
2001-11-07,1.035659,1.056141,-0.439073,0.523738
2001-11-14,0.599747,-0.943704,-1.049486,-0.121962
2001-11-21,0.570294,-1.081300,-0.255620,0.162231


In [36]:
df_towns.loc["2001-10-31"]

Tokyo       1.053324
Montreal   -0.690643
Madrid      0.845887
London     -0.637795
Name: 2001-10-31 00:00:00, dtype: float64

### Working with duplicates

In [37]:
dates = pd.DatetimeIndex(["2000-01-01", "2000-01-02", "2000-01-02","2000-01-02", "2000-01-03"])
duplicates = pd.Series(data = [1,2,3,3,4], index = dates)
duplicates

2000-01-01    1
2000-01-02    2
2000-01-02    3
2000-01-02    3
2000-01-03    4
dtype: int64

In [38]:
duplicates["2000-01-02"]

2000-01-02    2
2000-01-02    3
2000-01-02    3
dtype: int64

In [39]:
duplicates.groupby(level = 0).mean()

2000-01-01    1.000000
2000-01-02    2.666667
2000-01-03    4.000000
dtype: float64