Series Data is an important form of structured data.
Is everything that is recorded repeatedly at many points. 
Fixed frequency -> Data points occur at regular intervals.
It could be irregular respect to time.
### Ways to mark and refer to time series Data
- Timestamps -> Specific instant of time
- Fixed periods -> The whole month of January, the whole year 2020
- Intervals of time -> Indicated by a start and end timestamp. Periods are special cases of intervals
- Experiment or elapsed time -> Each timestamp is a measure of time relative to a particular start time, starting from 0

In [1]:
import numpy as np
import pandas as pd
from datetime import datetime, date

### Date and Time Data Types and Tools
datetime, time and calendar modules


In [2]:
now = datetime.now()
now

datetime.datetime(2024, 3, 6, 11, 49, 47, 96857)

In [3]:
now.year ,now.month, now.day

(2024, 3, 6)

In [4]:
delta = datetime(2011,1,7) - datetime(2008,6,24,8,15)
delta

datetime.timedelta(days=926, seconds=56700)

In [5]:
delta.days,delta.seconds

(926, 56700)

In [6]:
from datetime import timedelta
start = datetime(2023,4,21)
start + timedelta(2)

datetime.datetime(2023, 4, 23, 0, 0)

In [7]:
start - 2 * timedelta(12)

datetime.datetime(2023, 3, 28, 0, 0)

In [8]:
date(2022,12,2)

datetime.date(2022, 12, 2)

### Converting between String and Datetime
You can format datetime objects and pandas Timestamp objects as string using str or strftime method. 

Converting date to string

In [9]:
stamp = datetime(2022,12,2,10,00,10)
str(stamp)

'2022-12-02 10:00:10'

In [10]:
stamp.strftime("%d---%m---%y///%w")

'02---12---22///5'

Converting string to date

In [11]:
datestr = '2011-01-06'
datetime.strptime(datestr,"%Y-%m-%d")

datetime.datetime(2011, 1, 6, 0, 0)

Pandas usually uses arrays of dates.
To parse many different kind of data representations -> pandas.to_datetime

In [12]:
datestrs = ['2011-04-04 12:00:00','2011-08-04 16:10:00']
pd.to_datetime(datestrs) 

DatetimeIndex(['2011-04-04 12:00:00', '2011-08-04 16:10:00'], dtype='datetime64[ns]', freq=None)

In [13]:
#Evaluating with null values (None, empty,etc)
idx= pd.to_datetime(['2011-04-06 12:00:00','2012-12-23 17:35:20',None]) # NaT -> Not a Time
idx

DatetimeIndex(['2011-04-06 12:00:00', '2012-12-23 17:35:20', 'NaT'], dtype='datetime64[ns]', freq=None)

In [14]:
pd.isna(idx)

array([False, False,  True])

### Time Series Basics
Basic kind of time series object in pandas is Series indexed by timestamps.

In [15]:
cant_valores = 2000
dates= []
for i in range(cant_valores):
    anio = np.random.randint(2000,2024)
    mes = np.random.randint(1,12)
    dia = np.random.randint(1,29)
    dates.append(datetime(anio,mes,dia))

In [16]:
ts = pd.Series(np.random.standard_normal(cant_valores),index = dates)
ts

2020-10-24   -0.266615
2014-02-13   -0.492832
2002-11-21   -0.875787
2015-08-01    0.263407
2021-03-09   -0.693588
                ...   
2011-08-05   -0.367949
2022-01-08   -1.388875
2018-08-16   -1.846368
2018-01-13   -1.726898
2022-01-18   -0.672891
Length: 2000, dtype: float64

In [17]:
# Under the hood, the dateimte have been put in a Datetime Index
ts.index

DatetimeIndex(['2020-10-24', '2014-02-13', '2002-11-21', '2015-08-01',
               '2021-03-09', '2019-01-08', '2000-03-19', '2021-06-10',
               '2011-06-23', '2021-06-03',
               ...
               '2000-02-09', '2012-04-27', '2007-08-19', '2014-09-10',
               '2000-11-13', '2011-08-05', '2022-01-08', '2018-08-16',
               '2018-01-13', '2022-01-18'],
              dtype='datetime64[ns]', length=2000, freq=None)

In [18]:
ts + ts[::2]

2000-01-15         NaN
2000-01-16         NaN
2000-01-18         NaN
2000-01-28   -1.327169
2000-02-02         NaN
                ...   
2023-10-24         NaN
2023-10-25   -0.052337
2023-11-14         NaN
2023-11-15         NaN
2023-11-23         NaN
Length: 2138, dtype: float64

In [19]:
# Pandas stores timestamps using np datetime64 data type at nanosecond resolution
ts.index.dtype

dtype('<M8[ns]')

In [20]:
#Scalar values a DatetimeIndx are pandas Timestamp objects
stamp = ts.index[19]
stamp
#datetime to TimeStamp is possible, the reverse can´t be done because TimeStamp has nanosecond resolution

Timestamp('2014-01-07 00:00:00')

### Indexing, selection, Subsetting
It behaves like any other Series

In [21]:
stamps = ts.index[2]
ts[stamps]

-0.8757865501545752

In [22]:
# passing the datetime as string
ts["2023-10-21"]

KeyError: '2023-10-21'

In [None]:
ts['2012']

2012-10-25   -0.539872
2012-05-20   -0.406968
2012-11-18    0.700766
2012-09-22    0.096256
2012-08-18   -1.808650
                ...   
2012-09-22    0.502252
2012-03-17   -0.171005
2012-11-24   -0.576308
2012-11-19    0.976699
2012-11-13    0.689605
Length: 93, dtype: float64

In [None]:
ts['2009-06']

2009-06-06   -0.513876
2009-06-28   -0.779704
2009-06-05   -0.102042
2009-06-28    0.930869
2009-06-09   -0.883572
2009-06-22    1.409019
2009-06-14   -0.512740
2009-06-28   -0.857682
2009-06-02    2.895874
2009-06-08   -0.339620
2009-06-18   -0.298440
2009-06-10   -0.227081
2009-06-14    2.044585
dtype: float64

In [None]:
ts[f"{datetime(2009,10,12).year}"]

2009-07-27    0.519202
2009-07-24    0.718345
2009-08-02    0.051677
2009-04-20    0.369590
2009-11-20    0.244438
                ...   
2009-06-10   -0.227081
2009-01-04   -0.367019
2009-04-27    0.804367
2009-09-18    0.800959
2009-06-14    2.044585
Length: 92, dtype: float64

In [None]:
cant_valores = 2000
dates_order= []
time_inicial = datetime(2010,1,1)
for i in range(cant_valores):
    
    time_inicial = time_inicial + timedelta(2)
    
    dates_order.append(time_inicial)
    



In [None]:
dates_order
df_dates_order = pd.Series(np.random.standard_normal(cant_valores),index =dates_order)

In [None]:
df_dates_order

2010-01-03   -1.707199
2010-01-05   -0.229068
2010-01-07   -1.262733
2010-01-09    0.250963
2010-01-11    0.244032
                ...   
2020-12-06    1.657929
2020-12-08    0.532510
2020-12-10   -0.197188
2020-12-12   -0.294816
2020-12-14   -1.113642
Length: 2000, dtype: float64

In [None]:
df_dates_order.truncate(after='2013')

2010-01-03   -1.707199
2010-01-05   -0.229068
2010-01-07   -1.262733
2010-01-09    0.250963
2010-01-11    0.244032
                ...   
2012-12-24   -0.383410
2012-12-26   -1.124439
2012-12-28   -0.057711
2012-12-30   -0.552704
2013-01-01   -1.207644
Length: 548, dtype: float64

In [None]:
df_dates_order[datetime(2011,4,5):]

2011-04-06    0.880651
2011-04-08   -0.762355
2011-04-10   -0.956297
2011-04-12   -1.037614
2011-04-14   -0.553805
                ...   
2020-12-06    1.657929
2020-12-08    0.532510
2020-12-10   -0.197188
2020-12-12   -0.294816
2020-12-14   -1.113642
Length: 1771, dtype: float64

In [None]:
df_dates_order['2012-01-01':'2012-08-01']

2012-01-01    0.930477
2012-01-03   -0.255456
2012-01-05   -0.236914
2012-01-07    0.129057
2012-01-09    0.615081
                ...   
2012-07-23    0.871294
2012-07-25   -1.033324
2012-07-27    0.124317
2012-07-29   -0.064275
2012-07-31    0.009502
Length: 107, dtype: float64

### Time Series with Duplicate Indices
There may be multiple data observations falling on a particular timestamp

In [None]:
dates = pd.DatetimeIndex(["2000-01-01","2000-01-02","2000-01-02","2000-01-02","2000-01-03"])


In [None]:
dup_ts = pd.Series(np.arange(5),index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [None]:
# Checking if the index is unique
dup_ts.index.is_unique

False

In [None]:
dup_ts['2000-01-02'] #Duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

In [None]:
dup_ts['2000-01-03'] #Not duplicated

4

In [None]:
#If we want to have unique values, we can do it by groupby and level=0
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01    0.0
2000-01-02    2.0
2000-01-03    4.0
dtype: float64

In [None]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

Date Ranges, Frquencies, and Shifting
Generic time series in pandas are assumed to be irregular, they have no fixed frequency. It's often desirable to work with a fixed frequency. (daily, monthly, 15 minutes,etc).
Pandas has a full suite of standard time series frequencies and tools for resampling, inferring frequencies, and generating fixed-frequency date ranges.



In [None]:
resampler = df_dates_order.resample("D")
resampler

<pandas.core.resample.DatetimeIndexResampler object at 0x000001B46D9EA890>

### Generating Data Ranges
pandas.data_range is responsible for generating a DatetimeIndex with an indicated length according to a particular frequency

In [None]:
index = pd.date_range("2020-04-01","2022-04-01") #It generates daily stamps by default
index

DatetimeIndex(['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
               '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
               '2020-04-09', '2020-04-10',
               ...
               '2022-03-23', '2022-03-24', '2022-03-25', '2022-03-26',
               '2022-03-27', '2022-03-28', '2022-03-29', '2022-03-30',
               '2022-03-31', '2022-04-01'],
              dtype='datetime64[ns]', length=731, freq='D')

In [None]:
pd.date_range(start="2019-01-01", periods=60)

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10', '2019-01-11', '2019-01-12',
               '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
               '2019-01-17', '2019-01-18', '2019-01-19', '2019-01-20',
               '2019-01-21', '2019-01-22', '2019-01-23', '2019-01-24',
               '2019-01-25', '2019-01-26', '2019-01-27', '2019-01-28',
               '2019-01-29', '2019-01-30', '2019-01-31', '2019-02-01',
               '2019-02-02', '2019-02-03', '2019-02-04', '2019-02-05',
               '2019-02-06', '2019-02-07', '2019-02-08', '2019-02-09',
               '2019-02-10', '2019-02-11', '2019-02-12', '2019-02-13',
               '2019-02-14', '2019-02-15', '2019-02-16', '2019-02-17',
               '2019-02-18', '2019-02-19', '2019-02-20', '2019-02-21',
               '2019-02-22', '2019-02-23', '2019-02-24', '2019-02-25',
      

In [None]:
pd.date_range(end="2018-12-31",periods=20)

DatetimeIndex(['2018-12-12', '2018-12-13', '2018-12-14', '2018-12-15',
               '2018-12-16', '2018-12-17', '2018-12-18', '2018-12-19',
               '2018-12-20', '2018-12-21', '2018-12-22', '2018-12-23',
               '2018-12-24', '2018-12-25', '2018-12-26', '2018-12-27',
               '2018-12-28', '2018-12-29', '2018-12-30', '2018-12-31'],
              dtype='datetime64[ns]', freq='D')

In [None]:
#BM -> Business end of month
pd.date_range("2019-01-01","2020-05-01",freq="BM")

  pd.date_range("2019-01-01","2020-05-01",freq="BM")


DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-29', '2019-04-30',
               '2019-05-31', '2019-06-28', '2019-07-31', '2019-08-30',
               '2019-09-30', '2019-10-31', '2019-11-29', '2019-12-31',
               '2020-01-31', '2020-02-28', '2020-03-31', '2020-04-30'],
              dtype='datetime64[ns]', freq='BME')

In [None]:
pd.date_range("2012-05-02 15:00:04",periods=10)

DatetimeIndex(['2012-05-02 15:00:04', '2012-05-03 15:00:04',
               '2012-05-04 15:00:04', '2012-05-05 15:00:04',
               '2012-05-06 15:00:04', '2012-05-07 15:00:04',
               '2012-05-08 15:00:04', '2012-05-09 15:00:04',
               '2012-05-10 15:00:04', '2012-05-11 15:00:04'],
              dtype='datetime64[ns]', freq='D')

In [None]:
#normalize option -> sets timestamps to midnight as a convention
pd.date_range("2022-03-01 12:12:03","2022-07-01 12:14:31",normalize=True)

DatetimeIndex(['2022-03-01', '2022-03-02', '2022-03-03', '2022-03-04',
               '2022-03-05', '2022-03-06', '2022-03-07', '2022-03-08',
               '2022-03-09', '2022-03-10',
               ...
               '2022-06-22', '2022-06-23', '2022-06-24', '2022-06-25',
               '2022-06-26', '2022-06-27', '2022-06-28', '2022-06-29',
               '2022-06-30', '2022-07-01'],
              dtype='datetime64[ns]', length=123, freq='D')

### Frequencies and Date Offsets
Frequencies are composed of a base frequency and a multiplier. 
Base frequency referred to by a string alias -> "M" (month),"H" (hour).
For each base frequency, there is an object referred to as a date offset


In [None]:
from pandas.tseries.offsets import Hour,Minute,Second
hour = Hour()
hour


<Hour>

In [None]:
four_hours = Hour(4)
four_hours

<4 * Hours>

In [None]:
two_seconds = Second(2)
two_seconds

<2 * Seconds>

In [None]:
pd.date_range("2015-01-01","2015-01-03",freq="2H")

  pd.date_range("2015-01-01","2015-01-03",freq="2H")


DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 02:00:00',
               '2015-01-01 04:00:00', '2015-01-01 06:00:00',
               '2015-01-01 08:00:00', '2015-01-01 10:00:00',
               '2015-01-01 12:00:00', '2015-01-01 14:00:00',
               '2015-01-01 16:00:00', '2015-01-01 18:00:00',
               '2015-01-01 20:00:00', '2015-01-01 22:00:00',
               '2015-01-02 00:00:00', '2015-01-02 02:00:00',
               '2015-01-02 04:00:00', '2015-01-02 06:00:00',
               '2015-01-02 08:00:00', '2015-01-02 10:00:00',
               '2015-01-02 12:00:00', '2015-01-02 14:00:00',
               '2015-01-02 16:00:00', '2015-01-02 18:00:00',
               '2015-01-02 20:00:00', '2015-01-02 22:00:00',
               '2015-01-03 00:00:00'],
              dtype='datetime64[ns]', freq='2h')

In [None]:
Hour(2)+ Second(30) + Minute(10)

<7830 * Seconds>

In [None]:
pd.date_range("2015-01-01",periods=10,freq="1h50min")

DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:50:00',
               '2015-01-01 03:40:00', '2015-01-01 05:30:00',
               '2015-01-01 07:20:00', '2015-01-01 09:10:00',
               '2015-01-01 11:00:00', '2015-01-01 12:50:00',
               '2015-01-01 14:40:00', '2015-01-01 16:30:00'],
              dtype='datetime64[ns]', freq='110min')

### Week of month dates
Enables to get dates like the third Friday of each month. Has to start with "WOK"

In [None]:
monthly_dates = pd.date_range("2020-01-01","2020-10-04",freq="WOM-2FRI")
monthly_dates

DatetimeIndex(['2020-01-10', '2020-02-14', '2020-03-13', '2020-04-10',
               '2020-05-08', '2020-06-12', '2020-07-10', '2020-08-14',
               '2020-09-11'],
              dtype='datetime64[ns]', freq='WOM-2FRI')

### Shifting (Leading and Lagging) Data
Shifting -> Moving data backward and forward through time.
Series and Dataframe have a shift method for doing naive shifts forward or backward. leaving the index unmodified

In [None]:
ts = pd.Series(np.random.standard_normal(4),index = pd.date_range("2000-01-01",periods=4,freq="MS"))   

In [None]:
ts

2000-01-01    1.470773
2000-02-01   -0.881609
2000-03-01    0.426917
2000-04-01    0.353415
Freq: MS, dtype: float64

In [None]:
ts.shift(2)

2000-01-01         NaN
2000-02-01         NaN
2000-03-01    1.470773
2000-04-01   -0.881609
Freq: MS, dtype: float64

In [None]:
ts.shift(-2)

2000-01-01    0.426917
2000-02-01    0.353415
2000-03-01         NaN
2000-04-01         NaN
Freq: MS, dtype: float64

In [None]:
ts/ts.shift(1)-1

2000-01-01         NaN
2000-02-01   -1.599419
2000-03-01   -1.484248
2000-04-01   -0.172171
Freq: MS, dtype: float64

In [None]:
ts.shift(2*80,freq="D")

2000-06-09    1.470773
2000-07-10   -0.881609
2000-08-08    0.426917
2000-09-08    0.353415
dtype: float64

In [None]:
ts.shift(4,freq="20S")

  ts.shift(4,freq="20S")


2000-01-01 00:01:20    1.470773
2000-02-01 00:01:20   -0.881609
2000-03-01 00:01:20    0.426917
2000-04-01 00:01:20    0.353415
dtype: float64

### Shifting dates with offsets
pandas date offset can also be use with datetime and timestamp objects

In [None]:
now = datetime(2014,9,8)

In [None]:
from pandas.tseries.offsets import Day, MonthEnd,YearEnd
now + 3* Day()

Timestamp('2014-09-11 00:00:00')

In [None]:
now + MonthEnd()

Timestamp('2014-09-30 00:00:00')

In [None]:
now + MonthEnd(3)

Timestamp('2014-11-30 00:00:00')

In [None]:
offset = MonthEnd()

In [None]:
offset.rollforward(now)

Timestamp('2014-09-30 00:00:00')

In [None]:
offset_year = YearEnd()
offset_year.rollforward(now)

Timestamp('2014-12-31 00:00:00')

In [None]:
offset_year.rollback(now)

Timestamp('2013-12-31 00:00:00')

#### Using offsets with groupby

In [None]:
ts = pd.Series(np.random.standard_normal(20),index=pd.date_range("2000-01-15",periods=20,freq="5D"))

In [None]:
ts

2000-01-15    1.397860
2000-01-20   -0.623344
2000-01-25   -0.624504
2000-01-30   -1.034503
2000-02-04   -1.566878
2000-02-09   -0.752331
2000-02-14    0.178677
2000-02-19   -0.270577
2000-02-24   -0.320109
2000-02-29    1.704615
2000-03-05   -0.532875
2000-03-10    0.862089
2000-03-15   -0.684777
2000-03-20   -0.282238
2000-03-25   -0.376023
2000-03-30   -0.441349
2000-04-04    0.038781
2000-04-09    1.494000
2000-04-14   -0.425358
2000-04-19    1.030108
Freq: 5D, dtype: float64

In [None]:
ts.groupby(MonthEnd().rollforward).mean()

2000-01-31   -0.221123
2000-02-29   -0.171100
2000-03-31   -0.242529
2000-04-30    0.534383
dtype: float64

### Time zone handling
UTC -> Coordinated universal time. Is the geography-independent international standard.
Time zones are expressed as offsets from UTC.
in python we usea third party pytz library

In [None]:
import pytz

In [None]:
pytz.common_timezones[50:70]

['Africa/Tunis',
 'Africa/Windhoek',
 'America/Adak',
 'America/Anchorage',
 'America/Anguilla',
 'America/Antigua',
 'America/Araguaina',
 'America/Argentina/Buenos_Aires',
 'America/Argentina/Catamarca',
 'America/Argentina/Cordoba',
 'America/Argentina/Jujuy',
 'America/Argentina/La_Rioja',
 'America/Argentina/Mendoza',
 'America/Argentina/Rio_Gallegos',
 'America/Argentina/Salta',
 'America/Argentina/San_Juan',
 'America/Argentina/San_Luis',
 'America/Argentina/Tucuman',
 'America/Argentina/Ushuaia',
 'America/Aruba']

In [None]:
tz = pytz.timezone('America/Argentina/Tucuman')
tz

<DstTzInfo 'America/Argentina/Tucuman' LMT-1 day, 19:39:00 STD>

In [None]:
tz = pytz.timezone('America/Argentina/San_Luis')
tz

<DstTzInfo 'America/Argentina/San_Luis' LMT-1 day, 19:35:00 STD>

### Time Zone Localization and Conversion
time series in pandas are time zone naive

In [None]:
dates = pd.date_range("2021-03-04",periods = 5)

In [None]:
ts = pd.Series(np.random.standard_normal(len(dates)),index=dates)
ts

2021-03-04    0.924290
2021-03-05   -1.490790
2021-03-06    0.844316
2021-03-07    1.106616
2021-03-08   -1.543674
Freq: D, dtype: float64

In [None]:
print(ts.index.tz) #Here the data range is generate without time zone set  

None


In [None]:
#Creating data range with time zone set
tz_utc = pd.date_range("2020-01-09",periods=10,tz="UTC")
tz_utc

DatetimeIndex(['2020-01-09 00:00:00+00:00', '2020-01-10 00:00:00+00:00',
               '2020-01-11 00:00:00+00:00', '2020-01-12 00:00:00+00:00',
               '2020-01-13 00:00:00+00:00', '2020-01-14 00:00:00+00:00',
               '2020-01-15 00:00:00+00:00', '2020-01-16 00:00:00+00:00',
               '2020-01-17 00:00:00+00:00', '2020-01-18 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [None]:
ts

2021-03-04    0.924290
2021-03-05   -1.490790
2021-03-06    0.844316
2021-03-07    1.106616
2021-03-08   -1.543674
Freq: D, dtype: float64

Convertions in utc:
- tz_localize -> Converts a data range into datetime with UTC 00 time
- tz_convert -> Converts the actual UTC in the given UTC parameter

In [None]:
ts
tz_utc_ = ts.tz_localize("UTC")
tz_utc_

2021-03-04 00:00:00+00:00    0.924290
2021-03-05 00:00:00+00:00   -1.490790
2021-03-06 00:00:00+00:00    0.844316
2021-03-07 00:00:00+00:00    1.106616
2021-03-08 00:00:00+00:00   -1.543674
Freq: D, dtype: float64

In [None]:
tz_utc_.tz_convert("America/Buenos_Aires")

2021-03-03 21:00:00-03:00    0.924290
2021-03-04 21:00:00-03:00   -1.490790
2021-03-05 21:00:00-03:00    0.844316
2021-03-06 21:00:00-03:00    1.106616
2021-03-07 21:00:00-03:00   -1.543674
Freq: D, dtype: float64

In [None]:
no_utc  = pd.date_range("2022-02-01",periods=4)

In [None]:
buenos_aires_utc = ts.tz_localize("America/Buenos_Aires")
buenos_aires_utc

2021-03-04 00:00:00-03:00    0.924290
2021-03-05 00:00:00-03:00   -1.490790
2021-03-06 00:00:00-03:00    0.844316
2021-03-07 00:00:00-03:00    1.106616
2021-03-08 00:00:00-03:00   -1.543674
dtype: float64

In [None]:
with_utc.tz_convert("America/Argentina/San_Luis")

NameError: name 'with_utc' is not defined

### Operations with Time Zone-Aware TimeStamp Objects
Timestamp object can also be localized from naive to time zone-aware converted from one time zone to another

In [29]:
stamp = pd.Timestamp("2011-03-03 05:00")
utc_stamp = stamp.tz_localize("UTC").tz_convert("America/Argentina/San_Luis")
utc_stamp


Timestamp('2011-03-03 02:00:00-0300', tz='America/Argentina/San_Luis')

### Periods and period atrithmetic
Periods represent time spans -> Days, months, quarters, years.
pandas.Period class represents this data type

In [30]:
p = pd.Period("2022","Y-DEC")
p

Period('2022', 'Y-DEC')

In [31]:
p + 2

Period('2024', 'Y-DEC')

In [32]:
pd.Period("2024",freq="Y-DEC")-p

<2 * YearEnds: month=12>

In [33]:
periods = pd.period_range("2000-01-01","2000-06-01",freq="M")
periods

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')

In [34]:
data = pd.Series(np.random.standard_normal(len(periods)),index=periods)
data

2000-01    1.741146
2000-02   -1.141041
2000-03    0.172894
2000-04    0.514657
2000-05    0.679588
2000-06    0.109751
Freq: M, dtype: float64

In [35]:
# Using PeriodIndex class -> All of the values are periods
values = ["2001Q3","2002Q2","2003Q3"]
index = pd.PeriodIndex(values,freq="Q-DEC")
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q3'], dtype='period[Q-DEC]')

### Period Frequency Conversion
Periods and PeriodIndex can be converted to another frequency with their asfreq method. 
We can convert an annual period into a monthly period at the start or the end of the year.
This way to call a period is a sort of cursor pointing to a span of time.

In [36]:
p = pd.Period("2022",freq="Y-DEC")
p

Period('2022', 'Y-DEC')

In [37]:
p.asfreq("h",how="start")

Period('2022-01-01 00:00', 'h')

In [38]:
p.asfreq("D",how="end")

Period('2022-12-31', 'D')

### Quarterly period frequencies
Is standard in accounting, finance, etc. The last calendar or business day is one of the 12 months of the year.

In [39]:
p = pd.Period("2012Q4",freq="Q-JAN")
p

Period('2012Q4', 'Q-JAN')

In [40]:
# Año fiscal que arranca en Enero. Por lo tanto, el último cuarto comienza en noviembre y termina en enero
p.asfreq("D",how='start')

Period('2011-11-01', 'D')

In [41]:
p.asfreq("D",how="end")

Period('2012-01-31', 'D')

### Converting Timestamps to Periods (and Back)
Series and DataFrame object indexed by timestamps can be converted to periods with -> to_period method

In [42]:
dates = pd.date_range("2014-01-01",periods=3,freq="ME")
dates

DatetimeIndex(['2014-01-31', '2014-02-28', '2014-03-31'], dtype='datetime64[ns]', freq='ME')

In [43]:
ts = pd.Series(np.random.standard_normal(len(dates)),index=dates)
ts

2014-01-31    1.377287
2014-02-28   -0.257653
2014-03-31    0.520616
Freq: ME, dtype: float64

In [44]:
pts = ts.to_period("Y")
pts

2014    1.377287
2014   -0.257653
2014    0.520616
Freq: Y-DEC, dtype: float64

In [45]:
#There could be duplicated values in the period
dates= pd.date_range("2019-05-26",periods=10)
df_period = pd.Series(np.random.standard_normal(len(dates)),index=dates)

In [46]:
pts = df_period.to_period('M')
pts

2019-05   -0.266102
2019-05    1.133521
2019-05   -1.871074
2019-05    0.503907
2019-05   -0.466850
2019-05   -0.114891
2019-06   -0.066931
2019-06    0.887533
2019-06   -0.427700
2019-06   -0.338890
Freq: M, dtype: float64

In [47]:
#Converting back to time stamp
pts.to_timestamp(how="start")

2019-05-01   -0.266102
2019-05-01    1.133521
2019-05-01   -1.871074
2019-05-01    0.503907
2019-05-01   -0.466850
2019-05-01   -0.114891
2019-06-01   -0.066931
2019-06-01    0.887533
2019-06-01   -0.427700
2019-06-01   -0.338890
dtype: float64

### Creating a PeriodIndex from Arrays
Sometimes the fixed frequency datasets are stores with time span info spread across multiple columns.
The year and the quarter could be in different columns

In [54]:
years = pd.date_range("2022",periods=10,freq="YE").to_period("Y")
quarters = np.random.randint(1,4,size=len(years))
df_year_quarter = pd.DataFrame({"year":years,"quarter":quarters,"realgdp":np.random.standard_normal(len(years)),"realgovt":np.random.standard_normal(len(years))})
df_year_quarter

Unnamed: 0,year,quarter,realgdp,realgovt
0,2022,2,1.211558,-0.184452
1,2023,2,-1.075024,1.230993
2,2024,1,-1.804687,-0.618291
3,2025,1,-1.170101,-0.507327
4,2026,2,0.252773,3.359104
5,2027,2,-0.325155,-0.02146
6,2028,2,-0.766725,-1.057505
7,2029,1,0.23799,0.938347
8,2030,3,-0.737763,0.471409
9,2031,1,0.729092,-0.610598


In [55]:
years

PeriodIndex(['2022', '2023', '2024', '2025', '2026', '2027', '2028', '2029',
             '2030', '2031'],
            dtype='period[Y-DEC]')

In [57]:
index = pd.PeriodIndex.from_fields(year=df_year_quarter['year'],quarter=df_year_quarter['quarter'],freq="Q-DEC")

TypeError: an integer is required

### Resampling and frequency conversion
Resampling refers to the process of converting a time series from one frequency to another.
Downsampling -> Aggregating higher frequency data to lower frequency
Upsampling -> Converting lower frequency to higher frequency
Pandas -> resample method


In [61]:
dates = pd.date_range("2000-09-01",periods=2000)
ts = pd.Series(np.random.standard_normal(2000),index=dates)
ts

2000-09-01    0.277180
2000-09-02    0.761805
2000-09-03    0.013506
2000-09-04    0.869341
2000-09-05    0.461480
                ...   
2006-02-17    0.543173
2006-02-18    0.962202
2006-02-19   -1.376517
2006-02-20   -0.004335
2006-02-21    1.267418
Freq: D, Length: 2000, dtype: float64

In [68]:
ts.resample("Y").sum()

  ts.resample("Y").sum()


2000-12-31   -30.628569
2001-12-31    11.489635
2002-12-31   -43.924417
2003-12-31    38.224807
2004-12-31    24.875504
2005-12-31   -22.203509
2006-12-31    -0.975762
Freq: YE-DEC, dtype: float64

In [71]:
ts.resample("M",kind="period").mean()

  ts.resample("M",kind="period").mean()
  ts.resample("M",kind="period").mean()


2000-09   -0.252050
2000-10   -0.165332
2000-11   -0.513141
2000-12   -0.082179
2001-01   -0.147889
             ...   
2005-10   -0.218972
2005-11    0.089218
2005-12    0.030950
2006-01   -0.021771
2006-02   -0.014327
Freq: M, Length: 66, dtype: float64

### Downsampling
Aggregating data to a regular, lower frequency. The desired frequency defines bin edges that are use to slice the time series into pieces to aggregate

In [72]:
dates = pd.date_range("2000-01-01",periods = 12,freq="min")

In [75]:
ts = pd.Series(np.random.standard_normal(12),index=dates)
ts

2000-01-01 00:00:00    0.175696
2000-01-01 00:01:00    1.138459
2000-01-01 00:02:00    0.554458
2000-01-01 00:03:00   -0.022394
2000-01-01 00:04:00   -1.109108
2000-01-01 00:05:00    0.184905
2000-01-01 00:06:00    0.937348
2000-01-01 00:07:00    1.261099
2000-01-01 00:08:00    0.827628
2000-01-01 00:09:00   -0.481473
2000-01-01 00:10:00   -0.815505
2000-01-01 00:11:00    0.169219
Freq: min, dtype: float64

In [78]:
# We could aggregate this data into five-minute chunks or bars by taking the sum of each group
ts.resample("5min").count() # The frequency being passed defines how is going to be grouped.

2000-01-01 00:00:00    5
2000-01-01 00:05:00    5
2000-01-01 00:10:00    2
Freq: 5min, dtype: int64

In [80]:
ts.resample("5min",closed="right").count() # The right is being include 

1999-12-31 23:55:00    1
2000-01-01 00:00:00    5
2000-01-01 00:05:00    5
2000-01-01 00:10:00    1
Freq: 5min, dtype: int64

In [82]:
ts.resample("5min",closed="right",label="right").count() # you can label with right bin edge

2000-01-01 00:00:00    1
2000-01-01 00:05:00    5
2000-01-01 00:10:00    5
2000-01-01 00:15:00    1
Freq: 5min, dtype: int64

### Upsampling and interpolation
Converting from lower frequency to a higher one. No aggregation is needed.

In [85]:
df = pd.DataFrame(np.random.standard_normal((2,4)),index = pd.date_range("2002-04-05",periods=2,freq="W-WED"),columns=["Colorado","Texas","New York","Ohio"])
df

Unnamed: 0,Colorado,Texas,New York,Ohio
2002-04-10,-1.221048,-0.622091,1.291566,-0.259372
2002-04-17,-0.118697,-1.597851,1.581528,1.631301


In [87]:
df_daily = df.resample("D").asfreq()
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2002-04-10,-1.221048,-0.622091,1.291566,-0.259372
2002-04-11,,,,
2002-04-12,,,,
2002-04-13,,,,
2002-04-14,,,,
2002-04-15,,,,
2002-04-16,,,,
2002-04-17,-0.118697,-1.597851,1.581528,1.631301


We want to fill forward each weekly value on the non-Wednesday.
fillna and reindex

In [91]:
df.resample("D").ffill() #Fills with the last seen value

Unnamed: 0,Colorado,Texas,New York,Ohio
2002-04-10,-1.221048,-0.622091,1.291566,-0.259372
2002-04-11,-1.221048,-0.622091,1.291566,-0.259372
2002-04-12,-1.221048,-0.622091,1.291566,-0.259372
2002-04-13,-1.221048,-0.622091,1.291566,-0.259372
2002-04-14,-1.221048,-0.622091,1.291566,-0.259372
2002-04-15,-1.221048,-0.622091,1.291566,-0.259372
2002-04-16,-1.221048,-0.622091,1.291566,-0.259372
2002-04-17,-0.118697,-1.597851,1.581528,1.631301


### Resampling with periods
Resampling data indexed by periods is similar to timestamps


In [104]:
len_filas= 36
df = pd.DataFrame(np.random.standard_normal((len_filas,4)),index=pd.period_range("1-2000","12-2002",freq="M"),columns=["Colorado","Texas","New York","Ohio"])
df

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01,-0.564307,-0.495542,0.233251,0.657542
2000-02,1.457021,-0.775012,-0.395933,-0.547389
2000-03,-0.234324,0.013667,1.035171,-1.261998
2000-04,0.020024,-1.419585,-0.511165,-1.025225
2000-05,1.017842,-0.058982,3.057284,0.309466
2000-06,-0.009116,-1.310084,-1.32495,-1.311602
2000-07,-0.303826,-1.032383,-0.84931,-1.126515
2000-08,-1.331738,2.370479,-1.95439,0.618653
2000-09,-0.388065,-1.276457,-0.878539,-1.840942
2000-10,0.898683,-0.589787,0.221265,-0.305899


In [105]:
annual_frame = df.resample("Y-DEC").mean()
annual_frame

  annual_frame = df.resample("Y-DEC").mean()


Unnamed: 0,Colorado,Texas,New York,Ohio
2000,0.28521,-0.357941,-0.286669,-0.306269
2001,0.494397,0.394715,-0.101219,-0.516678
2002,-0.619428,-0.066704,0.33457,0.041215


### Grouped Time Resampling
Time series data -> Resample method is semantically a group operation based on a time intervalization



In [106]:
N = 15
times = pd.date_range("2017-01-01 12:00",freq="1min",periods=N)
df=pd.DataFrame({"time":times,"value":np.arange(N)})

In [107]:
df

Unnamed: 0,time,value
0,2017-01-01 12:00:00,0
1,2017-01-01 12:01:00,1
2,2017-01-01 12:02:00,2
3,2017-01-01 12:03:00,3
4,2017-01-01 12:04:00,4
5,2017-01-01 12:05:00,5
6,2017-01-01 12:06:00,6
7,2017-01-01 12:07:00,7
8,2017-01-01 12:08:00,8
9,2017-01-01 12:09:00,9


In [110]:
# Index by time and then resample
df.set_index("time").resample("5min").sum()

Unnamed: 0_level_0,value
time,Unnamed: 1_level_1
2017-01-01 12:00:00,10
2017-01-01 12:05:00,35
2017-01-01 12:10:00,60


### Moving window functions
Important class of array transformations that is used for time series operations.
Can be usefull for smoothing noisy or gappy data.
Automatically exclude missing data.