## DON'T FORGET

There is a reson why there is frequency offset M and MS & Q and QS. Because, while resampling or using asfreq functions, if we need to put the label as the starting of these periods|groups and if we use only M|Q, then last day of that bin|period is used as label. Sometimes, like in %_range() methods, LABEL='...' argument is not present. Hence we will use the MS|QS in this case. This is just used to label the period|bin.

### SM is not same as MS. Same goes for SQ and QS, and other similar offsets

In [None]:
import pandas as pd
df = pd.read_csv('./nasdaq_goog.csv', parse_dates=['Date'], index_col=['Date'])
df.head(20)

In [None]:
df['2020-07']['Close'].mean()

In [None]:
df['2020-07' : '2020-08'].head(10)

In [None]:
df['Close'].resample('M').mean()    # in downsamp: closed='left'(def)|'right', label='left'(def)|'right' ;;; in upsamp: convention='start'|'end' (for data point, rest will be NaN)
# resampling makes the bins of the data where number of bins depends on the frequency of resampling. Aggregate functions on resampled data performs on each bin
# without any aggregate function, this will give DatetimeIndexResampler Object, which is non iterable, but kind of similar to groupby object.

### Date Range

In [None]:
df1 = df
df1.reset_index(drop=True, inplace=True)
df1

In [None]:
date_rng = pd.date_range(start='2020-01-01', end='2020-12-31', freq='B')    # 'B' only eliminates Weekends, NOT the occasional holidays
print(date_rng)


In [None]:
pd.date_range(start='2020-01-12', end='2020-12-31', freq='Q') 

## OFFSET aliases
https://pandas.pydata.org/pandas-docs/version/0.22/timeseries.html#offset-aliases

In [None]:
print(pd.date_range(start='2020-01-12', end='2020-12-31', freq='QS'))  # observe that if label value is not present(2020-01-01) then that range wont be included
print(pd.date_range(start='2020-01-12', end='2020-12-31', freq='SMS'))  # there is no SQ present. Because, its absurd to find the mis of any quarter

In [None]:
df.set_index(date_rng[:len(df)], inplace=True)
df

In [None]:
%matplotlib inline
df['Close'].plot()  # kind='bar'

In [None]:
df.asfreq('M', method='ffill')  # if method not provided, fills with NaN. Also fill_value=<some_def_value>
# ffill|pad & bfill|backfill    (fills the immediate value from above/below cell from the original df, not from newly generated df with new freq)
# as freq is like a filtering mechanism, where the row of based on specific freq is filtered out. Aggregate method on it will be applied on all those filtered row and give single answer
# during upsampling, data is attachecd to the first of the initial index value. Here, HOW='START'|'END' only works with the PeriodIndex. Not with any other datatype.

In [None]:
pd.date_range(start='1/1/2020', periods=20, freq='M')   # if we dont know the end date, but we know the number of dates to generate

In [None]:
# GENERATING RANDOM DATA
rng = pd.date_range(start='1947-12-18', periods=50, freq='B')
import numpy as np
dum_df = pd.DataFrame(np.random.randint(1, 10, len(rng)), index=rng)
dum_df.head(10)

In [None]:
pd.date_range(start='1947-11-18', periods=50, freq='Q')

In [None]:
pd.date_range(start='1947-10-18', periods=50, freq='Q-FEB') # remember that there could also be QS-FEB
# WE DONT HAVE LABEL OPTION AS AN ARGUMENT HERE.

## HOLIDAY CALENDER for handling the extra holidays dates in date_range of 'B' frequency

In [None]:
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay

usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())
usb

In [None]:
pd.date_range(start='7/1/2017', end='7/21/2017', freq=usb)  # it has not included the holidays

## to_datetime

In [None]:
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05', '20170105']
pd.to_datetime(dates)

In [None]:
dates = ['2017-01-05 2:30:00', '2017-01-05 2:30:00 PM', 'Jan 5, 2017 14:30:00', '01/05/2017', '2017.01.05', '2017/01/05', '20170105']
pd.to_datetime(dates)

In [None]:
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05', '20170105']
pd.to_datetime(dates, dayfirst=True)

In [None]:
pd.to_datetime('5#1#2017', format='%d#%m#%Y')

In [None]:
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05', '20170105', 'abc']
pd.to_datetime(dates, errors='coerce')

In [None]:
# making datetime via epoch
t = 1501356749
pd.to_datetime(t, unit='s') # by default, the unit is set to ns. Hence we need to override it because the epoch is no. of seconds since 01-01-1970

In [None]:
t = 1501356749
pd.to_datetime([t], unit='s')

In [None]:
t = 1501356749
dt = pd.to_datetime([t], unit='s')
dt.view('int64')

## period; period index

Since it is period(range of time), hence there is no concept of QS or MS. Instead there is just Q and M and several others

In [None]:
y = pd.Period('2016')
y   # in o/p, A-DEC means that period is Annual, ending with December

In [None]:
dir(y)

In [None]:
y.start_time

In [None]:
y.end_time

In [None]:
m = pd.Period('2011-01', freq='M')
m

In [None]:
m.start_time

In [None]:
m.end_time

In [None]:
# periods can perform arithmetic operations. It also can perform the arithmetics between two periods of same frequency.
m+13

In [None]:
d = pd.Period('2016-01-28', freq='D')   # by default, it will put freq='D'
d

In [None]:
d+1 # it was aware of the leap year

In [None]:
d0 = pd.Period('2017-01-09 11:50:34.345')       # fro FRAC SEC, [0-3]rd digits will be considered for 3 digits with freq 'L', for [4-6]th digits freq='U', for [7-9] freq='N'
d0+1010

### Working with Quarter with pd.Period FREQ attr is set to 'Q-DEC' by default. Qn should be appeneded with the YYYY as YYYYQn. If custom freq is given, then end of fourth quarter is shifted to that year mentioned as a first argument, and then with Qn, denotes overall that, quarter number Qn whose Q4 will be in the year YYYY, mentioned in YYYYQn. However, it is labelled by YYYYQn.

In [None]:
q = pd.Period('2017Q1') # Q should be appended by number 1-4 (both inclusive), otherwise error will be thrown
q

In [None]:
q = pd.Period('2017Q2')
q+1

In [None]:
print(q.start_time, q.end_time)

In [None]:
q2 = pd.Period('2017Q1', freq='Q-JAN')  # shifting quarter ending time
q2  # remeber, although its from 2016-02-01 to 2016-04-30, but it will be still labelled as (2017Q1, Q-JAN)

In [None]:
print(q2.start_time, q2.end_time)

In [None]:
# Period or PeriodIndex asfreq() is little different from datetime asfreq()
q2.asfreq('M', how='start') # changing frequency to monthly. HOW='START'|'END' will decide that, while upsampling, what label shall be present, ie if start_time or end_time

In [None]:
q2.asfreq('M', how='end')

## Period Index

In [None]:
pd.period_range('2011', '2017', freq='Q')   # by default, both the YYYY years will be taken as YYYYQ1.

In [None]:
pd.period_range(start=pd.Period('2011Q1'), end=pd.Period('2017Q3'), freq='Q')   # custom quarter range

In [None]:
pd.period_range(start=pd.Period('2011Q1', freq='Q-FEB'), end=pd.Period('2017Q1', freq='Q-FEB'), freq='Q')
# Observe that o/p procues range from 2010Q2 to 2016Q2. It is because, its range is 01Mar2010 to 31May2016. These two dates are then rendered by outer frequency 'Q'
# Inner frequncy work is over before executing outer frequency. Hence, outer freq will just look these dates and form quarters as Q-DEC

In [None]:
pd.period_range(start=pd.Period('2011Q1', freq='Q-FEB'), end=pd.Period('2017Q1', freq='Q-MAY'), freq='Q-NOV')

In [None]:
pd.period_range('2011Q1', '2017Q4', freq='Q-JAN')
# pd.period_range(start=pd.Period('2011Q1', freq='Q'), end=pd.Period('2017Q1', freq='Q'), freq='Q-JAN')

In [None]:
pd.period_range('2011', '2017', freq='Q-JAN')[4].end_time

In [None]:
lux = pd.period_range('2011', periods=10, freq='Q-AUG') # for Q-AUG, in 2010, label-2010Q1 is sept-nov2010, label-2010Q2 is Dec2010-Jan2011-Mar2011. Since date starts from 20110101, hence it is 2010Q2.
lux

In [None]:
lux[0].start_time

In [None]:
import numpy as np
ps = pd.Series(np.random.rand(len(lux)), index=lux)
ps

In [None]:
ps['2010']  # this proves that indexing here takes reference of the underlying actual datetime. NOT the label

In [None]:
ps['2011']  # since 2011Q2 ranges from 2010-2011, hence it comes in both 2010 and 2011 indexing

In [None]:
ps['2011':'2013']

In [None]:
pst = ps.to_timestamp() # Since we are converting a range to a point of time, therefore there must be HOW='START'(def)|'END'. Also there can be FREQ='..-.....'
pst

In [None]:
pst.index

In [None]:
pst.to_period() # freq='......' can also be given

In [None]:
df69 = pd.read_csv('./wmt.csv')
df69

In [None]:
df69.set_index('Line Item', inplace=True)

In [None]:
df69 = df69.T   # transpose

In [None]:
df69

In [None]:
df69.index  # it is of type object. We need to convert it to period type

In [None]:
df69.index = pd.PeriodIndex(df69.index, freq='Q-JAN')
df69.index

In [None]:
df69['start date'] = df69.index.map( lambda x : x.start_time)
df69

In [None]:
df69['end date'] = df69.index.map( lambda x : x.end_time.date)
df69

# TZ
 Two types: Naive datetime (not aware of TZ); TZ aware datetime

In [None]:
df = pd.read_csv('./nasdaq_goog.csv', index_col='Date', parse_dates=True)
df

In [None]:
# converting naive datetime to some timezone aware datetime
df = df.tz_localize(tz='US/Eastern')
df.index

In [None]:
from pytz import all_timezones
all_timezones

In [None]:
# df = df.tz_localize(tz='Asia/Calcutta')   # once it is localized, ie converted to UTC, again it wont be converted.
# we need to use tz_convert
df = df.tz_convert('Asia/Calcutta') # can be None also in the brcaket
# df.index = df.index.tz_convert('Asia/Calcutta')       # similar effect
df

In [None]:
#  normally date_range creates naive datetimeIndex
rng = pd.date_range(start='1/1/2017', periods=10, freq='H')
rng

In [None]:
rng = pd.date_range(start='1/1/2017', periods=10, freq='H', tz='Asia/Calcutta')
rng

## arithmetic between two different TZs

In [None]:
rng = pd.date_range(start='2017-08-22 09:00:00', periods=10, freq='30T')
s = pd.Series(range(10), index=rng)
s

In [None]:
b = s.tz_localize(tz='Europe/Berlin')
b.index

In [None]:
m = s.tz_localize(tz='Asia/Calcutta')
m

In [None]:
b+m # it found corresponding UTC TZs and performed addition only on equal UTC time. Rest all left as NaN

### Shifting and Lagging

In [None]:
df = pd.read_csv('./nasdaq_goog.csv', parse_dates=['Date'], index_col='Date')
df.index = df.index.date
print(df.index)
df.index = pd.to_datetime(df.index)
df = df[['Open']]
df

In [None]:
df.shift(1) # shifts data one cell down. Last cell data will be vanished, and first cell will get occupied by NaN. Shifting can be called on both DF and TimeSeries

In [None]:
df.shift(2)

In [None]:
df.shift(-2)    # shifts 2 cells up

In [None]:
# application: to check change in price
df['Prev Day Price'] = df['Open'].shift(1)
df

In [None]:
df['Price Chnage in 1 day'] = df['Prev Day Price'] - df['Open']
df

In [None]:
# now instead of shifting of data points, we will shift the dates
df = df[['Open']]
df.index

In [None]:
df.index = pd.to_datetime(df.index)
df

In [None]:
df.index        # its frequency is None. But it is 'B'  type frequency. So, shifting wont have idea that by what value(freq), dates need to be shifted

In [None]:
df.index = pd.date_range(start=df.index.min(), periods=len(df.index), freq='B')
df.index    # index has now frequency

In [None]:
df.tshift(1)    # neagtive value is also supported

# COreys

In [None]:
import datetime

In [None]:
d = datetime.date(2016,7,24)    # digits are passed, without prefixed zero in the day or month part
print(d)
d

In [None]:
tday = datetime.date.today()
tday

In [None]:
print(tday.year, tday.day, tday.month)

In [None]:
print( tday.isoweekday(), tday.weekday())   # iso::Monday:1 & Sunday:7      #normal::Monday:0 & Sunday:6

In [None]:
tdelta = datetime.timedelta(days=7) # timedelta gives a duration of time in certain units
tdelta

In [None]:
# date = date <operator> timedelta
# timedelta = date <operator> date

In [None]:
print(tday + tdelta)
print(tday - tdelta)

In [None]:
bday = datetime.date(2020, 7, 15)
till_bday = bday - tday
print(till_bday)
print(till_bday.days)
print(till_bday.total_seconds())

In [None]:
t = datetime.time(9, 30, 45, 100000)
print(t)
print(t.hour)

In [None]:
t = datetime.datetime(2016,7,26,12,30,45,100000)
print(t)
print(t.date())
print(t.time())
print(t.year)
print(t.hour)

In [None]:
tdelta = datetime.timedelta(days=7)
print(t + tdelta)

In [None]:
tdelta = datetime.timedelta(hours=12)
print(t + tdelta)

In [None]:
dt_today = datetime.datetime.today()   # returns current local datetime with no TZ info or manipulation # Naive datetime
dt_now1 = datetime.datetime.now()   # if no TZ is provided, current TZ will be taken by default. Works like today() # Naive datetime
# dt_now2 = datetime.datetime.now()   # if TZ is provided, gives current UTC time with TZ provided info # TZ aware datetime. # TZ aware datetime
dt_utcnow1 = datetime.datetime.utcnow() # gives current UTC time without any TZ info. # Naive datetime
# dt_utcnow2 = datetime.datetime.utcnow() # gives current UTC time with TZ provided info # TZ aware datetime
print(dt_today)
print(dt_now1)
print(dt_utcnow1)

In [None]:
import pytz
# python recommends using pytz and perform utc TZ aware time operations

dt = datetime.datetime(2016, 7, 27, 12, 30, 45, tzinfo=pytz.UTC)
print(dt)

# useful for getting current time
dt_now = datetime.datetime.now(tz=pytz.UTC)
print(dt_now)

# not useful. Seems muddled
dt_utcnow = datetime.datetime.utcnow().replace(tzinfo=pytz.UTC)
print(dt_utcnow)


In [None]:
dt_now = datetime.datetime.now(tz=pytz.UTC)
print(dt_now)

dt_ind = dt_now.astimezone(pytz.timezone('Asia/Calcutta'))   # astimezone will work only with the TZ aware datetime. NOT with Naive datetime. Hence we first converted Naive to TZ aware
print(dt_ind)

In [None]:
for tz in pytz.all_timezones:
    print(tz)

In [None]:
dt_mtn = datetime.datetime.now()    # naive datetime
mtn_tz = pytz.timezone('US/Mountain')   # fetched TZ

dt_mtn = mtn_tz.localize(dt_mtn)    # converted naive datetime to TZ aware datetime
print(dt_mtn)   # printz TZ aware datetime

# if we would have not converted to TZ aware, below would have given error
dt_east = dt_mtn.astimezone(pytz.timezone('US/Eastern'))
print(dt_east)

# below we see how to convert the one time of one TZ to another TZ
dt_ind = dt_east.astimezone(pytz.timezone('Asia/Calcutta'))
print(dt_ind)

In [None]:
dt_mtn = datetime.datetime.now(tz=pytz.timezone('US/Mountain'))
print(dt_mtn.isoformat())
print(dt_mtn.strftime('%B %d, %Y'))


dt_str = 'August 13, 2021'
dt = datetime.datetime.strptime(dt_str, '%B %d, %Y')
print(dt)