# Introduction
<b>pandas</b> is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.
<br>
<br>
<br>
<b>Interesting Read</b> : [mlcourse.ai : EDA with Pandas](https://mlcourse.ai/articles/topic1-exploratory-data-analysis-with-pandas/)

<img src='https://habrastorage.org/webt/ia/m9/zk/iam9zkyzqebnf_okxipihkgjwnw.jpeg' width='300' align='left'>
<br>
<br>
<br>
<br>
<br>
<b>Much of this Notebook has been adopted from `pandas` docs</b>

# Imports

In [None]:
import pandas as pd
from datetime import datetime
from pytz import all_timezones

In [None]:
pd.__version__

In [None]:
?pd

# Load and Explore the Dataset

In [None]:
root_path = '../'
raw_datapath = root_path+'Raw Data/'
prepared_datapath = root_path+'Prepared Data/'

britannia_datapath = raw_datapath+'BRITANNIA.NS.csv'
mpc61_datapath = raw_datapath+'MPC61.txt'

# Pandas DataStructures

`pandas` creates and stores data in rectangular format. 

On a broad stroke there are majorly two forms of a datatype in `pandas` :-
- [Series](https://pandas.pydata.org/docs/user_guide/dsintro.html#series) : "Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.)."
- [Dataframe](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) : "DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects."

## Series

<b>Series - Generating One</b>

In [None]:
data_size=20
sdata=np.random.normal(20, scale=5.0, size=(data_size,))
sindex=np.arange(20)
pd_series = pd.Series(sdata, 
                      index=sindex)
pd_series

<b>Series - Indicing</b>

In [None]:
pd_series.iloc[-1]

**Q)** Get all the values in the series after indices 15, how?

In [None]:
pd_series.loc[15]

In [None]:
#----------YOUR SOLUTION-----------#
#....


<b>Series - Query</b>

In [None]:
pd_series[pd_series>20]

<b>Series - Stats</b>

In [None]:
pd_series.describe()

<b>Series - To a Dictionary</b>

In [None]:
pd_series.to_dict()

<b>Series - Numpy Like Vectorized Operations</b>

In [None]:
pd_series*pd_series

<b>Series - Giving it a name</b>

In [None]:
pd_series.name = 'Random Normal Series'
pd_series

<b>Series - To a DataFrame</b>

In [None]:
pd_series.to_frame()

## DataFrame

In [None]:
# What will happen to the name that we gave to the series above?
dfdata_dict = {'one':pd_series,
               'two':pd_series*2}
datadf = pd.DataFrame(dfdata_dict)
datadf.head()

<b>DataFrame - Creating One</b>

DataFrame can be creating by feeding any of the following:

- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A Series
- Another DataFrame

In [None]:
countrytemp_data = {'Country':['Brazil', 'India', 'India', 'Germany', 'China', 'Zambia'],
                    'City':['Brasília', 'New Delhi', 'Kashmir', 'Berlin', 'Beijing', 'Lusaka'],
                    'AverageTemperature':[30.1, 34.3, 22.4, 19.9, 26.2, 30.3],
                    'Humidity':[0.65, 0.67, 0.49, 0.44, 0.45, 0.76]}

In [None]:
temperatureDf = pd.DataFrame(countrytemp_data,
                             index=['a', 'b', 'c', 'd', 'e', 'f'])
temperatureDf

<b>DataFrame - Indicing</b>

In [None]:
temperatureDf.index

In [None]:
temperatureDf[temperatureDf.index>'b']

In [None]:
temperatureDf.iloc[2:4,:]

In [None]:
temperatureDf.iloc[:,1:3]

In [None]:
temperatureDf['b':'d']

**Q)** I want the indices from 'b' to 'd', and first two columns 'Country' & 'City'

In [None]:
temperatureDf.loc['b':'d'].iloc[:,:2]

In [None]:
#----------YOUR SOLUTION-----------#
#....



<b>DataFrame - Stats</b>

In [None]:
temperatureDf.describe()

In [None]:
temperatureDf.AverageTemperature.mean()

In [None]:
temperatureDf.Humidity.min(), temperatureDf.Humidity.max()

<b>DataFrame - Plot</b>

In [None]:
temperatureDf

In [None]:
temperatureDf.plot()

In [None]:
temperatureDf.set_index('City').plot(subplots=True)

<b>DataFrame - Groupby Operations</b>

In [None]:
temperatureDf

In [None]:
temperatureDf.groupby(['Country']).size()

In [None]:
# The result of groupby operation is usually in Multiindex Series Format
temperatureDf.groupby(['Country']).AverageTemperature.mean()

In [None]:
for egidx, eg in temperatureDf.groupby(['Country']):
    print(egidx)
    print(eg)
    print()

In [None]:
# Why did it only show one Value?
temperatureDf.groupby(['Country']).AverageTemperature.nth(0)
# temperatureDf.groupby(['Country']).AverageTemperature.nth(1)
# temperatureDf.groupby(['Country']).AverageTemperature.nth(-1)

## Reading a file to Pandas DataFrame

<b>Reading a .csv file</b>

In [None]:
britannia_data = pd.read_csv(britannia_datapath, index_col=0, parse_dates=True)
britannia_data.head()

In [None]:
britannia_data.dtypes

In [None]:
britannia_data.index

<b>Reading a .txt file</b>

In [None]:
pd.read_fwf(mpc61_datapath)

In [None]:
mpc61_datapath

In [None]:
pd.read_csv(mpc61_datapath)

In [None]:
mpc61_data = pd.read_fwf(mpc61_datapath,
                         skiprows=range(50),
                         names=['RUN', 'WAFER', 'PROBE',
                                'MONTH', 'DAY', 'OP',
                                'TEMP', 'AVERAGE', 'STDDEV'])
mpc61_data.head()

# Pandas Datetime Operations

"Pandas builds upon `dateutil`, `datetime` & `numpy.datetime64` the tools just discussed to provide a Timestamp object, which combines the ease-of-use of datetime and dateutil with the efficient storage and vectorized interface of numpy.datetime64. From a group of these Timestamp objects, Pandas can construct a DatetimeIndex that can be used to index data in a Series or DataFrame" - [Jake-Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html#Dates-and-times-in-pandas:-best-of-both-worlds)

<b>Generating a DatetimeIndex</b>

Look at the `dtype` and `freq`

In [None]:
current_date = pd.to_datetime(datetime.now().strftime('%Y-%m-%d'))
roll_dates = current_date+pd.to_timedelta(range(10), 'D')
current_date, type(current_date), roll_dates

<b>How can we set the frequency? And is it possible to infer the frequency intrinsically?</b>

In [None]:
pd.infer_freq(roll_dates)

In [None]:
roll_dates.freq='D'
roll_dates

In [None]:
roll_dates.freq = 'M'

<b>Lets make a Time Series with Dates as index</b>

In [None]:
sindex = ['2020-01-01', '2020-01-15', '2020-01-31',
          '2020-02-01', '2020-02-15', '2020-02-28',
          '2020-03-01', '2020-03-15', '2020-03-31',
          '2021-01-01', '2021-01-15', '2021-01-31',
          '2021-02-01', '2021-02-15', '2021-02-29',
          '2021-03-01', '2021-03-15', '2021-03-31']
dt_series = pd.Series(np.arange(18), index=sindex)
dt_series, dt_series.index

<b>Lets try to slice the series, and retrieve data only of year `2020`</b>

In [None]:
dt_series['2020']

<b>But why, cant i index this series using just the year, why didnt pandas understand that?</b>

In [None]:
dt_series.index

In [None]:
dt_series.index = pd.to_datetime(dt_series.index)

So basically, pandas is trying to parse the date `2021-02-29`, but while doing so, has encountered an error saying the 29th is not in the calendar

In [None]:
sindex = ['2020-01-01', '2020-01-15', '2020-01-31',
          '2020-02-01', '2020-02-15', '2020-02-28',
          '2020-03-01', '2020-03-15', '2020-03-31',
          '2021-01-01', '2021-01-15', '2021-01-31',
          '2021-02-01', '2021-02-15', '2021-02-28', # Change the date
          '2021-03-01', '2021-03-15', '2021-03-31']
dt_series.index = sindex
dt_series

<b>Once the parsing error cause has been fixed lets try to make a `DatetimeIndex` for our series</b>

In [None]:
dt_series.index = pd.to_datetime(dt_series.index)
dt_series.index

<b>Now can we filter out only `2020`</b>

In [None]:
dt_series['2020']

****
<b>[Pandas has four main concepts pertaining to Time](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#overview)</b>

> <b>Where methods of Pandas shine - Human Readable Times..</b>

1.) Timestamp - Similar to datetime, its an instnace in Time eg pd.Timestamp(2020,12,7) == pd.Timestamp("7th December 2020")

2.) Timedeltas - A absolute duration of time, eg pd.Timedelta('1 day') = 24 hours

3.) Timespans - A span of time, i.e Starting from 2020-12-07, every other month.

4.) DateOffsets - To support caneldar arithmetic, similar to `dateutil.relativedelta.relativedelta`


<table class="table">
<colgroup>
<col style="width: 15%">
<col style="width: 12%">
<col style="width: 13%">
<col style="width: 31%">
<col style="width: 28%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Concept</p></th>
<th class="head"><p>Scalar Class</p></th>
<th class="head"><p>Array Class</p></th>
<th class="head"><p>pandas Data Type</p></th>
<th class="head"><p>Primary Creation Method</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>Date times</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Timestamp</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">DatetimeIndex</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">datetime64[ns]</span></code> or <code class="docutils literal notranslate"><span class="pre">datetime64[ns,</span> <span class="pre">tz]</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">to_datetime</span></code> or <code class="docutils literal notranslate"><span class="pre">date_range</span></code></p></td>
</tr>
<tr class="row-odd"><td><p>Time deltas</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Timedelta</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">TimedeltaIndex</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">timedelta64[ns]</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">to_timedelta</span></code> or <code class="docutils literal notranslate"><span class="pre">timedelta_range</span></code></p></td>
</tr>
<tr class="row-even"><td><p>Time spans</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Period</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">PeriodIndex</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">period[freq]</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">Period</span></code> or <code class="docutils literal notranslate"><span class="pre">period_range</span></code></p></td>
</tr>
<tr class="row-odd"><td><p>Date offsets</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">DateOffset</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">None</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">None</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">DateOffset</span></code></p></td>
</tr>
</tbody>
</table>

****
<b>Playing around with Timestamps</b>

Its inferred format will always be in **YYYY-MM-DD HH:MM:SS...**

<b>Attributes available to Pandas Timestamp object :</b>

<table class="longtable table autosummary">
<colgroup>
<col style="width: 10%">
<col style="width: 90%">
</colgroup>
<tbody>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.asm8.html#pandas.Timestamp.asm8" title="pandas.Timestamp.asm8"><code class="xref py py-obj docutils literal notranslate"><span class="pre">asm8</span></code></a></p></td>
<td><p>Return numpy datetime64 format in nanoseconds.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.dayofweek.html#pandas.Timestamp.dayofweek" title="pandas.Timestamp.dayofweek"><code class="xref py py-obj docutils literal notranslate"><span class="pre">dayofweek</span></code></a></p></td>
<td><p>Return day of the week.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.dayofyear.html#pandas.Timestamp.dayofyear" title="pandas.Timestamp.dayofyear"><code class="xref py py-obj docutils literal notranslate"><span class="pre">dayofyear</span></code></a></p></td>
<td><p>Return the day of the year.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.days_in_month.html#pandas.Timestamp.days_in_month" title="pandas.Timestamp.days_in_month"><code class="xref py py-obj docutils literal notranslate"><span class="pre">days_in_month</span></code></a></p></td>
<td><p>Return the number of days in the month.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.daysinmonth.html#pandas.Timestamp.daysinmonth" title="pandas.Timestamp.daysinmonth"><code class="xref py py-obj docutils literal notranslate"><span class="pre">daysinmonth</span></code></a></p></td>
<td><p>Return the number of days in the month.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.freqstr.html#pandas.Timestamp.freqstr" title="pandas.Timestamp.freqstr"><code class="xref py py-obj docutils literal notranslate"><span class="pre">freqstr</span></code></a></p></td>
<td><p>Return the total number of days in the month.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.is_leap_year.html#pandas.Timestamp.is_leap_year" title="pandas.Timestamp.is_leap_year"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_leap_year</span></code></a></p></td>
<td><p>Return True if year is a leap year.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.is_month_end.html#pandas.Timestamp.is_month_end" title="pandas.Timestamp.is_month_end"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_month_end</span></code></a></p></td>
<td><p>Return True if date is last day of month.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.is_month_start.html#pandas.Timestamp.is_month_start" title="pandas.Timestamp.is_month_start"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_month_start</span></code></a></p></td>
<td><p>Return True if date is first day of month.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.is_quarter_end.html#pandas.Timestamp.is_quarter_end" title="pandas.Timestamp.is_quarter_end"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_quarter_end</span></code></a></p></td>
<td><p>Return True if date is last day of the quarter.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.is_quarter_start.html#pandas.Timestamp.is_quarter_start" title="pandas.Timestamp.is_quarter_start"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_quarter_start</span></code></a></p></td>
<td><p>Return True if date is first day of the quarter.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.is_year_end.html#pandas.Timestamp.is_year_end" title="pandas.Timestamp.is_year_end"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_year_end</span></code></a></p></td>
<td><p>Return True if date is last day of the year.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.is_year_start.html#pandas.Timestamp.is_year_start" title="pandas.Timestamp.is_year_start"><code class="xref py py-obj docutils literal notranslate"><span class="pre">is_year_start</span></code></a></p></td>
<td><p>Return True if date is first day of the year.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.quarter.html#pandas.Timestamp.quarter" title="pandas.Timestamp.quarter"><code class="xref py py-obj docutils literal notranslate"><span class="pre">quarter</span></code></a></p></td>
<td><p>Return the quarter of the year.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.tz.html#pandas.Timestamp.tz" title="pandas.Timestamp.tz"><code class="xref py py-obj docutils literal notranslate"><span class="pre">tz</span></code></a></p></td>
<td><p>Alias for tzinfo.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="pandas.Timestamp.week.html#pandas.Timestamp.week" title="pandas.Timestamp.week"><code class="xref py py-obj docutils literal notranslate"><span class="pre">week</span></code></a></p></td>
<td><p>Return the week number of the year.</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="pandas.Timestamp.weekofyear.html#pandas.Timestamp.weekofyear" title="pandas.Timestamp.weekofyear"><code class="xref py py-obj docutils literal notranslate"><span class="pre">weekofyear</span></code></a></p></td>
<td><p>Return the week number of the year.</p></td>
</tr>
</tbody>
</table>

In [None]:
pd.Timestamp(datetime.now())

In [None]:
pd.Timestamp("7th December 2020")

In [None]:
pd.Timestamp("December 7 2020")

In [None]:
pd.Timestamp("7th of December 2020 9:20:23.123123")

In [None]:
pd.Timestamp("7th Dec 2020 7 PM")

In [None]:
pd.Timestamp("7th Dec 2020 7 PM")==pd.Timestamp("7th Dec 2020 19:00")

In [None]:
pd.Timestamp("7/12 2020")==pd.Timestamp("2020-7/12")

In [None]:
pd.Timestamp("7/12/2020")==pd.Timestamp("12/7/2020")

In [None]:
pd.Timestamp(year=2020, month=12, day=7)

In [None]:
pdtstamp = pd.Timestamp("7th Dec 2020 7 PM")
pdtstamp

In [None]:
print('**Timestamp Attributes**')
print('------------------------')
print('Day Name   : ', pdtstamp.day_name())
print('Quarter    : ', pdtstamp.quarter)
print('Date       : ', pdtstamp.day)
print('Month      : ', pdtstamp.month)
print('Monthdays  : ', pdtstamp.daysinmonth)
print('Year       : ', pdtstamp.year)
print('Day Name   : ', pdtstamp.weekofyear)

****
<b>Playing around with `date_range`, `bdate_range`, Time Period</b>

What is the difference between TimePeriod and Timestamp?
> Well in most of the cases, atleast Time Series problems, its necessary to have a representation of the span instead of just a point in time

****
<b>Available `freq` or `offset_aliases` :</b>
<table class="colwidths-given table">
<colgroup>
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 68%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Date Offset</p></th>
<th class="head"><p>Frequency String</p></th>
<th class="head"><p>Description</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.DateOffset.html#pandas.tseries.offsets.DateOffset" title="pandas.tseries.offsets.DateOffset"><code class="xref py py-class docutils literal notranslate"><span class="pre">DateOffset</span></code></a></p></td>
<td><p>None</p></td>
<td><p>Generic offset class, defaults to 1 calendar day</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BDay.html#pandas.tseries.offsets.BDay" title="pandas.tseries.offsets.BDay"><code class="xref py py-class docutils literal notranslate"><span class="pre">BDay</span></code></a> or <a class="reference internal" href="../reference/api/pandas.tseries.offsets.BusinessDay.html#pandas.tseries.offsets.BusinessDay" title="pandas.tseries.offsets.BusinessDay"><code class="xref py py-class docutils literal notranslate"><span class="pre">BusinessDay</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'B'</span></code></p></td>
<td><p>business day (weekday)</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.CDay.html#pandas.tseries.offsets.CDay" title="pandas.tseries.offsets.CDay"><code class="xref py py-class docutils literal notranslate"><span class="pre">CDay</span></code></a> or <a class="reference internal" href="../reference/api/pandas.tseries.offsets.CustomBusinessDay.html#pandas.tseries.offsets.CustomBusinessDay" title="pandas.tseries.offsets.CustomBusinessDay"><code class="xref py py-class docutils literal notranslate"><span class="pre">CustomBusinessDay</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'C'</span></code></p></td>
<td><p>custom business day</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Week.html#pandas.tseries.offsets.Week" title="pandas.tseries.offsets.Week"><code class="xref py py-class docutils literal notranslate"><span class="pre">Week</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'W'</span></code></p></td>
<td><p>one week, optionally anchored on a day of the week</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.WeekOfMonth.html#pandas.tseries.offsets.WeekOfMonth" title="pandas.tseries.offsets.WeekOfMonth"><code class="xref py py-class docutils literal notranslate"><span class="pre">WeekOfMonth</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'WOM'</span></code></p></td>
<td><p>the x-th day of the y-th week of each month</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.LastWeekOfMonth.html#pandas.tseries.offsets.LastWeekOfMonth" title="pandas.tseries.offsets.LastWeekOfMonth"><code class="xref py py-class docutils literal notranslate"><span class="pre">LastWeekOfMonth</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'LWOM'</span></code></p></td>
<td><p>the x-th day of the last week of each month</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.MonthEnd.html#pandas.tseries.offsets.MonthEnd" title="pandas.tseries.offsets.MonthEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">MonthEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'M'</span></code></p></td>
<td><p>calendar month end</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.MonthBegin.html#pandas.tseries.offsets.MonthBegin" title="pandas.tseries.offsets.MonthBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">MonthBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'MS'</span></code></p></td>
<td><p>calendar month begin</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BMonthEnd.html#pandas.tseries.offsets.BMonthEnd" title="pandas.tseries.offsets.BMonthEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">BMonthEnd</span></code></a> or <a class="reference internal" href="../reference/api/pandas.tseries.offsets.BusinessMonthEnd.html#pandas.tseries.offsets.BusinessMonthEnd" title="pandas.tseries.offsets.BusinessMonthEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">BusinessMonthEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BM'</span></code></p></td>
<td><p>business month end</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BMonthBegin.html#pandas.tseries.offsets.BMonthBegin" title="pandas.tseries.offsets.BMonthBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">BMonthBegin</span></code></a> or <a class="reference internal" href="../reference/api/pandas.tseries.offsets.BusinessMonthBegin.html#pandas.tseries.offsets.BusinessMonthBegin" title="pandas.tseries.offsets.BusinessMonthBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">BusinessMonthBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BMS'</span></code></p></td>
<td><p>business month begin</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.CBMonthEnd.html#pandas.tseries.offsets.CBMonthEnd" title="pandas.tseries.offsets.CBMonthEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">CBMonthEnd</span></code></a> or <a class="reference internal" href="../reference/api/pandas.tseries.offsets.CustomBusinessMonthEnd.html#pandas.tseries.offsets.CustomBusinessMonthEnd" title="pandas.tseries.offsets.CustomBusinessMonthEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">CustomBusinessMonthEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'CBM'</span></code></p></td>
<td><p>custom business month end</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.CBMonthBegin.html#pandas.tseries.offsets.CBMonthBegin" title="pandas.tseries.offsets.CBMonthBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">CBMonthBegin</span></code></a> or <a class="reference internal" href="../reference/api/pandas.tseries.offsets.CustomBusinessMonthBegin.html#pandas.tseries.offsets.CustomBusinessMonthBegin" title="pandas.tseries.offsets.CustomBusinessMonthBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">CustomBusinessMonthBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'CBMS'</span></code></p></td>
<td><p>custom business month begin</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.SemiMonthEnd.html#pandas.tseries.offsets.SemiMonthEnd" title="pandas.tseries.offsets.SemiMonthEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">SemiMonthEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'SM'</span></code></p></td>
<td><p>15th (or other day_of_month) and calendar month end</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.SemiMonthBegin.html#pandas.tseries.offsets.SemiMonthBegin" title="pandas.tseries.offsets.SemiMonthBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">SemiMonthBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'SMS'</span></code></p></td>
<td><p>15th (or other day_of_month) and calendar month begin</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.QuarterEnd.html#pandas.tseries.offsets.QuarterEnd" title="pandas.tseries.offsets.QuarterEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">QuarterEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'Q'</span></code></p></td>
<td><p>calendar quarter end</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.QuarterBegin.html#pandas.tseries.offsets.QuarterBegin" title="pandas.tseries.offsets.QuarterBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">QuarterBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'QS'</span></code></p></td>
<td><p>calendar quarter begin</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BQuarterEnd.html#pandas.tseries.offsets.BQuarterEnd" title="pandas.tseries.offsets.BQuarterEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">BQuarterEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BQ</span></code></p></td>
<td><p>business quarter end</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BQuarterBegin.html#pandas.tseries.offsets.BQuarterBegin" title="pandas.tseries.offsets.BQuarterBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">BQuarterBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BQS'</span></code></p></td>
<td><p>business quarter begin</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.FY5253Quarter.html#pandas.tseries.offsets.FY5253Quarter" title="pandas.tseries.offsets.FY5253Quarter"><code class="xref py py-class docutils literal notranslate"><span class="pre">FY5253Quarter</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'REQ'</span></code></p></td>
<td><p>retail (aka 52-53 week) quarter</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.YearEnd.html#pandas.tseries.offsets.YearEnd" title="pandas.tseries.offsets.YearEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">YearEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'A'</span></code></p></td>
<td><p>calendar year end</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.YearBegin.html#pandas.tseries.offsets.YearBegin" title="pandas.tseries.offsets.YearBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">YearBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'AS'</span></code> or <code class="docutils literal notranslate"><span class="pre">'BYS'</span></code></p></td>
<td><p>calendar year begin</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BYearEnd.html#pandas.tseries.offsets.BYearEnd" title="pandas.tseries.offsets.BYearEnd"><code class="xref py py-class docutils literal notranslate"><span class="pre">BYearEnd</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BA'</span></code></p></td>
<td><p>business year end</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BYearBegin.html#pandas.tseries.offsets.BYearBegin" title="pandas.tseries.offsets.BYearBegin"><code class="xref py py-class docutils literal notranslate"><span class="pre">BYearBegin</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BAS'</span></code></p></td>
<td><p>business year begin</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.FY5253.html#pandas.tseries.offsets.FY5253" title="pandas.tseries.offsets.FY5253"><code class="xref py py-class docutils literal notranslate"><span class="pre">FY5253</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'RE'</span></code></p></td>
<td><p>retail (aka 52-53 week) year</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Easter.html#pandas.tseries.offsets.Easter" title="pandas.tseries.offsets.Easter"><code class="xref py py-class docutils literal notranslate"><span class="pre">Easter</span></code></a></p></td>
<td><p>None</p></td>
<td><p>Easter holiday</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.BusinessHour.html#pandas.tseries.offsets.BusinessHour" title="pandas.tseries.offsets.BusinessHour"><code class="xref py py-class docutils literal notranslate"><span class="pre">BusinessHour</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'BH'</span></code></p></td>
<td><p>business hour</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.CustomBusinessHour.html#pandas.tseries.offsets.CustomBusinessHour" title="pandas.tseries.offsets.CustomBusinessHour"><code class="xref py py-class docutils literal notranslate"><span class="pre">CustomBusinessHour</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'CBH'</span></code></p></td>
<td><p>custom business hour</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Day.html#pandas.tseries.offsets.Day" title="pandas.tseries.offsets.Day"><code class="xref py py-class docutils literal notranslate"><span class="pre">Day</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'D'</span></code></p></td>
<td><p>one absolute day</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Hour.html#pandas.tseries.offsets.Hour" title="pandas.tseries.offsets.Hour"><code class="xref py py-class docutils literal notranslate"><span class="pre">Hour</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'H'</span></code></p></td>
<td><p>one hour</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Minute.html#pandas.tseries.offsets.Minute" title="pandas.tseries.offsets.Minute"><code class="xref py py-class docutils literal notranslate"><span class="pre">Minute</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'T'</span></code> or <code class="docutils literal notranslate"><span class="pre">'min'</span></code></p></td>
<td><p>one minute</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Second.html#pandas.tseries.offsets.Second" title="pandas.tseries.offsets.Second"><code class="xref py py-class docutils literal notranslate"><span class="pre">Second</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'S'</span></code></p></td>
<td><p>one second</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Milli.html#pandas.tseries.offsets.Milli" title="pandas.tseries.offsets.Milli"><code class="xref py py-class docutils literal notranslate"><span class="pre">Milli</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'L'</span></code> or <code class="docutils literal notranslate"><span class="pre">'ms'</span></code></p></td>
<td><p>one millisecond</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Micro.html#pandas.tseries.offsets.Micro" title="pandas.tseries.offsets.Micro"><code class="xref py py-class docutils literal notranslate"><span class="pre">Micro</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'U'</span></code> or <code class="docutils literal notranslate"><span class="pre">'us'</span></code></p></td>
<td><p>one microsecond</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/api/pandas.tseries.offsets.Nano.html#pandas.tseries.offsets.Nano" title="pandas.tseries.offsets.Nano"><code class="xref py py-class docutils literal notranslate"><span class="pre">Nano</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">'N'</span></code></p></td>
<td><p>one nanosecond</p></td>
</tr>
</tbody>
</table>

In [None]:
pd.Timestamp('2020-01-01')

In [None]:
pd.Period('2020'), pd.Period('2020-01'), pd.Period('2020-01-01 12'), pd.Period('2020-01-01 12:8')

In [None]:
pd.Period('2020', freq='A-MAR') + 1

In [None]:
(pd.Period('2020', freq='A-MAR') + 1).month

In [None]:
pd.date_range('2020-01-01', freq='B',periods=10)

In [None]:
pd.date_range('2020-01-01', freq='3h43min', periods=10)

<b>Custom Buisness days with a list of Holidays</b>

In [None]:
pd.bdate_range('2020-01-01', freq='B', periods=10, holidays=['2020-01-10','2020-01-13'])

**Q)** How to fix it? HINT : Check the table above.

In [None]:
pd.bdate_range('2020-01-01', freq='C', periods=10, holidays=['2020-01-10','2020-01-13'])

In [None]:
#----------YOUR SOLUTION-----------#
#....


****

<b>Notice the difference between the two :</b>

In [None]:
pd.date_range('2020-01-01', freq='D',periods=10)

In [None]:
pd.period_range('2020-01-01', freq='D', periods=10)

<b>Well thats not much of difference to worry about now is it? So why do we need Period?</b>

In [None]:
pd.date_range('2020-01-01', freq='Q-MAR', periods=10)

In [None]:
pd.date_range('2020-01-01', freq='QS-MAR', periods=10)

In [None]:
pd.period_range('2020-01-01', freq='Q-MAR', periods=10)

****
<b>Playing around with Time Deltas</b>

In [None]:
pd.Period('2020-01-01')+pd.Timedelta(days=3)

In [None]:
pd.Timestamp('2020-01-01')+pd.Timedelta(days=3)

In [None]:
dtrange = pd.date_range('2020-01-01', freq='D', periods=10)
dtrange

In [None]:
dtrange+pd.Timedelta(days=+4, hours=+2)

# Pandas TimeZones

In [None]:
all_timezones

In [None]:
[k for k in all_timezones if 'Aus' in k]

<b>Inherently pandas `Timestamp` doesnt assume a time-zone, unless explicity given</b>

In [None]:
pdtstamp = pd.Timestamp(datetime.now())
pdtstamp

In [None]:
print(pdtstamp.tz)

<b>Or, you call the tz_localize() with the `tz`</b>

In [None]:
pdtstamp = pd.Timestamp(datetime.now(), tz='Asia/Calcutta')
pdtstamp.tz

**Converting the timestamps between timezones**

In [None]:
pdtstamp.tz_convert('Australia/Melbourne')

**Q)** Give me the current time in Calcutta, Bangkok & New York ->

In [None]:
current_tznaive_time = pd.Timestamp(datetime.now())
current_tzlocalised_time = current_tznaive_time.tz_localize('Asia/Calcutta')
current_tzconverted_time1 = current_tzlocalised_time.tz_convert('Asia/Bangkok')
current_tzconverted_time2 = current_tzlocalised_time.tz_convert('America/New_York')


In [None]:
print('Current Näive Time                     : ', current_tznaive_time)
print('Current Localised Time (Asia/Calcutta) : ', current_tzlocalised_time)
print('Whats the time in Bangkok?            -> ', current_tzconverted_time1)
print('Whats the time in New York?            -> ', current_tzconverted_time2)


In [None]:
#----------YOUR SOLUTION-----------#
#....


<b>Playing with your time series data</b>

In [None]:
print(britannia_data.index)
britannia_data.head()

In [None]:
britannia_data.index = britannia_data.index.tz_localize('Asia/Calcutta')
print(britannia_data.index)
britannia_data.head()

In [None]:
britannia_data['Date(SNG)'] = britannia_data.index.tz_convert('Asia/Singapore')
britannia_data.head()

In [None]:
britannia_data.drop('Date(SNG)', axis=1, inplace =True)

# Pandas Resampling

***NOTE : Do not confuse this with Undersampling and Oversampling***

<img src='https://raw.githubusercontent.com/rafjaa/machine_learning_fecib/master/src/static/img/resampling.png'>

In [None]:
britannia_data.head()

<b>Lower Granularity</b>

In [None]:
britannia_data.asfreq('1min', method='ffill')['1996-01-02']

<b>Higher Granularity</b>

In [None]:
britannia_data.resample('3D', label='right')
# .last()

<b>Can you see the problem here if we do this to our dataset??</b>

- Some dates have been added, although they might had been off days or holidays even in 'B' frequency

- Values assumed by the features are now defined by us, and no longer can be said they are the tru value

# Random Testing Space