## 18 Time Analysis Mini Series Pt5 - PeriodIndex & Period_Range
This tutorial continues pandas time series analysis by introducing period and periodIndex. These periods can be used to create an index in a df. The function that we will use for this periodindex..

In [2]:
import pandas as pd

In [3]:
idx = pd.period_range('2011', '2017', freq='q') # We're creating quarterly periods between 2011 & 2017
idx

PeriodIndex(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2',
             '2012Q3', '2012Q4', '2013Q1', '2013Q2', '2013Q3', '2013Q4',
             '2014Q1', '2014Q2', '2014Q3', '2014Q4', '2015Q1', '2015Q2',
             '2015Q3', '2015Q4', '2016Q1', '2016Q2', '2016Q3', '2016Q4',
             '2017Q1'],
            dtype='period[Q-DEC]', freq='Q-DEC')

These are all the quarters between 2011 and 2017. For companies like Walmart the freq would be Q-JAN etc.<br><br>
If you do not want to give a start and end date you can simply specify a number of periods to be created...

In [4]:
idx = pd.period_range('2011', periods = 10, freq='q') # We're creating 10 quarterly periods starting in 2011
idx

PeriodIndex(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2',
             '2012Q3', '2012Q4', '2013Q1', '2013Q2'],
            dtype='period[Q-DEC]', freq='Q-DEC')

We can use numpy's random number generator to create a series...

In [5]:
import numpy as np
ps = pd.Series(np.random.randn(len(idx)), idx)
ps

2011Q1    0.124910
2011Q2    0.826870
2011Q3   -2.244380
2011Q4   -1.449153
2012Q1    0.023171
2012Q2    0.603193
2012Q3    1.619980
2012Q4   -0.363176
2013Q1   -1.126276
2013Q2   -2.061746
Freq: Q-DEC, dtype: float64

In [6]:
ps.index

PeriodIndex(['2011Q1', '2011Q2', '2011Q3', '2011Q4', '2012Q1', '2012Q2',
             '2012Q3', '2012Q4', '2013Q1', '2013Q2'],
            dtype='period[Q-DEC]', freq='Q-DEC')

**Benefits of PeriodIndex**<br>
The main benefit is that we can start to slice and pull information out of the PeriodIndex...

In [8]:
ps['2011'] # To return just the quarters for 2011

2011Q1    0.124910
2011Q2    0.826870
2011Q3   -2.244380
2011Q4   -1.449153
Freq: Q-DEC, dtype: float64

In [9]:
ps['2011':'2013'] # To get just the quarters from the specified years

2011Q1    0.124910
2011Q2    0.826870
2011Q3   -2.244380
2011Q4   -1.449153
2012Q1    0.023171
2012Q2    0.603193
2012Q3    1.619980
2012Q4   -0.363176
2013Q1   -1.126276
2013Q2   -2.061746
Freq: Q-DEC, dtype: float64

In [10]:
pst = ps.to_timestamp() # To convert our period index to a timestamp index
pst

2011-01-01    0.124910
2011-04-01    0.826870
2011-07-01   -2.244380
2011-10-01   -1.449153
2012-01-01    0.023171
2012-04-01    0.603193
2012-07-01    1.619980
2012-10-01   -0.363176
2013-01-01   -1.126276
2013-04-01   -2.061746
Freq: QS-OCT, dtype: float64

In [11]:
pst.index

DatetimeIndex(['2011-01-01', '2011-04-01', '2011-07-01', '2011-10-01',
               '2012-01-01', '2012-04-01', '2012-07-01', '2012-10-01',
               '2013-01-01', '2013-04-01'],
              dtype='datetime64[ns]', freq='QS-OCT')

In [13]:
pst.to_period() # Will convert the timestamp index back to a period index

2011Q1    0.124910
2011Q2    0.826870
2011Q3   -2.244380
2011Q4   -1.449153
2012Q1    0.023171
2012Q2    0.603193
2012Q3    1.619980
2012Q4   -0.363176
2013Q1   -1.126276
2013Q2   -2.061746
Freq: Q-DEC, dtype: float64

In [14]:
pst.index # You could see that it was a period index as it had the Q1 etc appended to the date

DatetimeIndex(['2011-01-01', '2011-04-01', '2011-07-01', '2011-10-01',
               '2012-01-01', '2012-04-01', '2012-07-01', '2012-10-01',
               '2013-01-01', '2013-04-01'],
              dtype='datetime64[ns]', freq='QS-OCT')

#### Walmart Exercise

In [18]:
df = pd.read_csv('D:\\Pandas\\CodeBasics\\datasets\\18wmt.csv', sep = '\t')
df

Unnamed: 0,Line Item,2017Q1,2017Q2,2017Q3,2017Q4,2018Q1
0,Revenue,115904,120854,118179,130936,117542
1,Expenses,86544,89485,87484,97743,87688
2,Profit,29360,31369,30695,33193,29854


**Getting the Index correct**

In [19]:
df.set_index('Line Item', inplace = True) # To make the Line Item column our index

In [20]:
df


Unnamed: 0_level_0,2017Q1,2017Q2,2017Q3,2017Q4,2018Q1
Line Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Revenue,115904,120854,118179,130936,117542
Expenses,86544,89485,87484,97743,87688
Profit,29360,31369,30695,33193,29854


In [21]:
df = df.T # Transpose which means it will convert rows into columns and vice versa
df

Line Item,Revenue,Expenses,Profit
2017Q1,115904,86544,29360
2017Q2,120854,89485,31369
2017Q3,118179,87484,30695
2017Q4,130936,97743,33193
2018Q1,117542,87688,29854


In [22]:
df.index # Our index is currently an object type but we want it to be a period index

Index(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1'], dtype='object')

In [23]:
df.index = pd.PeriodIndex(df.index, freq = 'Q-JAN')
df

Line Item,Revenue,Expenses,Profit
2017Q1,115904,86544,29360
2017Q2,120854,89485,31369
2017Q3,118179,87484,30695
2017Q4,130936,97743,33193
2018Q1,117542,87688,29854


In [24]:
df.index # Just to check that we have a period index but, once again, we could see that visually

PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1'], dtype='period[Q-JAN]', freq='Q-JAN')

**Creating the two new columns**

In [25]:
df["Start Date"]=df.index.map(lambda x: x.start_time)
df

Line Item,Revenue,Expenses,Profit,Start Date
2017Q1,115904,86544,29360,2016-02-01
2017Q2,120854,89485,31369,2016-05-01
2017Q3,118179,87484,30695,2016-08-01
2017Q4,130936,97743,33193,2016-11-01
2018Q1,117542,87688,29854,2017-02-01


When creating a new column in a df you need to associate it to something, in this case we want to use the start time of all of the period objects in the index. The way to do this is by calling map.<br>
When you use map, it takes a function as an argument and the quickest way to define a function inline is to use lambda. The function must take each element in the index and return the start time from each of them.<br>
We can see that the column has indeed been created and it has an appropriate start time

In [26]:
df["End Date"]=df.index.map(lambda x: x.end_time) # End date column is created in much the same way
df

Line Item,Revenue,Expenses,Profit,Start Date,End Date
2017Q1,115904,86544,29360,2016-02-01,2016-04-30
2017Q2,120854,89485,31369,2016-05-01,2016-07-31
2017Q3,118179,87484,30695,2016-08-01,2016-10-31
2017Q4,130936,97743,33193,2016-11-01,2017-01-31
2018Q1,117542,87688,29854,2017-02-01,2017-04-30
