## Pandas Time Series Analysis Part 1

In [8]:
import pandas as pd
apple = pd.read_csv('D:\\Pandas\\CodeBasics\\datasets\\15_applnodates.csv', sep = '\t')
apple.head() # My df has no dates which is pretty important for most df nevermind stock prices

Unnamed: 0,Open,High,Low,Close,Volume
0,153.17,153.33,152.22,153.18,16404088
1,153.58,155.45,152.89,155.45,27770715
2,154.34,154.45,153.46,153.93,25331662
3,153.9,155.81,153.78,154.45,26624926
4,155.02,155.98,154.48,155.37,21069647


**Scenario** - We want to generate dates for the df and insert them into the df. We are going to use pandas DateRange function to achieve this.<br>
 <br>**Creating the date range..**

In [9]:
rng = pd.date_range(start = '6/1/2017', end = '6/30/2017', freq = 'B')
#First we call the date range function
#start - this is the first argument that it takes and of course is the date that we want to start the range
#end - is the date that we want to stop making the range for
#freq - is either soimething like daily, weekly, monthly etc but here we have specified business days (excl weekends)
rng

DatetimeIndex(['2017-06-01', '2017-06-02', '2017-06-05', '2017-06-06',
               '2017-06-07', '2017-06-08', '2017-06-09', '2017-06-12',
               '2017-06-13', '2017-06-14', '2017-06-15', '2017-06-16',
               '2017-06-19', '2017-06-20', '2017-06-21', '2017-06-22',
               '2017-06-23', '2017-06-26', '2017-06-27', '2017-06-28',
               '2017-06-29', '2017-06-30'],
              dtype='datetime64[ns]', freq='B')

Looking at the range that was created, we see that 3rd & 4th June are missing as they are weekends. This is just what we wanted.

**Setting this range as our df index**

In [11]:
apple.set_index(rng, inplace = True) # This sets our index to the range that we just created
apple.head() # And we have our datetime index 

Unnamed: 0,Open,High,Low,Close,Volume
2017-06-01,153.17,153.33,152.22,153.18,16404088
2017-06-02,153.58,155.45,152.89,155.45,27770715
2017-06-05,154.34,154.45,153.46,153.93,25331662
2017-06-06,153.9,155.81,153.78,154.45,26624926
2017-06-07,155.02,155.98,154.48,155.37,21069647


**See previous notes for the benefits of setting the index to a DateTime object but in summary...**<br>Charting your df/Series data<br>Specify a date range to retrieve data about<br>You can set aggregate methods like mean, max, min, count etc<br>

**Getting our weekend data back - asfreq()**<br>As there was no trading over the weekend, we can assume that the closing price on Friday will be the price for Saturday and Sunday.<br>**asfreq()** - Allows you to redo your df as per the frequency that you specify...

In [14]:
apple.asfreq('D',method = 'pad')
#Firstly, we call the asfreq method
#We pass it the Daily argument which does include weekends
#pad is the method that we use to carry forward the prices from 2nd to 3rd & 4th June
apple.head()

Unnamed: 0,Open,High,Low,Close,Volume
2017-06-01,153.17,153.33,152.22,153.18,16404088
2017-06-02,153.58,155.45,152.89,155.45,27770715
2017-06-05,154.34,154.45,153.46,153.93,25331662
2017-06-06,153.9,155.81,153.78,154.45,26624926
2017-06-07,155.02,155.98,154.48,155.37,21069647


**NOTE:** This has not worked as expected

**Scenario** - Imagine that we want to create a date range, as above, but we only have the start date, no end date, but we do have the number of periods that we want to create ie how many datetime elements we want in our range. We can create that range by...

In [15]:
rng = pd.date_range(start = '1/1/2017', periods = 72, freq = 'B')
#When using period, instead of end, and a frequency of business days, 
#Pandas will calculate 72 business days starting from 1/1/2017
#This will work for any period and freq you specify eg we could have had 72 hour elements
rng

DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
               '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
               '2017-01-12', '2017-01-13', '2017-01-16', '2017-01-17',
               '2017-01-18', '2017-01-19', '2017-01-20', '2017-01-23',
               '2017-01-24', '2017-01-25', '2017-01-26', '2017-01-27',
               '2017-01-30', '2017-01-31', '2017-02-01', '2017-02-02',
               '2017-02-03', '2017-02-06', '2017-02-07', '2017-02-08',
               '2017-02-09', '2017-02-10', '2017-02-13', '2017-02-14',
               '2017-02-15', '2017-02-16', '2017-02-17', '2017-02-20',
               '2017-02-21', '2017-02-22', '2017-02-23', '2017-02-24',
               '2017-02-27', '2017-02-28', '2017-03-01', '2017-03-02',
               '2017-03-03', '2017-03-06', '2017-03-07', '2017-03-08',
               '2017-03-09', '2017-03-10', '2017-03-13', '2017-03-14',
               '2017-03-15', '2017-03-16', '2017-03-17', '2017-03-20',
      

**Use Case**<br>This is most useful for test data or for running tests

In [19]:
import numpy as np
np.random.randint(1,10,len(rng))
#random.randint - Allows you to generate random numbers
#We are generating random numbers between 1 & 10. 
#We are generating the number of numbers in the range rng i.e. 72 numbers in our example

array([7, 6, 3, 4, 5, 6, 9, 8, 5, 7, 5, 5, 5, 2, 6, 8, 6, 2, 6, 1, 3, 2,
       1, 3, 5, 8, 3, 4, 7, 2, 5, 2, 8, 5, 6, 7, 8, 3, 1, 4, 4, 2, 8, 8,
       3, 8, 2, 9, 3, 8, 7, 7, 9, 4, 1, 5, 8, 3, 6, 5, 5, 3, 6, 7, 3, 6,
       1, 3, 2, 7, 1, 1])

We can generate a pandas series out of this...

In [21]:
ts = pd.Series(np.random.randint(1,10,len(rng)), index = rng)
ts.head(15)

2017-01-02    7
2017-01-03    7
2017-01-04    2
2017-01-05    7
2017-01-06    2
2017-01-09    6
2017-01-10    1
2017-01-11    5
2017-01-12    6
2017-01-13    8
2017-01-16    4
2017-01-17    6
2017-01-18    3
2017-01-19    9
2017-01-20    9
Freq: B, dtype: int32

**Note: **While date_range() can deal with weekends it cannot deal with holidays like bank holidays or Independence Day. For that, you have to use a holiday calendar but more of that in another tutorial