In [1]:
import numpy as np
import pandas as pd
from datetime import datetime

In [2]:
data = {'date': ['2018-01-01', '2018-02-01',
'2018-03-01', '2018-04-01',
'2018-05-01', '2018-06-01',
'2018-01-01', '2018-02-01',
'2018-03-01', '2018-04-01',
'2018-05-01', '2018-06-01'],
'visitors': [35, 30, 82, 26,
83, 46, 40, 57, 95, 57, 87, 42]}

In [3]:
df = pd.DataFrame(data,
columns=['date', 'visitors'])

In [4]:
 df.head()

Unnamed: 0,date,visitors
0,2018-01-01,35
1,2018-02-01,30
2,2018-03-01,82
3,2018-04-01,26
4,2018-05-01,83


In [5]:
df.dtypes

date        object
visitors     int64
dtype: object

The visitors column is of integer type, but the date
column is shown to be an object. We know that this is a date
and it would be preferable to use a more relevant type. We
can change the column with the to_datetime method in a
Pandas dataframe:

In [6]:
df['date'] = pd.to_datetime(df['date'])

In [7]:
df.dtypes

date        datetime64[ns]
visitors             int64
dtype: object

Furthermore, since the date provides an order sequence for
We can use the to_datetime
method to convert Pandas
columns into date objects.
our data, we can do a couple of useful things. First we can
set the index to be given by the date column, and second,
we can order the dataframe by this index:

In [8]:
df.set_index('date',  inplace=True)
df.sort_index(inplace=True)



In [9]:
df.head()

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-01-01,35
2018-01-01,40
2018-02-01,30
2018-02-01,57
2018-03-01,82


As we can see in the code above, the rows of the dataset
have been ordered by the date index. We can now apply
some slicing and dicing to our dataframe. For instance, we
can look at the visitors for the year 2018:

In [10]:
#What about I say I m interested in knowing list of visitors in May 2018
df['2018-05']

  df['2018-05']


Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-05-01,83
2018-05-01,87


We can also use colon operator in accessing date vs data in some range. It is similar to the list sliceing. 

In [12]:
df[datetime(2018,3,1):]

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-03-01,82
2018-03-01,95
2018-04-01,26
2018-04-01,57
2018-05-01,83
2018-05-01,87
2018-06-01,46
2018-06-01,42


In [14]:
df[datetime(2018,3,1):datetime(2018,5,1)]

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-03-01,82
2018-03-01,95
2018-04-01,26
2018-04-01,57
2018-05-01,83
2018-05-01,87



# Truncate Method

The truncate method can help us keep all the data points
before or after a given date. In this case, let us ask for the
data up to March 2018:
    

In [18]:
df.truncate(after='2018-03-01')

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-01-01,35
2018-01-01,40
2018-02-01,30
2018-02-01,57
2018-03-01,82
2018-03-01,95


In [20]:
df.truncate(before='2018-05-01')

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-05-01,83
2018-05-01,87
2018-06-01,46
2018-06-01,42


# we can count the number of datapoints per entry in the index:

In [22]:
df.groupby('date').count()

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-01-01,2
2018-02-01,2
2018-03-01,2
2018-04-01,2
2018-05-01,2
2018-06-01,2


In [24]:
df.groupby('date').sum()

# we can also add similar dates visitor count 

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-01-01,75
2018-02-01,87
2018-03-01,177
2018-04-01,83
2018-05-01,170
2018-06-01,88


# 'resample'
We can change the frequency of data set by resample.
Let us use the ’M’ offset alias to tell Pandas to
create monthly statistics. For the mean we have:

In [26]:
df.resample('M').mean()

Unnamed: 0_level_0,visitors
date,Unnamed: 1_level_1
2018-01-31,37.5
2018-02-28,43.5
2018-03-31,88.5
2018-04-30,41.5
2018-05-31,85.0
2018-06-30,44.0


An offset alias, such as ’M’ used in the code above is a Offset aliases are listed in Table
string that represents a common time series frequency. We 1.1.
can see some of these aliases

# Alias Description
B business day frequency

C custom business day frequency

D calendar day frequency

W weekly frequency

M month-end frequency


SM semi-month-end frequency (15th and end of
month)

BM business month-end frequency

CBM custom business month-end frequency

MS month-start frequency

SMS semi-month-start frequency (1st and 15th)

BMS business month start frequency

CBMS custom business month-start frequency

Q quarter-end frequency

BQ business quarter-end frequency

QS quarter start frequency

BQS business quarter-start frequency

A, Y year-end frequency

BA, BY business year-end frequency

AS, YS year-start frequency

BAS, BYS business year-start frequency

BH business hour frequency

H hourly frequency

T, min minutely frequency

S secondly frequency

L, ms milliseconds

U, us microseconds

N nanoseconds

In [29]:
df.groupby('date').describe()

Unnamed: 0_level_0,visitors,visitors,visitors,visitors,visitors,visitors,visitors,visitors
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
2018-01-01,2.0,37.5,3.535534,35.0,36.25,37.5,38.75,40.0
2018-02-01,2.0,43.5,19.091883,30.0,36.75,43.5,50.25,57.0
2018-03-01,2.0,88.5,9.192388,82.0,85.25,88.5,91.75,95.0
2018-04-01,2.0,41.5,21.92031,26.0,33.75,41.5,49.25,57.0
2018-05-01,2.0,85.0,2.828427,83.0,84.0,85.0,86.0,87.0
2018-06-01,2.0,44.0,2.828427,42.0,43.0,44.0,45.0,46.0


In Table, we see the descriptive statistics for the data
entered manually earlier on. For brevity we have decided
not to include the count column.

In [31]:
date = pd.to_datetime("14th of October, 2016")
print(date)

2016-10-14 00:00:00


We have successfully transformed a date given in natural
language to a time stamp. We can also do the opposite; in 


other words, we can obtain a string of the time stamp to tel