## Get data


[This](https://population.un.org/wpp/DVD/Files/2_Indicators%20(Probabilistic%20Projections)/UN_PPP2017_Output_PopTot.xls) is data is from United Nations population division. The dataset is prediction of world population from 2015 till 2100 with 5 years interval. We are taking a small subset of the data extracted which can be found in `data/world_population.csv`.


In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('../data/world_population.csv')
df.head()

In [None]:
df.describe()

# Set Index

setting of indices of time series column
* easy to understand the data
* In order to simplify plotting and computations, 
* as well as for reference for rolling or moving averages preferred compared to indices
* Uniqueness of rows are maintained


In [None]:
df.set_index('Year', inplace=True)
df.head()

## Plotting Data

In [None]:
# Setting up plotting options 
import matplotlib.pyplot as plt

from pylab import rcParams
%pylab inline
pylab.rcParams['figure.figsize'] = (10, 6)

plt.style.use('ggplot')

In [None]:
df.plot()

In [None]:
# Changing the x-limit and y-limit for better readability
df.plot(xlim=(2010, 2110), ylim=(7e6,12e6))

In [None]:
# Different colors for different time interval

In [None]:
fig, ax = plt.subplots(1, 1)

df.plot(ax=ax,color='b')

df.iloc[8:].plot(ax=ax,color='r')

ax.set_xlim(2010, 2110)
ax.set_ylim(7e6,12e6)

# More analysis on new data

Kaggle competition dataset for international airline passengers, monthly totals in thousands, [Download](https://www.kaggle.com/andreazzini/international-airline-passengers). Please add the `international-airline-passengers.csv` file to `data` folder

In [None]:
airline_df = pd.read_csv('../data/international-airline-passengers.csv')

In [None]:
print(airline_df.dtypes)
airline_df.describe()

In [None]:
# when we check the data there is issue in tail
# always perform head and tail analysis for dataset
airline_df.tail()

##  Cleaning up data

In [None]:
# Remove the last row as it's not used for analysis
airline_df = airline_df[:-1] 

# Rename columns for the readability
airline_df.rename(
    columns={'International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60': 'No of Passanger'}, 
    inplace=True)

In [None]:
airline_df.tail()

In [None]:
airline_df.dtypes

In [None]:

airline_df['Month'] = airline_df['Month'].map(lambda x: x + '-01')

airline_df['Month'] = pd.to_datetime(airline_df['Month'], format='%Y-%m-%d')

In [None]:
airline_df.set_index('Month', inplace=True)

In [None]:
# Setting limiits with datetime
ax = airline_df.plot()
ax.set_xlim(pd.Timestamp('1948-09-01'), pd.Timestamp('1961-03-01'))

In [None]:
# drill down to specific timespane
airline_df['1950-05-01':'1951-03-01'].plot(style='g--')

In [None]:
# create new column month by using index
airline_df['mon'] = airline_df.index.month
# get boxplot for the month
airline_df.boxplot(column=['No of Passanger'], by='mon')