![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# PIB provincial par industrie

Ce cahier explore le produit intérieur brut, utilisant les [données publiées par Statistique Canada au 2018-05-02](http://www.statcan.gc.ca/daily-quotidien/180502/dq180502a-fra.htm) . L'objectif est de générer une vue d'ensemble de l'économie canadienne, séparée par les provinces et les secteurs industriels. Les montants sont en dollars enchaînés (2007) (millions).

Questions spécifiques que je veux comprendre:

+ Quels sont les 10 principaux postes sectoriels au Canada?
+ Quels sont les 10 principaux postes sectoriels de chaque province?

> Auteur: [J. Colliander] (http://colliand.com)
> Date: 2018-05-03

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

In [None]:
## The data was downloaded from StatsCan.
## The first 3 rows contain header information that we don't want in the data frame.
## The last 28 rows contain footnotes and information about the data set. We don't want those in the frame.
## skipfooter seemed to require changing the engine from default 'c' to 'python'.
gitems = pd.read_csv('./data/cansim-3790030-eng-4789079444265787071.csv', skiprows = 3, skipfooter = 28, engine='python')

In [None]:
## Throw away lines with internal Provincial and territorial totals.
## There may be other internal totals included here.
## TODO Review the footnotes and accompanying information on the data set.
df = gitems.loc[~(gitems['North American Industry Classification System (NAICS)'].str.contains('All industries'))]

In [None]:
df

In [None]:
years = ['2013', '2014','2015', '2016', '2017']

In [None]:
## Total GDP vs Years
for year in years:
    print(year, df[year].sum())

In [None]:
df['Geography'].unique()

In [None]:
## long column label.
df = df.rename(columns={'North American Industry Classification System (NAICS)': 'NAICS'})

In [None]:
df.columns

In [None]:
## Industry sectors sorted by total value across all provinces and territories
df.groupby(['NAICS']).sum()['2017'].sort_values(ascending=False).head(n=25)

In [None]:
## GDP vs Provinces
pd.pivot_table(df,index=['Geography'], 
               aggfunc = sum, 
               values = ['2017','2016','2015','2014','2013']).sort_values(by=['2017'], 
                ascending=[False]).plot(kind='bar', figsize=(10,6))

In [None]:
## 2017 Single Largest GDP Sector Sorted
pd.pivot_table(df,index=['NAICS'], 
               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by=['2017'], 
                ascending=[False]).head(n=10).plot(kind='bar', figsize=(10,6))

In [None]:
## 2017 GDP vs Sector separated by Geography and sorted by maximum value
pd.pivot_table(df,index=['Geography', 'NAICS'], 
               aggfunc = max, values = ['2017']).sort_values(by=['2017'], ascending=[False]).head(n=25)

In [None]:
## 2017 Individual Lines Items in the GDP breakdown of Geography X Sector
pd.pivot_table(df,index=['Geography','NAICS'], 
               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by='2017', ascending=False).head(n=10).plot(
                kind='bar', figsize=(10, 6))

Le PIB de l'Alberta, extraction du pétrole et du gaz [211], est comparable au PIB de l'Ontario, immobilier [531]. Le PIB de la Colombie-Britannique (66), immobilier [531], représente environ la moitié de la contribution de l'extraction du pétrole et du gaz d'Alberta et d'immobilier d'Ontario.

## Analyser le PIB provincial entre les secteurs

In [None]:
df['Geography'].unique()

In [None]:
provinces = ['Newfoundland and Labrador', 'Prince Edward Island', 'Nova Scotia',
       'New Brunswick', 'Quebec', 'Ontario', 'Manitoba', 'Saskatchewan',
       'Alberta', 'British Columbia (66)', 'Yukon',
       'Northwest Territories', 'Nunavut']

In [None]:
whichprovince = 'British Columbia (66)'
pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], 
               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by='2017', ascending=False).head(n=20).plot(
                kind='bar', cmap="viridis", figsize=(16, 10), title= whichprovince)

In [None]:
whichprovince = 'Alberta'
pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], 
               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by='2017', ascending=False).head(n=20).plot(
                kind='bar', cmap="viridis", figsize=(12, 7), title= whichprovince)

In [None]:
whichprovince = 'Alberta'
pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], 
               aggfunc = sum, values = ['2017']).sort_values(by='2017', ascending=False).head(n=15).plot(
                kind='pie', subplots=True, title= whichprovince, legend=False, figsize=(6,6), autopct='%1.1f%%')

In [None]:
whichprovince = 'Ontario'
pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], 
               aggfunc = sum, values = ['2017']).sort_values(by='2017', ascending=False).head(n=10).plot(
                kind='pie', subplots=True, title= whichprovince, legend=False, figsize=(6,6), autopct='%1.1f%%')

In [None]:
whichprovince = 'British Columbia (66)'
pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], 
               aggfunc = sum, values = ['2017']).sort_values(by='2017', ascending=False).head(n=10).plot(
                kind='pie', subplots=True, 
                title= whichprovince, legend=False, autopct='%1.1f%%', radius=1, figsize =(6,6))

In [None]:
## This code creates a pivot table. Drag the column headings and play with the drop down menus to explore.
from pivottablejs import pivot_ui
pivot_ui(df)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)