
<img src="http://www.nserc-crsng.gc.ca/_gui/wmms.gif" alt="Canada logo" align="right">

<br>

<img src="http://www.triumf.ca/sites/default/files/styles/gallery_large/public/images/nserc_crsng.gif?itok=H7AhTN_F" alt="NSERC logo" align="right" width = 90>



# Exploring NSERC Awards Data


Canada's [Open Government Portal](http://open.canada.ca/en) includes [NSERC Awards Data](http://open.canada.ca/data/en/dataset/c1b0f627-8c29-427c-ab73-33968ad9176e) from 1995 through 2016.

The awards data (in .csv format) were copied to an [Amazon Web Services S3 bucket](http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html). This open Jupyter notebook is an instance of the Selecti

> **Acknowledgement:** I thank [Ian Allison](https://github.com/ianabc) and [James Colliander](http://colliand.com) of the [Pacific Institute for the Mathematical Sciences](http://www.pims.math.ca/) for building the [JupyterHub service](http://syzygy.ca) and for help with this notebook. -- I. Heisz

In [14]:
import numpy as np
import pandas as pd
import sys

df = pd.DataFrame()

startYear = 1995
endYear   = 2017  # The last year is not included, so if it was 2017 it means we include the 2016 collection but not 2017.

for year in range(startYear, endYear):
    file = 'https://s3.ca-central-1.amazonaws.com/open-data-ro/NSERC/NSERC_GRT_FYR' + str(year) + '_AWARD.csv.gz'
    df = df.append(pd.read_csv(file, 
                               compression='gzip', 
                               usecols = [1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 17, 28], 
                               encoding='latin-1'
                              )
                  )  
    print(year)
 
## Rename columns for better readability.
df.columns = ['Name', 'Department', 'OrganizationID',
                 'Institution', 'ProvinceEN', 'CountryEN',
                 'FiscalYear', 'AwardAmount', 'ProgramID',
                 'ProgramNameEN', 'Committee', 'ResearchSubjectEN']

## Strip out any leading or trailing whitespace in the ProgramID column
df['ProgramID'] = df['ProgramID'].str.strip();

1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016


## Define Methods

In [15]:
import matplotlib.ticker as mtick
import plotly.graph_objs as go
!pip3 install plotly --user 
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot
import plotly.tools as tls
import matplotlib.pylab as plt
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas

def nsercPlot (data):
    fig, axes = plt.subplots()

    xAxis = 'FiscalYear'
    yAxis = 'AwardAmount'
    
    y = data.groupby(xAxis).sum()[yAxis]
    x = y.index
    
    plt.xlabel(xAxis, fontsize=14)
    plt.ylabel(yAxis, fontsize=14)
    plt.title(title)
      
    plt.plot(x,y)
    
    init_notebook_mode(connected=True)
    
    axes.scatter(x,y,s=plotPointSizes,alpha=opacity)
    canvas = FigureCanvas(fig)
    plotly_fig = tls.mpl_to_plotly(fig)
    py.iplot(plotly_fig)
    
    return;

def viewAvailableSearch(column, searchString):
    available = df.drop_duplicates(subset = column)
    available = available[available[column].str.contains(searchString, na=False)]
    sorted = available.sort_values(by=[column], ascending=[True])
    print(sorted.to_string(columns= [column], index=False))
    return;

def overview(column, data):
    mean = data[columnYouWantInformationOn].mean()
    print('The mean of ' + str(columnYouWantInformationOn) + ' is ' + str(mean))

    median = data[columnYouWantInformationOn].median()
    print('The median of ' + str(columnYouWantInformationOn) + ' is ' + str(median))

    standardDeviation = data[columnYouWantInformationOn].std()
    print('The standard deviation of ' + str(columnYouWantInformationOn) + ' is ' + str(standardDeviation))

    awardCount = data.AwardAmount.count()
    print('The total number of awards for your selection is ' + str(awardCount))
    return;

## Engage Grants Program

In [79]:
selectedData = df
viewAvailableSearch('ProgramNameEN','nga')

ProgramNameEN
Engage Grants Program                         ...
Engage Plus Grants Program                    ...
                  Engage Plus Grants for Colleges
                  Engage Plus Grants for colleges


In [68]:
onlyEngageGrantsProgram = selectedData.loc[(selectedData['ProgramID'] == "EGP")]
onlyEngageGrantsProgram

Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,FiscalYear,AwardAmount,ProgramID,ProgramNameEN,Committee,ResearchSubjectEN
1003,"Bahrami, Majid","Engineering Science, School of",5,Simon Fraser University,British Columbia,CANADA,2009,25000,EGP,Engage Grants Program ...,1554,Mechanical engineering
1325,"Bassi, Amarjeet",Chemical and Biochemical Engineering,36,University of Western Ontario,Ontario,CANADA,2009,25000,EGP,Engage Grants Program ...,1553,Biochemical engineering
1664,"Bengio, Yoshua",Informatique et recherche opérationnelle,63,Université de Montréal,Québec,CANADA,2009,25000,EGP,Engage Grants Program ...,1552,Learning and inference theories
1726,"Benzaazoua, Mostafa",Sciences appliquées,47,Université du Québec en Abitibi-Témiscamingue,Québec,CANADA,2009,23400,EGP,Engage Grants Program ...,1552,Mining and mineral processing
1927,"Bhattacharjee, Subir",Mechanical Engineering,9,University of Alberta,Alberta,CANADA,2009,25000,EGP,Engage Grants Program ...,1555,Rheology and processing
3091,"Butler, Michael",Microbiology,19,University of Manitoba,Manitoba,CANADA,2009,25000,EGP,Engage Grants Program ...,1555,Biochemical engineering
3173,"Callaghan, Jack",Kinesiology,33,University of Waterloo,Ontario,CANADA,2009,24548,EGP,Engage Grants Program ...,1553,Human factors engineering
3478,"Cercone, Nick","Science and Engineering , Faculty of",38,York University,Ontario,CANADA,2009,25000,EGP,Engage Grants Program ...,1553,Software and development
3924,"Cheng, Yufeng(Frank)",Mechanical and Manufacturing Engineering,11,University of Calgary,Alberta,CANADA,2009,20200,EGP,Engage Grants Program ...,1555,"Materials structure, properties and testing"
4491,"Coops, Nicholas",Forest Resources Management,2,University of British Columbia,British Columbia,CANADA,2009,24500,EGP,Engage Grants Program ...,1554,Renewable and non-renewable resources management


In [81]:
title = 'Engage Grants Program Awards over Time'
plotPointSizes = 7 
opacity = 1

nsercPlot(onlyEngageGrantsProgram)

In [80]:
overview('AwardAmount', onlyEngageGrantsProgram)

The mean of AwardAmount is 24689.7360244
The median of AwardAmount is 25000.0
The standard deviation of AwardAmount is 1263.45508332
The total number of awards for your selection is 7531


## Engage Grants and Related Program Awards
Includes Engage Grants, Engage Plus Grants and Engage Plus Grants for colleges.

In [82]:
allRelatedPrograms = selectedData

allRelatedPrograms = allRelatedPrograms.loc[(allRelatedPrograms['ProgramID'] == "EGP") | (allRelatedPrograms['ProgramID'] == "EGP2") | (allRelatedPrograms['ProgramID'] == "CEGP2")]

allRelatedPrograms

Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,FiscalYear,AwardAmount,ProgramID,ProgramNameEN,Committee,ResearchSubjectEN
1003,"Bahrami, Majid","Engineering Science, School of",5,Simon Fraser University,British Columbia,CANADA,2009,25000,EGP,Engage Grants Program ...,1554,Mechanical engineering
1325,"Bassi, Amarjeet",Chemical and Biochemical Engineering,36,University of Western Ontario,Ontario,CANADA,2009,25000,EGP,Engage Grants Program ...,1553,Biochemical engineering
1664,"Bengio, Yoshua",Informatique et recherche opérationnelle,63,Université de Montréal,Québec,CANADA,2009,25000,EGP,Engage Grants Program ...,1552,Learning and inference theories
1726,"Benzaazoua, Mostafa",Sciences appliquées,47,Université du Québec en Abitibi-Témiscamingue,Québec,CANADA,2009,23400,EGP,Engage Grants Program ...,1552,Mining and mineral processing
1927,"Bhattacharjee, Subir",Mechanical Engineering,9,University of Alberta,Alberta,CANADA,2009,25000,EGP,Engage Grants Program ...,1555,Rheology and processing
3091,"Butler, Michael",Microbiology,19,University of Manitoba,Manitoba,CANADA,2009,25000,EGP,Engage Grants Program ...,1555,Biochemical engineering
3173,"Callaghan, Jack",Kinesiology,33,University of Waterloo,Ontario,CANADA,2009,24548,EGP,Engage Grants Program ...,1553,Human factors engineering
3478,"Cercone, Nick","Science and Engineering , Faculty of",38,York University,Ontario,CANADA,2009,25000,EGP,Engage Grants Program ...,1553,Software and development
3924,"Cheng, Yufeng(Frank)",Mechanical and Manufacturing Engineering,11,University of Calgary,Alberta,CANADA,2009,20200,EGP,Engage Grants Program ...,1555,"Materials structure, properties and testing"
4491,"Coops, Nicholas",Forest Resources Management,2,University of British Columbia,British Columbia,CANADA,2009,24500,EGP,Engage Grants Program ...,1554,Renewable and non-renewable resources management


In [72]:
title = 'Engage Grants and Related Program Awards over Time'
plotPointSizes = 7 
opacity = 1

nsercPlot(allRelatedPrograms)

In [83]:
overview(columnYouWantInformationOn, allRelatedPrograms)

The mean of AwardAmount is 24055.6383517
The median of AwardAmount is 25000.0
The standard deviation of AwardAmount is 3114.87975677
The total number of awards for your selection is 7911
