
<img src="http://www.nserc-crsng.gc.ca/_gui/wmms.gif" alt="Canada logo" align="right">

<br>

<img src="http://www.triumf.ca/sites/default/files/styles/gallery_large/public/images/nserc_crsng.gif?itok=H7AhTN_F" alt="NSERC logo" align="right" width = 90>



# Exploring NSERC Awards Data


Canada's [Open Government Portal](http://open.canada.ca/en) includes [NSERC Awards Data](http://open.canada.ca/data/en/dataset/c1b0f627-8c29-427c-ab73-33968ad9176e) from 1995 through 2016.

The awards data (in .csv format) were copied to an [Amazon Web Services S3 bucket](http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html). <br>

This open Jupyter notebook shows all of the "orphaned" researchers who are funded by the 1508 Mathematics and Statistics committee in 2016 but are not part of the mathematics department.

> **Acknowledgement:** I thank [Ian Allison](https://github.com/ianabc) and [James Colliander](http://colliand.com) of the [Pacific Institute for the Mathematical Sciences](http://www.pims.math.ca/) for building the [JupyterHub service](http://syzygy.ca) and for help with this notebook. -- I. Heisz

In [1]:
import numpy as np
import pandas as pd
import sys

df = pd.DataFrame()

startYear = 2016
endYear   = 2017  # The last year is not included, so if it was 2017 it means we include the 2016 collection but not 2017.

for year in range(startYear, endYear):
    file = 'https://s3.ca-central-1.amazonaws.com/open-data-ro/NSERC/NSERC_GRT_FYR' + str(year) + '_AWARD.csv.gz'
    df = df.append(pd.read_csv(file, 
                               compression='gzip', 
                               usecols = [1, 2, 3, 4, 5, 7, 11, 12, 13, 17, 28], 
                               encoding='latin-1'
                              )
                  )  
    print(year)
 
## Rename columns for better readability.
df.columns = ['Name', 'Department', 'OrganizationID',
                 'Institution', 'ProvinceEN', 'CountryEN',
                 'AwardAmount', 'ProgramID',
                 'ProgramNameEN', 'Committee', 'ResearchSubjectEN']

## Strip out any leading or trailing whitespace in the ProgramID column
df['ProgramID'] = df['ProgramID'].str.strip();

2016


In [12]:
selectedData = df
selectedData = selectedData.loc[(selectedData['Committee'] == 1508)]

In [13]:
# remove people in the math department
subject = 'Math'
selectedData = selectedData[selectedData['Department'].str.contains(subject)==False]

# remove people in the statistics department
subject = 'Stat'
selectedData = selectedData[selectedData['Department'].str.contains(subject)==False]

In [14]:
selectedData = selectedData.drop('Committee', axis=1)
selectedData

Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
83,"Abrahamowicz, Michal","Epidemiology, Biostatistics and Occupational H...",61,McGill University,Québec,CANADA,22000,RGPIN,Discovery Grants Program - Individual,Biostatistics
689,"Anton, Cristina",Arts and Science Division - Arts and Science D...,3205,Grant MacEwan University,Alberta,CANADA,10000,DDG,Discovery Development Grant,Not available
1101,"Bailey, Robert",Grenfell Campus - Grenfell Campus,89,Memorial University of Newfoundland,Newfoundland and Labrador,CANADA,18000,RGPIN,Discovery Grants Program - Individual,Combinatorics
1719,"Beltaos, Elaine",Arts and Science Division - Arts and Science D...,3205,Grant MacEwan University,Alberta,CANADA,10000,DDG,Discovery Development Grant,Not available
1737,"BenAbdallah, Ramzi",Finance (École des sciences de la gestion),57,Université du Québec à Montréal,Québec,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Civil engineering
2021,"Beyene, Joseph",Clinical Epidemiology and Biostatistics - Clin...,27,McMaster University,Ontario,CANADA,14000,RGPIN,Discovery Grants Program - Individual,Biostatistics
2074,"Bickel, David","Biochemistry, Microbiology and Immunology",28,University of Ottawa,Ontario,CANADA,14000,RGPIN,Discovery Grants Program - Individual,Statistical theory
2335,"Bohun, Christopher","Science, Faculty of",19222,University of Ontario Institute of Technology,Ontario,CANADA,11000,RGPIN,Discovery Grants Program - Individual,Mathematical modelling
2856,"Briollais, Laurent","Public Health, Dalla Lana School of - Public H...",31,University of Toronto,Ontario,CANADA,11000,RGPIN,Discovery Grants Program - Individual,Biostatistics
2976,"Brown, Patrick","Public Health, Dalla Lana School of",31,University of Toronto,Ontario,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Applied statistics


### UBC Orphans

In [21]:
selectedData = selectedData.loc[(selectedData['OrganizationID'] == 2)]
selectedData.drop(['ProvinceEN', 'CountryEN', 'OrganizationID', 'Institution'], axis = 1)

Unnamed: 0,Name,Department,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
6962,"Fisher, Adlai",Sauder School of Business - Sauder School of B...,14000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
11068,"Kasahara, Hiroyuki",Economics,11000,RGPIN,Discovery Grants Program - Individual,Statistical theory
13602,"Loeppky, Jason",Okanagan - Irving K Barber School of Arts and ...,20000,RGPIN,Discovery Grants Program - Individual,Applied statistics
14039,"MacNab, Ying","Population and Public Health, School of",15000,RGPIN,Discovery Grants Program - Individual,Biostatistics
19131,"Rolfsen, Dale","Science, Faculty of",21000,RGPIN,Discovery Grants Program - Individual,Algebra
19786,"Savalei, Victoria",Psychology - Psychology,14000,RGPIN,Discovery Grants Program - Individual,Applied statistics
