
<img src="http://www.nserc-crsng.gc.ca/_gui/wmms.gif" alt="Canada logo" align="right">

<br>

<img src="http://www.triumf.ca/sites/default/files/styles/gallery_large/public/images/nserc_crsng.gif?itok=H7AhTN_F" alt="NSERC logo" align="right" width = 90>



# Exploring NSERC Awards Data


Canada's [Open Government Portal](http://open.canada.ca/en) includes [NSERC Awards Data](http://open.canada.ca/data/en/dataset/c1b0f627-8c29-427c-ab73-33968ad9176e) from 1995 through 2016.

The awards data (in .csv format) were copied to an [Amazon Web Services S3 bucket](http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html). <br>

This open Jupyter notebook shows all of the "orphaned" researchers who are funded by the 1508 Mathematics and Statistics committee in 2016 but are not part of the mathematics department.

> **Acknowledgement:** I thank [Ian Allison](https://github.com/ianabc) and [James Colliander](http://colliand.com) of the [Pacific Institute for the Mathematical Sciences](http://www.pims.math.ca/) for building the [JupyterHub service](http://syzygy.ca) and for help with this notebook. -- I. Heisz

In [1]:
import numpy as np
import pandas as pd
import sys

df = pd.DataFrame()

startYear = 2014
endYear   = 2019  # The last year is not included, so if it was 2017 it means we include the 2016 collection but not 2017.

for year in range(startYear, endYear):
    file = 'https://s3.ca-central-1.amazonaws.com/open-data-ro/NSERC/NSERC_GRT_FYR' + str(year) + '_AWARD.csv.gz'
    df = df.append(pd.read_csv(file, 
                               compression='gzip', 
                               usecols = [1, 2, 3, 4, 5, 7, 11, 12, 13, 17, 28], 
                               encoding='latin-1'
                              )
                  )  
    print(year)
 
## Rename columns for better readability.
df.columns = ['Name', 'Department', 'OrganizationID',
                 'Institution', 'ProvinceEN', 'CountryEN',
                 'AwardAmount', 'ProgramID',
                 'ProgramNameEN', 'Committee', 'ResearchSubjectEN']

## Strip out any leading or trailing whitespace in the ProgramID column
df['ProgramID'] = df['ProgramID'].str.strip();

2014
2015
2016


HTTPError: HTTP Error 403: Forbidden

In [14]:
selectedData = df
selectedData = selectedData.loc[(selectedData['Committee'] == 1508)]

In [15]:
# remove people in the math department
subject = 'Math'
selectedData = selectedData[selectedData['Department'].str.contains(subject)==False]

# remove people in the statistics department
subject = 'Stat'
selectedData = selectedData[selectedData['Department'].str.contains(subject)==False]

In [16]:
selectedData = selectedData.drop('Committee', axis=1)
selectedData

Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
47,"Ramsay, James",Psychology,61,McGill University,Québec,CANADA,35000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
698,"Whittington, Stuart",Chemistry (St. George Campus),31,University of Toronto,Ontario,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Statistical mechanics and thermodynamics
742,"Heunis, Andrew",Electrical & Computer Engineering,33,University of Waterloo,Ontario,CANADA,20000,RGPIN,Discovery Grants Program - Individual,Applied probability
882,"Rolfsen, Dale","Science, Faculty of",2,University of British Columbia,British Columbia,CANADA,21000,RGPIN,Discovery Grants Program - Individual,Algebra
902,"Jackson, David",Combinatorics and Optimization,33,University of Waterloo,Ontario,CANADA,18000,RGPIN,Discovery Grants Program - Individual,Combinatorics
950,"Donner, Allan",Epidemiology and Biostatistics,36,University of Western Ontario,Ontario,CANADA,16000,RGPIN,Discovery Grants Program - Individual,Biostatistics
951,"Dufour, JeanMarie",Economics,61,McGill University,Québec,CANADA,28000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
970,"Delfour, Michel","Recherches mathématiques, Centre de",63,Université de Montréal,Québec,CANADA,25000,RGPIN,Discovery Grants Program - Individual,Optimisation and optimal control theory
972,"Walter, Stephen",Clinical Epidemiology and Biostatistics,27,McMaster University,Ontario,CANADA,16000,RGPIN,Discovery Grants Program - Individual,Biostatistics
996,"Goulden, Ian",Combinatorics and Optimization,33,University of Waterloo,Ontario,CANADA,34000,RGPIN,Discovery Grants Program - Individual,Combinatorics


## Institution Orphans

In [47]:
pims_sites = ['SFU', 'UA', 'UBC', 'UC', 'UL', 'UM', 'UR', 'US', 'UV']

In [48]:
org_id = {'SFU': 5,
          'UA': 9,
          'UBC': 2,
          'UC': 11,
          'UL': 12,
          'UM': 19,
          'UR': 17,
          'US': 16,
          'UV': 7}

In [57]:
## Localize to single institution (e.g. UA)
df = selectedData.loc[(selectedData['OrganizationID'] == org_id['UC'])]
df.drop(['ProvinceEN', 'CountryEN', 'OrganizationID', 'Institution'], axis = 1)

Unnamed: 0,Name,Department,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
5129,"Deardon, Rob","Veterinary Medicine, Faculty of - Veterinary M...",25000,RGPIN,Discovery Grants Program - Individual,Biostatistics
12942,"Li, Haocheng",Oncology - Oncology,13000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference
5205,"Deardon, Rob","Veterinary Medicine, Faculty of - Veterinary M...",25000,RGPIN,Discovery Grants Program - Individual,Biostatistics
13184,"Li, Haocheng",Oncology - Oncology,13000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference


In [54]:
## List the records of orphans for each PIMS site
for inst in pims_sites:
    orph_inst = selectedData.loc[(selectedData['OrganizationID'] == org_id[inst])]
    orph_inst.drop(['ProvinceEN', 'CountryEN', 'OrganizationID', 'Institution'], axis = 1)
    display(orph_inst)

Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
7571,"Gencay, Ramazan",Economics,5,Simon Fraser University,British Columbia,CANADA,14000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference
7614,"Gencay, Ramazan",Economics,5,Simon Fraser University,British Columbia,CANADA,14000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference
14531,"McCandless, Lawrence","Health Sciences, Faculty of - Health Sciences,...",5,Simon Fraser University,British Columbia,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Biostatistics
7778,"Gencay, Ramazan",Economics,5,Simon Fraser University,British Columbia,CANADA,14000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference
14786,"McCandless, Lawrence","Health Sciences, Faculty of - Health Sciences,...",5,Simon Fraser University,British Columbia,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Biostatistics


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
3731,"Rosychuk, Rhonda",Pediatrics,9,University of Alberta,Alberta,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
4765,"Safouhi, Hassan",Campus Saint-Jean,9,University of Alberta,Alberta,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Numerical analysis
5615,"Heo, Giseon",Dentistry,9,University of Alberta,Alberta,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Applied statistics
7253,"Tcaciuc, Adi",Arts and Science Division,9,University of Alberta,Alberta,CANADA,5838,RGPIN,Discovery Grants Program - Individual,Functional analysis and operator theory
9248,"Heo, Giseon",Dentistry,9,University of Alberta,Alberta,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Applied statistics
18587,"Rosychuk, Rhonda",Pediatrics,9,University of Alberta,Alberta,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
18875,"Safouhi, Hassan",Campus Saint-Jean,9,University of Alberta,Alberta,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Numerical analysis
1611,"Belhamadia, Youssef",Biomedical Engineering - Biomedical Engineering,9,University of Alberta,Alberta,CANADA,11000,RGPIN,Discovery Grants Program - Individual,Mathematical biology andphysiology
9317,"Heo, Giseon",Dentistry,9,University of Alberta,Alberta,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Applied statistics
18781,"Rosychuk, Rhonda",Pediatrics,9,University of Alberta,Alberta,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Statistics and probability


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
882,"Rolfsen, Dale","Science, Faculty of",2,University of British Columbia,British Columbia,CANADA,21000,RGPIN,Discovery Grants Program - Individual,Algebra
4200,"MacNab, Ying","Population and Public Health, School of",2,University of British Columbia,British Columbia,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Biostatistics
3703,"Chen, Jiahua",statistics,2,University of British Columbia,British Columbia,CANADA,38000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference
10737,"Kasahara, Hiroyuki",No Department/Division,2,University of British Columbia,British Columbia,CANADA,11000,RGPIN,Discovery Grants Program - Individual,Statistical theory
13605,"MacNab, Ying","Population and Public Health, School of",2,University of British Columbia,British Columbia,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Biostatistics
18491,"Rolfsen, Dale","Science, Faculty of",2,University of British Columbia,British Columbia,CANADA,21000,RGPIN,Discovery Grants Program - Individual,Algebra
6862,"Fisher, Adlai",Sauder School of Business - Sauder School of B...,2,University of British Columbia,British Columbia,CANADA,14000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
10838,"Kasahara, Hiroyuki",Economics,2,University of British Columbia,British Columbia,CANADA,11000,RGPIN,Discovery Grants Program - Individual,Statistical theory
13361,"Loeppky, Jason",Okanagan - Irving K Barber School of Arts and ...,2,University of British Columbia,British Columbia,CANADA,20000,RGPIN,Discovery Grants Program - Individual,Applied statistics
13800,"MacNab, Ying","Population and Public Health, School of",2,University of British Columbia,British Columbia,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Biostatistics


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
5129,"Deardon, Rob","Veterinary Medicine, Faculty of - Veterinary M...",11,University of Calgary,Alberta,CANADA,25000,RGPIN,Discovery Grants Program - Individual,Biostatistics
12942,"Li, Haocheng",Oncology - Oncology,11,University of Calgary,Alberta,CANADA,13000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference
5205,"Deardon, Rob","Veterinary Medicine, Faculty of - Veterinary M...",11,University of Calgary,Alberta,CANADA,25000,RGPIN,Discovery Grants Program - Individual,Biostatistics
13184,"Li, Haocheng",Oncology - Oncology,11,University of Calgary,Alberta,CANADA,13000,RGPIN,Discovery Grants Program - Individual,Nonparametric inference


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
3548,"Pizzi, Nicolino",Computer Science,19,University of Manitoba,Manitoba,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Mathematical modelling
8735,"Hao, Xuemiao",Warren Centre for Actuarial Studies and Research,19,University of Manitoba,Manitoba,CANADA,16700,RGPIN,Discovery Grants Program - Individual,Applied probability
9381,"Frank, Julieta",Agribusiness & Agricultural Economics,19,University of Manitoba,Manitoba,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Time series analysis
9392,"Torabi, Mahmoud",Community Health Sciences,19,University of Manitoba,Manitoba,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Survey methodology
7098,"Frank, Julieta",Agribusiness & Agricultural Economics,19,University of Manitoba,Manitoba,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Time series analysis
17276,"Pizzi, Nicolino",Computer Science,19,University of Manitoba,Manitoba,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Mathematical modelling
21541,"Torabi, Mahmoud",Community Health Sciences,19,University of Manitoba,Manitoba,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Survey methodology
17457,"Pizzi, Nicolino",Computer Science,19,University of Manitoba,Manitoba,CANADA,12000,RGPIN,Discovery Grants Program - Individual,Mathematical modelling
21769,"Torabi, Mahmoud",Community Health Sciences,19,University of Manitoba,Manitoba,CANADA,17000,RGPIN,Discovery Grants Program - Individual,Survey methodology
24366,"Zhou, Rui",Warren Centre for Actuarial Studies and Resear...,19,University of Manitoba,Manitoba,CANADA,13000,RGPIN,Discovery Grants Program - Individual,Applied statistics


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN
13228,"Feng, CindyXin","Public Health, School of",16,University of Saskatchewan,Saskatchewan,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
6712,"Feng, CindyXin","Public Health, School of",16,University of Saskatchewan,Saskatchewan,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
6704,"Feng, CindyXin","Public Health, School of",16,University of Saskatchewan,Saskatchewan,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Statistics and probability
6810,"Feng, CindyXin","Public Health, School of",16,University of Saskatchewan,Saskatchewan,CANADA,15000,RGPIN,Discovery Grants Program - Individual,Statistics and probability


Unnamed: 0,Name,Department,OrganizationID,Institution,ProvinceEN,CountryEN,AwardAmount,ProgramID,ProgramNameEN,ResearchSubjectEN


In [56]:
## List the names of orphans for each PIMS site
for inst in pims_sites:
    orph_inst = selectedData.loc[(selectedData['OrganizationID'] == org_id[inst])]
    orph_inst.drop(['ProvinceEN', 'CountryEN', 'OrganizationID', 'Institution'], axis = 1)
    print(inst,orph_inst['Name'].unique())

SFU ['Gencay, Ramazan' 'McCandless, Lawrence']
UA ['Rosychuk, Rhonda' 'Safouhi, Hassan' 'Heo, Giseon' 'Tcaciuc, Adi'
 'Belhamadia, Youssef']
UBC ['Rolfsen, Dale' 'MacNab, Ying' 'Chen, Jiahua' 'Kasahara, Hiroyuki'
 'Fisher, Adlai' 'Loeppky, Jason' 'Savalei, Victoria']
UC ['Deardon, Rob' 'Li, Haocheng']
UL []
UM ['Pizzi, Nicolino' 'Hao, Xuemiao' 'Frank, Julieta' 'Torabi, Mahmoud'
 'Zhou, Rui']
UR []
US ['Feng, CindyXin']
UV []
