<a class="anchor" id="top"></a>

# Classify Risk - Batch Process
Author: Ainesh Pandey

This notebook serves as an example of batch categorizing risk of future projects at NASA, allowing you to score multiple projects that are provided through a CSV. You simply need to change the specified user inputs in the first cell and run the notebook.

The classifications will be as follows:
- `Risk Class 0`: Technical Execution Risk
- `Risk Class 1`: Managerial Process Risk
- `Risk Class 2`: Operational Cost Risk

In [1]:
import pandas as pd
from scripts.classify_risk import ClassifyRisk

import warnings
warnings.filterwarnings("ignore")

## Specify User Inputs

Please change the values in the following code cell, if necessary. Default values have already been provided, along with instructions on how to change those values as need.

In [2]:
# set this variable equal to the location of the sample projects file
file_location = '../data/sample_data/SampleProjects_original.pkl'

# specify the model you want to use for classification, default is 'ensemble' which is the best model developed in ModelRiskClass.ipynb
# other options are: 'lm' for the Logistic Regression model
#                    'rfc' for the Random Forest Classifier
#                    'lgb' for the Light GBM Classifier
#                    'knn' for the KNN Classifier
#                    'gnb' for the Gaussian Naive Bayes Classifier
model_type = 'ensemble'

# if the column names for the Lesson ID, Title, and Abstract have changed in the above CSV, please change them here
id_col = 'Lesson ID'
title_col = 'Title'
abstract_col = 'Abstract'

# if you want to save results back to file, change value to location where you want to save the new CSV
# otherwise, leave as empty string ''
new_file_loc = '../data/sample_data/SampleProjects_predictions.csv'


## Classify using Script

In [3]:
df = pd.read_csv(file_location)
df_classified = ClassifyRisk(df, model_type=model_type).batch_classify(id_col, title_col, abstract_col)

# save file if specified
if new_file_loc != '':
    df_classified.to_csv(new_file_loc, index=False)

display(df_classified.shape)
df_classified.head(10)

(10, 6)

Unnamed: 0,Lesson ID,Title,Abstract,Risk Class Code,Risk Class Description,Risk Class Probability
0,1196,"Test as You Fly, Fly as You Test, and Demonstr...",Mars Polar Lander had deficiencies in complian...,2,Operational Cost Risk,0.850275
1,1443,Accident Investigations/Information Technology...,More IT pre-planning would have enabled a quic...,1,Managerial Process Risk,0.885726
2,654,Assessment and Control of Electrical Charges,,2,Operational Cost Risk,0.696342
3,2045,Failure of Pyrotechnic Operated Valves with Du...,Four spacecraft propulsion system pyrovalve no...,2,Operational Cost Risk,0.739755
4,1075,Space Shuttle Program/Logistics/Workforce,,1,Managerial Process Risk,0.861898
5,511,Placement of Aluminum Oxide Grit on Tape Aft o...,,0,Technical Execution Risk,0.487202
6,262,Concurrent Real-Time Operations and Advanced P...,The magnitude of both the real-time cruise act...,2,Operational Cost Risk,0.843077
7,1557,"Management Reviews, Reporting and Technical Pu...",A. Reviews: General feedback received from all...,1,Managerial Process Risk,0.982099
8,430,"Permeability, Swelling and Solvent-Stress-Crac...",During fabrication of the TIROS-N Microwave So...,0,Technical Execution Risk,0.95089
9,1621,Protection of Rocket Chamber Pressure Transduc...,The Dryden Aerospike Rocket Test Director's Di...,0,Technical Execution Risk,0.834837


[Back to Top](#top)