# Donors Choose Project Approval Prediction
The goal of the DonorsChoose competition is to build a model that can accurately predict whether a teacher's project proposal was accepted, based on the data they provided in their application

## Read in data

In [446]:
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import log_loss, mean_squared_error, accuracy_score, precision_score, recall_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import Imputer, LabelEncoder, LabelBinarizer
from sklearn_pandas import DataFrameMapper
import numpy as np

In [447]:
pd.options.display.max_columns = 999

In [448]:
raw_data = pd.read_csv('train.csv', sep=',', parse_dates = ['project_submitted_datetime'])

In [449]:
raw_data

Unnamed: 0,id,teacher_id,teacher_prefix,school_state,project_submitted_datetime,project_grade_category,project_subject_categories,project_subject_subcategories,project_title,project_essay_1,project_essay_2,project_essay_3,project_essay_4,project_resource_summary,teacher_number_of_previously_posted_projects,project_is_approved
0,p036502,484aaf11257089a66cfedc9461c6bd0a,Ms.,NV,2016-11-18 14:45:59,Grades PreK-2,Literacy & Language,Literacy,Super Sight Word Centers,Most of my kindergarten students come from low...,I currently have a differentiated sight word c...,,,My students need 6 Ipod Nano's to create and d...,26,1
1,p039565,df72a3ba8089423fa8a94be88060f6ed,Mrs.,GA,2017-04-26 15:57:28,Grades 3-5,"Music & The Arts, Health & Sports","Performing Arts, Team Sports",Keep Calm and Dance On,Our elementary school is a culturally rich sch...,We strive to provide our diverse population of...,,,My students need matching shirts to wear for d...,1,0
2,p233823,a9b876a9252e08a55e3d894150f75ba3,Ms.,UT,2017-01-01 22:57:44,Grades 3-5,"Math & Science, Literacy & Language","Applied Sciences, Literature & Writing",Lets 3Doodle to Learn,Hello;\r\nMy name is Mrs. Brotherton. I teach ...,We are looking to add some 3Doodler to our cla...,,,My students need the 3doodler. We are an SEM s...,5,1
3,p185307,525fdbb6ec7f538a48beebaa0a51b24f,Mr.,NC,2016-08-12 15:42:11,Grades 3-5,Health & Sports,Health & Wellness,"\""Kid Inspired\"" Equipment to Increase Activit...",My students are the greatest students but are ...,"The student's project which is totally \""kid-i...",,,My students need balls and other activity equi...,16,0
4,p013780,a63b5547a7239eae4c1872670848e61a,Mr.,CA,2016-08-06 09:09:11,Grades 6-8,Health & Sports,Health & Wellness,We need clean water for our culinary arts class!,My students are athletes and students who are ...,For some reason in our kitchen the water comes...,,,My students need a water filtration system for...,42,1
5,p063374,403c6783e9286e51ab318fba40f8d729,Mrs.,DE,2016-11-05 10:01:51,Grades PreK-2,"Applied Learning, Literacy & Language","Character Education, Literature & Writing",Need to Reach Our Virtual Mentors!!!,My kids tell me each day that they want to mak...,I started a program called Telementoring in ho...,,,My students need tablets in order to communic...,0,1
6,p103285,4e156c5fb3eea2531601c8736f3751a7,Mrs.,MO,2016-08-31 00:30:43,Grades PreK-2,Health & Sports,Health & Wellness,Active Kindergartners,Kindergarten is the new first grade. My studen...,With balance discs and stools as flexible seat...,,,My students need stability stools and inflatab...,1,1
7,p181781,c71f2ef13b4bc91afac61ca8fd4c0f9f,Mrs.,SC,2016-08-03 13:26:01,Grades PreK-2,"Applied Learning, Literacy & Language","Early Development, Literature & Writing",Fabulous Firsties-Wiggling to Learn!,First graders are fantastic! They are excited ...,First graders love learning! We need 6 wiggle-...,,,My students need wiggle stools to allow them t...,0,1
8,p114989,b580c11b1497a0a67317763b7f03eb27,Ms.,IN,2016-09-13 22:35:57,Grades 6-8,Math & Science,Mathematics,Wobble Chairs Help Fidgety Kids Focus,My seventh graders dream big. They can't wait ...,I have used alternative seating in my classroo...,,,My students need seating that allows the most ...,13,1
9,p191410,2071fb0af994f8f16e7c6ed0f35062a1,Mrs.,IL,2016-09-24 18:38:59,Grades PreK-2,Literacy & Language,Literacy,Snuggle Up With A Good Book,I teach first grade in a small farming town in...,There is nothing better than snuggling up with...,,,My students need 2 youth sized reclining chair...,12,1


In [450]:
raw_data.dtypes

id                                                      object
teacher_id                                              object
teacher_prefix                                          object
school_state                                            object
project_submitted_datetime                      datetime64[ns]
project_grade_category                                  object
project_subject_categories                              object
project_subject_subcategories                           object
project_title                                           object
project_essay_1                                         object
project_essay_2                                         object
project_essay_3                                         object
project_essay_4                                         object
project_resource_summary                                object
teacher_number_of_previously_posted_projects             int64
project_is_approved                                    

## Clean Data
We'll impute null values and handle high cardinality by mapping low-occurence categories to other.

In [451]:
raw_data[raw_data.isnull().any(axis=1)]

Unnamed: 0,id,teacher_id,teacher_prefix,school_state,project_submitted_datetime,project_grade_category,project_subject_categories,project_subject_subcategories,project_title,project_essay_1,project_essay_2,project_essay_3,project_essay_4,project_resource_summary,teacher_number_of_previously_posted_projects,project_is_approved
0,p036502,484aaf11257089a66cfedc9461c6bd0a,Ms.,NV,2016-11-18 14:45:59,Grades PreK-2,Literacy & Language,Literacy,Super Sight Word Centers,Most of my kindergarten students come from low...,I currently have a differentiated sight word c...,,,My students need 6 Ipod Nano's to create and d...,26,1
1,p039565,df72a3ba8089423fa8a94be88060f6ed,Mrs.,GA,2017-04-26 15:57:28,Grades 3-5,"Music & The Arts, Health & Sports","Performing Arts, Team Sports",Keep Calm and Dance On,Our elementary school is a culturally rich sch...,We strive to provide our diverse population of...,,,My students need matching shirts to wear for d...,1,0
2,p233823,a9b876a9252e08a55e3d894150f75ba3,Ms.,UT,2017-01-01 22:57:44,Grades 3-5,"Math & Science, Literacy & Language","Applied Sciences, Literature & Writing",Lets 3Doodle to Learn,Hello;\r\nMy name is Mrs. Brotherton. I teach ...,We are looking to add some 3Doodler to our cla...,,,My students need the 3doodler. We are an SEM s...,5,1
3,p185307,525fdbb6ec7f538a48beebaa0a51b24f,Mr.,NC,2016-08-12 15:42:11,Grades 3-5,Health & Sports,Health & Wellness,"\""Kid Inspired\"" Equipment to Increase Activit...",My students are the greatest students but are ...,"The student's project which is totally \""kid-i...",,,My students need balls and other activity equi...,16,0
4,p013780,a63b5547a7239eae4c1872670848e61a,Mr.,CA,2016-08-06 09:09:11,Grades 6-8,Health & Sports,Health & Wellness,We need clean water for our culinary arts class!,My students are athletes and students who are ...,For some reason in our kitchen the water comes...,,,My students need a water filtration system for...,42,1
5,p063374,403c6783e9286e51ab318fba40f8d729,Mrs.,DE,2016-11-05 10:01:51,Grades PreK-2,"Applied Learning, Literacy & Language","Character Education, Literature & Writing",Need to Reach Our Virtual Mentors!!!,My kids tell me each day that they want to mak...,I started a program called Telementoring in ho...,,,My students need tablets in order to communic...,0,1
6,p103285,4e156c5fb3eea2531601c8736f3751a7,Mrs.,MO,2016-08-31 00:30:43,Grades PreK-2,Health & Sports,Health & Wellness,Active Kindergartners,Kindergarten is the new first grade. My studen...,With balance discs and stools as flexible seat...,,,My students need stability stools and inflatab...,1,1
7,p181781,c71f2ef13b4bc91afac61ca8fd4c0f9f,Mrs.,SC,2016-08-03 13:26:01,Grades PreK-2,"Applied Learning, Literacy & Language","Early Development, Literature & Writing",Fabulous Firsties-Wiggling to Learn!,First graders are fantastic! They are excited ...,First graders love learning! We need 6 wiggle-...,,,My students need wiggle stools to allow them t...,0,1
8,p114989,b580c11b1497a0a67317763b7f03eb27,Ms.,IN,2016-09-13 22:35:57,Grades 6-8,Math & Science,Mathematics,Wobble Chairs Help Fidgety Kids Focus,My seventh graders dream big. They can't wait ...,I have used alternative seating in my classroo...,,,My students need seating that allows the most ...,13,1
9,p191410,2071fb0af994f8f16e7c6ed0f35062a1,Mrs.,IL,2016-09-24 18:38:59,Grades PreK-2,Literacy & Language,Literacy,Snuggle Up With A Good Book,I teach first grade in a small farming town in...,There is nothing better than snuggling up with...,,,My students need 2 youth sized reclining chair...,12,1


Find the modal class for imputing a missing value in a categorical class

In [452]:
def modalClass(df, colname):
    counts = df.dropna(subset=[colname])[colname].value_counts()
    return counts.idxmax()

Whittle function for reducing cardinality on train data

In [453]:
def whittle_train(df, cols_to_whittle, min_count):
    for col in cols_to_whittle:
        count = df[col].value_counts()
        classes_to_replace = count[count <= min_count].index
        if classes_to_replace.tolist():
            df[col].replace(classes_to_replace, 'other', inplace=True)
            print("replaced " + col)
        else:
            continue
    return df

Whittle function to replicate the same whittling on test data

In [454]:
def whittle_test(train_df, test_df, cols_to_whittle, min_count):
    for col in cols_to_whittle:
        print(col)
        count = train_df[col].value_counts()
        classes_to_keep = count[count > min_count].index.tolist()
        print(len(classes_to_keep))
        test_classes = test_df[col].value_counts().index.tolist()
        classes_to_replace = list(set(test_classes) - set(classes_to_keep))
        print(len(classes_to_replace))
        if (col in test_df.columns) & (len(classes_to_replace) > 0):
            test_df[col].replace(classes_to_replace, 'other', inplace=True)
            print("replaced " + col)
        else:
            continue
    return test_df

In [455]:
cols_to_whittle = ["school_state", "project_subject_categories", "project_grade_category", "project_subject_subcategories"]
whittle_min = raw_data.shape[0] / 25

In [456]:
clean_data = raw_data

In [457]:
clean_data.fillna({"project_essay_3": "exempt"}, inplace=True)
clean_data.fillna({"project_essay_4": "exempt"}, inplace=True)
clean_data.fillna({"teacher_prefix": modalClass(clean_data, 'teacher_prefix')}, inplace=True)
whittle_train(clean_data, cols_to_whittle, whittle_min)

replaced school_state
replaced project_subject_categories
replaced project_subject_subcategories


Unnamed: 0,id,teacher_id,teacher_prefix,school_state,project_submitted_datetime,project_grade_category,project_subject_categories,project_subject_subcategories,project_title,project_essay_1,project_essay_2,project_essay_3,project_essay_4,project_resource_summary,teacher_number_of_previously_posted_projects,project_is_approved
0,p036502,484aaf11257089a66cfedc9461c6bd0a,Ms.,other,2016-11-18 14:45:59,Grades PreK-2,Literacy & Language,Literacy,Super Sight Word Centers,Most of my kindergarten students come from low...,I currently have a differentiated sight word c...,exempt,exempt,My students need 6 Ipod Nano's to create and d...,26,1
1,p039565,df72a3ba8089423fa8a94be88060f6ed,Mrs.,other,2017-04-26 15:57:28,Grades 3-5,other,other,Keep Calm and Dance On,Our elementary school is a culturally rich sch...,We strive to provide our diverse population of...,exempt,exempt,My students need matching shirts to wear for d...,1,0
2,p233823,a9b876a9252e08a55e3d894150f75ba3,Ms.,other,2017-01-01 22:57:44,Grades 3-5,other,other,Lets 3Doodle to Learn,Hello;\r\nMy name is Mrs. Brotherton. I teach ...,We are looking to add some 3Doodler to our cla...,exempt,exempt,My students need the 3doodler. We are an SEM s...,5,1
3,p185307,525fdbb6ec7f538a48beebaa0a51b24f,Mr.,NC,2016-08-12 15:42:11,Grades 3-5,Health & Sports,other,"\""Kid Inspired\"" Equipment to Increase Activit...",My students are the greatest students but are ...,"The student's project which is totally \""kid-i...",exempt,exempt,My students need balls and other activity equi...,16,0
4,p013780,a63b5547a7239eae4c1872670848e61a,Mr.,CA,2016-08-06 09:09:11,Grades 6-8,Health & Sports,other,We need clean water for our culinary arts class!,My students are athletes and students who are ...,For some reason in our kitchen the water comes...,exempt,exempt,My students need a water filtration system for...,42,1
5,p063374,403c6783e9286e51ab318fba40f8d729,Mrs.,other,2016-11-05 10:01:51,Grades PreK-2,other,other,Need to Reach Our Virtual Mentors!!!,My kids tell me each day that they want to mak...,I started a program called Telementoring in ho...,exempt,exempt,My students need tablets in order to communic...,0,1
6,p103285,4e156c5fb3eea2531601c8736f3751a7,Mrs.,other,2016-08-31 00:30:43,Grades PreK-2,Health & Sports,other,Active Kindergartners,Kindergarten is the new first grade. My studen...,With balance discs and stools as flexible seat...,exempt,exempt,My students need stability stools and inflatab...,1,1
7,p181781,c71f2ef13b4bc91afac61ca8fd4c0f9f,Mrs.,other,2016-08-03 13:26:01,Grades PreK-2,other,other,Fabulous Firsties-Wiggling to Learn!,First graders are fantastic! They are excited ...,First graders love learning! We need 6 wiggle-...,exempt,exempt,My students need wiggle stools to allow them t...,0,1
8,p114989,b580c11b1497a0a67317763b7f03eb27,Ms.,other,2016-09-13 22:35:57,Grades 6-8,Math & Science,Mathematics,Wobble Chairs Help Fidgety Kids Focus,My seventh graders dream big. They can't wait ...,I have used alternative seating in my classroo...,exempt,exempt,My students need seating that allows the most ...,13,1
9,p191410,2071fb0af994f8f16e7c6ed0f35062a1,Mrs.,IL,2016-09-24 18:38:59,Grades PreK-2,Literacy & Language,Literacy,Snuggle Up With A Good Book,I teach first grade in a small farming town in...,There is nothing better than snuggling up with...,exempt,exempt,My students need 2 youth sized reclining chair...,12,1


In [458]:
if (raw_data[raw_data.isnull().any(axis=1)]).shape[0] == 0:
    print("No NAs left :)")
else: 
    print("Check for remaining missing values")

No NAs left :)


In [459]:
#clean_data['project_subject_categories'].str.contains(',')

In [460]:
clean_data.head()

Unnamed: 0,id,teacher_id,teacher_prefix,school_state,project_submitted_datetime,project_grade_category,project_subject_categories,project_subject_subcategories,project_title,project_essay_1,project_essay_2,project_essay_3,project_essay_4,project_resource_summary,teacher_number_of_previously_posted_projects,project_is_approved
0,p036502,484aaf11257089a66cfedc9461c6bd0a,Ms.,other,2016-11-18 14:45:59,Grades PreK-2,Literacy & Language,Literacy,Super Sight Word Centers,Most of my kindergarten students come from low...,I currently have a differentiated sight word c...,exempt,exempt,My students need 6 Ipod Nano's to create and d...,26,1
1,p039565,df72a3ba8089423fa8a94be88060f6ed,Mrs.,other,2017-04-26 15:57:28,Grades 3-5,other,other,Keep Calm and Dance On,Our elementary school is a culturally rich sch...,We strive to provide our diverse population of...,exempt,exempt,My students need matching shirts to wear for d...,1,0
2,p233823,a9b876a9252e08a55e3d894150f75ba3,Ms.,other,2017-01-01 22:57:44,Grades 3-5,other,other,Lets 3Doodle to Learn,Hello;\r\nMy name is Mrs. Brotherton. I teach ...,We are looking to add some 3Doodler to our cla...,exempt,exempt,My students need the 3doodler. We are an SEM s...,5,1
3,p185307,525fdbb6ec7f538a48beebaa0a51b24f,Mr.,NC,2016-08-12 15:42:11,Grades 3-5,Health & Sports,other,"\""Kid Inspired\"" Equipment to Increase Activit...",My students are the greatest students but are ...,"The student's project which is totally \""kid-i...",exempt,exempt,My students need balls and other activity equi...,16,0
4,p013780,a63b5547a7239eae4c1872670848e61a,Mr.,CA,2016-08-06 09:09:11,Grades 6-8,Health & Sports,other,We need clean water for our culinary arts class!,My students are athletes and students who are ...,For some reason in our kitchen the water comes...,exempt,exempt,My students need a water filtration system for...,42,1


## Data Exploration

Our dataset contains rows of project applications, and we want to predict approval -- the project_is_approved column.

Hopefully, a rich feature will be the contents of the project essays. There's a lot of text featurizing we could do with the essay columns, but I will try to keep it simple first to see what we can get with the other columns and basic information about the essays before digging deeper.

Let's start by getting an idea of our average approval rate.

In [461]:
clean_data["project_is_approved"].mean()

0.8476823374340949

Approval rate is much higher than I expected on the training dataset -- 85% of projects get approved. So a naive solution could get us 85% accuracy. This is an imbalanced class, but not so imbalanced that we should be worried about identifying the minority class if we can extract useful features.

## Feature Extraction
We have a few opportunities for features.
1. Parse out the project subject categories and subcategories and one hot encode them.
2. Length of each essay (wordcount)
3. Look for certain words in the project title

Not going to do anything more intense with the text columns yet.

In [462]:
non_model_cols = ['id', 'project_submitted_datetime', 'project_essay_1', 
                  'project_essay_2', 'project_essay_3', 'project_essay_4',
                  'project_title', 'project_resource_summary', 'teacher_id']
categorical_cols = ['teacher_prefix', 'school_state', 'project_grade_category', 'project_subject_categories',
                   'project_subject_subcategories']
numeric_cols = ['teacher_number_of_previously_posted_projects']
label_col = 'project_is_approved'

model_df = clean_data.drop(non_model_cols, axis=1)

for col in categorical_cols:
    model_df[col] = model_df[col].astype('category')
    
model_df.dtypes

teacher_prefix                                  category
school_state                                    category
project_grade_category                          category
project_subject_categories                      category
project_subject_subcategories                   category
teacher_number_of_previously_posted_projects       int64
project_is_approved                                int64
dtype: object

In [463]:
model_df = pd.get_dummies(model_df, columns=categorical_cols)

In [464]:
model_df.shape

(182080, 31)

Our dimensions still look good after one hot encoding.

In [465]:
model_df.head()

Unnamed: 0,teacher_number_of_previously_posted_projects,project_is_approved,teacher_prefix_Dr.,teacher_prefix_Mr.,teacher_prefix_Mrs.,teacher_prefix_Ms.,teacher_prefix_Teacher,school_state_CA,school_state_FL,school_state_IL,school_state_NC,school_state_NY,school_state_TX,school_state_other,project_grade_category_Grades 3-5,project_grade_category_Grades 6-8,project_grade_category_Grades 9-12,project_grade_category_Grades PreK-2,project_subject_categories_Health & Sports,project_subject_categories_Literacy & Language,"project_subject_categories_Literacy & Language, Math & Science",project_subject_categories_Math & Science,project_subject_categories_Music & The Arts,project_subject_categories_other,project_subject_subcategories_Literacy,"project_subject_subcategories_Literacy, Literature & Writing","project_subject_subcategories_Literacy, Mathematics",project_subject_subcategories_Literature & Writing,"project_subject_subcategories_Literature & Writing, Mathematics",project_subject_subcategories_Mathematics,project_subject_subcategories_other
0,26,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0
1,1,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1
2,5,1,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1
3,16,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1
4,42,1,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1


## Model Set Up

### Split into training and test

In [466]:
model_df_X = model_df.drop(['project_is_approved'], axis=1)
model_df_y = model_df['project_is_approved']

In [467]:
X_train, X_dev, y_train, y_dev = train_test_split(model_df_X, model_df_y, train_size=0.8, random_state=99)

## Fit model

In [468]:
lrModel = LogisticRegression()
lrModel.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [469]:
cross_val_score(lrModel, X_train, y_train, cv=10, scoring='neg_log_loss')

array([-0.42060839, -0.42049355, -0.41970193, -0.42059005, -0.420159  ,
       -0.42102165, -0.41966781, -0.4207027 , -0.42008026, -0.42116455])

In [470]:
print(confusion_matrix(y_dev, lrModel.predict(X_dev))) # known x predicted
print(accuracy_score(y_dev, lrModel.predict(X_dev)))
print(classification_report(y_dev, lrModel.predict(X_dev)))

[[    0  5632]
 [    0 30784]]
0.8453427065026362
             precision    recall  f1-score   support

          0       0.00      0.00      0.00      5632
          1       0.85      1.00      0.92     30784

avg / total       0.71      0.85      0.77     36416



  'precision', 'predicted', average, warn_for)


In [471]:
importances = pd.DataFrame({'feature':X_train.columns,'coefficients':np.round(lrModel.coef_[0],3)})
importances = importances.sort_values('coefficients',ascending=False).set_index('feature')
print(importances)

                                                    coefficients
feature                                                         
teacher_prefix_Mrs.                                        0.335
project_subject_subcategories_Literacy, Mathema...         0.325
project_subject_categories_Literacy & Language             0.287
project_subject_subcategories_Literature & Writ...         0.262
project_grade_category_Grades 3-5                          0.255
teacher_prefix_Ms.                                         0.243
teacher_prefix_Mr.                                         0.240
project_subject_subcategories_Literacy                     0.221
project_grade_category_Grades 6-8                          0.219
school_state_IL                                            0.216
project_subject_categories_Music & The Arts                0.207
school_state_CA                                            0.197
project_grade_category_Grades 9-12                         0.185
school_state_other       

## Submit Kaggle Entry

In [472]:
# Filepath to main test dataset.
test_file_path = 'test.csv'

# Read data and store in DataFrame.
test_data = pd.read_csv(test_file_path, sep=',')

  interactivity=interactivity, compiler=compiler, result=result)


In [473]:
test_data.head()

Unnamed: 0,id,teacher_id,teacher_prefix,school_state,project_submitted_datetime,project_grade_category,project_subject_categories,project_subject_subcategories,project_title,project_essay_1,project_essay_2,project_essay_3,project_essay_4,project_resource_summary,teacher_number_of_previously_posted_projects
0,p233245,5724a0c3ce11008366fff36dab4b943c,Ms.,CA,2016-04-27 13:45:41,Grades PreK-2,Music & The Arts,Visual Arts,Art Will Make You Happy!,My 2nd grade students are amazing! They are v...,My class is made up of 12 boys and 12 girls. ...,My second grade class will really benefit from...,The genorous donations to my project will make...,My students need a drying rack for their art p...,2
1,p096795,445619941dc7cbe81c7be109dc61a56a,Mrs.,SC,2016-04-28 12:43:56,Grades 3-5,"Literacy & Language, Math & Science","Literature & Writing, Mathematics",Keeping up with the TIMEs,Students within the classroom work in small gr...,My students are all very talented young indivi...,"We do a lot of small group, where the student ...",This project is very important to my classroom...,My students need Time Magazines for next year ...,1
2,p236235,e92a4902b1611a189643d6f12c51e6a0,Mrs.,SC,2016-04-29 21:16:05,Grades PreK-2,Math & Science,"Applied Sciences, Mathematics",Building Bridges to Problem Solving,My students share a love of learning. These s...,My class consists of 14 energetic learners. O...,These Fairy Tale Problem Solving STEM kits wil...,These materials will be help my students with ...,My students need to be mentally stimulated to ...,0
3,p233680,8e92622d2985d3faac1de71609c4be72,Mrs.,IA,2016-04-27 22:32:43,Grades PreK-2,Literacy & Language,Literacy,Classroom Library,Reading is the gateway to the soul. Guiding c...,First graders enter the classroom each day rea...,Book bins will help to organize our classroom ...,"When students begin the first grade, many are ...",My students need an organized classroom library.,0
4,p171879,91a3c89981f626d9a0d067c65fb186ce,Mr.,CA,2016-04-27 18:59:15,Grades 6-8,"Music & The Arts, Special Needs","Performing Arts, Special Needs",Reeds so we can Read,"\""Mr. Reyes! I need another reed!\"" I hear t...",We have a diverse population with almost entir...,Each day in my class students consume supplies...,My students come from very troubled homes and ...,My students need reeds to perform in class eac...,1


In [474]:
test_data.fillna({"project_essay_3": "exempt"}, inplace=True)
test_data.fillna({"project_essay_4": "exempt"}, inplace=True)
test_data.fillna({"teacher_prefix": modalClass(raw_data, 'teacher_prefix')}, inplace=True)
whittle_test(train_df=raw_data, test_df=test_data, cols_to_whittle=cols_to_whittle, min_count=whittle_min)

school_state
7
45
replaced school_state
project_subject_categories
6
46
replaced project_subject_categories
project_grade_category
4
0
project_subject_subcategories
7
384
replaced project_subject_subcategories


Unnamed: 0,id,teacher_id,teacher_prefix,school_state,project_submitted_datetime,project_grade_category,project_subject_categories,project_subject_subcategories,project_title,project_essay_1,project_essay_2,project_essay_3,project_essay_4,project_resource_summary,teacher_number_of_previously_posted_projects
0,p233245,5724a0c3ce11008366fff36dab4b943c,Ms.,CA,2016-04-27 13:45:41,Grades PreK-2,Music & The Arts,other,Art Will Make You Happy!,My 2nd grade students are amazing! They are v...,My class is made up of 12 boys and 12 girls. ...,My second grade class will really benefit from...,The genorous donations to my project will make...,My students need a drying rack for their art p...,2
1,p096795,445619941dc7cbe81c7be109dc61a56a,Mrs.,other,2016-04-28 12:43:56,Grades 3-5,"Literacy & Language, Math & Science","Literature & Writing, Mathematics",Keeping up with the TIMEs,Students within the classroom work in small gr...,My students are all very talented young indivi...,"We do a lot of small group, where the student ...",This project is very important to my classroom...,My students need Time Magazines for next year ...,1
2,p236235,e92a4902b1611a189643d6f12c51e6a0,Mrs.,other,2016-04-29 21:16:05,Grades PreK-2,Math & Science,other,Building Bridges to Problem Solving,My students share a love of learning. These s...,My class consists of 14 energetic learners. O...,These Fairy Tale Problem Solving STEM kits wil...,These materials will be help my students with ...,My students need to be mentally stimulated to ...,0
3,p233680,8e92622d2985d3faac1de71609c4be72,Mrs.,other,2016-04-27 22:32:43,Grades PreK-2,Literacy & Language,Literacy,Classroom Library,Reading is the gateway to the soul. Guiding c...,First graders enter the classroom each day rea...,Book bins will help to organize our classroom ...,"When students begin the first grade, many are ...",My students need an organized classroom library.,0
4,p171879,91a3c89981f626d9a0d067c65fb186ce,Mr.,CA,2016-04-27 18:59:15,Grades 6-8,other,other,Reeds so we can Read,"\""Mr. Reyes! I need another reed!\"" I hear t...",We have a diverse population with almost entir...,Each day in my class students consume supplies...,My students come from very troubled homes and ...,My students need reeds to perform in class eac...,1
5,p016071,3964746c32aa70b7161236d1eed9e98b,Ms.,other,2016-04-28 17:21:52,Grades 3-5,Literacy & Language,Literacy,Classroom Library,A typical day in our classroom can be a little...,My students come from a variety of backgrounds...,The books that I am requesting came directly f...,This project will truly change the lives of my...,My students need books that are engaging and e...,2
6,p099906,04dcdcf90807262e5cbe3a7a1435ca8b,Mrs.,other,2016-04-29 08:12:53,Grades 6-8,Literacy & Language,"Literacy, Literature & Writing",A Tisket A Tasket A Set of Student Tablets,My students crave engaging texts with which th...,My students face many obstacles in their quest...,Having access to a class set of Kindle Fires w...,Donations to this project will open up a world...,My students need a class set of Kindle Fires s...,13
7,p200236,d0f34635d41cf68d4d4a538194c88f4c,Mrs.,other,2016-04-29 12:35:48,Grades PreK-2,other,other,Literacy and Engineering,My students have to sit around a CD player to ...,My kindergarten students attend a school that ...,Our classroom needs a good listening center so...,When this project is fully funded it will bene...,My students need a listening center so they ca...,0
8,p129452,f725cbd914b053fabc2c234e54b11828,Mrs.,other,2016-04-29 13:52:41,Grades PreK-2,"Literacy & Language, Math & Science","Literature & Writing, Mathematics",Learning is Fun!,My students are new to school when they come t...,I teach 19 wonderful kids who want to learn to...,We need some fun letter and number stamping to...,My classroom needs to be fun and exciting. Fiv...,My students need some updated supplies for rea...,13
9,p186652,cee98e1a685275fe26656db019d85fc5,Mrs.,other,2016-04-29 22:48:48,Grades 3-5,"Literacy & Language, Math & Science","Literacy, Mathematics",Writing Our Way Through Math,My students have been working on DIY whiteboar...,Currently I have 21 students who love to prove...,In my math class we have adopted the guided ma...,Classroom never have enough surfaces to write ...,My students need a surface to write on and sho...,0


In [475]:
score_df = test_data.drop(non_model_cols, axis=1)
score_df = pd.get_dummies(score_df, columns=categorical_cols)

In [476]:
preds = lrModel.predict_proba(score_df)

In [478]:
my_submission = pd.DataFrame({'id': test_data["id"], 'project_is_approved': preds[:,1]})
print(my_submission.values)

[['p233245' 0.8326046741498686]
 ['p096795' 0.8718497686750228]
 ['p236235' 0.8070193031655466]
 ...
 ['p210728' 0.8826860412936758]
 ['p060531' 0.8549587115890799]
 ['p087783' 0.8774236802826603]]


In [479]:
my_submission.to_csv('my_submission.csv', index=False)