# Data to Policy Spring 2021

Weston Grewe and Angela Morrison

University of Colorado Denver

Math 7594 Integer Programming

Instructor: Dr. Steffen Borgwardt

## Summary
A college degree is becoming a necessary requirement of entering the middle class. A college education is also an excellent way to lift people out of poverty. Some high schools have high immediate college enrollments while others have low immediate college enrollments. For any one high school, it is nearly impossible to determine which factors contribute significantly to college enrollment. Often, it is a blend of many factors such as class size, proportion of low income students, teacher pay, number of AP classes offered, and many other factors. 

In this project, we create an interpretable optimal decision tree to understand which factors make the greatest impact. We will use data from Massachusetts' public schools in 2017 which can be found on Kaggle.

In [240]:
import numpy as np
import pandas as pd
from numpy import savetxt

In [187]:
raw_data = pd.read_csv('MA_Public_Schools_2017.csv')

The dataset contains 302 fields for 1861 schools. This includes elementary, middle, and high schools as well as schools that serve many grade levels. We will begin by selecting only schools which serve senior high school students. A school which does not serve seniors cannot have immediate college enrollment. We will then select only fields which would be beneficial to this analysis. For example, the number of AP classes taken is relevant while the school's principal is not.

In [189]:
slice1 = raw_data[raw_data['12_Enrollment'] > 1];
slice1.columns.tolist();

Now, we must choose which columns to include in our analysis. It would also be interesting to study mutable and immutable factors in two different analyses to understand what changes. For instance, some factors may be most determining, e.g. poverty/wealth, but schools have no control over these factors. For a decision, schools can only be concerned with mutable factors, e.g. teacher pay, number of AP classes. 

Factors (in order of Col #)
- School type (Public/Charter)
- ZIP 
- District/District Code
- Total Enrollment
- First Lang Not English
- English Lang Learner
- Disability
- High Need
- Economically Disadvantaged
- Race Makeup
- Average Class Size
- Average Salary
- Average Expenditure per Pupil
- % Graduated
- % Dropped Out
- AP Test takers
- Number of Tests Taken
- AP Score
- Average SAT Math
- Average SAT Reading
- Average SAT Writing
- 10th Grade MCAS (If used, filter for 10th graders)
- Accountability Metrics

Replace "%" symbols in column names to avoid possible errors in future column name calling.

In [190]:
slice1.columns = slice1.columns.str.replace('%', 'Percent');

Take only columns with information we care about.

In [191]:
>>> important_cols = slice1[['Total # of Classes', 'TOTAL_Enrollment','Percent First Language Not English',\
                             'Average Class Size','Percent Attending College','AP_Test Takers',\
                             'AP_One Test', 'AP_Two Tests', 'AP_Three Tests','AP_Four Tests', 'AP_Five or More Tests',\
                             'SAT_Tests Taken']]

Get current shape of new dataframe and remove rows with any missing data and get shape of new dataset

In [192]:
print(important_cols.shape)
important_cols.isnull().sum().sum()
clean_imp_cols = important_cols.dropna()
print(clean_imp_cols.shape)

(393, 12)
(295, 12)


Print names of columns in cleaned dataset

In [193]:
clean_imp_cols.columns.tolist();

In [194]:
clean_imp_cols['AP_Test Takers'] = clean_imp_cols['AP_Test Takers'].str.replace(',', '')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_imp_cols['AP_Test Takers'] = clean_imp_cols['AP_Test Takers'].str.replace(',', '')


In [195]:
print(pd.to_numeric(clean_imp_cols['AP_Test Takers']).max())
print(clean_imp_cols['SAT_Tests Taken'].max())
print(clean_imp_cols['Total # of Classes'].max())

1037
623.0
2001.0


Fucntions to Convert Columns to Binary Variables

In [196]:
def convert_SAT_to_bin(column, first_100, second_100, third_100, fourth_100, fifth_100, sixth_100,seventh_100):
    for i in range(len(column)):
        if column[i] <= 100:
            first_100.append(1)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
        elif ((column[i] > 100) and(column[i] <= 200)):
            first_100.append(0)
            second_100.append(1)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
        elif ((column[i] > 200) and(column[i] <= 300)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(1)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
        elif ((column[i] > 300) and(column[i] <= 400)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(1)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
        elif ((column[i] > 400) and(column[i] <= 500)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(1)
            sixth_100.append(0)
            seventh_100.append(0)
        elif ((column[i] > 500) and(column[i] <= 600)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(1)
            seventh_100.append(0)
        else:
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(1)
            
    return first_100, second_100, third_100, fourth_100, fifth_100, sixth_100,seventh_100

In [197]:
def convert_AP_taken_bin(column,first_100, second_100, third_100,fourth_100,\
                         fifth_100, sixth_100, seventh_100,eighth_100,\
                         nineth_100,tenth_100,eleventh_100):
    for i in range(len(column)):
        if column[i] <= 100:
            first_100.append(1)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 100) and(column[i] <= 200)):
            first_100.append(0)
            second_100.append(1)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 200) and(column[i] <= 300)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(1)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 300) and(column[i] <= 400)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(1)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 400) and(column[i] <= 500)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(1)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 500) and(column[i] <= 600)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(1)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 600) and(column[i] <=700)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(1)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 700) and(column[i] <=800)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(1)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 800) and(column[i] <=900)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(1)
            tenth_100.append(0)
            eleventh_100.append(0)
        elif ((column[i] > 900) and(column[i] <=1000)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(1)
            eleventh_100.append(0)
        else:
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
            eleventh_100.append(1)
    
    return first_100,second_100,third_100,fourth_100,fifth_100,sixth_100,seventh_100,eighth_100,\
           nineth_100,tenth_100, eleventh_100

In [198]:
def convert_tot_class_bin(column, fist_250, second_250, third_250, fourth_250, fifth_250,\
                          sitxth_250, seventh_250,eight_250, nineth_250):
    for i in range(len(column)):
        if column[i] <= 250:
            fist_250.append(1) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 250) and (column[i] <=500)):
            fist_250.append(0) 
            second_250.append(1) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 500) and (column[i] <=750)):
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(1) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 750) and (column[i] <1000)):
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(1)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 1000) and (column[i] <=1250)):
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(1)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 1250) and (column[i] <=1500)):
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(1)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 1500) and (column[i] <=1750)):
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(1)
            eight_250.append(0)
            nineth_250.append(0)
        elif ((column[i] > 1750) and (column[i] <=2000)):
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(1)
            nineth_250.append(0)
        else:
            fist_250.append(0) 
            second_250.append(0) 
            third_250.append(0) 
            fourth_250.append(0)
            fifth_250.append(0)
            sitxth_250.append(0)
            seventh_250.append(0)
            eight_250.append(0)
            nineth_250.append(1)
            
    return fist_250, second_250, third_250, fourth_250, fifth_250,sitxth_250, seventh_250,eight_250, nineth_250
    

In [199]:
def convert_to_thirds_perc(column,first_third,middle_third,last_third):
    for i in range(len(column)):
        if column[i] <= 33.0:
            first_third.append(1)
            middle_third.append(0)
            last_third.append(0)
        elif (column[i] > 33.0 and column[i] <=66.0):
            first_third.append(0)
            middle_third.append(1)
            last_third.append(0)
        else:
            first_third.append(0)
            middle_third.append(0)
            last_third.append(1)
            
    return first_third,middle_third,last_third

In [200]:
def convert_outcome_bin(column,bin_outcome):
    for i in range(len(column)):
        if column[i] < 64.6:
            bin_outcome.append(-1)
        else:
            bin_outcome.append(1)
    
    return bin_outcome

In [201]:
def convert_tot_enroll_binary(column, first_500, second_500, third_500, \
                              fourth_500, fifth_500, sixth_500, seventh_500, \
                              eighth_500, nineth_500):
    for i in range(len(column)):
        if column[i] <= 500:
            first_500.append(1)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 500) and(column[i] <= 1000)):
            first_500.append(0)
            second_500.append(1)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 1000) and(column[i] <= 1500)):
            first_500.append(0)
            second_500.append(0)
            third_500.append(1)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 1500) and(column[i] <= 2000)):
            first_500.append(0)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(1)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 2000) and(column[i] <= 2500)):
            first_500.append(0)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(1)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 2500) and(column[i] <= 3000)):
            first_500.append(0)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(1)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 3000) and(column[i] <=3500)):
            first_500.append(0)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(1)
            eighth_500.append(0)
            nineth_500.append(0)
        elif ((column[i] > 3500) and(column[i] <=4000)):
            first_500.append(0)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(1)
            nineth_500.append(0)
        else:
            first_500.append(0)
            second_500.append(0)
            third_500.append(0)
            fourth_500.append(0)
            fifth_500.append(0)
            sixth_500.append(0)
            seventh_500.append(0)
            eighth_500.append(0)
            nineth_500.append(1)
    
    return first_500,second_500,third_500,fourth_500,fifth_500,sixth_500,seventh_500,eighth_500,nineth_500

In [202]:
def convert_12th_enroll_binary(column, first_100, second_100, third_100, \
                              fourth_100, fifth_100, sixth_100, seventh_100, \
                              eighth_100, nineth_100,tenth_100):
    for i in range(len(column)):
        if column[i] <= 100:
            first_100.append(1)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 100) and(column[i] <= 200)):
            first_100.append(0)
            second_100.append(1)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 200) and(column[i] <= 300)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(1)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 300) and(column[i] <= 400)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(1)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 400) and(column[i] <= 500)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(1)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 500) and(column[i] <= 600)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(1)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 600) and(column[i] <=700)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(1)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 700) and(column[i] <=800)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(1)
            nineth_100.append(0)
            tenth_100.append(0)
        elif ((column[i] > 800) and(column[i] <=900)):
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(1)
            tenth_100.append(0)
        else:
            first_100.append(0)
            second_100.append(0)
            third_100.append(0)
            fourth_100.append(0)
            fifth_100.append(0)
            sixth_100.append(0)
            seventh_100.append(0)
            eighth_100.append(0)
            nineth_100.append(0)
            tenth_100.append(1)
    
    return first_100,second_100,third_100,fourth_100,fifth_100,sixth_100,seventh_100,eighth_100, nineth_100,tenth_100

In [203]:
def convert_class_size_bin(column, first_5, second_5, third_5, fourth_5, fifth_5, sixth_5, seventh_5):
    for i in range(len(column)):
        if column[i] <= 5:
            first_5.append(1)
            second_5.append(0)
            third_5.append(0)
            fourth_5.append(0)
            fifth_5.append(0)
            sixth_5.append(0)
            seventh_5.append(0)
        elif ((column[i] > 5) and (column[i] <= 10)):
            first_5.append(0)
            second_5.append(1)
            third_5.append(0)
            fourth_5.append(0)
            fifth_5.append(0)
            sixth_5.append(0)
            seventh_5.append(0)
        elif ((column[i] > 10) and (column[i] <= 15)):
            first_5.append(0)
            second_5.append(0)
            third_5.append(1)
            fourth_5.append(0)
            fifth_5.append(0)
            sixth_5.append(0)
            seventh_5.append(0)
        elif ((column[i] > 15) and (column[i] <= 20)):
            first_5.append(0)
            second_5.append(0)
            third_5.append(0)
            fourth_5.append(1)
            fifth_5.append(0)
            sixth_5.append(0)
            seventh_5.append(0)
        elif ((column[i] > 20) and (column[i] <= 25)):
            first_5.append(0)
            second_5.append(0)
            third_5.append(0)
            fourth_5.append(0)
            fifth_5.append(1)
            sixth_5.append(0)
            seventh_5.append(0)
        elif ((column[i] > 25) and (column[i] <= 30)):
            first_5.append(0)
            second_5.append(0)
            third_5.append(0)
            fourth_5.append(0)
            fifth_5.append(0)
            sixth_5.append(1)
            seventh_5.append(0)
        else:
            first_5.append(0)
            second_5.append(0)
            third_5.append(0)
            fourth_5.append(0)
            fifth_5.append(0)
            sixth_5.append(0)
            seventh_5.append(1)
            
    return first_5, second_5, third_5, fourth_5, fifth_5, sixth_5, seventh_5

Actually converting columns to binary variables

In [204]:
#convert columns to arrays for functions
# disab_array = np.array(clean_imp_cols['Percent Students With Disabilities'])
# high_needs_array = np.array(clean_imp_cols['Percent High Needs'])
# econ_dis_array = np.array(clean_imp_cols['Percent Economically Disadvantaged'])
# african_array = np.array(clean_imp_cols['Percent African American'])
# asian_array = np.array(clean_imp_cols['Percent Asian'])
# hispanic_array = np.array(clean_imp_cols['Percent Hispanic'])
# white_array = np.array(clean_imp_cols['Percent White'])
# native_array = np.array(clean_imp_cols['Percent Native American'])
# pacific_array = np.array(clean_imp_cols['Percent Native Hawaiian, Pacific Islander'])
# multi_race_array = np.array(clean_imp_cols['Percent Multi-Race, Non-Hispanic'])

ap_test_taken_1_array = np.array(clean_imp_cols['AP_One Test'])
ap_test_taken_2_array = np.array(clean_imp_cols['AP_Two Tests'])
ap_test_taken_3_array = np.array(clean_imp_cols['AP_Three Tests'])
ap_test_taken_4_array = np.array(clean_imp_cols['AP_Four Tests'])
ap_test_taken_5_plus_array = np.array(clean_imp_cols['AP_Five or More Tests'])


not_eng_array = np.array(clean_imp_cols['Percent First Language Not English'])



#Create empty lists for conversion function
# disab_33_below = []
# disab_33_66 = []
# disab_66_above = []

# high_needs_33_below = []
# high_needs_33_66 = []
# high_needs_66_above = []

# econ_dis_33_below = []
# econ_dis_33_66 = []
# econ_dis_66_above = []

# african_33_below = []
# african_33_66 = []
# african_66_above = []

# asian_33_below = []
# asian_33_66 = []
# asian_66_above = []

# hispanic_33_below = []
# hispanic_33_66 = []
# hispanic_66_above = []

# white_33_below = []
# white_33_66 = []
# white_66_above = []

# native_33_below = []
# native_33_66 = []
# native_66_above = []

# pacific_33_below = []
# pacific_33_66 = []
# pacific_66_above = []

# multi_race_33_below = []
# multi_race_33_66 = []
# multi_race_66_above = []

ap_test_taken_1_33_below = []
ap_test_taken_1_33_66 = []
ap_test_taken_1_66_above = []

ap_test_taken_2_33_below = []
ap_test_taken_2_33_66 = []
ap_test_taken_2_66_above = []

ap_test_taken_3_33_below = []
ap_test_taken_3_33_66 = []
ap_test_taken_3_66_above = []

ap_test_taken_4_33_below = []
ap_test_taken_4_33_66 = []
ap_test_taken_4_66_above = []

ap_test_taken_5_plus_33_below = []
ap_test_taken_5_plus_33_66 = []
ap_test_taken_5_plus_66_above = []

not_eng_33_below = []
not_eng_33_66 = []
not_eng_66_above = []

#Calling conversion fucntion
# disab_33_below,disab_33_66,disab_66_above = convert_to_thirds_perc(disab_array,disab_33_below,disab_33_66,disab_66_above);
# high_needs_33_below,high_needs_33_66,high_needs_66_above = convert_to_thirds_perc(high_needs_array,high_needs_33_below,high_needs_33_66,high_needs_66_above);
# econ_dis_33_below,econ_dis_33_66,econ_dis_66_above = convert_to_thirds_perc(econ_dis_array,econ_dis_33_below,econ_dis_33_66,econ_dis_66_above);
# african_33_below,african_33_66,african_66_above = convert_to_thirds_perc(african_array,african_33_below,african_33_66,african_66_above);
# asian_33_below,asian_33_66,asian_66_above = convert_to_thirds_perc(asian_array,asian_33_below,asian_33_66,asian_66_above);
# hispanic_33_below,hispanic_33_66,hispanic_66_above = convert_to_thirds_perc(hispanic_array,hispanic_33_below,hispanic_33_66,hispanic_66_above);
# white_33_below,white_33_66,white_66_above = convert_to_thirds_perc(white_array,white_33_below,white_33_66,white_66_above);
# native_33_below,native_33_66,native_66_above = convert_to_thirds_perc(native_array,native_33_below,native_33_66,native_66_above);
# pacific_33_below,pacific_33_66,pacific_66_above = convert_to_thirds_perc(pacific_array,pacific_33_below,pacific_33_66,pacific_66_above);
# multi_race_33_below,multi_race_33_66,multi_race_66_above = convert_to_thirds_perc(multi_race_array,multi_race_33_below,multi_race_33_66,multi_race_66_above);


ap_test_taken_1_33_below,ap_test_taken_1_33_66,ap_test_taken_1_66_above = convert_to_thirds_perc(ap_test_taken_1_array,ap_test_taken_1_33_below,ap_test_taken_1_33_66,ap_test_taken_1_66_above);
ap_test_taken_2_33_below,ap_test_taken_2_33_66,ap_test_taken_2_66_above = convert_to_thirds_perc(ap_test_taken_2_array,ap_test_taken_2_33_below,ap_test_taken_2_33_66,ap_test_taken_2_66_above);
ap_test_taken_3_33_below,ap_test_taken_3_33_66,ap_test_taken_3_66_above = convert_to_thirds_perc(ap_test_taken_3_array,ap_test_taken_3_33_below,ap_test_taken_3_33_66,ap_test_taken_3_66_above);
ap_test_taken_4_33_below,ap_test_taken_4_33_66,ap_test_taken_4_66_above = convert_to_thirds_perc(ap_test_taken_4_array,ap_test_taken_4_33_below,ap_test_taken_4_33_66,ap_test_taken_4_66_above);
ap_test_taken_5_plus_33_below,ap_test_taken_5_plus_33_66,ap_test_taken_5_plus_66_above = convert_to_thirds_perc(ap_test_taken_5_plus_array,ap_test_taken_5_plus_33_below,ap_test_taken_5_plus_33_66,ap_test_taken_5_plus_66_above);

not_eng_33_below,not_eng_33_66,not_eng_66_above = convert_to_thirds_perc(not_eng_array,not_eng_33_below,not_eng_33_66,not_eng_66_above);

In [205]:
#convert college percentage to binary outcomes
outcome_array = np.array(clean_imp_cols['Percent Attending College'])
outcome_bin = []

outcome_bin = convert_outcome_bin(outcome_array,outcome_bin)

In [206]:
# enroll_12th_array = np.array(clean_imp_cols['12_Enrollment'])
tot_enroll_array = np.array(clean_imp_cols['TOTAL_Enrollment'])
avg_class_array = np.array(clean_imp_cols['Average Class Size'])
SAT_taken_array = np.array(clean_imp_cols['SAT_Tests Taken'])
AP_taken_array = np.array(pd.to_numeric(clean_imp_cols['AP_Test Takers']))
Tot_num_classes_array = np.array(clean_imp_cols['Total # of Classes'])

enroll_12th_0_100 = []
enroll_12th_100_200 = []
enroll_12th_200_300 = []
enroll_12th_300_400 = []
enroll_12th_400_500 = []
enroll_12th_500_600 = []
enroll_12th_600_700 = []
enroll_12th_700_800 = []
enroll_12th_800_900 = []
enroll_12th_900_1000 = []

tot_enroll_0_500 = []
tot_enroll_500_1000 = []
tot_enroll_1000_1500 = []
tot_enroll_1500_2000 = []
tot_enroll_2000_2500 = []
tot_enroll_2500_3000 = []
tot_enroll_3000_3500 = []
tot_enroll_3500_4000 = []
tot_enroll_4000_4500 = []

avg_class_0_5 = []
avg_class_5_10 = []
avg_class_10_15 = []
avg_class_15_20 = []
avg_class_20_25 = []
avg_class_25_30 = []
avg_class_30_35 = []

SAT_taken_0_100 = [];
SAT_taken_100_200 = [];
SAT_taken_200_300 = [];
SAT_taken_300_400 = [];
SAT_taken_400_500 = [];
SAT_taken_500_600 = [];
SAT_taken_600_700 = [];

AP_taken_0_100 = [];
AP_taken_100_200 = [];
AP_taken_200_300 = [];
AP_taken_300_400 = [];
AP_taken_400_500 = [];
AP_taken_500_600 = [];
AP_taken_600_700 = [];
AP_taken_700_800 = [];
AP_taken_800_900 = [];
AP_taken_900_1000 = [];
AP_taken_1000_1100 = [];

Tot_num_classes_0_250 = [];
Tot_num_classes_250_500 = [];
Tot_num_classes_500_750 = [];
Tot_num_classes_750_1000 = [];
Tot_num_classes_1000_1250 = [];
Tot_num_classes_1250_1500 = [];
Tot_num_classes_1500_1750 = [];
Tot_num_classes_1750_2000 = [];
Tot_num_classes_2000_2250 = [];


# enroll_12th_0_100,enroll_12th_100_200,enroll_12th_200_300,\
# enroll_12th_300_400,enroll_12th_400_500,enroll_12th_500_600,\
# enroll_12th_600_700,enroll_12th_700_800,enroll_12th_800_900,\
# enroll_12th_900_1000 = convert_12th_enroll_binary(enroll_12th_array,enroll_12th_0_100,enroll_12th_100_200,\
#                                                   enroll_12th_200_300, enroll_12th_300_400,enroll_12th_400_500,\
#                                                   enroll_12th_500_600,enroll_12th_600_700,enroll_12th_700_800,\
#                                                   enroll_12th_800_900,enroll_12th_900_1000);

tot_enroll_0_500,tot_enroll_500_1000,tot_enroll_1000_1500,\
tot_enroll_1500_2000,tot_enroll_2000_2500,tot_enroll_2500_3000,\
tot_enroll_3000_3500,tot_enroll_3500_4000,tot_enroll_4000_4500 = convert_tot_enroll_binary(tot_enroll_array,\
                                                                                           tot_enroll_0_500,\
                                                                                           tot_enroll_500_1000,\
                                                                                           tot_enroll_1000_1500,\
                                                                                           tot_enroll_1500_2000,\
                                                                                           tot_enroll_2000_2500,\
                                                                                           tot_enroll_2500_3000,\
                                                                                           tot_enroll_3000_3500,\
                                                                                           tot_enroll_3500_4000,\
                                                                                           tot_enroll_4000_4500);

avg_class_0_5,avg_class_5_10,avg_class_10_15,\
avg_class_15_20,avg_class_20_25,avg_class_25_30,\
avg_class_30_35 = convert_class_size_bin(avg_class_array,avg_class_0_5,avg_class_5_10,avg_class_10_15,\
                                         avg_class_15_20,avg_class_20_25,avg_class_25_30,avg_class_30_35)

SAT_taken_0_100,SAT_taken_100_200,SAT_taken_200_300,SAT_taken_300_400,\
SAT_taken_400_500,SAT_taken_500_600,SAT_taken_600_700=convert_SAT_to_bin(SAT_taken_array,SAT_taken_0_100,\
                                                                         SAT_taken_100_200,SAT_taken_200_300,\
                                                                         SAT_taken_300_400,SAT_taken_400_500,\
                                                                         SAT_taken_500_600,SAT_taken_600_700)
AP_taken_0_100,AP_taken_100_200,AP_taken_200_300,AP_taken_300_400,\
AP_taken_400_500,AP_taken_500_600,AP_taken_600_700,AP_taken_700_800,\
AP_taken_800_900, AP_taken_900_1000,AP_taken_1000_1100 =convert_AP_taken_bin(AP_taken_array,AP_taken_0_100,\
                                                                             AP_taken_100_200,AP_taken_200_300,\
                                                                             AP_taken_300_400,AP_taken_400_500,\
                                                                             AP_taken_500_600,AP_taken_600_700,\
                                                                             AP_taken_700_800,AP_taken_800_900,\
                                                                             AP_taken_900_1000,AP_taken_1000_1100)
Tot_num_classes_0_250,Tot_num_classes_250_500,Tot_num_classes_500_750,\
Tot_num_classes_750_1000,Tot_num_classes_1000_1250,Tot_num_classes_1250_1500,\
Tot_num_classes_1500_1750,Tot_num_classes_1750_2000,\
Tot_num_classes_2000_2250 =convert_tot_class_bin(Tot_num_classes_array,Tot_num_classes_0_250,Tot_num_classes_250_500,\
                                                 Tot_num_classes_500_750,Tot_num_classes_750_1000,\
                                                 Tot_num_classes_1000_1250,Tot_num_classes_1250_1500,\
                                                 Tot_num_classes_1500_1750,Tot_num_classes_1750_2000,\
                                                 Tot_num_classes_2000_2250)


In [207]:
print(len(Tot_num_classes_array))

295


Add new binary columns to dataframe

In [208]:
# columns_to_add = [disab_33_below,disab_33_66,disab_66_above,high_needs_33_below,high_needs_33_66,\
#                   high_needs_66_above,econ_dis_33_below,econ_dis_33_66,econ_dis_66_above,\
#                   african_33_below,african_33_66,african_66_above,asian_33_below,asian_33_66,\
#                   asian_66_above,hispanic_33_below,hispanic_33_66,hispanic_66_above,white_33_below,\
#                   white_33_66,white_66_above,native_33_below,native_33_66,native_66_above,pacific_33_below,\
#                   pacific_33_66,pacific_66_above,multi_race_33_below,multi_race_33_66,multi_race_66_above,\
#                   ap_test_taken_1_33_below,ap_test_taken_1_33_66,ap_test_taken_1_66_above,ap_test_taken_2_33_below,\
#                   ap_test_taken_2_33_66,ap_test_taken_2_66_above,ap_test_taken_3_33_below,ap_test_taken_3_33_66,\
#                   ap_test_taken_3_66_above,ap_test_taken_4_33_below,ap_test_taken_4_33_66,ap_test_taken_4_66_above,\
#                   ap_test_taken_5_plus_33_below,ap_test_taken_5_plus_33_66,ap_test_taken_5_plus_66_above,\
#                   not_eng_33_below,not_eng_33_66,not_eng_66_above,enroll_12th_0_100,enroll_12th_100_200,\
#                   enroll_12th_200_300,enroll_12th_300_400,enroll_12th_400_500,enroll_12th_500_600,enroll_12th_600_700,\
#                   enroll_12th_700_800,enroll_12th_800_900,enroll_12th_900_1000,tot_enroll_0_500,\
#                   tot_enroll_500_1000,tot_enroll_1000_1500,tot_enroll_1500_2000,tot_enroll_2000_2500,\
#                   tot_enroll_2500_3000,tot_enroll_3000_3500,tot_enroll_3500_4000,tot_enroll_4000_4500,avg_class_0_5,\
#                   avg_class_5_10,avg_class_10_15,avg_class_15_20,avg_class_20_25,avg_class_25_30,avg_class_30_35]

columns_to_add = [ap_test_taken_1_33_below,ap_test_taken_1_33_66,ap_test_taken_1_66_above,ap_test_taken_2_33_below,\
                  ap_test_taken_2_33_66,ap_test_taken_2_66_above,ap_test_taken_3_33_below,ap_test_taken_3_33_66,\
                  ap_test_taken_3_66_above,ap_test_taken_4_33_below,ap_test_taken_4_33_66,ap_test_taken_4_66_above,\
                  ap_test_taken_5_plus_33_below,ap_test_taken_5_plus_33_66,ap_test_taken_5_plus_66_above,\
                  not_eng_33_below,not_eng_33_66,not_eng_66_above,tot_enroll_0_500,tot_enroll_500_1000,\
                  tot_enroll_1000_1500,tot_enroll_1500_2000,tot_enroll_2000_2500,tot_enroll_2500_3000,\
                  tot_enroll_3000_3500,tot_enroll_3500_4000,tot_enroll_4000_4500,avg_class_0_5,avg_class_5_10,\
                  avg_class_10_15,avg_class_15_20,avg_class_20_25,avg_class_25_30,avg_class_30_35,SAT_taken_0_100,\
                  SAT_taken_100_200,SAT_taken_200_300,SAT_taken_300_400,SAT_taken_400_500,SAT_taken_500_600,\
                  SAT_taken_600_700,AP_taken_0_100,AP_taken_100_200,AP_taken_200_300,AP_taken_300_400,\
                  AP_taken_400_500,AP_taken_500_600,AP_taken_600_700,AP_taken_700_800,AP_taken_800_900,\
                  AP_taken_900_1000,AP_taken_1000_1100,Tot_num_classes_0_250,Tot_num_classes_250_500,\
                  Tot_num_classes_500_750,Tot_num_classes_750_1000,Tot_num_classes_1000_1250,Tot_num_classes_1250_1500,\
                  Tot_num_classes_1500_1750,Tot_num_classes_1750_2000,Tot_num_classes_2000_2250]

# new_col_names = ['Disability Percent 0-33','Disability Percent 33-66','Disability Percent 66-100',\
#                  'High Needs Percent 0-33','High Needs Percent 33-66','High Needs Percent 66-100',\
#                  'Econ Disadvantage Percent 0-33','Econ Disadvantage Percent 33-66','Econ Disadvantage 66-100',\
#                  'African American 0-33','African American Percent 33-66','African American Percent 66-100',\
#                  'Asian Percent 0-33','Asian Percent 33-66','Asian Percent 66-100','Hispanic Percent 0-33',\
#                  'Hispanic Percent 33-66','Hispanic Percent 66-100','White Percent 0-33','White Percent 33-66',\
#                  'White Percent 66-100','Native Percent 0-33','Native Percent 33-66','Native Percent 66-100',\
#                  'Pacific Percent 0-33','Pacific Percent 33-66','Pacific Percent 66-100','Multi-Race Percent 0-33',\
#                  'Multi-Race Percent 33-66','Multi-Race Percent 66-100','1 AP Test Percent 0-33',\
#                  '1 AP Test Percent 33-66','1 AP Test Percent 66-100','2 AP Test Percent 0-33',\
#                  '2 AP Test Percent 33-66','2 AP Test Percent 66-100','3 AP Test Percent 0-33',\
#                  '3 AP Test Percent 33-66','3 AP Test Percent 66-100','4 AP Test Percent 0-33',\
#                  '4 AP Test Percent 33-66','4 AP Test Percent 66-100','5+ AP Test Percent 0-33',\
#                  '5+ AP Test Percent 33-66','5+ AP Test Percent 66-100','First Lang Not Eng Percent 0-33',\
#                  'First Lang Not Eng Percent 33-66','First Lang Not Eng Percent 66-100','12th Enroll 0-100',\
#                  '12th Enroll 100-200','12th Enroll 200-300','12th Enroll 300-400','12th Enroll 400-500',\
#                  '12th Enroll 500-600','12th Enroll 600-700','12th Enroll 700-800','12th Enroll 800-900',\
#                  '12th Enroll 900-1000','Total Enroll 0-500','Total Enroll 500-1000','Total Enroll 1000-1500',\
#                  'Total Enroll 1500-2000','Total Enroll 2000-2500','Total Enroll 2500-3000','Total Enroll 3000-3500',\
#                  'Total Enroll 3500-4000','Total Enroll 4000-4500', 'Avg Class Size 0-5','Avg Class Size 5-10',\
#                  'Avg Class Size 10-15','Avg Class Size 15-20','Avg Class Size 20-25','Avg Class Size 25-30',\
#                  'Avg Class Size 30-35']

new_col_names = ['1 AP Test Percent 33-66','1 AP Test Percent 66-100','2 AP Test Percent 0-33',\
                 '2 AP Test Percent 33-66','2 AP Test Percent 66-100','3 AP Test Percent 0-33',\
                 '3 AP Test Percent 33-66','3 AP Test Percent 66-100','4 AP Test Percent 0-33',\
                 '4 AP Test Percent 33-66','4 AP Test Percent 66-100','5+ AP Test Percent 0-33',\
                 '5+ AP Test Percent 33-66','5+ AP Test Percent 66-100','First Lang Not Eng Percent 0-33',\
                 'First Lang Not Eng Percent 33-66','First Lang Not Eng Percent 66-100','Total Enroll 0-500',\
                 'Total Enroll 500-1000','Total Enroll 1000-1500','Total Enroll 1500-2000','Total Enroll 2000-2500',\
                 'Total Enroll 2500-3000','Total Enroll 3000-3500','Total Enroll 3500-4000','Total Enroll 4000-4500',\
                 'Avg Class Size 0-5','Avg Class Size 5-10','Avg Class Size 10-15','Avg Class Size 15-20',\
                 'Avg Class Size 20-25','Avg Class Size 25-30','Avg Class Size 30-35','SAT Taken 0-100',\
                 'SAT Taken 100-200','SAT Taken 200-300','SAT Taken 300-400','SAT Taken 400-500','SAT Taken 500-600',\
                 'SAT Taken 600-700', 'AP Test Takers 0-100','AP Test Takers 100-200','AP Test Takers 200-300',\
                 'AP Test Takers 300-400','AP Test Takers 400-500','AP Test Takers 500-600',\
                 'AP Test Takers 600-700','AP Test Takers 700-800','AP Test Takers 800-900',\
                 'AP Test Takers 900-1000','AP Test Takers 1000-1100','Total Num Classes 0-250',\
                 'Total Num Classes 250-500','Total Num Classes 500-750','Total Num Classes 750-1000',\
                 'Total Num Classes 1000-1250','Total Num Classes 1250-1500',\
                 'Total Num Classes 1500-1750','Total Num Classes 1750-2000','Total Num Classes 2000-2250']

for i in range(len(new_col_names)):
    clean_imp_cols[new_col_names[i]] = pd.DataFrame(columns_to_add[i], index=clean_imp_cols.index) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_imp_cols[new_col_names[i]] = pd.DataFrame(columns_to_add[i], index=clean_imp_cols.index)


Remove non-binary/unused columns

In [209]:
clean_imp_cols = clean_imp_cols.drop(columns=['Total # of Classes', 'TOTAL_Enrollment',\
                                              'Percent First Language Not English','Average Class Size',\
                                              'Percent Attending College','AP_Test Takers','AP_One Test',\
                                              'AP_Two Tests', 'AP_Three Tests','AP_Four Tests',\
                                              'AP_Five or More Tests','SAT_Tests Taken']);

In [210]:
clean_imp_cols.columns.tolist()

['1 AP Test Percent 33-66',
 '1 AP Test Percent 66-100',
 '2 AP Test Percent 0-33',
 '2 AP Test Percent 33-66',
 '2 AP Test Percent 66-100',
 '3 AP Test Percent 0-33',
 '3 AP Test Percent 33-66',
 '3 AP Test Percent 66-100',
 '4 AP Test Percent 0-33',
 '4 AP Test Percent 33-66',
 '4 AP Test Percent 66-100',
 '5+ AP Test Percent 0-33',
 '5+ AP Test Percent 33-66',
 '5+ AP Test Percent 66-100',
 'First Lang Not Eng Percent 0-33',
 'First Lang Not Eng Percent 33-66',
 'First Lang Not Eng Percent 66-100',
 'Total Enroll 0-500',
 'Total Enroll 500-1000',
 'Total Enroll 1000-1500',
 'Total Enroll 1500-2000',
 'Total Enroll 2000-2500',
 'Total Enroll 2500-3000',
 'Total Enroll 3000-3500',
 'Total Enroll 3500-4000',
 'Total Enroll 4000-4500',
 'Avg Class Size 0-5',
 'Avg Class Size 5-10',
 'Avg Class Size 10-15',
 'Avg Class Size 15-20',
 'Avg Class Size 20-25',
 'Avg Class Size 25-30',
 'Avg Class Size 30-35',
 'SAT Taken 0-100',
 'SAT Taken 100-200',
 'SAT Taken 200-300',
 'SAT Taken 300-400

Add Outcome column to end of dataframe

In [211]:
num_non_zeros = [];
for i in range(len(clean_imp_cols.columns)):
    num_non_zeros.append(np.count_nonzero(np.array(clean_imp_cols)[:,i])) 
    
print(np.count_nonzero(np.array(num_non_zeros)))
print(len(num_non_zeros))
print(num_non_zeros)

52
60
[63, 79, 153, 118, 99, 78, 208, 73, 14, 288, 6, 1, 293, 2, 0, 246, 42, 7, 65, 114, 73, 29, 11, 1, 1, 0, 1, 0, 1, 106, 172, 12, 3, 1, 106, 97, 62, 23, 6, 0, 1, 100, 93, 60, 26, 13, 1, 0, 0, 1, 0, 1, 52, 117, 76, 33, 11, 3, 2, 0]


In [212]:
clean_imp_cols['Outcome'] = pd.DataFrame(outcome_bin, index=clean_imp_cols.index)

In [245]:
clean_imp_cols = np.array(clean_imp_cols)

In [246]:
print(np.shape(clean_imp_cols));

(295, 61)


In [247]:
pos_cases = clean_imp_cols[clean_imp_cols[:,60]==1]
neg_cases = clean_imp_cols[clean_imp_cols[:,60]==-1]

In [248]:
print(pos_cases)
index = [];

for i in range(1,len(pos_cases)+1):
    index.append(int(i))
    
pos_cases = np.concatenate((np.array(index) [:,np.newaxis],pos_cases), axis = 1) 

[[1 0 0 ... 0 0 1]
 [0 0 1 ... 0 0 1]
 [0 0 1 ... 0 0 1]
 ...
 [0 0 1 ... 0 0 1]
 [0 1 0 ... 0 0 1]
 [0 0 1 ... 0 0 1]]


In [249]:
print(pos_cases)

[[  1   1   0 ...   0   0   1]
 [  2   0   0 ...   0   0   1]
 [  3   0   0 ...   0   0   1]
 ...
 [253   0   0 ...   0   0   1]
 [254   0   1 ...   0   0   1]
 [255   0   0 ...   0   0   1]]


In [250]:
print(neg_cases)
index = [];

for i in range(1,len(neg_cases)+1):
    index.append(int(i))
    
neg_cases = np.concatenate((np.array(index) [:,np.newaxis],neg_cases), axis = 1) 

[[ 1  0  0 ...  0  0 -1]
 [ 0  0  1 ...  0  0 -1]
 [ 0  1  0 ...  0  0 -1]
 ...
 [ 1  0  0 ...  0  0 -1]
 [ 0  0  1 ...  0  0 -1]
 [ 0  0  1 ...  0  0 -1]]


In [251]:
print(neg_cases)

[[ 1  1  0 ...  0  0 -1]
 [ 2  0  0 ...  0  0 -1]
 [ 3  0  1 ...  0  0 -1]
 ...
 [38  1  0 ...  0  0 -1]
 [39  0  0 ...  0  0 -1]
 [40  0  0 ...  0  0 -1]]


Save cleaned data to csv file to be used in AMPL

In [252]:
# clean_imp_cols.to_csv('clean_school_data.csv', index=False)
savetxt('clean_school_data.csv',clean_imp_cols,delimiter = ',')

In [253]:
# pos_cases.to_csv('positive_cases.csv', index=False)
savetxt('positive_cases.csv',pos_cases,delimiter = ',')

In [254]:
# neg_cases.to_csv('negative_cases.csv', index=False)
savetxt('negative_cases.csv',neg_cases,delimiter = ',')