# Donors' Choices Report

Patrick King

Professor Morgan

DATA 512

7 December 2018

## Introduction

This report covers my statistical analysis of the open data shared by DonorsChoose.org, regardsing project funding success rate by subject. It details:
* why I examined DonorsChoose open data
* how I went about acquiring, cleaning, and preparing the data for processing
* the statistical methods I used to produce results
* analysis of the implications of said results
* all resources used in conducting this project
In addition to providing some insight into which projects attract donations and how DonorsChoose project managers might be able to revise their tactics towards increase funding success, a goal of this project is to provide transparency regarding the data science applied, to allow ease of understanding, sharing, and scaling for additional feedback, correction, and follow-on work by others.

## Background

[DonorsChoose](https:www.donorschoose.org) is a crowd-sourced nonprofit organization that provides a platform for teachers to request funding for classroom supplies and projects. Donors can then peruse the advertised projects on the site, to choose which causes to support based on information garnered from the page created for each plea by the teacher or class. These "attractor" pages are standardized, containing uniform components such as classroom data, category of project, essays or short-form paragraphs. These page components cover how funds will be used, the financial targets, provide a status bar showing progress towards goal, and more. DonorsChoose has made public many years of its project data, tracking these attributes and donation results over many years. This project statistically analyzes part of the DonorsChoose public dataset to identify the project subjects that result in more (or less) effective project funding requests. 


## Research Questions and Analysis

I intend to explore the data to find correlations (or lack thereof) in areas such as:

1. Do specific project types garner more funding than others?
2. Do specific class locations garner more funding than others?
3. Do specific giving page traits (such as "Thank you notes sent") garner more funding than others?

I expect that these questions and related ones will support traditions linear regression modelling and plotting, where I hope to be able to identify variables that have shown to be particularly successful or unsuccessful at attracting donations to projects. For example, given that a giving page must use pre-defined categories for primary_focus_subject (e.g. "Environmental Science"), some subjects might be better-funded than others. Perhaps certain donor regions (such as the Seattle area or San Francisco area) might support particular subjects ("Computer Science"), while others regions (such as Iowa from which I write this project plan) might better support different subjects ("Agricultural Science").

## Methods

### Deliverables

I expect to use [Python]() in a [Jupyter Notebook](http://jupyter.org/) to load and explore the data. I will use standard statistical anayses to investigate my research questions, perhaps develop new ideas, and report meaningful correlations between project variables and outcomes. For example, I should be able to find and plot which subject results in funding success, or whether the proposed cost of a project produces significantly different success rates. (Should a teacher create one project for all gaps in Art for $1000 or a collection of $200 projects, in order to reach the same financial end?) The deliverable should be one IPYNB file, its supporting data CSV files, and any PNG files corresponding to generated visualizations, in a [Github](https://github.com/) repository parallel to this one. 

## Data Sourcess

DonorsChoose has made [thirteen years of its data](https://data.donorschoose.org/docs/overview/) open and publicly accessible for analysis. The data includes information that I have downloaded and begun analysis on:

* Projects (including classroom projects that have been posted, school information such as government-issued NCES ID, lat/long, and city/state/zip)
* Donations (including donation amounts, donor city, state)
* Project resources (including materials/resources requested for the classroom projects, including vendor name)
* Project essays (including the full text of the teacher-written requests accompanying all classroom projects)
* Giving pages (number of teachers, students, amount raised)

and more, such as gift card data, which I do not intend to use in my analysis.


In [50]:
#import requests   # For the ORES API call 
#import numpy      # For replacement of unread values with NaNs
import pandas     # For dataframing, merging, numeric conversions, and reading CSVs
import os


## Findings

This is body text.

## Discussion

I want to do this project to learn how teachers could improve the effectiveness of their crowdfunding efforts. I hope that trends and correlations can be shown that might help teachers attract more donations by creating their funding pages using the options and tactics that are associated with higher rates of project completion or higher rates of donation. I could share this information to teaching communities or discussion groups for feedback, which might help others better use crowdfunding through DonorsChoose to meet their classroom expense needs. 

The DonorsChoose approach to funding education expenses is a valuable alternative for teachers who often must use personal funds to equip their classrooms or to enable novel learning experiences. Helping teachers understand the effectiveness or shortcomings of various options in setting up, describing, and categorizing their projects might increase the rates of donation. Though, it might instead produce a "zero-sum" competitive result, where some teachers attract even more donations, causing others who are less informed or adept with DonorsChoose to go without. Finally, my motivations are not purely altruistic as first, I intend to share my analysis with my underfunded high school-teaching wife who often funds her ideas and classrooms from our savings, typically without reimbursement from her school district. "Charity begins at home..." (Browne, 141).


## Conclusion

This is body text.

## References

This is body text.

In [51]:
cwd = os.getcwd()

In [52]:
print(cwd)

/Users/pking


In [53]:
data = pandas.read_csv('All Projects 2018-12-02T1458.csv')


In [54]:
data.head()

Unnamed: 0,Teacher Name,School Name,Project Title,Project Total Price Including Optional Support,Project Url,Project Subject,Project Grade Level,School State,School County,School City,School District,Project Completed Date
0,Mr. Stowell,Andrew Carnegie Middle School,Traveling Throughout History,176.26,http://donorschoose.org/project/48674,History & Geography,Grades 6-8,California,Los Angeles,Carson,Los Angeles Unif Sch Dist,
1,Mr. Brosnan,Warren Harding High School,My History Class Needs Art Supplies,213.71,http://donorschoose.org/project/485593,History & Geography,Grades 9-12,Connecticut,Fairfield,Bridgeport,Bridgeport City Sch District,
2,Ms. Oshen,South Loop Elementary School,New Bass Drum: Help Boost Our Rhythm Section!,632.5,http://donorschoose.org/project/48339,Music,Grades 6-8,Illinois,Cook,Chicago,Chicago Psd-Network 6,
3,Mr. O'Neal,Roland Senior High School,Sound Success,853.73,http://donorschoose.org/project/1870751,Other,Grades 9-12,Oklahoma,Sequoyah,Roland,Roland Ind School District 5,
4,Ms. Strout,Ephraim Curtis Middle School,"Role Plays, Dramatization and Performance",388.76,http://donorschoose.org/project/454146,Performing Arts,Grades 6-8,Massachusetts,Middlesex,Sudbury,Sudbury Public School District,


In [55]:
data.count()

Teacher Name                                      1300815
School Name                                       1300815
Project Title                                     1300791
Project Total Price Including Optional Support    1300815
Project Url                                       1300815
Project Subject                                   1300791
Project Grade Level                               1300775
School State                                      1300815
School County                                     1300809
School City                                       1290113
School District                                   1300530
Project Completed Date                            1300758
dtype: int64

In [56]:
data.shape[0]

1300815

In [57]:
projects = pandas.read_csv('opendata_projects000.gz', escapechar='\\', names=['_projectid', '_teacher_acctid', '_schoolid', 'school_ncesid', 'school_latitude', 'school_longitude', 'school_city', 'school_state', 'school_zip', 'school_metro', 'school_district', 'school_county', 'school_charter', 'school_magnet', 'school_year_round', 'school_nlns', 'school_kipp', 'school_charter_ready_promise', 'teacher_prefix', 'teacher_teach_for_america', 'teacher_ny_teaching_fellow', 'primary_focus_subject', 'primary_focus_area' ,'secondary_focus_subject', 'secondary_focus_area', 'resource_type', 'poverty_level', 'grade_level', 'vendor_shipping_charges', 'sales_tax', 'payment_processing_charges', 'fulfillment_labor_materials', 'total_price_excluding_optional_support', 'total_price_including_optional_support', 'students_reached', 'total_donations', 'num_donors', 'eligible_double_your_impact_match', 'eligible_almost_home_match', 'funding_status', 'date_posted', 'date_completed', 'date_thank_you_packet_mailed', 'date_expiration'])

In [58]:
projects.shape[0]

1203287

In [59]:
projects.count()

_projectid                                1203287
_teacher_acctid                           1203287
_schoolid                                 1203287
school_ncesid                             1130941
school_latitude                           1203287
school_longitude                          1203287
school_city                               1193492
school_state                              1203287
school_zip                                1203283
school_metro                              1059692
school_district                           1202934
school_county                             1203270
school_charter                            1203287
school_magnet                             1203287
school_year_round                         1203287
school_nlns                               1203287
school_kipp                               1203287
school_charter_ready_promise              1203287
teacher_prefix                            1203241
teacher_teach_for_america                 1203287


In [60]:
projects.head()

Unnamed: 0,_projectid,_teacher_acctid,_schoolid,school_ncesid,school_latitude,school_longitude,school_city,school_state,school_zip,school_metro,...,students_reached,total_donations,num_donors,eligible_double_your_impact_match,eligible_almost_home_match,funding_status,date_posted,date_completed,date_thank_you_packet_mailed,date_expiration
0,7342bd01a2a7725ce033a179d22e382d,5c43ef5eac0f5857c266baa1ccfa3d3f,9e72d6f2f1e9367b578b6479aa5852b7,360009700000.0,40.688454,-73.910432,New York City,NY,11207.0,urban,...,0.0,251.9,1,f,f,completed,2002-09-13 00:00:00,2002-09-23 00:00:00,2003-01-27 00:00:00,2003-12-31 00:00:00
1,ed87d61cef7fda668ae70be7e0c6cebf,1f4493b3d3fe4a611f3f4d21a249376a,1ae4695be589a36816188e2b301a0395,360007700000.0,40.765517,-73.96009,New York City,NY,10065.0,,...,0.0,137.0,1,f,f,completed,2002-09-13 00:00:00,2002-09-23 00:00:00,2003-01-03 00:00:00,2003-12-31 00:00:00
2,b56b502d25666e29550d107bf7e17910,57426949b47700ccf62098e1e9b0220c,4a06a328dd87bd29892d73310052f45f,360007700000.0,40.770233,-73.95076,New York City,NY,10075.0,,...,0.0,125.0,1,f,f,completed,2002-09-16 00:00:00,2002-09-19 00:00:00,2002-12-19 00:00:00,2003-12-31 00:00:00
3,016f03312995d5c89d6b348be4682166,9c0aa56b63b743454d6da9effcf122fc,bb0af5dac1b54693ba86ef63eacd6594,360007600000.0,40.727826,-73.978721,New York City,NY,10009.0,urban,...,0.0,205.0,1,f,f,completed,2002-09-17 00:00:00,2002-09-17 00:00:00,2002-12-02 00:00:00,2003-12-31 00:00:00
4,cf6275558534ca1b276b0d8d5130dd9a,1d4d8a42730dbb66af1ebb6ab37456b7,768dab263f87881fe7c68ffb3965df7c,360008300000.0,40.841216,-73.938605,New York City,NY,10032.0,urban,...,0.0,264.0,1,f,f,completed,2002-09-17 00:00:00,2002-09-23 00:00:00,2003-02-26 00:00:00,2003-12-31 00:00:00


In [61]:
projects.funding_status.unique()

array(['completed', 'expired', 'reallocated', 'live'], dtype=object)

In [62]:
projects.primary_focus_subject.unique()

array(['Other', 'Literacy', 'Early Development', 'History & Geography',
       'Economics', 'Environmental Science', 'Health & Life Science',
       'Literature & Writing', 'Mathematics', 'Music', 'Visual Arts',
       'College & Career Prep', 'Parent Involvement', 'Social Sciences',
       'Civics & Government', 'Extracurricular', 'Performing Arts',
       'Character Education', 'Applied Sciences', 'Team Sports',
       'Foreign Languages', 'Community Service', 'Special Needs',
       'Gym & Fitness', 'ESL', 'Health & Wellness', 'Nutrition', nan,
       'Financial Literacy'], dtype=object)

In [63]:
projects.primary_focus_area.unique()

array(['Applied Learning', 'Literacy & Language', 'History & Civics',
       'Math & Science', 'Music & The Arts', 'Health & Sports',
       'Special Needs', nan], dtype=object)

In [64]:
projects.secondary_focus_subject.unique()

array([nan, 'History & Geography', 'Early Development', 'Extracurricular',
       'Other', 'College & Career Prep', 'Performing Arts', 'Literacy',
       'Literature & Writing', 'Mathematics', 'Social Sciences',
       'Applied Sciences', 'Environmental Science', 'Visual Arts',
       'Parent Involvement', 'Foreign Languages', 'Team Sports',
       'Character Education', 'Music', 'Health & Life Science',
       'Civics & Government', 'Community Service', 'Economics',
       'Special Needs', 'ESL', 'Health & Wellness', 'Gym & Fitness',
       'Nutrition', 'Financial Literacy'], dtype=object)

In [65]:
projects.secondary_focus_area.unique()

array([nan, 'History & Civics', 'Applied Learning', 'Music & The Arts',
       'Literacy & Language', 'Math & Science', 'Health & Sports',
       'Special Needs'], dtype=object)

In [66]:
data_funding = pandas.DataFrame({'funding':projects['funding_status'],
                                 'subject':projects['primary_focus_subject']})

# ,
#                                 'primary_focus_area':projects['primary_focus_area'],
#                                 'secondary_focus_subject':projects['secondary_focus_subject'],
#                                 'secondary_focus_area':projects['secondary_focus_area']


In [67]:
data_funding.count()

funding    1203287
subject    1203241
dtype: int64

In [68]:
data_funding.head()

Unnamed: 0,funding,subject
0,completed,Other
1,completed,Literacy
2,completed,Early Development
3,completed,History & Geography
4,completed,Other


In [69]:
data_funding.subject.unique()

array(['Other', 'Literacy', 'Early Development', 'History & Geography',
       'Economics', 'Environmental Science', 'Health & Life Science',
       'Literature & Writing', 'Mathematics', 'Music', 'Visual Arts',
       'College & Career Prep', 'Parent Involvement', 'Social Sciences',
       'Civics & Government', 'Extracurricular', 'Performing Arts',
       'Character Education', 'Applied Sciences', 'Team Sports',
       'Foreign Languages', 'Community Service', 'Special Needs',
       'Gym & Fitness', 'ESL', 'Health & Wellness', 'Nutrition', nan,
       'Financial Literacy'], dtype=object)

In [70]:
data_funding.funding.unique()

array(['completed', 'expired', 'reallocated', 'live'], dtype=object)

In [71]:
data_funding[(data_funding['funding']=='completed')].count()

funding    797071
subject    797047
dtype: int64

In [72]:
data_funding[(data_funding['funding']=='expired')].count()

funding    333630
subject    333608
dtype: int64

In [73]:
data_funding[(data_funding['funding']=='reallocated')].count()

funding    9086
subject    9086
dtype: int64

In [74]:
data_funding[(data_funding['funding']=='live')].count()

funding    63500
subject    63500
dtype: int64

In [75]:
df_dummy = pandas.get_dummies(data_funding)

In [76]:
df_dummy

Unnamed: 0,funding_completed,funding_expired,funding_live,funding_reallocated,subject_Applied Sciences,subject_Character Education,subject_Civics & Government,subject_College & Career Prep,subject_Community Service,subject_ESL,...,subject_Mathematics,subject_Music,subject_Nutrition,subject_Other,subject_Parent Involvement,subject_Performing Arts,subject_Social Sciences,subject_Special Needs,subject_Team Sports,subject_Visual Arts
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
5,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [77]:
df_dummy = df_dummy[(df_dummy['funding_completed']==1) | (df_dummy['funding_expired']==1)]

In [78]:
df_dummy.shape[0]

1130701

In [79]:
df_dummy.head()

Unnamed: 0,funding_completed,funding_expired,funding_live,funding_reallocated,subject_Applied Sciences,subject_Character Education,subject_Civics & Government,subject_College & Career Prep,subject_Community Service,subject_ESL,...,subject_Mathematics,subject_Music,subject_Nutrition,subject_Other,subject_Parent Involvement,subject_Performing Arts,subject_Social Sciences,subject_Special Needs,subject_Team Sports,subject_Visual Arts
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [80]:
df_dummy = df_dummy.drop(["funding_expired", "funding_live", "funding_reallocated"], axis=1)

In [81]:
df_dummy.shape[0]

1130701

In [82]:
df_dummy.head()

Unnamed: 0,funding_completed,subject_Applied Sciences,subject_Character Education,subject_Civics & Government,subject_College & Career Prep,subject_Community Service,subject_ESL,subject_Early Development,subject_Economics,subject_Environmental Science,...,subject_Mathematics,subject_Music,subject_Nutrition,subject_Other,subject_Parent Involvement,subject_Performing Arts,subject_Social Sciences,subject_Special Needs,subject_Team Sports,subject_Visual Arts
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [83]:
#df_dummy.count()
#df_dummy[(df_dummy['subject_Applied Sciences'])==1].shape[0]
#df_dummy[(df_dummy['subject_Character Education'])==1].shape[0]
for column in df_dummy:
    print(column)
    print(df_dummy[(df_dummy[column])==1].shape[0]) #df_dummy.columns

funding_completed
797071
subject_Applied Sciences
59784
subject_Character Education
14016
subject_Civics & Government
3855
subject_College & Career Prep
11116
subject_Community Service
2326
subject_ESL
15212
subject_Early Development
23352
subject_Economics
2995
subject_Environmental Science
45862
subject_Extracurricular
4811
subject_Financial Literacy
2829
subject_Foreign Languages
8152
subject_Gym & Fitness
13029
subject_Health & Life Science
37320
subject_Health & Wellness
23951
subject_History & Geography
25061
subject_Literacy
323646
subject_Literature & Writing
136983
subject_Mathematics
156476
subject_Music
33262
subject_Nutrition
2325
subject_Other
19690
subject_Parent Involvement
1545
subject_Performing Arts
15341
subject_Social Sciences
13978
subject_Special Needs
73955
subject_Team Sports
8778
subject_Visual Arts
51005


In [84]:
correlation = df_dummy.corr(method='pearson')
#df_corr = pandas.DataFrame[{'funding':projects['funding_completed']}]
correlation.head()

Unnamed: 0,funding_completed,subject_Applied Sciences,subject_Character Education,subject_Civics & Government,subject_College & Career Prep,subject_Community Service,subject_ESL,subject_Early Development,subject_Economics,subject_Environmental Science,...,subject_Mathematics,subject_Music,subject_Nutrition,subject_Other,subject_Parent Involvement,subject_Performing Arts,subject_Social Sciences,subject_Special Needs,subject_Team Sports,subject_Visual Arts
funding_completed,1.0,0.00936,0.001238,0.000382,-0.00973,0.001169,-0.007633,-0.011572,0.006215,0.026534,...,-0.003131,0.020708,0.004368,-0.026524,-0.006884,0.005843,-0.001748,0.000255,0.011137,0.013172
subject_Applied Sciences,0.00936,1.0,-0.02647,-0.01382,-0.023543,-0.010727,-0.027591,-0.034311,-0.012176,-0.04858,...,-0.094691,-0.041134,-0.010725,-0.031454,-0.00874,-0.02771,-0.026434,-0.062505,-0.020899,-0.051354
subject_Character Education,0.001238,-0.02647,1.0,-0.006553,-0.011163,-0.005087,-0.013083,-0.016269,-0.005774,-0.023035,...,-0.044899,-0.019504,-0.005085,-0.014915,-0.004144,-0.013139,-0.012534,-0.029638,-0.00991,-0.02435
subject_Civics & Government,0.000382,-0.01382,-0.006553,1.0,-0.005828,-0.002656,-0.00683,-0.008494,-0.003014,-0.012026,...,-0.023441,-0.010183,-0.002655,-0.007787,-0.002164,-0.00686,-0.006544,-0.015473,-0.005174,-0.012713
subject_College & Career Prep,-0.00973,-0.023543,-0.011163,-0.005828,1.0,-0.004524,-0.011636,-0.01447,-0.005135,-0.020488,...,-0.039934,-0.017347,-0.004523,-0.013265,-0.003686,-0.011686,-0.011148,-0.02636,-0.008814,-0.021657


In [85]:
corr = correlation[['funding_completed']]
corr

Unnamed: 0,funding_completed
funding_completed,1.0
subject_Applied Sciences,0.00936
subject_Character Education,0.001238
subject_Civics & Government,0.000382
subject_College & Career Prep,-0.00973
subject_Community Service,0.001169
subject_ESL,-0.007633
subject_Early Development,-0.011572
subject_Economics,0.006215
subject_Environmental Science,0.026534


In [86]:
corr.sort_values('funding_completed')

Unnamed: 0,funding_completed
subject_Other,-0.026524
subject_Literature & Writing,-0.015453
subject_Early Development,-0.011572
subject_Gym & Fitness,-0.009968
subject_College & Career Prep,-0.00973
subject_Health & Wellness,-0.008901
subject_ESL,-0.007633
subject_Foreign Languages,-0.007441
subject_Parent Involvement,-0.006884
subject_Literacy,-0.004181


In [87]:
import statsmodels.api as sm

import warnings
warnings.filterwarnings('ignore')

X = df_dummy.loc[:, df_dummy.columns != 'funding_completed']
y = df_dummy.loc[:, df_dummy.columns == 'funding_completed']


In [88]:
print(X.shape)
print(y.shape)

(1130701, 28)
(1130701, 1)


In [89]:
X.head()

Unnamed: 0,subject_Applied Sciences,subject_Character Education,subject_Civics & Government,subject_College & Career Prep,subject_Community Service,subject_ESL,subject_Early Development,subject_Economics,subject_Environmental Science,subject_Extracurricular,...,subject_Mathematics,subject_Music,subject_Nutrition,subject_Other,subject_Parent Involvement,subject_Performing Arts,subject_Social Sciences,subject_Special Needs,subject_Team Sports,subject_Visual Arts
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [90]:
y.head()

Unnamed: 0,funding_completed
0,1
1,1
2,1
3,1
4,1


In [91]:
logit_model = sm.Logit(y,X)
result = logit_model.fit()
print(result.summary2())

Optimization terminated successfully.
         Current function value: 0.246008
         Iterations 5
                              Results: Logit
Model:                  Logit               Pseudo R-squared:   inf        
Dependent Variable:     funding_completed   AIC:                556379.8427
Date:                   2018-12-09 11:30    BIC:                556714.1165
No. Observations:       1130701             Log-Likelihood:     -2.7816e+05
Df Model:               27                  LL-Null:            0.0000     
Df Residuals:           1130673             LLR p-value:        1.0000     
Converged:              1.0000              Scale:              1.0000     
No. Iterations:         5.0000                                             
---------------------------------------------------------------------------
                              Coef.  Std.Err.    z     P>|z|  [0.025 0.975]
---------------------------------------------------------------------------
subject_Applied S

In [92]:
X_int = sm.tools.tools.add_constant(X)

In [93]:
X_int.head()

Unnamed: 0,const,subject_Applied Sciences,subject_Character Education,subject_Civics & Government,subject_College & Career Prep,subject_Community Service,subject_ESL,subject_Early Development,subject_Economics,subject_Environmental Science,...,subject_Mathematics,subject_Music,subject_Nutrition,subject_Other,subject_Parent Involvement,subject_Performing Arts,subject_Social Sciences,subject_Special Needs,subject_Team Sports,subject_Visual Arts
0,1.0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,1.0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1.0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1.0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1.0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [94]:
logit_model2 = sm.Logit(y,X_int)
result2 = logit_model2.fit()
print(result2.summary2())

Optimization terminated successfully.
         Current function value: 0.245994
         Iterations 5
                              Results: Logit
Model:                 Logit               Pseudo R-squared:   inf        
Dependent Variable:    funding_completed   AIC:                556349.3014
Date:                  2018-12-09 11:31    BIC:                556695.5135
No. Observations:      1130701             Log-Likelihood:     -2.7815e+05
Df Model:              28                  LL-Null:            0.0000     
Df Residuals:          1130672             LLR p-value:        1.0000     
Converged:             1.0000              Scale:              1.0000     
No. Iterations:        5.0000                                             
--------------------------------------------------------------------------
                              Coef.  Std.Err.   z    P>|z|   [0.025 0.975]
--------------------------------------------------------------------------
const                       