TODO:

- [ ] Update Planning TODO list
- [ ] Create script version

# Executive Summary

This notebook will attempt to answer the following research question:

    What's money got to do with it?

## PLANNING

- [X] Planning
    - [X] import libraries/packages
    - [X] configure notebook environment
    - [X] define helper functions
- [X] Acquire data
    - [X] get PEIMS financial data
    - [X] get STAAR performance data
    - [X] get ETHNICITY data
- [X] Use Dtale to analyze the dataset
         
*First, let's prepare the notebook environment*

In [16]:
# for manipulating dataframes
import pandas as pd

# for EDA
import dtale
import sweetviz as sv

# to print out all the outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [2]:
# get the ETHNICITY, PEIMS, and STAAR datasets
ethnic = pd.read_csv('../data/inter/clean_ethnic_2019.csv')
staar = pd.read_csv('../data/inter/clean_staar_2019.csv')
peims = pd.read_csv('../data/inter/clean_peims_2019.csv')

In [3]:
ethnic.head()
staar.head()
peims.head()

Unnamed: 0,District,Percentage of Non-White Students
0,1902,0.145315
1,1903,0.211415
2,1904,0.213158
3,1906,0.268657
4,1907,0.717973


Unnamed: 0,District,Total Number of Students,Total Number of Passing Students
0,1902,932,842
1,1903,1991,1651
2,1904,1365,1064
3,1906,569,420
4,1907,5497,3274


Unnamed: 0,DISTRICT,TOTAL PAYROLL EXPENDITURES,TOTAL PROFESSIONAL & CONTRACTED SERVICES EXPENDITURES,TOTAL SUPPLIES & MATERIALS EXPENDITURES,TOTAL OTHER OPERATING EXPENDITURES,"INSTRUCTION + TRANSFER EXPEND-FCT11,95","INSTRUC RESOURCE MEDIA SERVICE EXP, FCT12","CURRICULUM/STAFF DEVELOPMENT EXP, FCT13","INSTRUC LEADERSHIP EXPEND, FCT21","CAMPUS ADMINISTRATION EXPEND, FCT23","GUIDANCE & COUNSELING SERVICES EXP, FCT31","SOCIAL WORK SERVICES EXP, FCT32","HEALTH SERVICES EXP, FCT33","TRANSPORTATION EXPENDITURES, FCT34","FOOD SERVICE EXPENDITURES, FCT35","EXTRACURRICULAR EXPENDITURES,FCT36","GENERAL ADMINISTRAT EXPEND-FCT41,92","PLANT MAINTENANCE/OPERA EXPEND, FCT51","SECURITY/MONITORING SERVICE EXPEND, FCT52","DATA PROCESSING SERVICES EXPEND, FCT53","COMMUNITY SERVICES, FCT61",REGULAR PROGRAM EXPEND--11,GIFTED/TALENTED PROGRAM EXPEND--21,CAREER & TECHNOLOGY PGM EXPEND--22,STUDENTS WITH DISABILITIES PGM EXPEND--23,"STATE COMPENSATORY ED EXPEND--24, 26, 28, 29, 30, 34","BILINGUAL PROGRAM EXP--25, 35",HIGH SCHOOL ALLOTMENT PROGRAM--31,"PREKINDERGARTEN--32,35",ATHLETICS PROGRAM EXPEND--91,UNDISTRIBUTED PROGRAM EXP--99,TOTAL PROGRAM OPERATING EXPENDITURES,TOTAL OTHER USES
0,1902,6025217,1075904,648206,809559,4649118,66490,4986,270353,306385,998314,0,37882,293070,287406,413755,284553,773085,0,173489,0,2778638,3968,251350,3005575,273747,9599,40285,32890,304174,1858660,8558886,48633
1,1903,9093950,1514689,784631,303052,7043892,117860,33175,66374,574699,202086,0,33657,422887,630202,598484,558948,1248908,13530,151120,500,5313722,93,852319,1028587,799037,0,101243,0,339045,3262276,11696322,102465
2,1904,6659596,927209,937810,278109,4611747,51126,157830,0,466345,199338,0,102385,38800,411195,754465,539512,1014501,45482,409998,0,3945494,10154,552217,726827,377013,0,59567,114404,571388,2445660,8802724,481
3,1906,3134475,373513,408024,105878,2087166,19990,0,7905,379101,75235,0,40628,148301,257465,210240,201520,465549,10415,118375,0,1499301,14498,164641,447072,402415,2706,48748,29920,0,1412589,4021890,53786
4,1907,25587063,5603896,4134969,1048416,18807861,167823,535649,1033275,2201907,1443630,170074,208736,1442619,2071781,1422648,1287489,3937087,242658,1006175,394932,15527277,39671,1625090,2422707,3147717,231026,302531,923035,1214433,10940857,36374344,0


In [4]:
peims = peims.rename(columns={'DISTRICT':'District'})

In [5]:
# prep to do string operations onthe District columns
ethnic = ethnic.astype({"District": str})
staar = staar.astype({"District": str})
peims = peims.astype({"District": str})

In [6]:
# padd District numbers with 0's
ethnic['District'] = ethnic['District'].str.zfill(6)
staar['District'] = staar['District'].str.zfill(6)
peims['District'] = peims['District'].str.zfill(6)

In [7]:
ethnic.head()
staar.head()
peims.head()

Unnamed: 0,District,Percentage of Non-White Students
0,1902,0.145315
1,1903,0.211415
2,1904,0.213158
3,1906,0.268657
4,1907,0.717973


Unnamed: 0,District,Total Number of Students,Total Number of Passing Students
0,1902,932,842
1,1903,1991,1651
2,1904,1365,1064
3,1906,569,420
4,1907,5497,3274


Unnamed: 0,District,TOTAL PAYROLL EXPENDITURES,TOTAL PROFESSIONAL & CONTRACTED SERVICES EXPENDITURES,TOTAL SUPPLIES & MATERIALS EXPENDITURES,TOTAL OTHER OPERATING EXPENDITURES,"INSTRUCTION + TRANSFER EXPEND-FCT11,95","INSTRUC RESOURCE MEDIA SERVICE EXP, FCT12","CURRICULUM/STAFF DEVELOPMENT EXP, FCT13","INSTRUC LEADERSHIP EXPEND, FCT21","CAMPUS ADMINISTRATION EXPEND, FCT23","GUIDANCE & COUNSELING SERVICES EXP, FCT31","SOCIAL WORK SERVICES EXP, FCT32","HEALTH SERVICES EXP, FCT33","TRANSPORTATION EXPENDITURES, FCT34","FOOD SERVICE EXPENDITURES, FCT35","EXTRACURRICULAR EXPENDITURES,FCT36","GENERAL ADMINISTRAT EXPEND-FCT41,92","PLANT MAINTENANCE/OPERA EXPEND, FCT51","SECURITY/MONITORING SERVICE EXPEND, FCT52","DATA PROCESSING SERVICES EXPEND, FCT53","COMMUNITY SERVICES, FCT61",REGULAR PROGRAM EXPEND--11,GIFTED/TALENTED PROGRAM EXPEND--21,CAREER & TECHNOLOGY PGM EXPEND--22,STUDENTS WITH DISABILITIES PGM EXPEND--23,"STATE COMPENSATORY ED EXPEND--24, 26, 28, 29, 30, 34","BILINGUAL PROGRAM EXP--25, 35",HIGH SCHOOL ALLOTMENT PROGRAM--31,"PREKINDERGARTEN--32,35",ATHLETICS PROGRAM EXPEND--91,UNDISTRIBUTED PROGRAM EXP--99,TOTAL PROGRAM OPERATING EXPENDITURES,TOTAL OTHER USES
0,1902,6025217,1075904,648206,809559,4649118,66490,4986,270353,306385,998314,0,37882,293070,287406,413755,284553,773085,0,173489,0,2778638,3968,251350,3005575,273747,9599,40285,32890,304174,1858660,8558886,48633
1,1903,9093950,1514689,784631,303052,7043892,117860,33175,66374,574699,202086,0,33657,422887,630202,598484,558948,1248908,13530,151120,500,5313722,93,852319,1028587,799037,0,101243,0,339045,3262276,11696322,102465
2,1904,6659596,927209,937810,278109,4611747,51126,157830,0,466345,199338,0,102385,38800,411195,754465,539512,1014501,45482,409998,0,3945494,10154,552217,726827,377013,0,59567,114404,571388,2445660,8802724,481
3,1906,3134475,373513,408024,105878,2087166,19990,0,7905,379101,75235,0,40628,148301,257465,210240,201520,465549,10415,118375,0,1499301,14498,164641,447072,402415,2706,48748,29920,0,1412589,4021890,53786
4,1907,25587063,5603896,4134969,1048416,18807861,167823,535649,1033275,2201907,1443630,170074,208736,1442619,2071781,1422648,1287489,3937087,242658,1006175,394932,15527277,39671,1625090,2422707,3147717,231026,302531,923035,1214433,10940857,36374344,0


In [8]:
# merge all the dataframes together on  District
df = pd.merge(peims, staar, on=['District'])
df = pd.merge(df, ethnic, on=['District'])

In [9]:
df.head()

Unnamed: 0,District,TOTAL PAYROLL EXPENDITURES,TOTAL PROFESSIONAL & CONTRACTED SERVICES EXPENDITURES,TOTAL SUPPLIES & MATERIALS EXPENDITURES,TOTAL OTHER OPERATING EXPENDITURES,"INSTRUCTION + TRANSFER EXPEND-FCT11,95","INSTRUC RESOURCE MEDIA SERVICE EXP, FCT12","CURRICULUM/STAFF DEVELOPMENT EXP, FCT13","INSTRUC LEADERSHIP EXPEND, FCT21","CAMPUS ADMINISTRATION EXPEND, FCT23","GUIDANCE & COUNSELING SERVICES EXP, FCT31","SOCIAL WORK SERVICES EXP, FCT32","HEALTH SERVICES EXP, FCT33","TRANSPORTATION EXPENDITURES, FCT34","FOOD SERVICE EXPENDITURES, FCT35","EXTRACURRICULAR EXPENDITURES,FCT36","GENERAL ADMINISTRAT EXPEND-FCT41,92","PLANT MAINTENANCE/OPERA EXPEND, FCT51","SECURITY/MONITORING SERVICE EXPEND, FCT52","DATA PROCESSING SERVICES EXPEND, FCT53","COMMUNITY SERVICES, FCT61",REGULAR PROGRAM EXPEND--11,GIFTED/TALENTED PROGRAM EXPEND--21,CAREER & TECHNOLOGY PGM EXPEND--22,STUDENTS WITH DISABILITIES PGM EXPEND--23,"STATE COMPENSATORY ED EXPEND--24, 26, 28, 29, 30, 34","BILINGUAL PROGRAM EXP--25, 35",HIGH SCHOOL ALLOTMENT PROGRAM--31,"PREKINDERGARTEN--32,35",ATHLETICS PROGRAM EXPEND--91,UNDISTRIBUTED PROGRAM EXP--99,TOTAL PROGRAM OPERATING EXPENDITURES,TOTAL OTHER USES,Total Number of Students,Total Number of Passing Students,Percentage of Non-White Students
0,1902,6025217,1075904,648206,809559,4649118,66490,4986,270353,306385,998314,0,37882,293070,287406,413755,284553,773085,0,173489,0,2778638,3968,251350,3005575,273747,9599,40285,32890,304174,1858660,8558886,48633,932,842,0.145315
1,1903,9093950,1514689,784631,303052,7043892,117860,33175,66374,574699,202086,0,33657,422887,630202,598484,558948,1248908,13530,151120,500,5313722,93,852319,1028587,799037,0,101243,0,339045,3262276,11696322,102465,1991,1651,0.211415
2,1904,6659596,927209,937810,278109,4611747,51126,157830,0,466345,199338,0,102385,38800,411195,754465,539512,1014501,45482,409998,0,3945494,10154,552217,726827,377013,0,59567,114404,571388,2445660,8802724,481,1365,1064,0.213158
3,1906,3134475,373513,408024,105878,2087166,19990,0,7905,379101,75235,0,40628,148301,257465,210240,201520,465549,10415,118375,0,1499301,14498,164641,447072,402415,2706,48748,29920,0,1412589,4021890,53786,569,420,0.268657
4,1907,25587063,5603896,4134969,1048416,18807861,167823,535649,1033275,2201907,1443630,170074,208736,1442619,2071781,1422648,1287489,3937087,242658,1006175,394932,15527277,39671,1625090,2422707,3147717,231026,302531,923035,1214433,10940857,36374344,0,5497,3274,0.717973


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 36 columns):
 #   Column                                                 Non-Null Count  Dtype  
---  ------                                                 --------------  -----  
 0   District                                               1250 non-null   object 
 1   TOTAL PAYROLL EXPENDITURES                             1250 non-null   int64  
 2   TOTAL PROFESSIONAL & CONTRACTED SERVICES EXPENDITURES  1250 non-null   int64  
 3   TOTAL SUPPLIES & MATERIALS EXPENDITURES                1250 non-null   int64  
 4   TOTAL OTHER OPERATING EXPENDITURES                     1250 non-null   int64  
 5   INSTRUCTION + TRANSFER EXPEND-FCT11,95                 1250 non-null   int64  
 6   INSTRUC RESOURCE MEDIA SERVICE EXP, FCT12              1250 non-null   int64  
 7   CURRICULUM/STAFF DEVELOPMENT EXP, FCT13                1250 non-null   int64  
 8   INSTRUC LEADERSHIP EXPEND, FCT21                

In [13]:
# perform EDA using Dtale
dtale.show(df)




is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) i

In [17]:
# EDA using Autoviz
sweet_report = sv.analyze(df)

#Saving results to HTML file
sweet_report.show_html('../viz/sweet_report.html')


is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead

Done! Use 'show' commands to display/save.   |████████████████████████████████████████████████████████████████████████████████████████████| [100%]   00:04 -> (00:00 left)


Report ../viz/sweet_report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
