#### Dataset: DepEd 10% Schools (Stratified) 2015 NAT takers

- Classification goal: To predict NAT Overall Score
- Number of Instances: 3377 for depedES.csv
- Number of Attributes: 20 + output attribute.

### Attributes (DepEd ES data)
   1. SchID: unique 6-digit code assigned to public school (numeric)
   2. SchName : name of school (object: text)
   3. DivName : division name (object: mixed characters)
   4. DivCode : division code (object: mixed characters)
   5. Clusters: NAT grouping by DepEd (categorical: "1", "2", "3", "4", "5", "6")
   6. Region: schools regional location (categorical: values 1 to 16)
   7. Filipno: school average filipino NAT score (categorical: 1 means below 75; 2 means above 75)*
   8. AralinP: school average araling panlipunan NAT score (categorical: 1 means below 75; 2 means above 75)* 
   9. Mathematics: school average mathematics NAT score (categorical: 1 means below 75; 2 means above 75)*
   10. Science: school average science NAT score  (categorical: 1 means below 75; 2 means above 75)*
   11. English: school average english NAT score (categorical: 1 means below 75; 2 means above 75)*
   12. CriticalThinking: critical thinking NAT score (categorical: 1 means below 75; 2 means above 75)*
   13. Overall: school average NAT score (categorical: 1 means below 75; 2 means above 75)*
   14. Internet: does school have internet (binary: 0 means "no", 1 means "yes")
   15. Lat: latitude coordinate location of school (numeric: -90 to 90)
   16. Long: longitude coordinate location of school (numeric: -180 to 180)
   17. Enrolled: number of students  (numeric)     
   18. Teacher: number of teacher items (numeric)     
   19. Ratio: teacher to student ratio(numeric between 0 to 1)
   20. Energized : does school have electricity (binary: 0 means "no", 1 means "yes")
   21. SchType: type of school (categorical: "School with no annexes","mother school", "Annex or extension")
   *75 was used because it is the median


Missing Attribute Values: There are several missing values in some categorical attributes, all coded with the "unknown" label. These missing values can be treated as a possible class label or using deletion or imputation techniques. 


### Load dataset

In [1]:
import pandas as pd 
import numpy as np
DF = pd.read_csv('DepEd_ES.csv')
DF.head()

Unnamed: 0,SchID,SchName,DivName,DivisionCode,Province,MunicipalityORCity,Clusters,Region,Filipino,AralinP,...,Teachers,Ratio,Energized,Grid,SchoolType,PBOR,PovertyCluster,PovertyIncidence,LowerLimit,UpperLimit
0,134968,Bangued East CS,Abra,N01,ABRA,BANGUED (Capital),4,CAR,44.0141,37.4648,...,19,23,1,1,School with no Annexes,15781.0,3,20,16,53
1,135008,Siwasiw ES,Abra,N01,ABRA,BUCAY,5,CAR,64.4022,64.7283,...,6,37,1,1,School with no Annexes,15781.0,3,20,16,53
2,134995,Layugan ES,Abra,N01,ABRA,BUCAY,5,CAR,67.8049,75.9756,...,9,32,1,1,School with no Annexes,15781.0,3,20,16,53
3,135182,Ducligan ES,Abra,N01,ABRA,BUCLOC,6,CAR,72.8846,73.8462,...,12,20,1,1,School with no Annexes,15781.0,3,20,16,53
4,135022,Pacac ES,Abra,N01,ABRA,DOLORES,6,CAR,72.7083,80.4167,...,4,19,1,1,School with no Annexes,15781.0,3,20,16,53


In [2]:
DF.describe(include='all')

Unnamed: 0,SchID,SchName,DivName,DivisionCode,Province,MunicipalityORCity,Clusters,Region,Filipino,AralinP,...,Teachers,Ratio,Energized,Grid,SchoolType,PBOR,PovertyCluster,PovertyIncidence,LowerLimit,UpperLimit
count,3377.0,3377,3377,3377,3377,3377,3377.0,3377,3377.0,3377.0,...,3377.0,3377.0,3377.0,3377.0,3377,3377.0,3377.0,3377.0,3377.0,3377.0
unique,,2990,210,212,86,1160,,17,,,...,,,,,4,,,,,
top,,San Isidro ES,Leyte,H04,LEYTE,DAVAO CITY,,VIII,,,...,,,,,School with no Annexes,,,,,
freq,,23,105,105,119,34,,347,,,...,,,,,3310,,,,,
mean,121042.840983,,,,,,4.888955,,71.915518,74.100833,...,14.55345,31.981344,0.910868,0.864377,,5889.640163,3.031093,21.677821,15.552561,27.097424
std,20254.523068,,,,,,1.082683,,11.823959,16.099594,...,35.390663,10.282097,0.284977,0.342439,,5666.438632,1.236028,12.920435,10.649507,16.04268
min,100003.0,,,,,,1.0,,27.0,16.8421,...,1.0,2.0,0.0,0.0,,277.29,1.0,0.0,0.0,0.0
25%,109134.0,,,,,,4.0,,64.3671,65.0,...,6.0,25.0,1.0,1.0,,277.29,2.0,12.0,8.0,16.0
50%,118739.0,,,,,,5.0,,73.2813,77.7119,...,8.0,32.0,1.0,1.0,,8982.0,3.0,19.0,14.0,25.0
75%,127712.0,,,,,,6.0,,81.0556,86.3889,...,14.0,38.0,1.0,1.0,,10732.0,4.0,30.0,21.0,39.0


In [3]:
DF['PovertyI'] = DF.PovertyIncidence.astype(int)

In [4]:
#DF['pbor'] = DF.PBOR.astype(int)

In [5]:
bins = [0, 65, 100]
DF['overall'] = np.searchsorted(bins, DF['Overall'].values)

In [6]:
DFI = DF.drop(['PBOR', 'PovertyIncidence', 'Mathematics', 'Science', 'Filipino', 'English', 'AralinP', 'Overall'], axis=1)

In [7]:
DFI.columns

Index([u'SchID', u'SchName', u'DivName', u'DivisionCode', u'Province',
       u'MunicipalityORCity', u'Clusters', u'Region', u'Internet', u'Lat',
       u'Long', u'Enrolled', u'Teachers', u'Ratio', u'Energized', u'Grid',
       u'SchoolType', u'PovertyCluster', u'LowerLimit', u'UpperLimit',
       u'PovertyI', u'overall'],
      dtype='object')

In [8]:
DFI.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3377 entries, 0 to 3376
Data columns (total 22 columns):
SchID                 3377 non-null int64
SchName               3377 non-null object
DivName               3377 non-null object
DivisionCode          3377 non-null object
Province              3377 non-null object
MunicipalityORCity    3377 non-null object
Clusters              3377 non-null int64
Region                3377 non-null object
Internet              3377 non-null int64
Lat                   3377 non-null float64
Long                  3377 non-null float64
Enrolled              3377 non-null int64
Teachers              3377 non-null int64
Ratio                 3377 non-null int64
Energized             3377 non-null int64
Grid                  3377 non-null int64
SchoolType            3377 non-null object
PovertyCluster        3377 non-null int64
LowerLimit            3377 non-null int64
UpperLimit            3377 non-null int64
PovertyI              3377 non-null int64


- Need to drop the not so important columns to make the data more meaningful.

In [9]:
# Drop columns 
df = DFI.drop(['SchID', 'SchName','DivName', 'MunicipalityORCity', 'Province', 'DivisionCode','Clusters','Lat', 'Long', 'Enrolled', 'Teachers', 'Grid', 'PovertyCluster', 'LowerLimit', 'UpperLimit'], axis=1)

In [10]:
df.head()

Unnamed: 0,Region,Internet,Ratio,Energized,SchoolType,PovertyI,overall
0,CAR,0,23,1,School with no Annexes,20,1
1,CAR,1,37,1,School with no Annexes,20,2
2,CAR,0,32,1,School with no Annexes,20,2
3,CAR,0,20,1,School with no Annexes,20,2
4,CAR,1,19,1,School with no Annexes,20,2


## One Hot Encoding

In [11]:
df_processed = pd.get_dummies(df['Region'])

In [12]:
df_processed.head()

Unnamed: 0,ARMM,CAR,CARAGA,I,II,III,IV-A,IV-B,IX,NCR,V,VI,VII,VIII,X,XI,XII
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [13]:
df_proc = pd.get_dummies(df['SchoolType'])

In [14]:
df_proc.head()

Unnamed: 0,Annex or Extension school(s),Mobile School(s)/Center(s),Mother school,School with no Annexes
0,0,0,0,1
1,0,0,0,1
2,0,0,0,1
3,0,0,0,1
4,0,0,0,1


In [15]:
combine = pd.concat([df, df_processed, df_proc], axis=1)

In [16]:
combine.head()

Unnamed: 0,Region,Internet,Ratio,Energized,SchoolType,PovertyI,overall,ARMM,CAR,CARAGA,...,VI,VII,VIII,X,XI,XII,Annex or Extension school(s),Mobile School(s)/Center(s),Mother school,School with no Annexes
0,CAR,0,23,1,School with no Annexes,20,1,0,1,0,...,0,0,0,0,0,0,0,0,0,1
1,CAR,1,37,1,School with no Annexes,20,2,0,1,0,...,0,0,0,0,0,0,0,0,0,1
2,CAR,0,32,1,School with no Annexes,20,2,0,1,0,...,0,0,0,0,0,0,0,0,0,1
3,CAR,0,20,1,School with no Annexes,20,2,0,1,0,...,0,0,0,0,0,0,0,0,0,1
4,CAR,1,19,1,School with no Annexes,20,2,0,1,0,...,0,0,0,0,0,0,0,0,0,1


In [17]:
comb = combine.drop(['Region', 'SchoolType'], axis=1)

In [18]:
comb.head()

Unnamed: 0,Internet,Ratio,Energized,PovertyI,overall,ARMM,CAR,CARAGA,I,II,...,VI,VII,VIII,X,XI,XII,Annex or Extension school(s),Mobile School(s)/Center(s),Mother school,School with no Annexes
0,0,23,1,20,1,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,1,37,1,20,2,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,0,32,1,20,2,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,0,20,1,20,2,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,1,19,1,20,2,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [19]:
comb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3377 entries, 0 to 3376
Data columns (total 26 columns):
Internet                        3377 non-null int64
Ratio                           3377 non-null int64
Energized                       3377 non-null int64
PovertyI                        3377 non-null int64
overall                         3377 non-null int64
ARMM                            3377 non-null uint8
CAR                             3377 non-null uint8
CARAGA                          3377 non-null uint8
I                               3377 non-null uint8
II                              3377 non-null uint8
III                             3377 non-null uint8
IV-A                            3377 non-null uint8
IV-B                            3377 non-null uint8
IX                              3377 non-null uint8
NCR                             3377 non-null uint8
V                               3377 non-null uint8
VI                              3377 non-null uint8
VII      

### MLP Neural Network Model Creation

In [20]:
X=comb.drop('overall',axis=1)
X.head()

Unnamed: 0,Internet,Ratio,Energized,PovertyI,ARMM,CAR,CARAGA,I,II,III,...,VI,VII,VIII,X,XI,XII,Annex or Extension school(s),Mobile School(s)/Center(s),Mother school,School with no Annexes
0,0,23,1,20,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,1,37,1,20,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,0,32,1,20,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,0,20,1,20,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,1,19,1,20,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [21]:
y= comb['overall']
y.head()
y = np.asarray(comb['overall'], dtype="|S6")

In [22]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)

In [23]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit the Training Data 
scaler.fit(X_train)

  return self.partial_fit(X, y)


StandardScaler(copy=True, with_mean=True, with_std=True)

In [24]:
X_train.head()

Unnamed: 0,Internet,Ratio,Energized,PovertyI,ARMM,CAR,CARAGA,I,II,III,...,VI,VII,VIII,X,XI,XII,Annex or Extension school(s),Mobile School(s)/Center(s),Mother school,School with no Annexes
1854,0,28,0,36,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2255,0,40,1,19,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,1
2943,0,64,0,50,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1048,1,49,1,26,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,1
1869,0,41,1,36,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


- MLP classifier models take at least 3 layers
- For simplicity sake we shall be taking 3 layers — ( 13 input, 10 hidden & 2 output) with maximum iterations of 1000. 
- This parameters can be fine tuned later on based on domain and data to improve the accuracy.

In [25]:
X_train = scaler.transform(X_train)
X_train

  """Entry point for launching an IPython kernel.


array([[-0.8314943 , -0.37865543, -3.17124085, ..., -0.01987714,
        -0.11839265,  0.14762035],
       [-0.8314943 ,  0.77058706,  0.31533398, ..., -0.01987714,
        -0.11839265,  0.14762035],
       [-0.8314943 ,  3.06907204, -3.17124085, ..., -0.01987714,
        -0.11839265,  0.14762035],
       ...,
       [ 1.20265407,  0.48327644,  0.31533398, ..., -0.01987714,
        -0.11839265,  0.14762035],
       [-0.8314943 ,  0.19596582, -3.17124085, ..., -0.01987714,
        -0.11839265,  0.14762035],
       [ 1.20265407,  1.15366789,  0.31533398, ..., -0.01987714,
        -0.11839265,  0.14762035]])

In [26]:
X_test = scaler.transform(X_test)
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(30, 25, 1),max_iter=1000) 
mlp.fit(X_train,y_train)

  """Entry point for launching an IPython kernel.


MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(30, 25, 1), learning_rate='constant',
       learning_rate_init=0.001, max_iter=1000, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

##### Model Validations
Now that the training data has been fit into the model, it is important that we validate the model for it’s accuracy.

In [27]:
predictions =mlp.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

[[ 55 116]
 [ 40 634]]


In [28]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test,predictions)
print accuracy

0.8153846153846154


As we can see the accuracy % of this model is ~ 92% which testifies the robustness of MLP classifier as one of the most preferred models for binary classification challenges.

Source: https://becominghuman.ai/multi-layer-perceptron-mlp-models-on-real-world-banking-data-f6dd3d7e998f