<a href="https://colab.research.google.com/github/debi92/RepoCourseraGitHub/blob/main/Tuition_Fee_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Tuition Fee Prediction based on Student's Family's Socio-Economic Factors

Education is still a major problem in Indonesia. One of the main reason is the unequality of educational facility and socio economic factors that caused someone to dropout from school. This project is to give prediction on which category of tuition fee that every student need to pay based on their family's socio-economic factors, such as: income, electricity bill, water bill, internet bill, house ownership, house size, number of dependents in the family so it will give an equality in the payment of education fees, where people with better economies will help paying for the poorer education. 
Manually analyzing these factors is mundane, error-prone, and time-consuming (and time is money!). Luckily, this task can be automated with the power of machine learning. In this notebook, we will build an automatic tuition fee predictor using machine learning techniques.

The dataset we used was made by ourselves by gaining data from parents in Java (mostly Jakarta and West Java). We asked them their socio economic background and the tuition fee their paying monthly. 
The structure of this notebook is as follows:

First, we will start off by loading and viewing the dataset.

1.   First, we will start by importing the library that needed in this project
2.   Then we will load and view the dataset. 
3.   We will see that the dataset has a mixture of both numerical and non-numerical features, that it contains values from different ranges.
4.   We will have to preprocess the dataset to ensure the machine learning model we choose can make good predictions.
5.   After our data is in good shape, we will do some exploratory data analysis to build our intuitions.
6.   Finally, we will build a machine learning model that can predict the tuition fee's category. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

import tensorflow as tf

In [2]:
from google.colab import drive

tfp = pd.read_csv('/content/Dataset Sosial Ekonomi dan Uang Sekolah.csv')
tfp.head()

Unnamed: 0,Nama,Domisili,Pekerjaan,Tingkat,Nama sekolah,Jenis sekolah,Biaya sekolah,Kepemilikan rumah,Jenis bangunan,Ukuran rumah,Listrik,Air,Internet,Penghasilan,Jumlah tanggungan
0,HN,Jakarta,IRT,SD,Chandra Kusuma,Swasta,Rp 500.001 - Rp 1.000.000,Milik sendiri/ keluarga,Rumah,< 36 m2,Rp. 500.001 – Rp. 1.000.000,Rp. 250.001 – Rp. 500.000,Rp. 100.000 – Rp. 250.000,Rp. 3.000.001 - Rp. 6.000.000,2
1,HN,Jakarta,IRT,SD,SDN 01 pejagalan,Negri,≤ Rp. 100.000,Milik sendiri/ keluarga,Rumah,< 36 m2,Rp. 500.001 – Rp. 1.000.000,Rp. 250.001 – Rp. 500.000,Rp. 100.000 – Rp. 250.000,Rp. 3.000.001 - Rp. 6.000.000,2
2,SG,Banten,Guru,SD,SD 03,Negri,≤ Rp. 100.000,Kontrak/ sewa,Rumah,36 – 60 m2,Rp. 250.001 – Rp. 500.000,≤ Rp. 100.000,Rp. 100.000 – Rp. 250.000,Rp. 3.000.001 - Rp. 6.000.000,3
3,Ros,Jakarta,Wiraswasta,SD,SD CHANDRA KUSUMA,Swasta,Rp 500.001 - Rp 1.000.000,Kontrak/ sewa,Rumah,36 – 60 m2,Lebih dari Rp 1.000.000,Rp. 500.001 – Rp. 1.000.000,Rp. 250.001 – Rp. 500.000,Rp. 10.000.001 - Rp.20.000.000,1
4,DS,Jakarta,IRT,SD,Chandra Kusuma,Swasta,Rp 500.001 - Rp 1.000.000,Milik sendiri/ keluarga,Rumah,61 – 90 m2,Lebih dari Rp 1.000.000,Rp. 100.000 – Rp. 250.000,Rp. 100.000 – Rp. 250.000,Rp. 10.000.001 - Rp.20.000.000,3


# Check Data Type

In [3]:
tfp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113 entries, 0 to 112
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Nama               113 non-null    object
 1   Domisili           113 non-null    object
 2   Pekerjaan          113 non-null    object
 3   Tingkat            113 non-null    object
 4   Nama sekolah       113 non-null    object
 5   Jenis sekolah      113 non-null    object
 6   Biaya sekolah      113 non-null    object
 7   Kepemilikan rumah  113 non-null    object
 8   Jenis bangunan     113 non-null    object
 9   Ukuran rumah       113 non-null    object
 10  Listrik            113 non-null    object
 11  Air                113 non-null    object
 12  Internet           113 non-null    object
 13  Penghasilan        113 non-null    object
 14  Jumlah tanggungan  113 non-null    object
dtypes: object(15)
memory usage: 13.4+ KB


# Change categorical data

In [13]:
biaya_dict = {'≤ Rp. 100.000':1,'Rp 100.001 - Rp 500.000':2,'Rp 500.001 - Rp 1.000.000':3,'Rp 1.000.001 - Rp 2.000.000':4,'lebih dari 2 juta':5}
tfp['Biaya kat'] = tfp['Biaya sekolah'].map(biaya_dict)

kepemilikan_dict = {'Milik sendiri/ keluarga':2,'Kontrak/ sewa':1}
tfp['Kepemilikan kat'] = tfp['Kepemilikan rumah'].map(kepemilikan_dict)

jenis_bangunan_dict = {}

tfp.head()

Unnamed: 0,Nama,Domisili,Pekerjaan,Tingkat,Nama sekolah,Jenis sekolah,Biaya sekolah,Kepemilikan rumah,Jenis bangunan,Ukuran rumah,Listrik,Air,Internet,Penghasilan,Jumlah tanggungan,Biaya kat,Kepemilikan kat
0,HN,Jakarta,IRT,SD,Chandra Kusuma,Swasta,Rp 500.001 - Rp 1.000.000,Milik sendiri/ keluarga,Rumah,< 36 m2,Rp. 500.001 – Rp. 1.000.000,Rp. 250.001 – Rp. 500.000,Rp. 100.000 – Rp. 250.000,Rp. 3.000.001 - Rp. 6.000.000,2,3,2
1,HN,Jakarta,IRT,SD,SDN 01 pejagalan,Negri,≤ Rp. 100.000,Milik sendiri/ keluarga,Rumah,< 36 m2,Rp. 500.001 – Rp. 1.000.000,Rp. 250.001 – Rp. 500.000,Rp. 100.000 – Rp. 250.000,Rp. 3.000.001 - Rp. 6.000.000,2,1,2
2,SG,Banten,Guru,SD,SD 03,Negri,≤ Rp. 100.000,Kontrak/ sewa,Rumah,36 – 60 m2,Rp. 250.001 – Rp. 500.000,≤ Rp. 100.000,Rp. 100.000 – Rp. 250.000,Rp. 3.000.001 - Rp. 6.000.000,3,1,1
3,Ros,Jakarta,Wiraswasta,SD,SD CHANDRA KUSUMA,Swasta,Rp 500.001 - Rp 1.000.000,Kontrak/ sewa,Rumah,36 – 60 m2,Lebih dari Rp 1.000.000,Rp. 500.001 – Rp. 1.000.000,Rp. 250.001 – Rp. 500.000,Rp. 10.000.001 - Rp.20.000.000,1,3,1
4,DS,Jakarta,IRT,SD,Chandra Kusuma,Swasta,Rp 500.001 - Rp 1.000.000,Milik sendiri/ keluarga,Rumah,61 – 90 m2,Lebih dari Rp 1.000.000,Rp. 100.000 – Rp. 250.000,Rp. 100.000 – Rp. 250.000,Rp. 10.000.001 - Rp.20.000.000,3,3,2


# Choosing the most relevant variable


# Preprocessing Data

# Split the Data

# Modeling and Training

# Result & Visualization
