Skip to content

census_data_model using machine learning algorithm as well as deep learning algorithm

Notifications You must be signed in to change notification settings

Realworldnewpeopple/census_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

census_data_model using machine learning algorithm as well as deep learning algorithm

Project Description

In this project, I used sklearn and supervised learning techniques on data collected for the U.S. census to help a fictitious charity organization identify people most likely to donate to their cause.

Here, I first investigate the factors that affect the likelihood of charity donations being made. Then, I use a training and predicting pipeline to evaluate the accuracy and efficiency/speed of six supervised machine learning algorithms ('logistic_regression','support_vector_machine','RandomForestClassifier','knearest_neigbour_classification','AdaBoostClassifier','GradientBoostingClassifier'). I then proceed to fine tune the parameters of the algorithm that provides the highest donation yield (while reducing mailing efforts/costs). Finally, I also explore the impact of reducing number of features in data.

After carefully studying given scores and details about the model which gave max prediction around 87%, I shifted my attension of using deep learning model. Which clearly gave me an accuracy score of 97% after some tuning.

The main.py is used to predict and get an accuracy score for both machine learning and deep learning model dynamically. The trained deep learning model is saved in model folder and also note that graph for learning curve with train vs validation curve saved in img folder which is also dynamically saved according to the choice of the user from config.json which is in config folder.

For detailed anlysis and usage of machine learning algorithm(complete project) please look into project.ipynb but for deep learning model please do look into src folder and main.py.

Install

This project requires **Python 3.7** and the following Python libraries installed:

  • NumPy
  • Pandas
  • matplotlib
  • scikit-learn
  • keras

Code

The main code for this project is located in the `project.ipynb` notebook file. The other main codes is intertwined together along with main.py such as the folders(config,img,data,log,model,score,src) and files(census.csv,config.json,census.py,jsonstore.py,model_graph.py) are all working together to provide output in terminal and as well as file format such as accuracy_val.json,info.log,my_model.h5 as well as the graph produced in the img folder.