Boosting - Step by step guide

Use the data you have analyzed in the previous two projects.
Continue with the development to find a model that fits better.

🌱 How to start this project

Follow the instructions below:

Create a new repository based on machine learning project by clicking here.
Open the newly created repository in Codespace using the Codespace button extension.
Once the Codespace VSCode has finished opening, start your project by following the instructions below.

🚛 How to deliver this project

Once you have finished solving the exercises, be sure to commit your changes, push them to your repository, and go to 4Geeks.com to upload the repository link.

📝 Instructions

Predicting diabetes

In the two previous projects, we saw how we could use a decision tree and then a random forest to improve the prediction of diabetes. We have reached a point where we need to improve. Can boosting be the best alternative to optimize the results?

Boosting is a sequential composition of models (usually decision trees) in which the new model aims to correct the errors of the previous one. This view may be useful in this data set, since several of the assumptions studied in the module are met.

In this project, you will focus on this idea by training the dataset to improve the $accuracy$.

Remember that previous projects can be found here (decision trees) and here (random forest).

Step 1: Loading the dataset

Loads the processed dataset from the previous project (split into training and test samples and analyzed with EDA).

Step 2: Build a boosting

One way to optimize and improve the results is to generate a boosting so that there is the necessary variety to enrich the prediction. Train it and analyze its results. Try modifying the hyperparameters that define the model with different values, analyzing their impact on the final accuracy, and plotting the conclusions.

Step 3: Save the model

Store the model in the corresponding folder.

Step 4: Analyze and compare model results

Make a study now of the three models used and analyze their predictions: the class with the highest prediction accuracy and the one with the lowest. Which of the three models do you choose?

Note: We also incorporated the solution samples on ./solution.ipynb that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
README.es.md		README.es.md
README.md		README.md
learn.json		learn.json
preview.png		preview.png
solution.es.ipynb		solution.es.ipynb
solution.ipynb		solution.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Boosting - Step by step guide

🌱 How to start this project

🚛 How to deliver this project

📝 Instructions

Predicting diabetes

Step 1: Loading the dataset

Step 2: Build a boosting

Step 3: Save the model

Step 4: Analyze and compare model results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

4GeeksAcademy/boosting-algorithms-project-tutorial

Folders and files

Latest commit

History

Repository files navigation

Boosting - Step by step guide

🌱 How to start this project

🚛 How to deliver this project

📝 Instructions

Predicting diabetes

Step 1: Loading the dataset

Step 2: Build a boosting

Step 3: Save the model

Step 4: Analyze and compare model results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages