This is a practice project using the IBM Data Science Methodology. The project uses a recipe dataset and aims to demonstrate how to apply the methodology to a specific problem. The dataset is loaded using Pandas from a URL and contains information on Country, and Ingredients.
The project contains a Python script that loads the recipe dataset using Pandas and applies the IBM Data Science Methodology to the problem of analyzing recipe data. The dataset is sourced from a URL and includes information on Country, and Ingredients. The script is thoroughly documented and explains each step in the methodology.
To run the script, you need to have Python and the necessary libraries (Pandas, Numpy, Scikit-learn, and Matplotlib) installed on your machine. The script can be run using a Jupyter Notebook or from the command line.
The script contains the following steps:
- Business Understanding: Defining the problem and identifying the data required to solve it.
- Analytic Approach: Identifying the type of analysis to be performed on the data.
- Data Requirements: Outlining the data needed to solve the problem.
- Data Collection: Loading the recipe dataset from a URL and performing an initial inspection of the data.
- Data Understanding and Preparation: Preprocessing the data by removing missing values, transforming categorical variables, and scaling the data.
- Modeling: Building and evaluating a Decision Tree model to classify of a recipe based on its ingredients.
- Evaluation: Evaluating the performance of the Decision Tree model using confusion matrix for how well the decision tree is able to correctly classify the recipes.
- Deployment: Deploying the model in a production environment and iterating the process.
The results of the analysis and the evaluation metrics are presented in the Jupyter Notebook or console output.