Predicting heart disease with decision tree classification
This project explores the application of decision tree classification on a given dataset. It includes data analysis, model building, evaluation, and visualization.
This project performs the following tasks:
- Data Analysis:
- Exploratory data analysis (EDA) using Pandas to understand the dataset's characteristics.
- Visualization of data distributions using Matplotlib and Seaborn (e.g., histograms, bar charts, pie charts).
- Duplicate data handling.
- Model Building:
- Building a decision tree classifier using Scikit-learn's
DecisionTreeClassifier. - Splitting the data into training and testing sets.
- Hyperparameter tuning using
GridSearchCVto optimize the model and mitigate overfitting.
- Building a decision tree classifier using Scikit-learn's
- Model Evaluation:
- Evaluating the model's performance using accuracy scores.
- Generating and visualizing a confusion matrix.
- Generating a classification report with precision, recall, and F1-scores.
- Analyzing feature importance.
- Tree Visualization:
- Visualizing the decision tree using
sklearn.tree.plot_treefor basic visualization. - Visualizing the decision tree using
export_graphvizandgraphvizfor more detailed and customizable visualization.
- Visualizing the decision tree using
- Clone the Repository:
git clone [https://github.com/Viraj97-SL/Machine-learning-Assignment01.git] cd [https://github.com/Viraj97-SL/Machine-learning-Assignment01.git] - Install Dependencies:
- Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
- Install the required Python libraries:
pip install pandas scikit-learn matplotlib seaborn graphviz
- Install Graphviz software:
- You must also install the Graphviz software on your system.
- Windows: Download the installer from http://www.graphviz.org/download/ and add the
bindirectory to your system's PATH. - macOS:
brew install graphviz - Linux (Debian/Ubuntu):
sudo apt-get install graphviz - Linux (Fedora/CentOS):
sudo yum install graphviz
- Create a virtual environment (recommended):
- Run the Jupyter Notebook:
- Start Jupyter Notebook:
jupyter notebook
- Open the
.ipynbfile and execute the cells.
- Start Jupyter Notebook:
-
Python Libraries:
pandas(for data manipulation)scikit-learn(for machine learning)matplotlib(for basic plotting)seaborn(for advanced plotting)graphviz(for decision tree visualization)
-
Graphviz Software:
- Graphviz is required for the advanced tree visualization. You need to install it separately according to your operating system.
Assignment(ML&NN).ipynb: Jupyter Notebook containing the code and analysis.hearts.csv: The dataset used in the analysis.README.md: This file.
- Ensure that the Graphviz
bindirectory is added to your system's PATH environment variable for thegraphvizvisualization to work correctly. - The notebook assumes that the data file is in the same directory as the notebook. Update the file paths if necessary.
- Feel free to modify the code and parameters to experiment with different settings and datasets.