Breast Cancer Prediction - 5th Semester Data Analytics Project

Team name: Patterns n Parameters

This is the final Data Analytics course project repository where we have implemented Breast Cancer tumor classification into malignant and benign thereby predicting the chance of breast cancer. We have used the kaggle data set and implemented Logistic Regression, Naive Bayes Algorithm, KNeighbors Classifier, Decision Tree Classifier, Random Forest Classifier and AdaBoost Classiifier.

Dataset Description:

The dataset given to us was aready cleaned and ready for pre-processing, having the following features:

Each record consists of 32 features including the tumor details such as radius, perimeter, texture, density, symmetry, diagnosis etc.
Each tumor is one of 2 classes, benign or malignant
400+ examples in train and 100+ in test

Link to the original dataset: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data/tasks (Breast Cancer Wisconsin (Diagnostic) Data Set)

Libraries Used:

Pandas, Numpy, Matplotlib, Seaborn, Scipy

Steps:

Import all the libraries mentioned above along with os, warning and datetime. Also import drive from google, and import/read the dataset.
Perform data visualization to find out the state of the data - ready to use or pre-processing required. Use pie charts, bar graphs, histograms, heat maps etc.
Pre-processing - Drop the null value entries or use imputation i.e replacing with mean or median. We have dropped the null/NaN value entries.
Store diagnosis in a separate list. Divide the dataset into train and test in the ratio 80:20.
Apply each model and print each of their accuracy scores.

Conclusions and Performance Metrics

So, concluding the accuracy of different models:

AdaBoost Classifier = 98.24 %
Random Forest Classifier = 95.61 %
Decision Tree Classifier = 94.78 %
K Neighbours Classifier = 70.18 %
Naive Bayes = 63.30 %
Logistic Regression = 58.82%

Acknowledgements

I'd like to thank Prof. Bharathi R for her guidance throughout the project. I'd also like to thank my teammates - Tankala Sunaina, Sanjana Murthy and Susan Mathew for their contribution in the project.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Breast Cancer Classification and Prediction.docx.pdf		Breast Cancer Classification and Prediction.docx.pdf
Breast-cancer-eda-and-prediction.ipynb		Breast-cancer-eda-and-prediction.ipynb
Literature Survey Summary.pdf		Literature Survey Summary.pdf
Literature survey - Excel Sheet.pdf		Literature survey - Excel Sheet.pdf
Project Report.pdf		Project Report.pdf
README.md		README.md
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast Cancer Prediction - 5th Semester Data Analytics Project

Dataset Description:

Libraries Used:

Steps:

Conclusions and Performance Metrics

Acknowledgements

About

Releases

Packages

Languages

Toshani/Breast-Cancer-Prediction-Data-Analytics

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Prediction - 5th Semester Data Analytics Project

Dataset Description:

Libraries Used:

Steps:

Conclusions and Performance Metrics

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages