Skip to content

CIND820 Capstone Project: Classification of Water Quality samples using ML

Notifications You must be signed in to change notification settings

annsam0115/Water-Classification-Capstone-Project

Repository files navigation

CIND820 - Big Data Analysis Project

This repository consists of the following:

  1. Final Code in Python/Jupyter Notebook
  2. Final Code saved in HTML format
  3. Final project report (doc)
  4. Final project report (pdf)
  5. PPT Final presentation
  6. Copy of the original dataset (https://www.kaggle.com/datasets/adityakadiwal/water-potability/code)

The project report will be consistent with details within an academic paper. The upload code is accessible via google.colab in python

The notebook includes headings and some comments to help readers follow the process of data analysis.

  1. Data preparation section includes general investigation of the dataset
  2. Dealing with missing values, outliers, and imbalance of data walks through those respective processes
  3. Exporation of the dataset using ANOVA and PCA
  4. Preprocessing of the dataset
  5. Modeling using: Logistic Regression, KNN Regression, Decision Tree, Random Forest, SVM, and XGBoost Classifers
  6. Hyper-tunning parameters within each of the prediction/classification models
  7. Comparison of the model's performances
  8. Cross Validation
  9. Conclusions