CIND820 - Big Data Analysis Project

This repository consists of the following:

Final Code in Python/Jupyter Notebook
Final Code saved in HTML format
Final project report (doc)
Final project report (pdf)
PPT Final presentation
Copy of the original dataset (https://www.kaggle.com/datasets/adityakadiwal/water-potability/code)

The project report will be consistent with details within an academic paper. The upload code is accessible via google.colab in python

The notebook includes headings and some comments to help readers follow the process of data analysis.

Data preparation section includes general investigation of the dataset
Dealing with missing values, outliers, and imbalance of data walks through those respective processes
Exporation of the dataset using ANOVA and PCA
Preprocessing of the dataset
Modeling using: Logistic Regression, KNN Regression, Decision Tree, Random Forest, SVM, and XGBoost Classifers
Hyper-tunning parameters within each of the prediction/classification models
Comparison of the model's performances
Cross Validation
Conclusions

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
CIND820_CapstoneProject_ASAM (Final).html		CIND820_CapstoneProject_ASAM (Final).html
CIND820_CapstoneProject_ASAM.ipynb		CIND820_CapstoneProject_ASAM.ipynb
CIND820_FInalReport_ASAM.docx		CIND820_FInalReport_ASAM.docx
CIND820_FinalReport_ASAM.pdf		CIND820_FinalReport_ASAM.pdf
CIND820_PPT.pdf		CIND820_PPT.pdf
README.md		README.md
water_potability.csv		water_potability.csv

Provide feedback