This repository consists of the following:
- Final Code in Python/Jupyter Notebook
- Final Code saved in HTML format
- Final project report (doc)
- Final project report (pdf)
- PPT Final presentation
- Copy of the original dataset (https://www.kaggle.com/datasets/adityakadiwal/water-potability/code)
The project report will be consistent with details within an academic paper. The upload code is accessible via google.colab in python
The notebook includes headings and some comments to help readers follow the process of data analysis.
- Data preparation section includes general investigation of the dataset
- Dealing with missing values, outliers, and imbalance of data walks through those respective processes
- Exporation of the dataset using ANOVA and PCA
- Preprocessing of the dataset
- Modeling using: Logistic Regression, KNN Regression, Decision Tree, Random Forest, SVM, and XGBoost Classifers
- Hyper-tunning parameters within each of the prediction/classification models
- Comparison of the model's performances
- Cross Validation
- Conclusions