A collection of machine learning mini-projects completed for the K. N. Toosi University of Technology Machine Learning course. Each project script is self-contained and generates plots and reports to dedicated folders.
Internet access is required for datasets fetched from OpenML and public repositories.
Machine-Learning-Projects/
├── P2.py # Mini Project 2: Naive Bayes, KNN+PCA, Decision Tree
├── P3.py # Mini Project 3: SVM (Air Quality) + PCA/LDA (Fashion-MNIST)
├── P4.py # Mini Project 4: McCulloch-Pitts, Weather NN, Q-learning (optional)
├── requirements.txt # Base Python dependencies
├── project2_plots/ # Generated plots for P2
├── project3_data/ # Engineered dataset artifacts for P3
├── project3_plots/ # Generated plots for P3
├── project4_plots/ # Generated plots for P4
├── report_plots/ # Duplicated/curated plots for reports
└── Figure_1.png, Figure_2.png # Additional figures
It is recommended to use Python 3.9–3.10 on Windows. TensorFlow support may be easiest with Python 3.10.
- Create and activate a virtual environment (PowerShell on Windows):
py -3.10 -m venv .venv
.\.venv\Scripts\Activate.ps1
- Install base requirements:
pip install -r requirements.txt
- Install extra packages used by P3 and P4:
pip install imbalanced-learn liac-arff tensorflow
Notes:
- If
fetch_openml
raises a parser warning, installingliac-arff
helps (already listed above). - If you face issues installing TensorFlow on Windows/CPU, try a specific version, e.g.
pip install tensorflow==2.13.*
with Python 3.10.
All scripts save their plots into their respective projectX_plots/
folders and print progress to the console. Ensure you have an active internet connection for dataset downloads.
File: P2.py
This script runs three tasks sequentially:
- Spam detection via Multinomial Naive Bayes (from scratch and scikit-learn).
- KNN digit classification on MNIST-784 with k-tuning and PCA-based dimensionality reduction.
- Decision Tree classification on the Carseats dataset with hyperparameter tuning and visualization.
Run:
python P2.py
Outputs (saved to project2_plots/
):
chart1_knn_vs_k.png
chart2_knn_vs_pca.png
chart3_dt_confusion_matrix.png
chart4_optimized_tree.png
Datasets Used:
- SMS Spam:
https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv
- MNIST-784: OpenML via
sklearn.datasets.fetch_openml
- Carseats:
https://raw.githubusercontent.com/JWarmenhoven/ISLR-python/master/Notebooks/Data/Carseats.csv
File: P3.py
This script has two parts:
- SVM classification of engineered air quality categories for Beijing with SMOTE, scaling, and GridSearchCV.
- Saves engineered data to
project3_data/beijing_aq_engineered.csv
. - Saves evaluation plots to
project3_plots/
.
- Saves engineered data to
- Dimensionality Reduction on Fashion-MNIST:
- Explained variance analysis with PCA.
- Denoising reconstruction using PCA.
- 2D visualizations comparing PCA vs. LDA.
Run:
python P3.py
Outputs (saved to project3_plots/
):
plot_pca_explained_variance.png
plot_pca_reconstruction.png
plot_pca_vs_lda.png
plot_svm_final_confusion_matrix.png
Datasets Used:
- Beijing PM2.5 Data: UCI ML Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv
- Fashion-MNIST: OpenML via
sklearn.datasets.fetch_openml
Tips:
- The OpenML downloads may take a while and consume memory; consider limiting samples if needed.
File: P4.py
This script includes:
- McCulloch-Pitts neuron network to classify points inside a triangle. Generates a plot.
- Weather prediction using a feedforward neural network (Keras/TensorFlow) on a sliding-window dataset.
- Requires
weather_prediction_dataset.csv
in the project root. If missing, the script will print a helpful message and skip training.
- Requires
- Optional Q-learning agent for the Wumpus World (kept commented by default).
Run:
python P4.py
Outputs (saved to project4_plots/
):
plot1_mcculloch_pitts.png
plot2_simple_nn_loss.png
plot3_deep_nn_loss.png
plot4_q_learning_reward.png
(only if you uncomment the Q-learning section)
To enable Q-learning in P4.py
, uncomment the line near the bottom:
# question_three_wumpus_world()
- OpenML/Network:
- Ensure you are online; some datasets are fetched at runtime.
- If downloads are slow or fail, retry later or configure an OpenML API key/cache.
- TensorFlow on Windows:
- Prefer Python 3.10 and TensorFlow 2.13+ for CPU installs.
- If GPU is desired, install CUDA/cuDNN versions compatible with your TensorFlow version.
- Virtual environment activation policy (PowerShell):
- If activation is blocked, run PowerShell as Administrator and execute:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
- If activation is blocked, run PowerShell as Administrator and execute:
This is a coursework repository. If you plan to extend it or accept contributions, consider adding a LICENSE file and contribution guidelines.