Ultimate Machine Learning Algorithms, published by Orange, AVA™
This repository contains comprehensive implementations of machine learning algorithms organized by chapters, covering topics from linear regression to advanced ensemble methods, clustering, dimensionality reduction, recommender systems, anomaly detection, and spam email classification.
- Chapter 2: Regression Algorithms (Linear, Polynomial, Robust, Quantile Regression)
- Chapter 3: Classification Algorithms (Logistic Regression, Naive Bayes, SVM, Decision Trees, Neural Networks)
- Chapter 4: Ensemble Methods (Random Forest, AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost, Stacking)
- Chapter 5: Model Evaluation (Metrics, Cross-Validation, Performance Analysis)
- Chapter 6: Clustering Algorithms (K-Means, Hierarchical, DBSCAN, OPTICS, Spectral, Gaussian Mixture)
- Chapter 7: Dimensionality Reduction (PCA, SVD, t-SNE, UMAP, Kernel PCA, Autoencoders)
- Chapter 8: Clustering Validation (Hopkins Statistics, Silhouette, Gap Statistics)
- Chapter 9: Recommender Systems (Content-Based, Collaborative Filtering, Matrix Factorization)
- Chapter 10: Anomaly Detection (Statistical Methods, Clustering-Based, Isolation Forest, Autoencoders)
- Chapter 11: Spam Email Classification (Preprocessing, Extraction, Selection, Text Classification)
- Python 3.8 or higher
- pip (Python package manager)
- virtualenv or venv
git clone <repository-url>
cd Ultimate-Machine-Learning-Algorithms# Create virtual environment
python3 -m venv test
# Activate virtual environment
source test/bin/activate# Create virtual environment
python -m venv test
# Activate virtual environment
test\Scripts\activateOnce the virtual environment is activated, install all required packages:
pip install --upgrade pip
pip install -r requirements.txtNote: Chapter 11 scripts automatically download NLTK data when first run. Alternatively, you can manually download them:
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('stopwords'); nltk.download('wordnet')"Required NLTK packages:
punkt- Sentence tokenizerpunkt_tab- Updated tokenizer modelsstopwords- Common stopwordswordnet- Lexical database for lemmatization
Navigate to any chapter directory and run the Python scripts:
# Example: Running Linear Regression from Chapter 2
python Chapter2/LinearRegression.py
# Example: Running XGBoost from Chapter 4
python Chapter4/XGBoost.py
# Example: Running K-Means from Chapter 6
python Chapter6/Kmeans.pyYou can also run scripts from the project root:
python Chapter3/NeuralNetwork.py
python Chapter7/UMAP.py
python Chapter10/IsolationForestAnomaly.pyThe project uses the following main libraries:
- Core Libraries: NumPy, Pandas, SciPy
- Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-learn
- Gradient Boosting: XGBoost, LightGBM, CatBoost
- Deep Learning: TensorFlow/Keras
- Dimensionality Reduction: UMAP
- Text Processing: NLTK, BeautifulSoup4
- Statistical Testing: diptest
See requirements.txt for complete list with version specifications.
When you're done working, deactivate the virtual environment:
deactivateIf you encounter issues installing TensorFlow:
- On macOS with Apple Silicon (M1/M2): Use
tensorflow-macos - Consider using a specific version:
pip install tensorflow==2.13.0
These libraries may require additional system dependencies:
- On macOS:
brew install libomp - On Linux:
sudo apt-get install libgomp1
If automatic NLTK downloads fail, manually download:
import nltk
nltk.download('all')- Some scripts generate visualizations that will display in popup windows
- Generated plots may also be saved as PNG files in the project directory
- Some scripts (especially deep learning models) may take time to train
- Ensure sufficient disk space for datasets (some scripts download data automatically)
Ultimate-Machine-Learning-Algorithms/
├── Chapter2/ # Regression algorithms
├── Chapter3/ # Classification algorithms
├── Chapter4/ # Ensemble methods
├── Chapter5/ # Supervised ML Model evaluation
├── Chapter6/ # Clustering algorithms
├── Chapter7/ # Dimensionality reduction
├── Chapter8/ # Clustering evaluation
├── Chapter9/ # Recommender systems
├── Chapter10/ # Anomaly detection
├── Chapter11/ # Spam Classification
├── requirements.txt # Python dependencies
└── README.md # This file
Published by Orange, AVA™