Skip to content

ava-orange-education/Ultimate-Machine-Learning-Algorithms-with-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ultimate-Machine-Learning-Algorithms

Ultimate Machine Learning Algorithms, published by Orange, AVA™

Overview

This repository contains comprehensive implementations of machine learning algorithms organized by chapters, covering topics from linear regression to advanced ensemble methods, clustering, dimensionality reduction, recommender systems, anomaly detection, and spam email classification.

Table of Contents

  • Chapter 2: Regression Algorithms (Linear, Polynomial, Robust, Quantile Regression)
  • Chapter 3: Classification Algorithms (Logistic Regression, Naive Bayes, SVM, Decision Trees, Neural Networks)
  • Chapter 4: Ensemble Methods (Random Forest, AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost, Stacking)
  • Chapter 5: Model Evaluation (Metrics, Cross-Validation, Performance Analysis)
  • Chapter 6: Clustering Algorithms (K-Means, Hierarchical, DBSCAN, OPTICS, Spectral, Gaussian Mixture)
  • Chapter 7: Dimensionality Reduction (PCA, SVD, t-SNE, UMAP, Kernel PCA, Autoencoders)
  • Chapter 8: Clustering Validation (Hopkins Statistics, Silhouette, Gap Statistics)
  • Chapter 9: Recommender Systems (Content-Based, Collaborative Filtering, Matrix Factorization)
  • Chapter 10: Anomaly Detection (Statistical Methods, Clustering-Based, Isolation Forest, Autoencoders)
  • Chapter 11: Spam Email Classification (Preprocessing, Extraction, Selection, Text Classification)

Prerequisites

  • Python 3.8 or higher
  • pip (Python package manager)
  • virtualenv or venv

Setup Instructions

1. Clone the Repository

git clone <repository-url>
cd Ultimate-Machine-Learning-Algorithms

2. Create a Virtual Environment

On macOS/Linux:

# Create virtual environment
python3 -m venv test

# Activate virtual environment
source test/bin/activate

On Windows:

# Create virtual environment
python -m venv test

# Activate virtual environment
test\Scripts\activate

3. Install Dependencies

Once the virtual environment is activated, install all required packages:

pip install --upgrade pip
pip install -r requirements.txt

4. Download NLTK Data (Required for Chapter 11)

Note: Chapter 11 scripts automatically download NLTK data when first run. Alternatively, you can manually download them:

python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('stopwords'); nltk.download('wordnet')"

Required NLTK packages:

  • punkt - Sentence tokenizer
  • punkt_tab - Updated tokenizer models
  • stopwords - Common stopwords
  • wordnet - Lexical database for lemmatization

Running the Code

Running Individual Scripts

Navigate to any chapter directory and run the Python scripts:

# Example: Running Linear Regression from Chapter 2
python Chapter2/LinearRegression.py

# Example: Running XGBoost from Chapter 4
python Chapter4/XGBoost.py

# Example: Running K-Means from Chapter 6
python Chapter6/Kmeans.py

Running from Project Root

You can also run scripts from the project root:

python Chapter3/NeuralNetwork.py
python Chapter7/UMAP.py
python Chapter10/IsolationForestAnomaly.py

Dependencies

The project uses the following main libraries:

  • Core Libraries: NumPy, Pandas, SciPy
  • Visualization: Matplotlib, Seaborn
  • Machine Learning: Scikit-learn
  • Gradient Boosting: XGBoost, LightGBM, CatBoost
  • Deep Learning: TensorFlow/Keras
  • Dimensionality Reduction: UMAP
  • Text Processing: NLTK, BeautifulSoup4
  • Statistical Testing: diptest

See requirements.txt for complete list with version specifications.

Deactivating Virtual Environment

When you're done working, deactivate the virtual environment:

deactivate

Troubleshooting

TensorFlow Installation Issues

If you encounter issues installing TensorFlow:

  • On macOS with Apple Silicon (M1/M2): Use tensorflow-macos
  • Consider using a specific version: pip install tensorflow==2.13.0

XGBoost/LightGBM Installation Issues

These libraries may require additional system dependencies:

  • On macOS: brew install libomp
  • On Linux: sudo apt-get install libgomp1

NLTK Data Download Issues

If automatic NLTK downloads fail, manually download:

import nltk
nltk.download('all')

Notes

  • Some scripts generate visualizations that will display in popup windows
  • Generated plots may also be saved as PNG files in the project directory
  • Some scripts (especially deep learning models) may take time to train
  • Ensure sufficient disk space for datasets (some scripts download data automatically)

Project Structure

Ultimate-Machine-Learning-Algorithms/
├── Chapter2/          # Regression algorithms
├── Chapter3/          # Classification algorithms
├── Chapter4/          # Ensemble methods
├── Chapter5/          # Supervised ML Model evaluation
├── Chapter6/          # Clustering algorithms
├── Chapter7/          # Dimensionality reduction
├── Chapter8/          # Clustering evaluation
├── Chapter9/          # Recommender systems
├── Chapter10/         # Anomaly detection
├── Chapter11/         # Spam Classification
├── requirements.txt   # Python dependencies
└── README.md         # This file

License

Published by Orange, AVA™

About

Ultimate Machine Learning Algorithms with Python, published by Orange, AVA™

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages