Data Mining Algorithms From Scratch

This repository contains "from scratch" Python implementations of fundamental data mining and machine learning algorithms. The code is intentionally kept simple and easy to understand, designed for educational purposes and as a preparation resource for my Data Warehousing and Mining (DWM) lab exam.

Each implementation is self-contained, heavily commented, and uses only basic Python libraries.

📚 Algorithms Included

This project includes from-scratch implementations of the following algorithms:

1. K-Means Clustering
- File: k_means.py
- Purpose: A popular partitioning algorithm that groups data points into $k$ clusters, where each point belongs to the cluster with the nearest mean (centroid).
- Distance: Uses Euclidean Distance.
2. K-Medoids Clustering (PAM)
- File: k_medoids.py
- Purpose: A variation of K-Means that is more robust to outliers because it uses an actual data point (medoid) as the cluster center.
- Distance: Uses Manhattan Distance.
3. Naive Bayes (Categorical)
- File: naive_bayes.py
- Purpose: A probabilistic classifier based on Bayes' theorem with the "naive" assumption of feature independence. This implementation is designed for categorical data (e.g., "Sunny," "Hot").
- Features: Includes Laplace (add-1) smoothing to handle zero-probability cases.
4. Apriori Algorithm
- File: apriori.py
- Purpose: The classic algorithm for Association Rule Mining. It discovers frequent itemsets in a transactional dataset (e.g., "items frequently bought together").
- Logic: Implements the L(k-1) -> C(k) -> L(k) (Join & Prune) cycle.
5. PageRank
- File: pagerank.py
- Purpose: An algorithm that measures the importance of nodes in a graph. It's famously used by Google to rank web pages.
- Features: Includes the damping factor and proper handling for dangling nodes.

🎯 Project Goal

The primary goal of this repository is not to create optimized, production-ready code. Instead, the focus is on clarity and readability. Each file is heavily commented to explain the core logic, step-by-step, making it an effective study guide for understanding how these algorithms work internally.

🚀 How to Use

Each algorithm is a standalone Python script. They include a simple, static dataset directly in the file (under the if __name__ == "__main__": block) for demonstration.

To run any of the algorithms, simply execute the file using Python:

# Example for K-Means
python k_means.py

# Example for Apriori
python apriori.py

# Example for Naive Bayes
python naive_bayes.py

You can modify the dataset variable within any of the files to test the algorithms with your own simple data.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Practicecode		Practicecode
README.md		README.md
apriorialgo.py		apriorialgo.py
aprioribrief.py		aprioribrief.py
kmeans.py		kmeans.py
kmeansdirect.py		kmeansdirect.py
kmediode.py		kmediode.py
kmediodebrief.py		kmediodebrief.py
naivebayes.py		naivebayes.py
naivebayesbrief.py		naivebayesbrief.py
pagerank.py		pagerank.py
pagerankbrief.py		pagerankbrief.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Mining Algorithms From Scratch

📚 Algorithms Included

🎯 Project Goal

🚀 How to Use

About

Uh oh!

Releases

Packages

Languages

Soham-droid-pixel/Data-Mining-Algorithms

Folders and files

Latest commit

History

Repository files navigation

Data Mining Algorithms From Scratch

📚 Algorithms Included

🎯 Project Goal

🚀 How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages