Skip to content

Wkyne/core-datascience-models

Repository files navigation

core-datascience-models

A structured collection of machine learning notebooks covering core algorithms from supervised and unsupervised learning to NLP and time series forecasting. Each notebook includes line-by-line comments explaining both the code and the underlying concepts, with references where applicable.


Repository Structure

core-datascience-models/
├── Supervised_Learning/
│   ├── DecisionTree_CART_Classification.ipynb
│   ├── NaiveBayes_FromScratch_SpamDetection.ipynb
│   ├── NaiveBayes_TestScript_Iris.ipynb
│   ├── SVM_AirQuality_Pollution.ipynb
│   ├── SVM_Classification_Iris.ipynb
│   └── SVM_HeartAttack.ipynb
├── Unsupervised_Learning/
│   └── Clustering_AssociationRuleMining.ipynb
├── Regression_and_Neural_Networks/
│   ├── LogisticRegression_PoissonRegression_AdultIncome.ipynb
│   └── NeuralNetwork_Keras_LoanPrediction.ipynb
├── NLP/
│   └── SentimentAnalysis_NLP_Tweets.ipynb
├── Time_Series/
│   └── TimeSeries_ARIMA_Forecasting.ipynb
└── data_mining_practice/
    ├── Chapter_01_Data_Loading.ipynb
    ├── Chapter_03_Data_Preparation.ipynb
    ├── Chapter_04_EDA.ipynb
    ├── Chapter_05.ipynb ... Chapter_13.ipynb
    ├── SA01_BankMarketing.ipynb
    └── SA02_ChurnPrediction.ipynb

Notebooks

Supervised Learning

Notebook Algorithm Dataset Core Concept
DecisionTree_CART_Classification Decision Tree (CART) Custom Recursive splitting, information gain, Gini impurity
NaiveBayes_FromScratch_SpamDetection Naive Bayes Spambase (UCI) Built from scratch without sklearn — Gaussian likelihood, prior/posterior calculation
NaiveBayes_TestScript_Iris Naive Bayes Iris Manual accuracy function, model evaluation
SVM_Classification_Iris SVM Iris Hyperplane optimization, kernel tricks
SVM_AirQuality_Pollution SVM Air Quality / Pollution Seaborn pair plots, margin analysis on real-world data
SVM_HeartAttack SVM Heart Attack Risk (PH) Feature exploration, classification on health data

Unsupervised Learning

Notebook Algorithm Dataset Core Concept
Clustering_AssociationRuleMining K-Means, Association Rules Custom Centroid clustering, Apriori, support/confidence/lift

Regression and Neural Networks

Notebook Algorithm Dataset Core Concept
LogisticRegression_PoissonRegression_AdultIncome Logistic Regression, Poisson Regression Adult Income (UCI) MLE via statsmodels, multicollinearity check, model comparison
NeuralNetwork_Keras_LoanPrediction Feedforward Neural Network Loans (49,698 records) Keras Sequential API, ReLU + Sigmoid activations, forward propagation

NLP

Notebook Algorithm Dataset Core Concept
SentimentAnalysis_NLP_Tweets Sentiment Analysis Kaggle Tweet Dataset Text preprocessing, emoticon-based labeling, binary classification

Time Series

Notebook Algorithm Dataset Core Concept
TimeSeries_ARIMA_Forecasting ARIMA Custom pmdarima auto-ARIMA, stationarity testing, forecast evaluation

Data Mining Practice (data_mining_practice/)

Textbook exercises following a structured data mining workflow — from data loading and preparation through EDA, modeling, and evaluation. Uses the Bank Marketing and Churn datasets from the UCI Machine Learning Repository.


Tools and Libraries

  • Python 3.11
  • scikit-learn — model training and evaluation
  • statsmodels — logistic and Poisson regression
  • keras / tensorflow — neural network construction
  • pandas, numpy — data manipulation
  • matplotlib, seaborn — visualization
  • pmdarima — auto-ARIMA for time series

Notes

  • All notebooks were developed and run on Google Colab.
  • The NaiveBayes_FromScratch notebook implements the algorithm without sklearn to demonstrate understanding of the underlying math.
  • Datasets used are either publicly available (UCI ML Repository, Kaggle) or course-provided.

About

A curated collection of Jupyter Notebooks containing implementations and evaluations of foundational machine learning algorithms and data science techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors