Skip to content

This repository encompasses various techniques of Regression, Classification, Clustering, Dimensionality Reduction, Deep Learning, and Recommendation Systems.

Notifications You must be signed in to change notification settings

KAFSALAH/IBM_MachineLearning

Repository files navigation

Welcome to my IBM—ML Repository 😄

Python Jupyter Notebook Pandas NumPy scikit-learn Keras

This repository aims to build highly interpretable and accurate machine learning models that balance variance, bias, and time complexity. The Scikit-Learn framework is being used to build machine learning models and Keras for deep learning 💡

Courses

Moreover, the repository contains hands-on labs of 6 machine learning courses created by IBM, which cover in-depth and breadth numerous ML concepts.

Hands-on Labs: SQL, Hypothesis Testing, Features Transformation, Scaling, Skewness & Importance.

Hands-on Labs: Cross-Validation, Ridge, Lasso, ElasticNet, Pipelines.

Hands-on Labs: Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Decision Tree, Random Forrest, Extra Trees, Ensemble, Bagging, Boosting, Stacking, Model-Agnostic, Resampling Techniques.

Hands-on Labs: Principle Component Analysis, Distance Metrics, Inertia & Distortion, K-means, hierarchical, DBSCAN, Mean Shift Clustering.

Hands-on Labs: Gradient Descent, Backpropagation, Artificial NN, Convolutional NN, Recurrent NN.

Hands-on Labs: Bag of Words, User-Profile Recommendation, Similarity-Index Recommendation.

Capstone Projects

You are welcome to explore my findings in the personal capstone projects I created during my learning journey.

• Aim: predict the cost of medical treatments based on six features, namely, age, sex, BMI, children, smoking status, and region.

• Procedure: In-depth EDA via pair, bar, box, violin, and regression plots to see the effect of smoking on charges. Hypothesis testing on the relationship between treatment costs and smoking status.

• Findings: The test indicates that a person with a 35K$ charge or more is likely a smoker with a p-value = 0.023 and a confidence level = 0.977.

• Aim: create a regression model that predicts the generated power by PV panels to facilitate energy management in power plants.

• Procedure: Deploy a pipeline encompassing polynomial transformation, standard scaling, and regressor models. Then, apply GridSearchCV, hyper-parameters tuning and benchmarking of Regular, Lasso, Ridge, Elastic Net & Gradient Boosting Regressors.

• Findings: The winner is the Gradient Boosting Regressor model with an R2 score of ~ 0.79.

• Aim: Classify the faults that might occur in photovoltaic panels, namely, Short-Circuit, Open-Circuit, Degradation, and Shadowing.

• Procedure: Data stratified split, features scaling, and re-weighting the imbalanced classes. Then, apply a GridSearchCV, hyper-parameters tuning and benchmarking of Logistic Regression, Decision Tree, and Random Forrest.

• Findings: The winner is the Decision Tree algorithm with an accuracy and a weighted F1-score of ~ 97%.

• Aim: Cluster date fruits based on their physical features.

• Procedure: Check multicollinearity, scale data, and reduce the number of features via PCA. Then, apply a comparative analysis between K-means, Agglomerative, Mean Shift & DBSCAN clustering.

• Findings: The winner is the k-means++ technique. Also, an accuracy of 76% was scored with only two PCAs.

• Aim: Detect whether a patient has a brain tumor or not.

• Procedure: Convert images to a NumPy array and scale them. Build a convolutional network and train the CNN model to classify brain tumors. Then, deploy the deep learning model using Flask app.

• Findings: The CNN model accuracy is 97%.

• Aim: To build a recommendation system that recommends the most suitable courses for learners on educational platforms.

• Procedure: As listed in the findings, several techniques are used to build the recommendation system.

• Findings: The recommender system is created via eight approaches. Firstly, the content-based approaches.

Approach 1 - Content-Based Recommender Using User Profile and Course Genres

Approach 2 - Content-Based Recommender Using Course Similarities

Approach 3 - Content-Based Recommender Using PCA Clustering

• Findings: The remaining five approaches are collaborative-based. The comparison between them is based on RMSE.

Approach 4 - Collaborative-Filtering Recommender Using K Nearest Neighbor

Approach 5 - Collaborative-Filtering Recommender Using Non-negative Matrix Factorization

Approach 6 - Collaborative-Filtering Recommender Using Neural Networks

Approach 7 - Collaborative-Filtering Recommender Using Embedding Features Regression

Approach 8 - Collaborative-Filtering Recommender using Embedding Features Classification

Acknowledgment

My friend, Mohamad Osman's ML-Repo has been a great source of inspiration. I implore you to have a look at his remarkable work.

About

This repository encompasses various techniques of Regression, Classification, Clustering, Dimensionality Reduction, Deep Learning, and Recommendation Systems.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages