Skip to content

Interactive website illustrating ml concepts and providing playground. Enjoy :)

Notifications You must be signed in to change notification settings

MikheilKvizhinadze2001/ML_playground

Repository files navigation

Machine Learning playground, tools and tutorials!!!

Part 1, machine learning playground and object detection tools in streamlit!

This part contains demos of my streamlit website.

Credit card fraud detection demo

Dataset The dataset used is the Credit Card Fraud Detection dataset from Kaggle. It contains transactions made by European cardholders in September 2013. With 492 frauds out of 284,807 transactions, it presents a highly imbalanced class distribution. Features include transformed numerical variables and 'Time' and 'Amount' as non-transformed variables.

Methodology The project follows these steps:

  • Data Preprocessing: Cleaning and preprocessing the data.
  • Exploratory Data Analysis (EDA): Analyzing variable distributions and relationships.
  • Model Building: Training a suitable machine learning model.
  • Model Evaluation: Assessing model performance using appropriate metrics.
  • Model Interpretation: Utilizing techniques like SHAP for feature importance.
18.05.2024_11.45.32_REC.online-video-cutter.com.mp4

shap values of xgboost model used in this project

shap_values

Real-time object detection demo

18.05.2024_11.52.19_REC.mp4

Playground demo

This is a playground which will allow you to experiment with various machine learning algorithms and datasets. It allows you to select a dataset, visualize and investigate it, tweak parameters (and hyperparameters) and see the results in real-time. This is a great way to learn how machine learning algorithms work and how they can be applied to different datasets.

18.05.2024_11.53.29_REC.mp4

Video annotation tool demo

The tool offers the following functionality:

  • Video Upload: Users can upload a video file of their choice.
  • Annotation: The uploaded video is annotated with relevant information or markings.
  • Output: An annotated version of the video is generated and made available for download.

Video before:

before.online-video-cutter.com.mp4

Video after:

after.online-video-cutter.com.mp4

Part 2, tutorials

Directory called 'notebooks' contains several jupyter notebook files, feel free to check them out for more info, thanks!

This part contains various machine learning projects, each demonstrating different techniques and algorithms on distinct datasets. The projects include ensemble learning, K-Means clustering, rice image classification using CNNs, and cat vs dog image classification.

1. Ensemble Learning on MNIST Dataset Goal: Classify handwritten digits using ensemble learning techniques.

Steps Data Preparation: Load and split the MNIST dataset. Training Individual Classifiers: Train Random Forest, Extra Trees, SVM, and MLP classifiers. Voting Classifier: Combine classifiers into a voting classifier for majority vote predictions. Model Evaluation: Assess individual and ensemble model performance on validation and test sets. Improvement and Blending: Optimize the voting classifier by removing SVM, and create a blender and stacking classifier. Comparison: Compare the performance of different ensemble methods. Key Findings The stacking classifier outperformed other methods. Removing SVM improved the voting classifier’s performance. The blender was less effective due to various factors including individual prediction quality. Conclusion Ensemble methods enhance model performance, with the best approach varying by dataset and model specifics.

2. K-Means Clustering on Customer Dataset Goal: Implement K-Means clustering to segment customers.

Dataset: Customer Segmentation Dataset from Kaggle.

Implementation Library Implementation: Benchmark K-Means using libraries. Scratch Implementation: Manually implement K-Means, including centroid initialization, data point assignment, and centroid updates. Link to Dataset: https://www.kaggle.com/datasets/yasserh/customer-segmentation-dataset

3. Rice Image Classification using CNNs Goal: Classify rice images into five types using a Convolutional Neural Network (CNN).

Dataset: 75,000 rice images, with 15,000 images per class.

Steps Model Architecture: Three convolutional layers with max-pooling, followed by a fully connected layer and softmax output. Training: Use Adam optimizer and sparse categorical cross-entropy loss. Apply data augmentation techniques. Evaluation: Assess model accuracy on a test set and visualize training/validation loss and accuracy. Link to Dataset: https://www.kaggle.com/datasets/muratkokludataset/rice-image-dataset

4. Cat vs Dog Image Classification Goal: Classify images of cats and dogs using different models.

Steps Logistic Regression: Baseline model, inadequate for complex image data. Random Forest: More complex but prone to overfitting. Requires hyperparameter tuning. VGG16: Pre-trained CNN model, best performance, but underfits the data. Potential improvements with increased complexity and fine-tuning. Link to Dataset: https://www.kaggle.com/datasets/karakaggle/kaggle-cat-vs-dog-dataset

Conclusion

These projects collectively showcase various machine learning techniques applied to different datasets. They highlight the strengths and limitations of each method, emphasizing the importance of model selection and tuning based on specific tasks and datasets.

If you have any questions, contact me at mikheilkvizhinadze@gmail.com