Data Science and Machine Learning projects

This repository showcases my journey through different domains, applying various machine learning models and techniques to solve real-world problems. Each project is documented, and includes both the code and a comprehensive analysis.

Projects overview:

1. Brain Stroke Classifier

A machine learning project focused on predicting the occurrence of brain strokes. The project applies Logistic Regression, K-Nearest Neighbors, and Random Forest models, including exploratory data analysis (EDA), feature engineering, model training, evaluation, comparison, and fine-tuning. Models used - Logistic Regression, K-Nearest Neighbors, Random Forest.

2. CO2 Emissions

This project investigates the factors affecting vehicle CO2 emissions in Canada and constructs a Random Forest regression model to predict the emissions based on various features. It includes feature engineering, EDA, visualization, hypothesis testing, model building, and fine-tuning. Model used: Random Forest.

3. Stock Price Prediction

Utilizing Long Short-Term Memory (LSTM) networks implemented with Keras, this project aims to predict Apple Inc.'s stock prices. The project encompasses data preprocessing, model building and training, evaluation, hyperparameter tuning, visualization, and model persistence. Model used: LSTM model implemented with Keras.

4. Mail Spam Classifier

A text classification project to identify spam emails. It explores different models, including TF-IDF Vectorization with Multinomial Naive Bayes, Support Vector Machine (SVM), and LSTM networks using TensorFlow and Keras, covering data preprocessing, visualization, model training, evaluation, fine-tuning, and testing on custom data. Models used: TF-IDF Vectorization + Multinomial Naive Bayes, SVM, TensorFlow and Keras:LSTM.

5. Medical Abstracts: Sequential Sentence Classification

Replicating and extending the work of Franck Dernoncourt and Ji Young Lee's 2017 study on sequential sentence classification in medical abstracts. It leverages the 'PubMed 200k RCT' dataset to explore and evaluate different natural language processing (NLP) models for structuring abstracts into coherent segments. Inspired by methodologies from and guided by the Zero to Mastery TensorFlow course, particularly the SkimLit project. The best model: TensorFlow model with character, token and positional embedding layers.

6. Dog Vision Project

A deep learning project utilizing Transfer Learning with TensorFlow to classify dog breeds. The model, based on mobilenet_v2_130_224 from TensorFlow Hub and trained on the ImageNet database, demonstrates the power of transfer learning in image classification tasks. Model used - mobilenet_v2_130_224 from TensorFlow Hub.
Note: The project implements methods used in TensorFlow 2.2. Model works, however, some parts of code might seem outdated. The project was build during completing the bootcamp. The bootcamp's project can be found here

7. Movie Genre Prediction

This project leverages the power of advanced machine learning models to classify movie plots into one or more genres (multi-label classification). Two primary models have been utilized and optimized for this task: LSTM (Long Short-Term Memory) and BERT (Bidirectional Encoder Representations from Transformers). I also tried to use data modification to handle prediction minority classes. However some techniques (translation) were not implemented in the training because of high time complexity. This project showed me that it is impossible to always achieve very high results and sometimes experiments do not lead to the expected outcome. I could not achieve high accuracy using the described models and techniques. However it was a useful experience in working with text data. The data is taken from Kaggle.

Dependencies

The projects use various libraries and frameworks, primarily:

TensorFlow (including Keras)
scikit-learn
pandas
numpy
matplotlib and seaborn for visualization

Contact and Feedback

For any inquiries or collaboration requests, feel free to contact me or open an issue.

License

These projects are open-sourced under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
brain-stroke-classiffier		brain-stroke-classiffier
co2-emissions-regression		co2-emissions-regression
mail-spam-classifier		mail-spam-classifier
stock-price-prediction		stock-price-prediction
LICENSE		LICENSE
MedAbstracts_sequent.ipynb		MedAbstracts_sequent.ipynb
README.md		README.md
dog-vision-project.ipynb		dog-vision-project.ipynb
movie_genre_prediction.ipynb		movie_genre_prediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science and Machine Learning projects

Projects overview:

1. Brain Stroke Classifier

2. CO2 Emissions

3. Stock Price Prediction

4. Mail Spam Classifier

5. Medical Abstracts: Sequential Sentence Classification

6. Dog Vision Project

7. Movie Genre Prediction

Dependencies

Contact and Feedback

License

About

Releases

Packages

Languages

License

AlexeyKarz/Machine-Learning-Projects

Folders and files

Latest commit

History

Repository files navigation

Data Science and Machine Learning projects

Projects overview:

1. Brain Stroke Classifier

2. CO2 Emissions

3. Stock Price Prediction

4. Mail Spam Classifier

5. Medical Abstracts: Sequential Sentence Classification

6. Dog Vision Project

7. Movie Genre Prediction

Dependencies

Contact and Feedback

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages