Skip to content
View Veto2922's full-sized avatar
Block or Report

Block or report Veto2922

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Veto2922/README.md

Hi there 👋

Data Science Portfolio

End to End Projects

  • The Web Application Firewall (WAF) project is aimed at developing a robust and comprehensive solution to safeguard web applications against a wide range of security threats. The project incorporates machine learning techniques for threat detection, real-time monitoring, and response mechanisms to protect web applications from common attacks such as Cross-Site Scripting (XSS), SQL Injection (SQLI), Command Injection (CMDI), and Path Traversal (PATHT).

  • Project Paper: Project Paper Link

  • Project Video: Project Video Link

  • Project Website: Project Website Link

  • This application helps you find the most suitable job based on your technological skills. Additionally, it suggests skills to develop further in your chosen field.
  • project steps : Features selection ,Data cleaning ,Feature Engineering ,EDA ,Data preprocessing ,Modeling ,prediction pipeline ,Deployment
  • Project article: Medium
  • project web site : SteamLit
  • The Hybrid-based Book Recommendation System encompasses an end-to-end development process, incorporating data exploration, cleaning, and the implementation of both content-based and collaborative filtering recommendation systems using Python.
  • Project article: Medium
  • This project is dedicated to constructing an advanced movie recommendation system by leveraging popularity and content-based filtering. The model integrates diverse data features, such as genres, cast, crew, overview, release year, runtime, keywords, vote average, and director information, obtained from a comprehensive dataset of several thousand films. The recommendation system employs advanced techniques like cosine similarity and Euclidean distance to enhance the accuracy and personalization of movie suggestions.
  • project steps : Features selection ,Data cleaning ,Feature Engineering ,Data preprocessing , Similarity Matrix ,prediction app ,Deployment
  • project web site : SteamLit

Skill Based Projects

Data Analysis and Visualization:

  • Airbnb-Listing-EDA: This project involves performing an exploratory data analysis (EDA) on Airbnb listing data for a particular city. The analysis focuses on factors such as price, availability, location, and property type to identify trends and patterns in the demand for Airbnb listings in the city. The project includes data cleaning, visualization, and statistical analysis.

  • Electronics store sales EDA: In this data analysis project, I took on the challenge of exploring 12 months' worth of sales data for an electronics store. The dataset included a vast array of purchase information, including product types, costs, purchase addresses, and more.

  • Startup expansion: In this project, I worked with a comprehensive dataset that provided valuable insights into the dynamics of startup growth, covering key aspects such as location, marketing spending, and revenue. and I made a Dashboard and Report To ensure that the project's insights were easily digestible

  • tracking Maji Ndogo's water funds (Power BI): Our mission is to communicate with transparency: Where did the money go? We will track the total budget against project completion, monitor teams' performance, and compare budgeted versus actual costs to flag potential corruption, promoting transparency and accountability in addressing Maji Ndogo’s water crisis.


Machine Learning:

  • FastML With EDA python package (My own package) : FastML_With_EDA is a versatile Python package designed to simplify the machine learning pipeline, from exploratory data analysis (EDA) and preprocessing to automated machine learning (AutoML) model training and evaluation. Whether you're a beginner or an experienced data scientist, FastML provides the tools you need to streamline your workflow and make data-driven decisions.
  • I also created a WEB APP as GUI for this package

Regression


Classification


Clustering

  • Customer Segmentation for E commerce (KNN , PCA): This project focuses on customer segmentation using K-means clustering and PCA (Principal Component Analysis). The goal is to identify distinct groups of customers based on their purchasing behavior in an e-commerce dataset. Customer segmentation enables personalized marketing strategies and recommendations.

  • Detecting outliers using DBSCAN

  • PCA Manual Implementation: Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in machine learning and data analysis. This project provides step-by-step instructions for implementing PCA.


Recommendation System


Time series analysis and forecast

  • Monthly Retail Sales Forecasting: This project involves forecasting the monthly sales for clothing and clothing accessory stores using LSTM (Long Short-Term Memory) neural networks. The dataset consists of monthly sales data (in millions of dollars, not seasonally adjusted) retrieved from the FRED, Federal Reserve Bank of St. Louis.

  • Time series forecasting using LSTM


Deep Learning:


ANN

  • Tanserflow and keras regression: This project analyzes housing data using Pandas, NumPy, and TensorFlow & keras. It explores data distribution, geographical properties, and performs feature engineering. A neural network model is built, trained, and evaluated for predicting house prices, achieving insights and predictions.

  • Tanserflow and keras classification: The project conducts t-tests on each feature to assess its significance in predicting a binary target variable (breast cancer diagnosis). It then selects features with p-values below a significance level (0.05). The data is split into training and testing sets, scaled, and used to train a neural network model. And addresses overfitting by implementing early stopping and dropout techniques in the neural network model

  • LendingClub Loan Default Prediction: This project aims to build a Deep learning model to predict whether a borrower will repay their loan based on historical loan data from LendingClub. LendingClub is a peer-to-peer lending platform that facilitates loan origination by connecting borrowers with investors. The dataset includes various features such as loan amount, interest rate, borrower employment details, and credit history. The main target variable for prediction is loan_status, which indicates if a loan was "Fully Paid" or "Charged Off" (defaulted).


CNN

  • Image Classification with CNN and Transfer learning : This project utilizes a Convolutional Neural Network (CNN) with TensorFlow and Keras to classify cat and dog images. It demonstrates transfer learning with the VGG16 model, improving model performance and reducing training time. Additionally, it offers insights into interpreting model decisions, aiding in understanding how the CNN makes predictions based on learned features from the images.

  • Malaria Cell Image Classification: This project aims to classify cell images as either parasitized or uninfected using a Convolutional Neural Network (CNN). The dataset consists of cell images with labels indicating whether the cells are parasitized by the malaria parasite or not. The goal is to build and train a CNN model that can accurately classify the cell images.


RNN and LSTMs

  • Monthly Retail Sales Forecasting: This project involves forecasting the monthly sales for clothing and clothing accessory stores using LSTM (Long Short-Term Memory) neural networks. The dataset consists of monthly sales data (in millions of dollars, not seasonally adjusted) retrieved from the FRED, Federal Reserve Bank of St. Louis.

Time series forecasting using LSTM


SOMs

  • Self Organizing Map for Fraud Detection: implementation of a Self-Organizing Map (SOM) for detecting fraud in credit card applications. The SOM is trained on a dataset of credit card applications to learn patterns and identify potential outliers that may indicate fraudulent activities.

AutoEncoder

Natural Language Processing (NLP):


  • Text classification using TFIDF: The goal of this project is to develop a machine learning model that can accurately classify tweets as either related to real disasters or not.

  • Text Classification with GloVe and LSTM: This project employs LSTM neural networks for text classification on disaster-related messages. It preprocesses and tokenizes text data, utilizes pre-trained word embeddings, and trains the model with Keras. Finally, it evaluates the model's performance, generates predictions, and creates a submission file for Kaggle competition.

  • Text Generation with GRU: This code employs TensorFlow and Keras to build a text generation model. It preprocesses Shakespearean text data, creating sequences for training. The model architecture includes an Embedding layer and a GRU layer. After training, the model can generate text based on given starting seeds. This approach facilitates the generation of coherent text sequences.

Pinned

  1. Tech_Skills_Map-End-to-End-data-science-project- Tech_Skills_Map-End-to-End-data-science-project- Public

    Jupyter Notebook 2

  2. EXPLORE-Al-Integrated-project-Transparency-in-tracking-Maji-Ndogo-s-water-funds EXPLORE-Al-Integrated-project-Transparency-in-tracking-Maji-Ndogo-s-water-funds Public

    Our mission is to communicate with transparency: Where did the money go? We will track the total budget against project completion, monitor teams' performance, and compare budgeted versus actual c…

  3. Fast-Machine-Learning-With-EDA-python-package Fast-Machine-Learning-With-EDA-python-package Public

    Python

  4. My-data-analysis-projects My-data-analysis-projects Public

    This repo contains several data analysis projects I've worked on

    Jupyter Notebook

  5. Project-Customer-Segmentation-for-E-commerce Project-Customer-Segmentation-for-E-commerce Public

    Jupyter Notebook

  6. streamlit_app_for_FastML_With_EDA_python_package streamlit_app_for_FastML_With_EDA_python_package Public

    Python