Skip to content

A carefully curated selection of real-world data science projects designed for both beginners and seasoned practitioners.

Notifications You must be signed in to change notification settings

ProjectProRepo/Data-Science-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Science-Projects

Welcome to ProjectPro's Data Science Projects Repository! This repository is a collaborative space for data developers interested in exploring and contributing to a wide range of data science projects. These projects span various domains—from forecasting and classification to recommendation systems and anomaly detection—providing real-world examples that help you build your data science skillset.

Introduction

Data science isn't just about theory—it’s about building, experimenting, and solving real-world business problems. We all know hands-on projects are the fastest way to learn new tools and technologies, sharpen your skills, and build a portfolio that stands out. This data science repository is a goldmine of practical, enterprise-grade projects covering everything from machine learning, deep learning, computer vision, and NLP to big data and MLOps. Each project is designed to help you tackle real-world challenges, with step-by-step guidance, datasets, and best practices to ensure you not only understand the concepts but can apply them effectively.

Here’s What You’ll Get

  • Beginner to Advanced Data Science Projects across various verticals
  • Real-world Datasets and detailed explanations to guide your learning
  • Code implementation in Python, SQL, TensorFlow, and more
  • Industry use cases in finance, healthcare, e-commerce, and beyond

If you’re serious about breaking into data science or mastering advanced techniques, these projects are all you need to build. All data science project solutions can be found here: ProjectPro Data Science Projects

List of Best Data Science Projects on Github

Below is a curated list of data science projects included in the ProjectPro repository. Each project includes a sample dataset or reference, along with documentation to help you get started quickly and experiment with real-world scenarios.

Sl No. Project Name Category/Focus Description
1 Walmart Sales Forecast Forecasting Predict Walmart sales using historical data and time series analysis.
2 Card Default Predictor Classification Predict credit card defaults using ML techniques on transactional data.
3 BigMart Sales Forecast Forecasting Forecast sales at BigMart using store and product features.
4 Insurance Claims Severity Regression/Ensemble Predict insurance claims severity using ensemble methods.
5 House Price Predictor Regression Predict house prices using ML techniques in Python.
6 Music Recommender System Recommendation Systems Build a music recommender using KKBox's dataset and collaborative filtering.
7 Plant Species Classifier Image Classification Create a CNN-based classifier for plant species identification.
8 Avocado Price Predictor Regression Predict avocado prices using historical pricing data and ML models.
9 Insurance Pricing Forecast Regression Forecast insurance pricing using an XGBoost regressor.
10 Loan Eligibility Predictor Classification Predict loan eligibility using a gradient boosting classifier.
11 Churn Prediction Using Ensemble Models Classification/Ensemble Analyze customer churn using ensemble techniques.
12 Topic Modeling using KMeans Clustering/NLP Group customer reviews using KMeans clustering for topic modeling.
13 Card Default Prediction Classification Predict credit card defaults with advanced ML techniques.
14 Expedia Hotel Recommendation System Recommendation Systems Generate hotel recommendations using collaborative filtering and NLP.
15 Rossmann Sales Forecast Forecasting Forecast Rossmann store sales using time series analysis and ML.
16 Credit Fraud Detector Classification Detect fraudulent transactions using anomaly detection and classification methods.
17 Personalized Cancer Therapy Healthcare/Personalization Personalize cancer treatment using patient data and ML insights.
18 Telecom Churn Predictor Classification Predict telecom churn using logistic regression.
19 Loan Eligibility Prediction Classification Build a loan eligibility model using H2O.ai in Python.
20 Census Income Prediction Classification Predict adult census income using the Census Income dataset.
21 Ecommerce Product Reviews Analysis Sentiment Analysis/Ranking Analyze product reviews using pairwise ranking and sentiment analysis.
22 Retail Price Optimization Optimization/Regression Optimize retail pricing using historical sales data and ML models.
23 Market Basket Analysis Association Rules Discover frequent item sets using Apriori and FPGrowth algorithms.
24 Driver Demand Forecasting Time Series Forecasting Forecast driver availability using multistep time series analysis.
25 Collaborative Recommender System Recommendation Systems Build a collaborative filtering recommender system in Python.
26 Similar Images Finder Computer Vision Develop a tool to find similar images using Keras and TensorFlow.
27 Mask R-CNN Segmentation Image Processing Implement Mask R-CNN with TensorFlow for accurate image segmentation.
28 Face Recognition System Computer Vision Build a face recognition system in Python using FaceNet.
29 Business KPI Forecasting Forecasting Forecast key business KPIs using TensorFlow and Python.
30 Real-Time Fruit Detection Deep Learning Detect fruits in real time using YOLOv4.
31 Scaling ML usingFEAST Feature Store Classification A beginner project using FEAST for scaling ML.
32 GCP Mask R-CNN MLOps MLOps/Computer Vision Deploy Mask R-CNN on GCP with uWSGI and Flask.
33 Rule Based Recommender System Recommendation Systems Build a rule-based recommender system in Python.
34 Recommender System using Association Rule Mining Recommendation Systems Build a recommender system for market basket analysis using association rule mining..
35 Multi-Class Image Classifier Image Classification Build a CNN for multi-class image classification.
36 Ola Bike Demand Forecast Time Series Forecasting Forecast ride demand for Ola Bike using time series analysis.
37 Medical Image Segmentation Image Processing/Healthcare Apply deep learning for accurate medical image segmentation.
38 MNIST Digit Recognition Classification Use CNNs to recognize handwritten digits from the MNIST dataset.
39 LSH Look-Alike Modeling NLP/Similarity Implement locality sensitive hashing to identify look-alike items.
40 Custom OCR System Computer Vision/NLP Build an OCR system from scratch using YOLO and Tesseract.
41 Multi-Touch Attribution Model Marketing Analytics Develop a model to understand multi-touch attribution in marketing.
42 OpenCV Beginner Level Project Computer Vision Learn computer vision basics with OpenCV.
43 Banking Classification Model Classification/Finance Build classification algorithms for digital transformation in banking.
44 Advanced OpenCV Project Computer Vision Master advanced OpenCV concepts for computer vision tasks.
45 ARMA Time Series Model Time Series Analysis Build autoregressive and moving average models for time series forecasting.
46 ARIMA Forecasting Model Time Series Forecasting Build an ARIMA model for forecasting time series data.
47 AWS Topic Modeling MLOps MLOps/NLP Deploy a topic modeling pipeline on AWS using Gunicorn and Flask.
48 MLR Time Series Model Regression/Forecasting Build a multiple linear regression model for time series forecasting.
49 Predict License Status Classification Implement various ensemble techniques to predict license status for a given business.
50 Logistic Regression Scratch Classification Build a logistic regression model in Python from scratch.
51 Resume Parsing Deployment NLP/Model Deployment Deploy a resume parsing model on GCP using Streamlit.
52 ARCH/GARCH Time Series Models Time Series Analysis Build ARCH and GARCH models for volatility forecasting.
53 Flask ML Deployment Model Deployment Deploy ML models using Flask for a beginner-friendly introduction.
54 Azure Medical Text Analytics NLP/Cloud Deployment Deploy Azure Text Analytics for a medical search engine.
55 AWS ARCH/GARCH MLOps MLOps/Time Series Deploy ARCH and GARCH models on AWS with MLOps practices.
56 TF Transfer Learning Classifier Image Classification Use TensorFlow transfer learning for image classification.
57 GCP Loan Eligibility Predictor Classification Predict loan eligibility on GCP using advanced ML techniques.
58 Anomaly Detection with LOF Anomaly Detection Detect anomalies using Isolation Forest and LOF in Python.
59 Build a Content Based Recommender System Recommendation Systems Building a Content-Based Product Recommender App with Streamlit
60 GCP ARIMA MLOps Deployment MLOps/Time Series Deploy an ARIMA model on GCP using uWSGI and Flask.
61 Build an Optimal MLOps Pipeline MLOps Build an optimal end-to-end MLOps pipeline on GCP.
62 Transformer BART Summarizer NLP/Model Deployment Deploy a Transformer BART model on GCP for text summarization.
63 Gaussian Process Time Series Time Series Forecasting Build time series models using Gaussian Processes in Python.
64 AWS MLR MLOps Deployment Regression/MLOps Deploy a multiple linear regression model on AWS with MLOps.
65 GCP Kubeflow MLOps MLOps Deploy ML models on GCP using Kubeflow for scalable production.
66 AWS Gausiann Process Time Series MLOps MLOps/Time Series Deploy Gaussian Process time series models on AWS.
67 AWS Churn Prediction Deployment Model Deployment Deploy a customer churn prediction model on AWS.
68 Detectron2 Object Detection Computer Vision Implement Detectron2 for advanced object detection and segmentation.
69 Build a Simple Linear Regression Model Regression A beginner-friendly project on linear regression in Python.
70 Deep Time Series Forecasting Deep Learning Build a deep learning model for time series forecasting in Python.
71 Graph Recommender System Recommendation Systems Develop a graph-based recommendation system in Python.
72 AWS Banking Classification MLOps MLOps/Finance Deploy a classification model for banking on AWS.
73 Portfolio Optimization in R Finance/Optimization Build portfolio optimization models in R using ML-driven strategies.
74 Object Tracking with OpenCV Computer Vision Learn single and multi-object tracking using OpenCV and Python.
75 Paperspace BART Model Deployment NLP/Cloud Deployment Deploy a Transformer-BART model for text summarization on Paperspace Cloud.
76 Saturn Cloud Multi-Class Classifier Classification/Cloud Build a multi-class classification model on Saturn Cloud.
77 Ensemble Churn Predictor Classification Build a churn prediction model using ensemble learning techniques.
78 Azure DevOps Classification MLOps MLOps/Classification Deploy a classification model using Azure DevOps CI/CD pipelines.
79 Build a Multiple Linear Regression Model Regression Build a Multiple Linear Regression Model on Soccer Player Dataset.
80 Polynomial Regression Model Regression Learn to build a polynomial regression model from scratch.
81 PyCaret ML App Deployment Automated ML/Deployment Build and deploy an ML app using PyCaret and Streamlit.
82 Causal Inference in ML Causal Inference Explore causal inference techniques to determine cause-and-effect relationships in data.
83 Real Estate Price Predictor Regression/NLP/Deployment Build a real estate price prediction model with NLP and FastAPI.
84 Decision Tree Churn Predictor Classification Build a churn prediction model using decision trees.
85 Collaborative Filtering Recommender System Recommendation Systems Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.
86 Paperspace Resume Parser MLOps MLOps/NLP Deploy a resume parser model with an end-to-end MLOps workflow on Paperspace.
87 NumPy Regression Models Regression Build linear, ridge, and lasso regression models in NumPy from scratch.
88 Deploy RNN CNN models for TimeSeries on Azure Deep Learning/Cloud Deploy RNN/CNN models for time series forecasting on Azure.
89 Time Series Analysis with Prophet & Cesium Time Series Forecasting Analyze time series data using Facebook Prophet and Cesium.
90 Hybrid Recommender System Recommendation Systems Build a hybrid recommender system using LightFM.
91 House Price Regression Models Regression Implement regression models for predicting house prices.
92 Graph Recommendation System Recommendation Systems Build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.
93 Purchase Propensity Predictor Classification Predict customer propensity to purchase using ML techniques.
94 Elevator Failure Classifier Time Series Classification Predict elevator failures using time series classification.
95 Regression Discontinuity Design Causal Inference Apply regression discontinuity design to evaluate causal impacts in data.
96 AWS SageMaker Classifier Model Deployment/Cloud Build and deploy a classification model on AWS SageMaker.
97 End-to-End ML Monitoring MLOps Implement an ML monitoring pipeline using Airflow and Docker.
98 AWS LSTM Deployment Deep Learning/Time Series Build and deploy an LSTM model on AWS SageMaker for forecasting tasks.
99 End-to-End Snowflake Healthcare Analytics Project on AWS Healthcare/Data Engineering Execute healthcare analytics using Snowflake and AWS – Part 1.
100 End-to-End Snowflake Healthcare Analytics Project on AWS Part 2 Healthcare/Data Engineering Continue building advanced healthcare analytics pipelines – Part 2.

About

A carefully curated selection of real-world data science projects designed for both beginners and seasoned practitioners.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published