Welcome to ProjectPro's Data Science Projects Repository! This repository is a collaborative space for data developers interested in exploring and contributing to a wide range of data science projects. These projects span various domains—from forecasting and classification to recommendation systems and anomaly detection—providing real-world examples that help you build your data science skillset.
Data science isn't just about theory—it’s about building, experimenting, and solving real-world business problems. We all know hands-on projects are the fastest way to learn new tools and technologies, sharpen your skills, and build a portfolio that stands out. This data science repository is a goldmine of practical, enterprise-grade projects covering everything from machine learning, deep learning, computer vision, and NLP to big data and MLOps. Each project is designed to help you tackle real-world challenges, with step-by-step guidance, datasets, and best practices to ensure you not only understand the concepts but can apply them effectively.
- Beginner to Advanced Data Science Projects across various verticals
- Real-world Datasets and detailed explanations to guide your learning
- Code implementation in Python, SQL, TensorFlow, and more
- Industry use cases in finance, healthcare, e-commerce, and beyond
If you’re serious about breaking into data science or mastering advanced techniques, these projects are all you need to build. All data science project solutions can be found here: ProjectPro Data Science Projects
Below is a curated list of data science projects included in the ProjectPro repository. Each project includes a sample dataset or reference, along with documentation to help you get started quickly and experiment with real-world scenarios.
Sl No. | Project Name | Category/Focus | Description |
---|---|---|---|
1 | Walmart Sales Forecast | Forecasting | Predict Walmart sales using historical data and time series analysis. |
2 | Card Default Predictor | Classification | Predict credit card defaults using ML techniques on transactional data. |
3 | BigMart Sales Forecast | Forecasting | Forecast sales at BigMart using store and product features. |
4 | Insurance Claims Severity | Regression/Ensemble | Predict insurance claims severity using ensemble methods. |
5 | House Price Predictor | Regression | Predict house prices using ML techniques in Python. |
6 | Music Recommender System | Recommendation Systems | Build a music recommender using KKBox's dataset and collaborative filtering. |
7 | Plant Species Classifier | Image Classification | Create a CNN-based classifier for plant species identification. |
8 | Avocado Price Predictor | Regression | Predict avocado prices using historical pricing data and ML models. |
9 | Insurance Pricing Forecast | Regression | Forecast insurance pricing using an XGBoost regressor. |
10 | Loan Eligibility Predictor | Classification | Predict loan eligibility using a gradient boosting classifier. |
11 | Churn Prediction Using Ensemble Models | Classification/Ensemble | Analyze customer churn using ensemble techniques. |
12 | Topic Modeling using KMeans | Clustering/NLP | Group customer reviews using KMeans clustering for topic modeling. |
13 | Card Default Prediction | Classification | Predict credit card defaults with advanced ML techniques. |
14 | Expedia Hotel Recommendation System | Recommendation Systems | Generate hotel recommendations using collaborative filtering and NLP. |
15 | Rossmann Sales Forecast | Forecasting | Forecast Rossmann store sales using time series analysis and ML. |
16 | Credit Fraud Detector | Classification | Detect fraudulent transactions using anomaly detection and classification methods. |
17 | Personalized Cancer Therapy | Healthcare/Personalization | Personalize cancer treatment using patient data and ML insights. |
18 | Telecom Churn Predictor | Classification | Predict telecom churn using logistic regression. |
19 | Loan Eligibility Prediction | Classification | Build a loan eligibility model using H2O.ai in Python. |
20 | Census Income Prediction | Classification | Predict adult census income using the Census Income dataset. |
21 | Ecommerce Product Reviews Analysis | Sentiment Analysis/Ranking | Analyze product reviews using pairwise ranking and sentiment analysis. |
22 | Retail Price Optimization | Optimization/Regression | Optimize retail pricing using historical sales data and ML models. |
23 | Market Basket Analysis | Association Rules | Discover frequent item sets using Apriori and FPGrowth algorithms. |
24 | Driver Demand Forecasting | Time Series Forecasting | Forecast driver availability using multistep time series analysis. |
25 | Collaborative Recommender System | Recommendation Systems | Build a collaborative filtering recommender system in Python. |
26 | Similar Images Finder | Computer Vision | Develop a tool to find similar images using Keras and TensorFlow. |
27 | Mask R-CNN Segmentation | Image Processing | Implement Mask R-CNN with TensorFlow for accurate image segmentation. |
28 | Face Recognition System | Computer Vision | Build a face recognition system in Python using FaceNet. |
29 | Business KPI Forecasting | Forecasting | Forecast key business KPIs using TensorFlow and Python. |
30 | Real-Time Fruit Detection | Deep Learning | Detect fruits in real time using YOLOv4. |
31 | Scaling ML usingFEAST Feature Store | Classification | A beginner project using FEAST for scaling ML. |
32 | GCP Mask R-CNN MLOps | MLOps/Computer Vision | Deploy Mask R-CNN on GCP with uWSGI and Flask. |
33 | Rule Based Recommender System | Recommendation Systems | Build a rule-based recommender system in Python. |
34 | Recommender System using Association Rule Mining | Recommendation Systems | Build a recommender system for market basket analysis using association rule mining.. |
35 | Multi-Class Image Classifier | Image Classification | Build a CNN for multi-class image classification. |
36 | Ola Bike Demand Forecast | Time Series Forecasting | Forecast ride demand for Ola Bike using time series analysis. |
37 | Medical Image Segmentation | Image Processing/Healthcare | Apply deep learning for accurate medical image segmentation. |
38 | MNIST Digit Recognition | Classification | Use CNNs to recognize handwritten digits from the MNIST dataset. |
39 | LSH Look-Alike Modeling | NLP/Similarity | Implement locality sensitive hashing to identify look-alike items. |
40 | Custom OCR System | Computer Vision/NLP | Build an OCR system from scratch using YOLO and Tesseract. |
41 | Multi-Touch Attribution Model | Marketing Analytics | Develop a model to understand multi-touch attribution in marketing. |
42 | OpenCV Beginner Level Project | Computer Vision | Learn computer vision basics with OpenCV. |
43 | Banking Classification Model | Classification/Finance | Build classification algorithms for digital transformation in banking. |
44 | Advanced OpenCV Project | Computer Vision | Master advanced OpenCV concepts for computer vision tasks. |
45 | ARMA Time Series Model | Time Series Analysis | Build autoregressive and moving average models for time series forecasting. |
46 | ARIMA Forecasting Model | Time Series Forecasting | Build an ARIMA model for forecasting time series data. |
47 | AWS Topic Modeling MLOps | MLOps/NLP | Deploy a topic modeling pipeline on AWS using Gunicorn and Flask. |
48 | MLR Time Series Model | Regression/Forecasting | Build a multiple linear regression model for time series forecasting. |
49 | Predict License Status | Classification | Implement various ensemble techniques to predict license status for a given business. |
50 | Logistic Regression Scratch | Classification | Build a logistic regression model in Python from scratch. |
51 | Resume Parsing Deployment | NLP/Model Deployment | Deploy a resume parsing model on GCP using Streamlit. |
52 | ARCH/GARCH Time Series Models | Time Series Analysis | Build ARCH and GARCH models for volatility forecasting. |
53 | Flask ML Deployment | Model Deployment | Deploy ML models using Flask for a beginner-friendly introduction. |
54 | Azure Medical Text Analytics | NLP/Cloud Deployment | Deploy Azure Text Analytics for a medical search engine. |
55 | AWS ARCH/GARCH MLOps | MLOps/Time Series | Deploy ARCH and GARCH models on AWS with MLOps practices. |
56 | TF Transfer Learning Classifier | Image Classification | Use TensorFlow transfer learning for image classification. |
57 | GCP Loan Eligibility Predictor | Classification | Predict loan eligibility on GCP using advanced ML techniques. |
58 | Anomaly Detection with LOF | Anomaly Detection | Detect anomalies using Isolation Forest and LOF in Python. |
59 | Build a Content Based Recommender System | Recommendation Systems | Building a Content-Based Product Recommender App with Streamlit |
60 | GCP ARIMA MLOps Deployment | MLOps/Time Series | Deploy an ARIMA model on GCP using uWSGI and Flask. |
61 | Build an Optimal MLOps Pipeline | MLOps | Build an optimal end-to-end MLOps pipeline on GCP. |
62 | Transformer BART Summarizer | NLP/Model Deployment | Deploy a Transformer BART model on GCP for text summarization. |
63 | Gaussian Process Time Series | Time Series Forecasting | Build time series models using Gaussian Processes in Python. |
64 | AWS MLR MLOps Deployment | Regression/MLOps | Deploy a multiple linear regression model on AWS with MLOps. |
65 | GCP Kubeflow MLOps | MLOps | Deploy ML models on GCP using Kubeflow for scalable production. |
66 | AWS Gausiann Process Time Series MLOps | MLOps/Time Series | Deploy Gaussian Process time series models on AWS. |
67 | AWS Churn Prediction Deployment | Model Deployment | Deploy a customer churn prediction model on AWS. |
68 | Detectron2 Object Detection | Computer Vision | Implement Detectron2 for advanced object detection and segmentation. |
69 | Build a Simple Linear Regression Model | Regression | A beginner-friendly project on linear regression in Python. |
70 | Deep Time Series Forecasting | Deep Learning | Build a deep learning model for time series forecasting in Python. |
71 | Graph Recommender System | Recommendation Systems | Develop a graph-based recommendation system in Python. |
72 | AWS Banking Classification MLOps | MLOps/Finance | Deploy a classification model for banking on AWS. |
73 | Portfolio Optimization in R | Finance/Optimization | Build portfolio optimization models in R using ML-driven strategies. |
74 | Object Tracking with OpenCV | Computer Vision | Learn single and multi-object tracking using OpenCV and Python. |
75 | Paperspace BART Model Deployment | NLP/Cloud Deployment | Deploy a Transformer-BART model for text summarization on Paperspace Cloud. |
76 | Saturn Cloud Multi-Class Classifier | Classification/Cloud | Build a multi-class classification model on Saturn Cloud. |
77 | Ensemble Churn Predictor | Classification | Build a churn prediction model using ensemble learning techniques. |
78 | Azure DevOps Classification MLOps | MLOps/Classification | Deploy a classification model using Azure DevOps CI/CD pipelines. |
79 | Build a Multiple Linear Regression Model | Regression | Build a Multiple Linear Regression Model on Soccer Player Dataset. |
80 | Polynomial Regression Model | Regression | Learn to build a polynomial regression model from scratch. |
81 | PyCaret ML App Deployment | Automated ML/Deployment | Build and deploy an ML app using PyCaret and Streamlit. |
82 | Causal Inference in ML | Causal Inference | Explore causal inference techniques to determine cause-and-effect relationships in data. |
83 | Real Estate Price Predictor | Regression/NLP/Deployment | Build a real estate price prediction model with NLP and FastAPI. |
84 | Decision Tree Churn Predictor | Classification | Build a churn prediction model using decision trees. |
85 | Collaborative Filtering Recommender System | Recommendation Systems | Comparison of different model based and memory based methods to build recommendation system using collaborative filtering. |
86 | Paperspace Resume Parser MLOps | MLOps/NLP | Deploy a resume parser model with an end-to-end MLOps workflow on Paperspace. |
87 | NumPy Regression Models | Regression | Build linear, ridge, and lasso regression models in NumPy from scratch. |
88 | Deploy RNN CNN models for TimeSeries on Azure | Deep Learning/Cloud | Deploy RNN/CNN models for time series forecasting on Azure. |
89 | Time Series Analysis with Prophet & Cesium | Time Series Forecasting | Analyze time series data using Facebook Prophet and Cesium. |
90 | Hybrid Recommender System | Recommendation Systems | Build a hybrid recommender system using LightFM. |
91 | House Price Regression Models | Regression | Implement regression models for predicting house prices. |
92 | Graph Recommendation System | Recommendation Systems | Build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search. |
93 | Purchase Propensity Predictor | Classification | Predict customer propensity to purchase using ML techniques. |
94 | Elevator Failure Classifier | Time Series Classification | Predict elevator failures using time series classification. |
95 | Regression Discontinuity Design | Causal Inference | Apply regression discontinuity design to evaluate causal impacts in data. |
96 | AWS SageMaker Classifier | Model Deployment/Cloud | Build and deploy a classification model on AWS SageMaker. |
97 | End-to-End ML Monitoring | MLOps | Implement an ML monitoring pipeline using Airflow and Docker. |
98 | AWS LSTM Deployment | Deep Learning/Time Series | Build and deploy an LSTM model on AWS SageMaker for forecasting tasks. |
99 | End-to-End Snowflake Healthcare Analytics Project on AWS | Healthcare/Data Engineering | Execute healthcare analytics using Snowflake and AWS – Part 1. |
100 | End-to-End Snowflake Healthcare Analytics Project on AWS Part 2 | Healthcare/Data Engineering | Continue building advanced healthcare analytics pipelines – Part 2. |