This repository contains the Python code, data, and presentation for a group project completed as part of the MScBA in Business Analytics & Management program at the Erasmus University Rotterdam. The project, undertaken for the Business Analytics Applications with Python course (February - April 2024), analyzed investment rounds and acquisitions of real startups from 1995 to 2014. Our goal was to develop a model to predict their total funding.
Our project was comprised of two main stages:
- Data Engineering and Exploration: We cleaned and prepared the data for analysis, identified key trends and patterns, and created visualizations to communicate our findings effectively.
- Predictive Machine Learning: We built and compared various models, including K-Nearest Neighbors (K-NN), Lasso Regression, Random Forests, and LightGBM, to predict total startup funding as accurately as possible.
This repository contains the following files:
- Project data.7z: This compressed file contains the original datasets used in the project. Decompressing it is necessary to run the code.
- question1.ipynb: This Jupyter Notebook focuses on data engineering and visualization tasks.
- question2.ipynb: This Jupyter Notebook explores the different modeling techniques used in the project.
- Presentation.pptx: This presentation summarizes the findings of our data analysis, including methodology, results, and conclusions.
Our project culminated in several robust predictive models, with Lasso regression emerging as the top performer on our evaluation metrics, supported by rigorous data engineering and exploration.