# India Start-ups Ecosystem Data Analysis

**Business Understanding**      

**Objective**

This is India start-ups ecosystem Data analysis project, this project will focus on investigating the India start-ups ecosystem, propose the best course of action and analyze the funding received by start-ups in India from 2018 to 2021, using the provided data set gotten from three different source namely Database, One drive and GitHub Repository. This Analysis will provide valuable insights and inform various stakeholders, including entrepreneurs, investors, policymakers, and researchers to track the growth rate of startups over time, and Identify emerging sectors and industries with high growth potential as well as analyze trends in funding, mergers, and acquisitions among other insights that will be revels by the Analysis. 

**Hypothesis Statements**

**Null Hypothesis (H0):**
Startups with multiple founders tend to raise significantly more money than those with a single founder.

**Alternative Hypothesis (H1):**
Startups with multiple founders do not raise significantly more money than those with a single founder.


**Uderstanding each column in the data set**

**1.Company_Brand:** The name of the company.

**2.Founded:** The year in which the company was founded.

**3.HeadQuarter:** The city in which the company Haedquarter is located.

**4.Sector:** The sector in which the company operates.

**5.What_it_does:** What the company does and in to.

**6.Founders:** The number of founders in the company.

**7.Investor:** The number of investors in the company.

**8.Amount:** The total funding amount received.

**9.Stage:** The stage in which the company is in e.g (Pre-seed, Seed,Series C)


Having gain deep understanding of each column in the data set this will help us come up with some analytical questions that can be answered using the data set.

**Analytical questions;**

Question 1: What is the average funding amount for startups based in different cities?

Question 2: How does the total investment compare across different sectors?

Which sectors have received the most and least funding?

Question 3: What is the distribution of investment amounts across different stages of funding?

How does the funding amount differ between Seed, Pre-series, Series A, Series B, etc.?

Question 4: Is there a significant difference in the amount of funding received by companies with single founders versus multiple founders?

What is the average funding for single-founder startups compared to multi-founder startups?

Question 5: How does the age of the company (years since founding) relate to the stage of funding?

Are newer companies more likely to be in the Seed or Pre-series stage?

Question 6: What is the distribution of funding amounts within specific sectors?

For example, within HealthTech or FinTech, how are the investments distributed?

Question 7: What are the common investors in different sectors and stages?

Are there any investors that frequently appear across multiple companies or sectors?

Question 8: How many companies are in each stage of funding?

What is the proportion of companies in Seed, Pre-series, Series A, Series B, etc.?

Question 9: How does the number of founders correlate with the stage of funding?

Are there more single-founder companies in the early stages of funding compared to later stages?

**Data Understanding**




In [32]:
#Install pyodbc and python-dotenv
#%pip install pyodbc  
#%pip install python-dotenv 

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import the neccessary libraries and packages.
import pyodbc 
from dotenv import dotenv_values 
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

warnings.filterwarnings('ignore')

## Create .env file 

The sensitive nature of the database credentials where to get the first dataset from requires that i hide the information such as username, password etc. from public hence need to create a .env file to hide the information

In [3]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Put the values of the credentials set in the "env" file 
database=environment_variables.get("DATABASE")
server=environment_variables.get("SERVER")
login=environment_variables.get("USERNAME")
password=environment_variables.get("PASSWORD")

# Using f strings to connect to the database.
connection_string=f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={login};PWD={password}"


In [5]:
connection = pyodbc.connect(connection_string)

In [6]:
# Let get the tables in the database be read into a dataframe.
query = ''' SELECT *
            FROM INFORMATION_SCHEMA.TABLES
            WHERE TABLE_TYPE = 'BASE TABLE' '''

database_tables=pd.read_sql(query,connection)
print(database_tables)


  TABLE_CATALOG TABLE_SCHEMA               TABLE_NAME  TABLE_TYPE
0         dapDB          dbo  LP1_startup_funding2021  BASE TABLE
1         dapDB          dbo  LP1_startup_funding2020  BASE TABLE


# Read the two dataset into pandas dataframe one after the other.

In [7]:
# import the start_up funding 2020 dataset from the database and name it, data_2020

query="Select * from dbo.LP1_startup_funding2020"
data_2020=pd.read_sql(query,connection)

data_2020.head(10)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,
5,qZense,2019.0,Bangalore,AgriTech,qZense Labs is building the next-generation Io...,"Rubal Chib, Dr Srishti Batra","Venture Catalysts, 9Unicorns Accelerator Fund",600000.0,Seed,
6,MyClassboard,2008.0,Hyderabad,EdTech,MyClassboard is a full-fledged School / Colleg...,Ajay Sakhamuri,ICICI Bank.,600000.0,Pre-series A,
7,Metvy,2018.0,Gurgaon,Networking platform,AI driven networking platform for individuals ...,Shawrya Mehrotra,HostelFund,,Pre-series,
8,Rupeek,2015.0,Bangalore,FinTech,Rupeek is an online lending platform that spec...,"Amar Prabhu, Ashwin Soni, Sumit Maniyar","KB Investment, Bertelsmann India Investments",45000000.0,Series C,
9,Gig India,2017.0,Pune,Crowdsourcing,GigIndia is a marketplace that provides on-dem...,"Aditya Shirole, Sahil Sharma","Shantanu Deshpande, Subramaniam Ramadorai",1000000.0,Pre-series A,


In [11]:
# Add the start_up funding year column to the 2020 dataset to indicate the funding year.
data_2020['year_collected']=2020
data_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10,year_collected
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,,2020
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,,2020
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,,2020


In [8]:
# import the start_up funding 2021 dataset from the database and name it, data_2021

query="Select * from dbo.LP1_startup_funding2021"
data_2021=pd.read_sql(query,connection)

data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [12]:
# Add the start_up funding year column to the 2021 dataset to indicate the funding year.
data_2021['year_collected']=2021
data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,year_collected
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021


Now that we have the data for 2020 and 2021. We move ahead to download the data for 2018 called startup_funding2018.csv, which is stored in a github repository.

Then downloaded 2019 dataset from OneDrive and read into pandas dataframe.

In [13]:
# Load the 2018 dataset to dataframe using read_csv as it is csv file
data_2018=pd.read_csv(r'C:\Users\USER\Desktop\indian start-up\Indian-start-up-ecosystem-Analysis\Data\startup_funding2018.csv')
data_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [14]:
# Add the start_up funding year column to the 2018 dataset to indicate the funding year.
data_2018['year_collected']=2018

data_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,year_collected
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018


In [10]:
# Load the 2018 dataset to dataframe using read_csv as it is also csv file.
data_2019=pd.read_csv(r'C:\Users\USER\Desktop\indian start-up\Indian-start-up-ecosystem-Analysis\Data\startup_funding2019.csv')
data_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [15]:
# Add the start_up funding year column to the 2018 dataset to indicate the funding year.
data_2019['year_collected']=2019
data_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,year_collected
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",,2019


In [17]:
# First, rename all columns in each dataframe to conform to the 2021 dataset for easy concatenation.

data_2018.columns=['Company_Brand','Sector','Stage', 'Amount', 'HeadQuarter', 'What_it_does','year_collected']

data_2019.columns=['Company_Brand','Founded','HeadQuarter','Sector','What_it_does','Founders','Investor', 'Amount', 'Stage', 'year_collected']

# Now, concatenate all the dataframes

df = pd.concat([data_2021, data_2020, data_2019, data_2018], axis=0)

In [18]:
df.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,year_collected,column10
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021,
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021,
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021,
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021,
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021,
