## INDIAN START-UP ECOSYSTEM FUNDING ANALYSIS  (2018-2021)

This is a project to under study Start-Up businesses in the Indian Ecosystem to better understand and make data driven suggestions to my entity as to whether or not it is a ripe spacce to invest in. It is a comprehensive study of data sets spanning four different years (2018 - 2021) with information gathered on What business, Who funds them, Where they are located among many others. All these insights are supposed to help us drive us to a point where we make a data informed decision on whether to invest or not, based off the informative suggestion i provide to my team as a Data Analyst.

# BUSINESS UNDERSTANDING 

Summary of Task:

This project involves analyzing the funding received by start-ups in India from 2018 to 2021. The goal is to investigate the Indian start-up ecosystem and propose strategic recommendations based on data-driven insights. The datasets are provided for each year, and the analysis will cover start-up details, funding amounts, and investors' information. Data is stored across various sources, and it is crucial to gather, clean, and analyze this data to derive meaningful insights.

Libraries and Packages:

pandas for data manipulation and analysis
numpy for numerical operations
pyodbc for database connectivity
sqlalchemy for database ORM (optional)
matplotlib and seaborn for data visualization
scikit-learn for machine learning (if applicable)
python-dotenv for managing environment variables
requests for handling HTTP requests (if needed)
os and pathlib for handling file paths and directories


# BUSINESS QUESTIONS

To be able to come to the goal of our project as has been stated above, we will need to ask some questions, business-wise that is going to aid point us in the right direction with our goal. The following are the few that I have posed to help me reach the goal of my project. 

1.What sectors have shown the highest growth in terms of funding received over the past four years?

2.What geographical regions within India have emerged as the primary hubs for startup activity and investment, and what factors contribute to their prominence?

3.Are there any notable differences in funding patterns between early-stage startups and more established companies?

4.Which sectors recieve the lowest level of funding and which sectors recieve the highest levels of funding in India and what factors contribute to this?

5.Which investors have more impact on startups over the years?

6.What are the key characteristics of startups that successfully secure funding, and how do they differ from those that struggle to attract investment?

# HYPOTHESIS 

Our project is going to be to test two possible outcomes, of which one is the case with regards to the funding given to these start-ups. 

Null Hypothesis, (Ho) : There is no significant difference in the amount of funding between startups in particular "location".

Alternative Hypothesis, (Ha): There is a significant difference in the amount of funding between startups in "Blocation".

In [1]:
%pip install pyodbc
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd 
import numpy as np
import pyodbc 
import matplotlib.pyplot as plt 
from dotenv import dotenv_values 
import warnings 

warnings.filterwarnings('ignore')

# CONNECTING, COLLECTING AND LOADING DATASETS  

Importing Data Set into VSO Environment 

1. Databases for 2020 & 2021

In [3]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')
 
# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("UID")
password = environment_variables.get("PWD")

connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"

In [4]:
# Use the connect method of the pyodbc library and pass in the connection string.
# This will connect to the server and might take a few seconds to be complete.
 
connection = pyodbc.connect(connection_string)

In [5]:
# sql query to pull up data 
db_query ="""SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' """

In [6]:
data = pd.read_sql(db_query,connection)

In [7]:
query = "Select * from dbo.LP1_startup_funding2020"
data20 = pd.read_sql(query, connection)
data20


Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,
...,...,...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,AI enabled marketplace that provides career gu...,Akshay Chaturvedi,"DSG Consumer Partners, Blume Ventures",1500000.0,,
1051,EpiFi,,,Fintech,It offers customers with a single interface fo...,"Sujith Narayanan, Sumit Gwalani","Sequoia India, Ribbit Capital",13200000.0,Seed Round,
1052,Purplle,2012.0,Mumbai,Cosmetics,Online makeup and beauty products retailer,"Manish Taneja, Rahul Dash",Verlinvest,8000000.0,,
1053,Shuttl,2015.0,Delhi,Transport,App based bus aggregator serice,"Amit Singh, Deepanshu Malviya",SIG Global India Fund LLP.,8043000.0,Series C,


In [8]:
# pulling up the first few rows of the data 
data20.head()


Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [9]:
# Finding out necessary information about the dataframe 
data20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.5+ KB


In [10]:
# finding out the structure of the dataframe 
# Number of rows and columns
data20.shape

(1055, 10)

In [11]:
# Creating a column to aid in identifying each data by the year they were added 
data20['Funding_Year'] = 2020

In [12]:
# changing the data type of the Funding year from foat64 to integer 
data20['Funding_Year'] = data20['Funding_Year'].astype(int)

In [13]:
data20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
 10  Funding_Year   1055 non-null   int32  
dtypes: float64(2), int32(1), object(8)
memory usage: 86.7+ KB


In [14]:
data20.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,,2020
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,,2020
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,,2020


In [15]:
# printing out dataframe column names 
print(data20.columns)

Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage', 'column10', 'Funding_Year'],
      dtype='object')


In [16]:

data = pd.read_sql(db_query,connection)

query1 = "Select * from dbo.LP1_startup_funding2021"
data21 = pd.read_sql(query1, connection)
data21


Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed
...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B


In [17]:
data21.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [18]:
data21.shape

(1209, 9)

In [19]:
data21.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [20]:
# Creating a column to aid in identifying each data by the year they were added 
data21['Funding_Year'] = 2021

In [21]:
data21

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021
...,...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A,2021
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D,2021
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C,2021
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B,2021


In [22]:
data21.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021


In [23]:
data21.shape

(1209, 10)

In [24]:
data21.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
 9   Funding_Year   1209 non-null   int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 94.6+ KB


In [25]:
# printing out dataframe column names 
print(data21.columns)

Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage', 'Funding_Year'],
      dtype='object')


Loading Data Sets Into VSO Environment 2 

2. Database Collection 2019

In [26]:
# Accessing database for 2019 from the One drive, now downloaded unto the local pc 
data19 = pd.read_csv('startup_funding2019.csv') 
data19

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",


In [27]:
data19.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [28]:
# Creating a column to aid in identifying each data by the year they were added 
data19['Funding_Year'] = 2019

In [29]:
data19.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Funding_Year
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",,2019


In [30]:
data19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
 9   Funding_Year   89 non-null     int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 7.1+ KB


In [31]:
data19.shape

(89, 10)

In [32]:
# printing out dataframe column names 
print(data19.columns)

Index(['Company/Brand', 'Founded', 'HeadQuarter', 'Sector', 'What it does',
       'Founders', 'Investor', 'Amount($)', 'Stage', 'Funding_Year'],
      dtype='object')


Loading Dataset into VSO Environment 

3. Data Collection 2018


In [33]:
# Accessing database for 2019 from the Github Repository, now downloaded unto the local pc 
data18 = pd.read_csv('startup_funding2018.csv') 
data18

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


In [34]:
data18.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [35]:
# Creating a column to aid in identifying each data by the year they were added 
data18['Funding_year'] = 2018
data18

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Funding_year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018
...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018


In [36]:
data18.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Funding_year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018


In [37]:
data18.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [38]:
data18.shape

(526, 7)

In [39]:
# Renaming some columns 
data20.rename(columns = {'Company_Brand' :'Company_Name'}, inplace =True)

data20.rename(columns = {'HeadQuarter': 'Location'}, inplace =True)

data20.head()

Unnamed: 0,Company_Name,Founded,Location,Sector,What_it_does,Founders,Investor,Amount,Stage,column10,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,,2020
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,,2020
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,,2020


In [40]:
# Selecting some specific columns from the Dataframe 
data20 = data20[['Company_Name', 'Founded','Location','Sector','Investor','Amount','Stage','Funding_Year']]
# these columns are deemed more important for the very analysis and comparisons I am loking forward to doing             
data20.head() 

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Angel investors,200000.0,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Venture Center,,Pre-seed,2020
3,NCOME,2020.0,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016.0,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,,2020


In [41]:
# Converting the funded column to numeric data
data20['Founded'] = pd.to_numeric(data20['Founded'], errors='coerce').convert_dtypes(int)

# EXPLORATORY DATA ANALYSIS 


In [42]:
data18.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [43]:
data18.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Funding_year,526.0,2018.0,0.0,2018.0,2018.0,2018.0,2018.0,2018.0


In [44]:
data18.shape

(526, 7)

In [45]:
# Checking for Missing Values in 2018 data 
print(data18.isna().sum())

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
Funding_year     0
dtype: int64


In [46]:
# Checking for duplicated vales in 2018 data 
print(data18.duplicated().sum())

1


In [47]:
data19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
 9   Funding_Year   89 non-null     int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 7.1+ KB


In [48]:
data19.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,60.0,2014.533333,2.937003,2004.0,2013.0,2015.0,2016.25,2019.0
Funding_Year,89.0,2019.0,0.0,2019.0,2019.0,2019.0,2019.0,2019.0


In [49]:
data19.shape

(89, 10)

In [50]:
# Checking for missing values in 2019 data
print(data19.isna().sum())

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
Funding_Year      0
dtype: int64


In [51]:
# Checking for duplicated values in 2019 data
print(data19.duplicated().sum())

0


In [52]:
data20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Company_Name  1055 non-null   object 
 1   Founded       842 non-null    Int64  
 2   Location      961 non-null    object 
 3   Sector        1042 non-null   object 
 4   Investor      1017 non-null   object 
 5   Amount        801 non-null    float64
 6   Stage         591 non-null    object 
 7   Funding_Year  1055 non-null   int32  
dtypes: Int64(1), float64(1), int32(1), object(5)
memory usage: 63.0+ KB


In [53]:
data20.shape

(1055, 8)

In [54]:
# Checking for Missing Values in 2020 data 
print(data20.isna().sum())

Company_Name      0
Founded         213
Location         94
Sector           13
Investor         38
Amount          254
Stage           464
Funding_Year      0
dtype: int64


In [55]:
data20.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,842.0,2015.36342,4.097909,1973.0,2014.0,2016.0,2018.0,2020.0
Amount,801.0,113042969.543071,2476634939.888347,12700.0,1000000.0,3000000.0,11000000.0,70000000000.0
Funding_Year,1055.0,2020.0,0.0,2020.0,2020.0,2020.0,2020.0,2020.0


In [56]:
# Checking for duplicated values in 2020 data 
print(data20.duplicated().sum())

3


In [57]:
data21.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
 9   Funding_Year   1209 non-null   int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 94.6+ KB


In [58]:
data21.shape

(1209, 10)

In [59]:
data21.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,1208.0,2016.655629,4.517364,1963.0,2015.0,2018.0,2020.0,2021.0
Funding_Year,1209.0,2021.0,0.0,2021.0,2021.0,2021.0,2021.0,2021.0


In [60]:
# Checking for missing values in 2021 data 
print(data21.isna().sum())

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
Funding_Year       0
dtype: int64


In [61]:
data21.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
1204    False
1205    False
1206    False
1207    False
1208    False
Length: 1209, dtype: bool

In [62]:
# Checking for Duplicate Values in 2021 data
print(data21.duplicated().sum())

19


# DATA CLEANING 

Year 2021

In the data exploratory stage, the following were identified;
1. A total of 19 duplicates in the dataset.
2. There are missing values in 6 columns; Founded(1), HeadQuarter(1), Founders(4), Investor(62), Amount(3), and Stage(428).
3. The Amount data type shows object instead of int64, or float64.
4. The sector column should be categorized properly.

In [63]:
# Filtering the Duplicated Values in  2021 dataset 
data21[data21.duplicated()]

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
107,Curefoods,2020.0,Bangalore,Food & Beverages,Healthy & nutritious foods and cold pressed ju...,Ankit Nagori,"Iron Pillar, Nordstar, Binny Bansal",$13000000,,2021
109,Bewakoof,2012.0,Mumbai,Apparel & Fashion,Bewakoof is a lifestyle fashion brand that mak...,Prabhkiran Singh,InvestCorp,$8000000,,2021
111,FanPlay,2020.0,Computer Games,Computer Games,A real money game app specializing in trivia g...,YC W21,"Pritesh Kumar, Bharat Gupta",Upsparks,$1200000,2021
117,Advantage Club,2014.0,Mumbai,HRTech,Advantage Club is India's largest employee eng...,"Sourabh Deorah, Smiti Bhatt Deorah","Y Combinator, Broom Ventures, Kunal Shah",$1700000,,2021
119,Ruptok,2020.0,New Delhi,FinTech,Ruptok fintech Pvt. Ltd. is an online gold loa...,Ankur Gupta,Eclear Leasing,$1000000,,2021
243,Trinkerr,2021.0,Bangalore,Capital Markets,Trinkerr is India's first social trading platf...,"Manvendra Singh, Gaurav Agarwal",Accel India,$6600000,Series A,2021
244,Zorro,2021.0,Gurugram,Social network,Pseudonymous social network platform,"Jasveer Singh, Abhishek Asthana, Deepak Kumar","Vijay Shekhar Sharma, Ritesh Agarwal, Ankiti Bose",$32000000,Seed,2021
245,Ultraviolette,2021.0,Bangalore,Automotive,Create and Inspire the future of sustainable u...,"Subramaniam Narayan, Niraj Rajmohan","TVS Motor, Zoho",$150000000,Series C,2021
246,NephroPlus,2009.0,Hyderabad,Hospital & Health Care,A vision and passion of redefining healthcare ...,Vikram Vuppala,IIFL Asset Management,$24000000,Series E,2021
247,Unremot,2020.0,Bangalore,Information Technology & Services,Unremot is a personal office for consultants!,Shiju Radhakrishnan,Inflection Point Ventures,$700000,Seed,2021


In [64]:
# Dropping Duplicates 
data21.drop_duplicates(inplace = True)

In [65]:
# Confirming the Changes made with regards to duplicates 
data21.duplicated().sum()

0

All duplicate data have been dropped in the 2021 dataset to make it concise. 

In [66]:
# Entries in Worng columns 
data21.loc[[98, 241, 242, 257, 538, 545, 551, 674, 1100, 1148, 1176]]

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
98,FanPlay,2020.0,Computer Games,Computer Games,A real money game app specializing in trivia g...,YC W21,"Pritesh Kumar, Bharat Gupta",Upsparks,$1200000,2021
241,MasterChow,2020.0,Food & Beverages,Hauz Khas,A ready-to-cook Asian cuisine brand,"Vidur Kataria, Sidhanth Madan",WEH Ventures,$461000,Seed,2021
242,Fullife Healthcare,2009.0,Pharmaceuticals\t#REF!,Primary Business is Development and Manufactur...,Varun Khanna,Morgan Stanley Private Equity Asia,$22000000,Series C,,2021
257,MoEVing,2021.0,Gurugram\t#REF!,MoEVing is India's only Electric Mobility focu...,"Vikash Mishra, Mragank Jain","Anshuman Maheshwary, Dr Srihari Raju Kalidindi",$5000000,Seed,,2021
538,Little Leap,2020.0,New Delhi,EdTech,Soft Skills that make Smart Leaders,Holistic Development Programs for children in ...,Vishal Gupta,ah! Ventures,$300000,2021
545,AdmitKard,2016.0,Noida,EdTech,A tech solution for end to end career advisory...,"Vamsi Krishna, Pulkit Jain, Gaurav Munjal\t#REF!",$1000000,Pre-series A,,2021
551,BHyve,2020.0,Mumbai,Human Resources,A Future of Work Platform for diffusing Employ...,Backed by 100x.VC,"Omkar Pandharkame, Ketaki Ogale","ITO Angel Network, LetsVenture",$300000,2021
674,MYRE Capital,2020.0,Mumbai,Commercial Real Estate,Democratising Real Estate Ownership,Own rent yielding commercial properties,Aryaman Vir,,$6000000,2021
1100,Sochcast,2020.0,Online Media\t#REF!,Sochcast is an Audio experiences company that ...,"CA Harvinderjit Singh Bhatia, Garima Surana, A...","Vinners, Raj Nayak, Amritaanshu Agrawal",$Undisclosed,,,2021
1148,Godamwale,2016.0,Mumbai,Logistics & Supply Chain,Godamwale is tech enabled integrated logistics...,"Basant Kumar, Vivek Tiwari, Ranbir Nandan",1000000\t#REF!,Seed,,2021


In [67]:
# Correcting Entries in Wrong Columns 

# LABEL 98 (FanPlay)
data21.loc[98, "Amount"] = "$1200000"
data21.loc[98, "Stage"] = "Undisclosed"
data21.loc[98, "Investor"] = "Upsparks"
data21.loc[98, "HeadQuarter"] = "undisclosed"

## LABEL 241 (MasterChow)
data21.loc[241,  "HeadQuarter"] = "Hauz Khas"
data21.loc[241, "Sector"] = "Food & Beverages"

#LABEL 257(MoEVing)
data21.loc[257, "Amount"] = "$5000000"
data21.loc[257, "HeadQuarter"] = "Gurugram"
data21.loc[257, "Sector"] = "Electronic Vehicle"
data21.loc[257, "Stage"] = "Seed"
data21.loc[257, "Investor"] = "Anshuman Maheshwary, Dr Srihari Raju Kalidindi"

#LABEL 242, 256(Fullife Healthcare)
data21.loc[242, "Amount"] = "$22000000"
data21.loc[242, "Investor"] = "Morgan Stanley Private Equity Asia"
data21.loc[242, "Stage"] = "Series C"
data21.loc[242, "HeadQuarter"] = "Mumbai"
data21.loc[242, "Sector"] = "Pharmaceuticals"
data21.loc[242, "Founders"] = "Varun Khanna"
data21.loc[242, "What_it_does"] = "Primary Business is Development and Manufacture of sports nutrition and supplements"

#LABEL 538 (Little Leap)
data21.loc[538, "Amount"] = "$300000"
data21.loc[538, "Investor"] = "ah! Ventures"
data21.loc[538, "Stage"] = "Undisclosed"
data21.loc[538, "Founders"] = "Vishal Gupta"

#LABEL 545 (AdmitKard)
data21.loc[545, "Amount"] = "$1000000"
data21.loc[545, "Investor"] = "Unknown"
data21.loc[545, "Stage"] = "Pre-series A"
data21.loc[545, "Founders"] = "Vamsi Krishna, Pulkit Jain, Gaurav Munjal"

#LABEL 551 (BHyve)
data21.loc[551, "Amount"] = "$300000"
data21.loc[551, "Investor"] = "ITO Angel Network, LetsVenture, 100x.VC"
data21.loc[551, "Stage"] = "Undisclosed"
data21.loc[551, "Sector"] = "HR"

#LABEL 674 (MYRE Capital)
data21.loc[674, "Amount"] = "$6000000"
data21.loc[674, "Stage"] = "Undisclosed"

#LABEL 677 (Saarthi Pedagogy)
data21.loc[677, "Amount"] = "$1000000"
data21.loc[677, "Investor"] = "JITO Angel Network, LetsVenture"
data21.loc[677, "Stage"] = "Undisclosed"

#LABEL 1100 (Sochcast)
data21.loc[1100, "Investor"] = "Undisclosed"
data21.loc[1100, "HeadQuarter"] = "Undisclosed"
data21.loc[1100, "Sector"] = "Online Media"

#LABEL 1148 (Godamwale)
data21.loc[1148, "Amount"] = "1000000"
data21.loc[1148, "Investor"] = "Undisclosed"
data21.loc[1148, "Stage"] = "Seed"

#LABEL 1176 (Peak)
data21.loc[1176, "Sector"] = "Information Technology & Services"
data21.loc[1176, "HeadQuarter"] = "Manchester"

data21

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021
...,...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A,2021
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D,2021
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C,2021
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B,2021


For analysis of dataset location, cross-referencing ad manual inspections were the tools employed as they were few in number

In [68]:
# Cleaning the column names of the 2021 dataset
data21 = data21.rename(columns=lambda x: x.lower().replace(' ', '_'))
# Rename company_brand as company/brand
data21 = data21.rename(columns={"company_brand":"company/brand"})
# Adding the year column
data21["funded_year"] = 2021
data21

Unnamed: 0,company/brand,founded,headquarter,sector,what_it_does,founders,investor,amount,stage,funding_year,funded_year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021,2021
...,...,...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A,2021,2021
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D,2021,2021
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C,2021,2021
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B,2021,2021


Exchange rates 

Source: OFX

exchange_rates = {
    2018: 0.014649,
    2019: 0.014209,
    2020: 0.013501,
    2021: 0.013527
}

In [69]:
# Cleaning the Amount Column 
# Checking for values that do not have the dollar sign attached to them 
data21[~data21.amount.str.get(0).isin(['$'])]

Unnamed: 0,company/brand,founded,headquarter,sector,what_it_does,founders,investor,amount,stage,funding_year,funded_year
7,Qube Health,2016.0,Mumbai,HealthTech,India's Most Respected Workplace Healthcare Ma...,Gagan Kapur,Inflection Point Ventures,Undisclosed,Pre-series A,2021,2021
8,Vitra.ai,2020.0,Bangalore,Tech Startup,Vitra.ai is an AI-based video translation plat...,Akash Nidhi PS,Inflexor Ventures,Undisclosed,,2021,2021
21,Uable,2020.0,Bangalore,EdTech,Uable offers role based programmes to empower ...,Saurabh Saxena,"Chiratae Ventures, JAFCO Asia",Undisclosed,Seed,2021,2021
39,TruNativ,2019.0,Mumbai,Food & Beverages,TruNativ Foods & Beverages Pvt Ltd,"Pranav Malhotra, Mamta Malhotra",9Unicorns,Undisclosed,Seed,2021,2021
54,AntWak,2019.0,Bangalore,EdTech,AntWak provides a video platform for e-learnin...,"Basav Nagur, Joybroto Ganguly, Sudhanshu Shekh...","Vaibhav Domkundwar, Kunal Shah",Undisclosed,Seed,2021,2021
64,Rage Coffee,2018.0,New Delhi,Food & Beverages,Rage Coffee is infused with plant based vitami...,Bharat Sethi,"GetVantage, Prakash Katama",Undisclosed,Pre-series A,2021,2021
67,Kudos,2014.0,Pune,FinTech,"Kudos is a full stack NBFC, a digital lending ...",Naresh Vigh,Marquee fintech founders,Undisclosed,Pre-series A,2021,2021
86,Onsurity,2020.0,Bangalore,HealthCare,Onsurity is an employee healthcare platform pr...,"Kulin Shah, Yogesh Agarwal","Jitendra Gupta, Harsh Shah",,,2021,2021
316,Hubhopper,2015.0,New Delhi,Podcast,"Hubhopper is India's leading podcast hosting, ...",Gautam Raj Anand,"ITI Growth Opportunities Fund, Unit-E Ventures",Undisclosed,,2021,2021
319,Battery Smart,2019.0,New Delhi,Battery,Battery Smart provides advanced Li-ion batteri...,"Pulkit Khurana, Siddharth Sikka",Orios Venture Partners,Undisclosed,Seed,2021,2021


In [70]:
data21.amount.unique()

array(['$1,200,000', '$120,000,000', '$30,000,000', '$51,000,000',
       '$2,000,000', '$188,000,000', '$200,000', 'Undisclosed',
       '$1,000,000', '$3,000,000', '$100,000', '$700,000', '$9,000,000',
       '$40,000,000', '$49,000,000', '$400,000', '$300,000',
       '$25,000,000', '$160,000,000', '$150,000', '$1,800,000',
       '$5,000,000', '$850,000', '$53,000,000', '$500,000', '$1,100,000',
       '$6,000,000', '$800,000', '$10,000,000', '$21,000,000',
       '$7,500,000', '$26,000,000', '$7,400,000', '$1,500,000',
       '$600,000', '$800,000,000', '$17,000,000', '$3,500,000',
       '$15,000,000', '$215,000,000', '$2,500,000', '$350,000,000',
       '$5,500,000', '$83,000,000', '$110,000,000', '$500,000,000',
       '$65,000,000', '$150,000,000,000', '$300,000,000', '$2,200,000',
       '$35,000,000', '$140,000,000', '$4,000,000', '$13,000,000', None,
       '$Undisclosed', '$2000000', '$800000', '$6000000', '$2500000',
       '$9500000', '$13000000', '$5000000', '$8000000',

In [71]:
# Remove the word with a value
data21['amount'] = data21['amount'].str.replace("1000000\t#REF!" , '1000000').replace('$Undisclosed', np.NaN).replace('$undisclosed',np.NaN).replace('Undisclosed', np.NaN)


In [72]:
data21.amount.unique()

array(['$1,200,000', '$120,000,000', '$30,000,000', '$51,000,000',
       '$2,000,000', '$188,000,000', '$200,000', nan, '$1,000,000',
       '$3,000,000', '$100,000', '$700,000', '$9,000,000', '$40,000,000',
       '$49,000,000', '$400,000', '$300,000', '$25,000,000',
       '$160,000,000', '$150,000', '$1,800,000', '$5,000,000', '$850,000',
       '$53,000,000', '$500,000', '$1,100,000', '$6,000,000', '$800,000',
       '$10,000,000', '$21,000,000', '$7,500,000', '$26,000,000',
       '$7,400,000', '$1,500,000', '$600,000', '$800,000,000',
       '$17,000,000', '$3,500,000', '$15,000,000', '$215,000,000',
       '$2,500,000', '$350,000,000', '$5,500,000', '$83,000,000',
       '$110,000,000', '$500,000,000', '$65,000,000', '$150,000,000,000',
       '$300,000,000', '$2,200,000', '$35,000,000', '$140,000,000',
       '$4,000,000', '$13,000,000', None, '$2000000', '$800000',
       '$6000000', '$2500000', '$9500000', '$13000000', '$5000000',
       '$8000000', '$1000000', '$1200000', '

In [73]:
# Creating a function on amount column for data21
def clean_amount():
    # copy original amount columns
    amount_column = data21["amount"].copy().str.replace(",","") 
    
    # extract values in dollars($)
    amount_in_dollar_1 = amount_column[amount_column.str.startswith("$", na=False)]
    # strip off the dollar symbol
    amount_in_dollar_1 = amount_in_dollar_1.str.lstrip("$")

    # extract values in dollars($)
    amount_in_dollar_2 = amount_column[amount_column.str.startswith("$$", na=False)]

     # strip off the dollar symbol
    amount_in_dollar_2 = amount_in_dollar_2.str.lstrip("$$")
    # Replace the Unclean columns with the clean one

    amount_column.loc[amount_in_dollar_1.index] = amount_in_dollar_1
    amount_column.loc[amount_in_dollar_2.index] = amount_in_dollar_2
    
    # convert the clean column to numeric
    amount_column = pd.to_numeric(amount_column,errors="coerce")
    # update the amount column 
    data21["amount"] = amount_column
    
    data21

In [74]:
# Checking for null values in the amount column
data21["amount"].isna().sum()
data21["amount"]

0         $1,200,000
1       $120,000,000
2        $30,000,000
3        $51,000,000
4         $2,000,000
            ...     
1204        $3000000
1205       $20000000
1206       $55000000
1207       $26000000
1208        $8000000
Name: amount, Length: 1190, dtype: object

In [75]:
# Cleaning the stage column 
modified_stage= data21.stage.unique()
modified_stage

array(['Pre-series A', None, 'Series D', 'Series C', 'Seed', 'Series B',
       'Series E', 'Pre-seed', 'Series A', 'Pre-series B', 'Debt',
       'Undisclosed', 'Bridge', 'Seed+', 'Series F2', 'Series A+',
       'Series G', 'Series F', 'Series H', 'Series B3', 'PE', 'Series F1',
       'Pre-series A1', 'Early seed', 'Series D1', 'Seies A',
       'Pre-series', 'Series A2', 'Series I'], dtype=object)

In [76]:
data21["amount"].isna().sum()

139

In [77]:
# Performing Descriptive Statistics after light cleaning of the amount column 
data21.describe(include="all").T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
company/brand,1190.0,1033.0,BharatPe,8.0,,,,,,,
founded,1189.0,,,,2016.634146,4.522098,1963.0,2015.0,2018.0,2020.0,2021.0
headquarter,1189.0,68.0,Bangalore,421.0,,,,,,,
sector,1190.0,252.0,FinTech,121.0,,,,,,,
what_it_does,1190.0,1143.0,BharatPe develops a QR code-based payment app ...,4.0,,,,,,,
founders,1186.0,1095.0,"Ashneer Grover, Shashvat Nakrani",7.0,,,,,,,
investor,1129.0,933.0,Inflection Point Ventures,23.0,,,,,,,
amount,1051.0,269.0,$1000000,52.0,,,,,,,
stage,773.0,28.0,Seed,243.0,,,,,,,
funding_year,1190.0,,,,2021.0,0.0,2021.0,2021.0,2021.0,2021.0,2021.0


In [None]:
# Creating a function to clean the Amount Column of the 2021 dataset

def clean_amount_2021(Amount):
    try:
        Amount = str(Amount)
        # Remove commas
        Amount = Amount.replace(",", "")
        Amount = Amount.replace('—', "")
        # Check if the value is in Indian Rupees and convert US Dollars: Using USD = 68.4113 which was the average annual Indian Rupee to US Dollars in 2018
        if "₹" in Amount:
            Amount = Amount.replace("₹", "")
            return round(float(Amount) * 0.0146, 2)
        # Check if the value is in US Dollars
        elif "$" in Amount:
            Amount = Amount.replace("$", "")
            return round (float(Amount), 2)
        # check if no currency symbol is present, assume US Dollars
        else:
            return round(float(Amount), 2)
    except ValueError:
        # If the value is not a number, return NaN
        return np.nan
        
# Clean the Amount column of the 2018 DataFrame
data21["Amount"] = data21["Amount"].apply(clean_amount_2021)

KeyError: 'Amount'

Year 2020

In the data exploratory stage, the following were identified;
1. 
2. 
3. 
4. 

In [None]:
# Coverting the dataset into dataframe 
data20

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019,Chennai,AgriTech,Angel investors,200000.0,,2020
1,Krayonnz,2019,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018,Pune,Hygiene management,Venture Center,,Pre-seed,2020
3,NCOME,2020,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,,2020
...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,"DSG Consumer Partners, Blume Ventures",1500000.0,,2020
1051,EpiFi,,,Fintech,"Sequoia India, Ribbit Capital",13200000.0,Seed Round,2020
1052,Purplle,2012,Mumbai,Cosmetics,Verlinvest,8000000.0,,2020
1053,Shuttl,2015,Delhi,Transport,SIG Global India Fund LLP.,8043000.0,Series C,2020


In [None]:
# Removing duplicates from data set
data20.drop_duplicates() 

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019,Chennai,AgriTech,Angel investors,200000.0,,2020
1,Krayonnz,2019,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018,Pune,Hygiene management,Venture Center,,Pre-seed,2020
3,NCOME,2020,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,,2020
...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,"DSG Consumer Partners, Blume Ventures",1500000.0,,2020
1051,EpiFi,,,Fintech,"Sequoia India, Ribbit Capital",13200000.0,Seed Round,2020
1052,Purplle,2012,Mumbai,Cosmetics,Verlinvest,8000000.0,,2020
1053,Shuttl,2015,Delhi,Transport,SIG Global India Fund LLP.,8043000.0,Series C,2020


In [None]:
# Confirming changes 
data20

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019,Chennai,AgriTech,Angel investors,200000.0,,2020
1,Krayonnz,2019,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018,Pune,Hygiene management,Venture Center,,Pre-seed,2020
3,NCOME,2020,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,,2020
...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,"DSG Consumer Partners, Blume Ventures",1500000.0,,2020
1051,EpiFi,,,Fintech,"Sequoia India, Ribbit Capital",13200000.0,Seed Round,2020
1052,Purplle,2012,Mumbai,Cosmetics,Verlinvest,8000000.0,,2020
1053,Shuttl,2015,Delhi,Transport,SIG Global India Fund LLP.,8043000.0,Series C,2020


In [None]:
# Dropping column 10 because there are a lot of missing values in it and its an unneccessary column
data20.drop(columns='column10', axis=1,inplace=True)


In [None]:
# Converting the founded column to numeric data

data20['Founded']=pd.to_datetime(data20['Founded'],format='%Y',errors='coerce')
