Indian Start-up Ecosystem Funding Analysis (2018-2021)

Business Understanding

Summary of the Task:
This project involves analyzing the funding received by start-ups in India from 2018 to 2021. The goal is to investigate the Indian start-up ecosystem and propose strategic recommendations based on data-driven insights. The datasets are provided for each year, and the analysis will cover start-up details, funding amounts, and investors' information. Data is stored across various sources, and it is crucial to gather, clean, and analyze this data to derive meaningful insights


Project Name:
Indian Start-up Funding Project (2018-2021)

Summary of the Task:
This project involves analyzing the funding received by start-ups in India from 2018 to 2021. The goal is to investigate the Indian start-up ecosystem and propose strategic recommendations based on data-driven insights. The datasets are provided for each year, and the analysis will cover start-up details, funding amounts, and investors' information. Data is stored across various sources, and it is crucial to gather, clean, and analyze this data to derive meaningful insights.

Libraries and Packages:
pandas for data manipulation and analysis
numpy for numerical operations
pyodbc for database connectivity
sqlalchemy for database ORM (optional)
matplotlib and seaborn for data visualization
scikit-learn for machine learning (if applicable)
python-dotenv for managing environment variables
requests for handling HTTP requests (if needed)
os and pathlib for handling file paths and directories

## Business Questions

1.What sectors have shown the highest growth in terms of funding received over the past four years?

2.What geographical regions within India have emerged as the primary hubs for startup activity and investment, and what factors contribute to their prominence?

3.Are there any notable differences in funding patterns between early-stage startups and more established companies?

4.Which sectors recieve the lowest level of funding and which sectors recieve the highest levels of funding in India and what factors contribute to this?

5.Which investors have more impact on startups over the years?

6.What are the key characteristics of startups that successfully secure funding, and how do they differ from those that struggle to attract investment?

1. Sectors with Highest Growth: This question helps identify the sectors that are experiencing rapid growth in terms of funding received, providing valuable insights into where investor interest and capital are flowing. Understanding these sectors can help investors identify potential high-growth opportunities for investment.

2. Geographical Regions for Startup Activity: Understanding the primary hubs for startup activity and investment within India helps investors gauge where the most vibrant ecosystems are located. Factors contributing to their prominence, such as infrastructure, government support, and access to talent, can influence investment decisions and strategies.

3. Funding Patterns Across Startup Stages: Comparing funding patterns between early-stage startups and more established companies helps investors understand how investment behavior varies depending on the maturity and growth stage of the startup. This insight can inform investment strategies tailored to different stages of the startup lifecycle.

4. Sectorial Funding Disparities: Identifying sectors with the lowest and highest levels of funding sheds light on where capital is concentrated and where there may be untapped opportunities. Understanding the factors contributing to these disparities can help investors assess sector-specific risks and opportunities.

5. Impactful Investors: Analyzing the influence of different investors on startups over the years provides insights into which investors have been most active and successful in driving startup growth. This understanding can help investors identify potential partners or co-investors and assess the reputations and track records of different investment firms.

6. Characteristics of Funded Startups: Identifying key characteristics shared by startups that successfully secure funding helps investors understand what factors contribute to investment readiness and attractiveness. Contrasting these characteristics with those of startups that struggle to attract investment can provide valuable lessons for entrepreneurs and investors alike.

Null Hypothesis(Ho): There is no significant difference in the amount of funding between startups in particular "location".

Alternative Hypothesis(Ha): There is a significant difference in the amount of funding between startups in "Blocation".

In [3]:
# Data manipulation and analysis
import pandas as pd
import numpy as np

# Database connectivity
import pyodbc

# Database ORM (optional)
from sqlalchemy import create_engine

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning (if applicable)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Managing environment variables
from dotenv import dotenv_values

# Handling HTTP requests (if needed)
import requests

# Handling file paths and directories
import os
from pathlib import Path


import warnings 

warnings.filterwarnings('ignore')

Loading Data to Python VSO Environment:

1. Database Connection (2020 and 2021 Data):

In [4]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("UID")
password = environment_variables.get("PWD")


In [5]:
# Create a connection string
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"

In [6]:
# Use the connect method of the pyodbc library and pass in the connection string.
# This will connect to the server and might take a few seconds to be complete. 
# Check your internet connection if it takes more time than necessary

connection = pyodbc.connect(connection_string)





In [7]:
# Now the sql query to get the data is what what you see below. 


#query = "SELECT * FROM LP2_Telco_churn_first_3000"

# Note that you will not have permissions to insert delete or update this database table. 
# select data from 2020

query = "SELECT * FROM LP1_startup_funding2020"

data20 = pd.read_sql(query, connection)
data20.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [8]:
data20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.5+ KB


In [9]:
data20.shape

(1055, 10)

In [10]:
# creating a column to identify each dataset by addition of data year

data20['Funding_Year'] = 2020

#Change the funding year to integer type

data20['Funding_Year'] = data20['Funding_Year'].astype(int)

data20.info()

data20.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
 10  Funding_Year   1055 non-null   int32  
dtypes: float64(2), int32(1), object(8)
memory usage: 86.7+ KB


Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,,2020
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,,2020
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,,2020


In [11]:
data20.shape

(1055, 11)

In [12]:
#printing columns to compare if the column names are matching
print(data20.columns)

Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage', 'column10', 'Funding_Year'],
      dtype='object')


In [13]:
# Renaming some columns

data20.rename(columns = {'Company_Brand' :'Company_Name'}, inplace =True)

data20.rename(columns = {'HeadQuarter': 'Location'}, inplace =True)

data20.head()

Unnamed: 0,Company_Name,Founded,Location,Sector,What_it_does,Founders,Investor,Amount,Stage,column10,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,,2020
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,,2020
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,,2020


In [14]:
#select specific columns
data20 = data20[['Company_Name', 'Founded','Location','Sector','Investor','Amount','Stage','Funding_Year']]
                
data20.head() 

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Angel investors,200000.0,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Venture Center,,Pre-seed,2020
3,NCOME,2020.0,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016.0,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,,2020


In [15]:
# Converting the funded column to numeric data
data20['Founded'] = pd.to_numeric(data20['Founded'], errors='coerce').convert_dtypes(int)

In [16]:
# Converting the Amount column to a numeric, there the need to remove some symbols including commas and currency

data20['Amount'] = data20['Amount'].apply(lambda x:str(x).replace('$', ''))

data20['Amount'] = data20['Amount'].apply(lambda x:str(x).replace(',', ''))

data20['Amount'] = data20['Amount'].replace('—', np.nan)




In [17]:
#Find the number of rows with undisclosed amounts 

index1 = data20.index[data20['Amount']=='Undisclosed']

print('The total number of undisclosed records is', len(index1))

The total number of undisclosed records is 0


In [18]:
# convert undisclosed to NAN
data20['Amount'] = data20['Amount'].replace('Undisclosed', np.nan)

In [19]:
#print a summary information on the 2020 data 
data20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Company_Name  1055 non-null   object
 1   Founded       842 non-null    Int64 
 2   Location      961 non-null    object
 3   Sector        1042 non-null   object
 4   Investor      1017 non-null   object
 5   Amount        1055 non-null   object
 6   Stage         591 non-null    object
 7   Funding_Year  1055 non-null   int32 
dtypes: Int64(1), int32(1), object(6)
memory usage: 63.0+ KB


In [20]:
#Find the row with 887000 23000000 in the amount section
index1 = data20.index[data20['Amount']=='887000 23000000']
index1

Index([], dtype='int64')

In [21]:
#replace the values with the average 
avg = str((887000+23000000)/2)
data20.at[465, 'Amount'] = avg 


In [22]:
#print the row record to confirm
print(data20.iloc[(465)])

Company_Name    True Balance
Founded                 2014
Location            Gurugram
Sector               Finance
Investor         Balancehero
Amount            11943500.0
Stage               Series C
Funding_Year            2020
Name: 465, dtype: object


In [23]:

#Find the row with 800000000 to 850000000 in the amount section
index2 = data20.index[data20['Amount']=='800000000 to 850000000']

In [24]:
#replace the values with the average 
avg = str((800000000+850000000)/2)

data20.at[472, 'Amount'] = avg 

In [25]:
#print the row record to confirm 
print(data20.iloc[(472)])

Company_Name                                             Eruditus
Founded                                                      2010
Location                                                   Mumbai
Sector                                                  Education
Investor        Bertelsmann India Investments, Sequoia Capital...
Amount                                                825000000.0
Stage                                                        None
Funding_Year                                                 2020
Name: 472, dtype: object


In [26]:
#Convert the Amount column to numeric 

data20['Amount'] = pd.to_numeric(data20['Amount'], errors='coerce')

In [27]:
#print a summary information on the 2020 data 
data20.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Company_Name  1055 non-null   object 
 1   Founded       842 non-null    Int64  
 2   Location      961 non-null    object 
 3   Sector        1042 non-null   object 
 4   Investor      1017 non-null   object 
 5   Amount        803 non-null    float64
 6   Stage         591 non-null    object 
 7   Funding_Year  1055 non-null   int32  
dtypes: Int64(1), float64(1), int32(1), object(5)
memory usage: 63.0+ KB


In [28]:
duplicates = data20[data20.duplicated()]

duplicates

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
145,Krimanshi,2015,Jodhpur,Biotechnology company,"Rajasthan Venture Capital Fund, AIM Smart City",600000.0,Seed,2020
205,Nykaa,2012,Mumbai,Cosmetics,"Alia Bhatt, Katrina Kaif",,,2020
362,Byju’s,2011,Bangalore,EdTech,"Owl Ventures, Tiger Global Management",500000000.0,,2020


In [29]:
#drop all duplicates and leave only one record 

data20 = data20.drop_duplicates(keep='first')

In [30]:
#Check the 2020 datatset information to confirm the datatypes 
data20.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1052 entries, 0 to 1054
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Company_Name  1052 non-null   object 
 1   Founded       839 non-null    Int64  
 2   Location      958 non-null    object 
 3   Sector        1039 non-null   object 
 4   Investor      1014 non-null   object 
 5   Amount        801 non-null    float64
 6   Stage         590 non-null    object 
 7   Funding_Year  1052 non-null   int32  
dtypes: Int64(1), float64(1), int32(1), object(5)
memory usage: 70.9+ KB


In [31]:
#Check the first set of row 
data20.head()

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019,Chennai,AgriTech,Angel investors,200000.0,,2020
1,Krayonnz,2019,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018,Pune,Hygiene management,Venture Center,,Pre-seed,2020
3,NCOME,2020,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,,2020


In [32]:
# select data from 2021

query = "SELECT * FROM LP1_startup_funding2021"

data21 = pd.read_sql(query, connection)
data21.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [33]:
data21.shape

(1209, 9)

In [34]:
data21.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [35]:
# creating a column to identify each dataset by addition of data year

data21['Funding_Year'] = 2021

# change the Funding_Year to interger type

data21['Funding_Year'] = data21['Funding_Year'].astype(int)

data21.info()

data21.head()






<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
 9   Funding_Year   1209 non-null   int32  
dtypes: float64(1), int32(1), object(8)
memory usage: 89.9+ KB


Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021


In [36]:
#printing columns to compare if the column names are matching

print(data21.columns)






Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage', 'Funding_Year'],
      dtype='object')


In [37]:
# Renaming some columns

data21.rename(columns = {'Company_Brand' :'Company_Name'}, inplace =True)

data21.rename(columns = {'HeadQuarter': 'Location'}, inplace =True)

data21.head()

Unnamed: 0,Company_Name,Founded,Location,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021


In [38]:
#select specific columns
data21 = data21[['Company_Name', 'Founded','Location','Sector','Investor','Amount','Stage','Funding_Year']]
                
data21.head() 

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,"BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,"Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,"GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,"CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021


In [61]:
# total undisclosed in the dataset
index5 = data21.index[data21['Amount']=='Undisclosed']

print(len(index5))

43


In [62]:
#print the row records 
data21.loc[(index5)].tail()

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
824,Avalon Labs,2017.0,Bangalore,FinTech,"Tanglin Ventures, Better Capital, Whiteboard C...",Undisclosed,Pre-series A,2021
827,Rezo.ai,2017.0,Noida,AI startup,"Devesh Sachdev, Bhavesh Manglani",Undisclosed,Seed,2021
833,Polygon,2017.0,Mumbai,Crypto,"Mark Cuban, MiH Ventures",Undisclosed,,2021
846,Ingenium,2018.0,New Delhi,EdTech,Lead Angels,Undisclosed,Seed,2021
853,Celcius,2020.0,Mumbai,Logistics,Eaglewings Ventures,Undisclosed,Seed,2021


In [63]:
# Replace the Undisclosed with NAN

data21['Amount'] = data21['Amount'].replace('Undisclosed', np.nan)

In [65]:
#print the last 5 row records 
data21.loc[(index5)].tail()

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
824,Avalon Labs,2017.0,Bangalore,FinTech,"Tanglin Ventures, Better Capital, Whiteboard C...",,Pre-series A,2021
827,Rezo.ai,2017.0,Noida,AI startup,"Devesh Sachdev, Bhavesh Manglani",,Seed,2021
833,Polygon,2017.0,Mumbai,Crypto,"Mark Cuban, MiH Ventures",,,2021
846,Ingenium,2018.0,New Delhi,EdTech,Lead Angels,,Seed,2021
853,Celcius,2020.0,Mumbai,Logistics,Eaglewings Ventures,,Seed,2021


In [66]:
# number of upspark in Amount column
index6 = data21.index[data21['Amount']=='Upsparks']

print(len(index6)), index6

2


(None, Index([98, 111], dtype='int64'))

In [69]:
# display them
data21.loc[index6]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
98,FanPlay,2020.0,Computer Games,Computer Games,"Pritesh Kumar, Bharat Gupta",Upsparks,$1200000,2021
111,FanPlay,2020.0,Computer Games,Computer Games,"Pritesh Kumar, Bharat Gupta",Upsparks,$1200000,2021


In [70]:
#drop the duplicate

data21 = data21.drop(labels=index6[1], axis=0)

In [71]:
#Rearrange the record data correctly 

data21.loc[index6[0], ['Amount', 'Stage']] = ['$1200000', '']


In [72]:
# dispaly the changes 
data21.iloc[98]

Company_Name                        FanPlay
Founded                              2020.0
Location                     Computer Games
Sector                       Computer Games
Investor        Pritesh Kumar, Bharat Gupta
Amount                             $1200000
Stage                                      
Funding_Year                           2021
Name: 98, dtype: object

In [74]:
# Find element in amount with series C
index7 = data21.index[data21['Amount']=='Series C']

print(len(index7)), index7

2


(None, Index([242, 256], dtype='int64'))

In [75]:
# show the entry
data21.loc[index7]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
242,Fullife Healthcare,2009.0,Pharmaceuticals\t#REF!,Primary Business is Development and Manufactur...,$22000000,Series C,,2021
256,Fullife Healthcare,2009.0,Pharmaceuticals\t#REF!,Primary Business is Development and Manufactur...,$22000000,Series C,,2021


In [76]:
#since its duplicate  drop one 
data21 = data21.drop(labels=index7[1], axis=0)

In [78]:
#rearrange the columns entery 
data21.loc[index7[0], ['Sector', 'Location', 'Amount', 'Investor', 'Stage']] = ['Pharmaceuticals', '', '$22000000', '', 'Series C']

data21.loc[242]

Company_Name    Fullife Healthcare
Founded                     2009.0
Location                          
Sector             Pharmaceuticals
Investor                          
Amount                   $22000000
Stage                     Series C
Funding_Year                  2021
Name: 242, dtype: object

In [80]:
index8 = data21.index[data21['Amount']=='Seed']

print(index8)

Index([257, 1148], dtype='int64')


In [81]:
data21.loc[index8]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
257,MoEVing,2021.0,Gurugram\t#REF!,MoEVing is India's only Electric Mobility focu...,$5000000,Seed,,2021
1148,Godamwale,2016.0,Mumbai,Logistics & Supply Chain,1000000\t#REF!,Seed,,2021


In [84]:
data21.loc[index8[0], ['Sector', 'Location', 'Amount', 'Investor', 'Stage']] = ['Electric Mobility', 'Gurugram', '$5000000', '', 'Seed']
data21.loc[index8[1], ['Amount', 'Investor', 'Stage']] = ['1000000', '', 'Seed']

In [85]:
data21.loc[257]

Company_Name              MoEVing
Founded                    2021.0
Location                 Gurugram
Sector          Electric Mobility
Investor                         
Amount                   $5000000
Stage                        Seed
Funding_Year                 2021
Name: 257, dtype: object

In [86]:
index9 = data21.index[data21['Amount']=='ah! Ventures']

print(index9)

Index([538], dtype='int64')


In [87]:
data21.loc[index9]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
538,Little Leap,2020.0,New Delhi,EdTech,Vishal Gupta,ah! Ventures,$300000,2021


In [88]:
data21.loc[index9, ['Amount', 'Stage']] = ['$300000', '']

In [89]:
data21.loc[538]

Company_Name     Little Leap
Founded               2020.0
Location           New Delhi
Sector                EdTech
Investor        Vishal Gupta
Amount               $300000
Stage                       
Funding_Year            2021
Name: 538, dtype: object

In [91]:
# Pre-series A
index10 = data21.index[data21['Amount']=='Pre-series A']

index10

Index([545], dtype='int64')

In [92]:
data21.loc[index10]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
545,AdmitKard,2016.0,Noida,EdTech,$1000000,Pre-series A,,2021


In [93]:
# ITO angel network, letsventure
index11 = data21.index[data21['Amount']=='ITO Angel Network, LetsVenture']

index11

Index([551], dtype='int64')

In [94]:
data21.loc[index11]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
551,BHyve,2020.0,Mumbai,Human Resources,"Omkar Pandharkame, Ketaki Ogale","ITO Angel Network, LetsVenture",$300000,2021


In [96]:

# rearranging 
data21.at[551, 'Amount'] = '$300000'
data21.at[551, 'Investor'] = 'Omkar Pandharkame, Ketaki Ogale, JITO Angel Network, LetsVenture'
data21.at[551, 'Stage'] = ''



In [97]:
data21.loc[index11]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
551,BHyve,2020.0,Mumbai,Human Resources,"Omkar Pandharkame, Ketaki Ogale, JITO Angel Ne...",$300000,,2021


In [98]:
# JITO Angel Network, LetsVenture
index12 = data21.index[data21['Amount']=='JITO Angel Network, LetsVenture']

index12

Index([677], dtype='int64')

In [99]:
data21.loc[index12]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
677,Saarthi Pedagogy,2015.0,Ahmadabad,EdTech,Sushil Agarwal,"JITO Angel Network, LetsVenture",$1000000,2021


In [100]:
# rearranging 
data21.at[677, 'Amount'] = '$1000000'
data21.at[677, 'Investor'] = 'Sushil Agarwal, JITO Angel Network, LetsVenture'
data21.at[677, 'Stage'] = ''

In [102]:
data21.loc[index12]

Unnamed: 0,Company_Name,Founded,Location,Sector,Investor,Amount,Stage,Funding_Year
677,Saarthi Pedagogy,2015.0,Ahmadabad,EdTech,"Sushil Agarwal, JITO Angel Network, LetsVenture",$1000000,,2021


In [103]:
index13 = data21.index[data21['Amount']=='nan']

data21['Amount'] = data21['Amount'].replace('nan', np.nan)

In [104]:
index13 = data21.index[data21['Amount']=='nan']

data21['Amount'] = data21['Amount'].replace('nan', np.nan)

In [105]:
# replace $ and , to empty space, - to NAN
data21['Amount'] = data21['Amount'].apply(lambda x:str(x).replace('$', ''))

data21['Amount'] = data21['Amount'].apply(lambda x:str(x).replace(',', ''))

data21['Amount'] = data21['Amount'].replace('—', np.nan)

In [106]:
data21.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1207 entries, 0 to 1208
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Company_Name  1207 non-null   object 
 1   Founded       1206 non-null   float64
 2   Location      1206 non-null   object 
 3   Sector        1207 non-null   object 
 4   Investor      1145 non-null   object 
 5   Amount        1207 non-null   object 
 6   Stage         783 non-null    object 
 7   Funding_Year  1207 non-null   int32  
dtypes: float64(1), int32(1), object(6)
memory usage: 112.4+ KB


In [107]:
# convert amount column to numeric
data21['Amount']  = pd.to_numeric(data21['Amount'], errors='coerce')

In [108]:
# Considering Location Column
data21.loc[98]


Company_Name                        FanPlay
Founded                              2020.0
Location                     Computer Games
Sector                       Computer Games
Investor        Pritesh Kumar, Bharat Gupta
Amount                            1200000.0
Stage                                      
Funding_Year                           2021
Name: 98, dtype: object

In [109]:
data21.loc[752]

Company_Name    NewLink Group
Founded                2016.0
Location              Beijing
Sector           Tech Startup
Investor         Bain Capital
Amount            200000000.0
Stage                    None
Funding_Year             2021
Name: 752, dtype: object

In [111]:
data21['Location'] = data21.Location.str.split(',').str[0]
data21.at[32, 'Location'] = 'Andhra Pradesh'
data21.at[98, 'Location'] = ''
data21.at[241, 'Location'] = ''
data21.at[255, 'Location'] = ''
data21.at[752, 'Location'] = ''
data21.at[1100, 'Location'] = ''
data21.at[1176, 'Location'] = ''

In [112]:
# Considering Sector Attribute

data21['Sector'] = data21.Sector.str.split(',').str[0]
data21.at[1100, 'Sector'] = 'Audio experience'

Loading Data to Python VSO Environment:

2. Database Connection (2019 Data):

In [39]:
# Accessing the data for 2019 can be found in OneDrive. The file name startup_funding2019.csv

data19 = pd.read_csv('startup_funding2019.csv')
data19.head()



Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [40]:
data19.shape


(89, 9)

In [41]:
data19.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [42]:
# Creating a column to identify each dataset by addition of data year
data19['Year'] = 2019

data19.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Year
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",,2019


In [43]:
#Renaming amount column in dataset 2#

data19 = data19.rename(columns={'Amount($)':'Amount'})
data19

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount,Stage,Year
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",,2019
...,...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A,2019
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",,2019
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B,2019
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",,2019


In [44]:
# Define a function to convert rupees to dollars
def convert_to_dollars(Amount):
    if Amount.startswith('₹'):
        return float(Amount[1:]) * 0.0118
    else:
        return Amount
 
# Apply the conversion function to the 'amount' column
data19['Amount'] = data19['Amount'].apply(convert_to_dollars)
data19

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount,Stage,Year
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",,2019
...,...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A,2019
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",,2019
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B,2019
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",,2019


In [45]:
#removing dollar sign
data19['Amount']= data19['Amount'].str.replace('$', '')
data19

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount,Stage,Year
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,6300000,,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,150000000,Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey",28000000,Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...",30000000,Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),6000000,,2019
...,...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...",20000000,Series A,2019
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...",693000000,,2019
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,5000000,Series B,2019
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...",50000000,,2019


In [46]:
# remove commas
data19['Amount']= data19['Amount'].str.replace(',', '', regex= True)

data19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount         89 non-null     object 
 8   Stage          43 non-null     object 
 9   Year           89 non-null     int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 7.1+ KB


In [47]:
data19['Amount'].unique()

array(['6300000', '150000000', '28000000', '30000000', '6000000',
       'Undisclosed', '1000000', '20000000', '275000000', '22000000',
       '5000000', '140500', '540000000', '15000000', '182700', '12000000',
       '11000000', '15500000', '1500000', '5500000', '2500000', '140000',
       '230000000', '49400000', '32000000', '26000000', '150000',
       '400000', '2000000', '100000000', '8000000', '100000', '50000000',
       '120000000', '4000000', '6800000', '36000000', '5700000',
       '25000000', '600000', '70000000', '60000000', '220000', '2800000',
       '2100000', '7000000', '311000000', '4800000', '693000000',
       '33000000'], dtype=object)

Loading Data to Python VSO Environment:

2. Database Connection (2018 Data):

In [48]:
# The third data (data for 2018) is hosted on this GitHub Repository, in file called startup_funding2018.csv

data18 = pd.read_csv('startup_funding2018.csv')
data18.head()



Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [49]:
data18.shape

(526, 6)

In [50]:
# Creating a column to identify each dataset by addition of data year

data18['Year'] = 2018

data18.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018


In [51]:
data18.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Year           526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [52]:
data18['Amount'].unique()

array(['250000', '₹40,000,000', '₹65,000,000', '2000000', '—', '1600000',
       '₹16,000,000', '₹50,000,000', '₹100,000,000', '150000', '1100000',
       '₹500,000', '6000000', '650000', '₹35,000,000', '₹64,000,000',
       '₹20,000,000', '1000000', '5000000', '4000000', '₹30,000,000',
       '2800000', '1700000', '1300000', '₹5,000,000', '₹12,500,000',
       '₹15,000,000', '500000', '₹104,000,000', '₹45,000,000', '13400000',
       '₹25,000,000', '₹26,400,000', '₹8,000,000', '₹60,000', '9000000',
       '100000', '20000', '120000', '₹34,000,000', '₹342,000,000',
       '$143,145', '₹600,000,000', '$742,000,000', '₹1,000,000,000',
       '₹2,000,000,000', '$3,980,000', '$10,000', '₹100,000',
       '₹250,000,000', '$1,000,000,000', '$7,000,000', '$35,000,000',
       '₹550,000,000', '$28,500,000', '$2,000,000', '₹240,000,000',
       '₹120,000,000', '$2,400,000', '$30,000,000', '₹2,500,000,000',
       '$23,000,000', '$150,000', '$11,000,000', '₹44,000,000',
       '$3,240,000', '₹60

In [53]:
data18[data18['Amount'].str.startswith('$')]

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Year
86,WHR,"Health Care, Information Technology",Seed,"$143,145","Pune, Maharashtra, India",WHR is to make affordable healthcare a reality...,2018
90,SBI Life,Insurance,Private Equity,"$742,000,000","Mumbai, Maharashtra, India",SBI Life is one of the life insurance company ...,2018
93,NoPaperForms Solutions Pvt. Ltd.,"EdTech, Education, Information Services, SaaS",Series B,"$3,980,000","New Delhi, Delhi, India","NoPaperForms is a marketing automation, lead n...",2018
95,AuthMetrik,"B2B, Biometrics, Cyber Security, Fraud Detecti...",Grant,"$10,000","Gurgaon, Haryana, India","SaaS, B2B, Security, Stop account sharing, Fra...",2018
101,Swiggy,"Food Delivery, Food Processing, Internet",Series H,"$1,000,000,000","Bangalore, Karnataka, India",Swiggy is a food ordering and delivery company...,2018
102,Milkbasket,"E-Commerce, Food and Beverage, Internet",Series A,"$7,000,000","Haryana, Haryana, India","Milkbasket delivers milk, bread, eggs, butter,...",2018
104,Toppr,"EdTech, Education, Knowledge Management",Series C,"$35,000,000","Mumbai, Maharashtra, India",Toppr.com is an online preparation platform fo...,2018
106,Vivriti Capital,Financial Services,Venture - Series Unknown,"$28,500,000","Chennai, Tamil Nadu, India",Vivriti Capital is an online platform for inst...,2018
108,Impact Guru,"Creative Agency, Crowdfunding, EdTech, Health ...",Series A,"$2,000,000","Mumbai, Maharashtra, India",We're a Harvard incubated crowdfunding platfor...,2018
114,OneAssist,"Financial Services, SaaS, Security",Debt Financing,"$2,400,000","Mumbai, Maharashtra, India",OneAssist is a protection & assistance service...,2018


In [54]:
data18['Amount'] = data18['Amount'].str.replace(',', '')
data18

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,₹40000000,"Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,₹65000000,"Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018
...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018
524,Droni Tech,Information Technology,Seed,₹35000000,"Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018


GitHub Repository and Cloning to VSO Python Notebook:

In [55]:
data18['Amount'] = data18['Amount'].str.replace(',', '')
data18
def convert_to_dollars(Amount):
    if Amount.startswith('₹'):
        return float(Amount[1:]) * 0.0117
    else:
        return Amount
 
# Apply the conversion function to the 'amount' column
data18['Amount'] = data18['Amount'].apply(convert_to_dollars)
data18





Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,468000.0,"Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,760500.0,"Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018
...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018
524,Droni Tech,Information Technology,Seed,409500.0,"Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018


In [56]:
data18.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Year           526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [57]:
# Rename round_series to stage and location to headquarter
data18.rename(columns={
    'Company Name': 'company_brand', 
    'Industry': 'sector', 
    'Round/Series': 'stage', 
    'About Company': 'what_it_does', 
    'Location': 'headquarter'
    },
    inplace=True
)

data18.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   company_brand  526 non-null    object
 1   sector         526 non-null    object
 2   stage          526 non-null    object
 3   Amount         526 non-null    object
 4   headquarter    526 non-null    object
 5   what_it_does   526 non-null    object
 6   Year           526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [58]:
# git clone https://github.com/SamuelAsong/indian-startup-funding-analysis.git
# cd indian-startup-funding-analysis
# pip install -r requirements.txt


CRISP-DM Process:
Business Understanding:

Define project objectives and requirements
Understand the start-up ecosystem and the importance of funding data
Data Understanding:

Gather and explore datasets for 2018, 2019, 2020, and 2021
Identify key features and initial insights
Data Preparation:

Clean and preprocess data
Handle missing values, duplicates, and inconsistent data
Merge datasets into a single comprehensive dataset
Data Analysis:

Perform exploratory data analysis (EDA)
Identify trends, patterns, and outliers
Visualize funding trends over the years
Modeling (if applicable):

Develop machine learning models to predict funding success (optional)
Evaluate model performance
Evaluation:

Assess the analysis results and model performance
Validate findings against business objectives
Deployment:

Present findings and recommendations
Prepare a final report and presentation
Conclusion and Findings:
Summarize key insights from the data analysis
Highlight significant trends and patterns in the Indian start-up funding landscape
Provide actionable recommendations based on data-driven insights
Discuss potential limitations and future work
This structured approach ensures a comprehensive analysis and effective communication of results, helping to make strategic, data-driven decisions in the Indian start-up ecosystem.








The CRISP-DM reference model 
1 Business understanding 

1.1 Determine business objectives 

1.2 Assess situation 

1.3 Determine data mining goals 

1.4 Produce project plan 

2 Data understanding 

2.1 Collect initial data 

2.2 Describe data 

2.3 Explore data 

2.4 Verify data quality 

3 Data preparation 

3.1 Select data 

3.2 Clean data 

3.3 Construct data 

3.4 Integrate data 

3.5 Format data 

4 Modeling 

4.1 Select modeling technique 

4.2 Generate test design 

4.3 Build model 

4.4 Assess model 

5 Evaluation 

5.1 Evaluate results 

5.2 Review process 

5.3 Determine next steps 

6 Deployment 

6.1 Plan deployment 

6.2 Plan monitoring and maintenance report 

6.4 Review project 