# **START-UP FUNDING ANALYSIS IN THE INDIAN ECOSYSTEM**

**Project Description**

The Indian ecosystem is dynamic and constantly evolving, thus, for every business venturing into an unknown territory i.e., a new country or landscape, the fear of the unknown normally takes precedence with regards to whether the business will succeed or not. By examining existing data spanning from 2018 to 2021 on start-up funding, this project will identify key patterns, investment behaviors, and emerging sectors within the Indian start-up ecosystem to inform strategic decision-making for venturing into this market. 


**Task Outline**

- Analyze venture funding in India from 2018 t0 2021
- Carry out a comprehensive study of datasets,examining funding distributions,sector-specific details and focal geographic points in the Indian ecosystem.

**Hypothesis**

Null Hypothesis (H0) – The funds a company receives does not depend on the sector the company invests in.
 
Alternative Hypothesis (H1) – The funds a company receives depends on the sector a company invests in.


**Questions**

1. Does the location of a start-up influence the sector? 
2.	Which industries have received the most funding in each year, and how has this distribution changed over time? 
3.	What is the distribution of funding amounts among start-ups each year? 
4.	What are the average funding amounts for different funding stages (e.g., Seed, Series A, Series B, etc.) each y ear? 
5.	How does funding vary within the various geographical locations for start-ups?


# **1.DATA EXPLORATION,UNDERSTANDING AND ANALYSIS**



In [2]:
#Install all the necessary packages
#Import and load libraries

import pyodbc 
from dotenv import dotenv_values    #import the dotenv_values function from the dotenv package
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
 
warnings.filterwarnings('ignore')

In [3]:
environment_variables = dotenv_values('.env')

# Load environment variables from .env file into a dictionaryenvironment_variables=dotenv_values('.env')
# Get the values for the credentials you set in the '.env' file
database=environment_variables.get("DATABASE")
server=environment_variables.get("SERVER")
username=environment_variables.get("LOGIN")
password=environment_variables.get("PASSWORD")
connection_string=f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

In [4]:
# Create a connection string
connection=pyodbc.connect(connection_string)

In [5]:
# Sql query to access 2020 data 

query = ''' SELECT *
            FROM INFORMATION_SCHEMA.TABLES
            WHERE TABLE_TYPE = 'BASE TABLE' '''


In [6]:
indian=pd.read_sql(query, connection)
indian

Unnamed: 0,TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,TABLE_TYPE
0,dapDB,dbo,LP1_startup_funding2021,BASE TABLE
1,dapDB,dbo,LP1_startup_funding2020,BASE TABLE


In [7]:
#Load data for 2020

query= "SELECT * FROM dbo.LP1_startup_funding2020"
data_2020 =pd.read_sql(query, connection)

data_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [8]:
data_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.5+ KB


In [9]:
data_2020.shape

(1055, 10)

In [10]:
#Load data for 2021

query= "SELECT * FROM dbo.LP1_startup_funding2021"
data_2021 =pd.read_sql(query, connection)

data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [11]:
data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [12]:
data_2021.shape

(1209, 9)

In [13]:
#Load CSV files
#Load 2018 data
data_2018 =pd.read_csv("C:\\Users\\User\\Desktop\\Jamaica\\Indian-Startup\\Data\\startup_funding2018.csv")
data_2018
data_2018.head(10)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,1600000,"Bengaluru, Karnataka, India",Hasura is a platform that allows developers to...
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...
8,Freightwalla,"Information Services, Information Technology",Seed,—,"Mumbai, Maharashtra, India",Freightwalla is an international forwarder tha...
9,Microchip Payments,Mobile Payments,Seed,—,"Bangalore, Karnataka, India",Microchip payments is a mobile-based payment a...


In [14]:
#Load 2019 data
data_2019 =pd.read_csv("C:\\Users\\User\\Desktop\\Jamaica\\Indian-Startup\\Data\\startup_funding2019.csv")
data_2019
data_2019.head(10)

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",
5,FlytBase,,Pune,Technology,A drone automation platform,Nitin Gupta,Undisclosed,Undisclosed,
6,Finly,,Bangalore,SaaS,It builds software products that makes work si...,"Vivek AG, Veekshith C Rai","Social Capital, AngelList India, Gemba Capital...",Undisclosed,
7,Kratikal,2013.0,Noida,Technology,It is a product-based cybersecurity solutions ...,"Pavan Kushwaha, Paratosh Bansal, Dip Jung Thapa","Gilda VC, Art Venture, Rajeev Chitrabhanu.","$1,000,000",Pre series A
8,Quantiphi,,,AI & Tech,It is an AI and big data services company prov...,Renuka Ramnath,Multiples Alternate Asset Management,"$20,000,000",Series A
9,Lenskart,2010.0,Delhi,E-commerce,It is a eyewear company,"Peyush Bansal, Amit Chaudhary, Sumeet Kapahi",SoftBank,"$275,000,000",Series G


# **2.EXPLORATORY DATA ANALYSIS(EDA)**
 **EDA For 2018 Dataset**

In [16]:
#EDA with the 2018 dataset
#Preview the rows and columns for this dataset
 
data_2018.sample(5)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
144,HalaPlay Technologies,"Digital Entertainment, Fantasy Sports, Sports",Series A,"$5,000,000","Bangalore, Karnataka, India",A daily fantasy sports platform.
77,HappyEMI,"Consumer, Financial Services, FinTech",Seed,1000000,"Bangalore, Karnataka, India",HappyEMI is a point of sale digital lending pl...
243,Steradian Semiconductors,—,Seed,—,"Bangalore, Karnataka, India",It is a fabless semiconductor company focused ...
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
97,LetsTransport,"Logistics, Transportation, Travel",Series B,"₹1,000,000,000","Bangalore, Karnataka, India",Lets transport is a logistics solution provider.


In [17]:
#Now, we check for information on the datatypes
data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [18]:
#Moving on to find the number of rows and columns
data_2018.shape

(526, 6)

In [27]:
#Information on the number of rows and columns
print(f"There are {data_2018.shape[0]}rows, and {data_2018.shape[1]}columns")

There are 526rows, and 6columns


In [19]:
#Finding missing values
data_2018.isnull().sum()

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64

In [21]:
#Finding duplicated values
data_2018.duplicated()
data_2018.duplicated().sum()

1

In [22]:
#Finally,describe the data for this set
data_2018.describe()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
count,526,526,526,526,526,526
unique,525,405,21,198,50,524
top,TheCollegeFever,—,Seed,—,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
freq,2,30,280,148,102,2


 **EDA For 2019 Dataset**

In [23]:
#EDA with the 2019 dataset
#Preview the rows and columns for this dataset
 
data_2019.sample(5)

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
64,Moms Co,,New Delhi,E-commerce,It is into mother and baby care-focused consum...,Malika Sadani,"Saama Capital, DSG Consumer Partners","$5,000,000",Series B
43,Slintel,2016.0,,SaaS,It helps sales and marketing teams understand ...,Deepak Anchala,Stellaris Ventures,"$1,500,000",
42,Bombay Shirt Company,2012.0,Mumbai,E-commerce,Online custom shirt brand,Akshay Narvekar,Lightbox Ventures,"$8,000,000",
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",
41,VMate,,,,A short video platform,,Alibaba,"$100,000,000",


In [24]:
#Now, we check for information on the datatypes
data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [25]:
#Moving on to find the number of rows and columns
data_2019.shape

(89, 9)

In [26]:
#Information on the number of rows and columns
print(f"There are {data_2019.shape[0]}rows, and {data_2019.shape[1]}columns")

There are 89rows, and 9columns


In [28]:
#Finding missing values
data_2019.isnull().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64

In [29]:
#Finding duplicated values
data_2019.duplicated()
data_2019.duplicated().sum()

0

In [30]:
#Finally,describe the data for this set
data_2019.describe()

Unnamed: 0,Founded
count,60.0
mean,2014.533333
std,2.937003
min,2004.0
25%,2013.0
50%,2015.0
75%,2016.25
max,2019.0


**EDA For 2020 Dataset**

In [31]:
#EDA with the 2020 dataset
#Preview the rows and columns for this dataset
 
data_2020.sample(5)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
237,GoodGamer,2020.0,Bangalore,Gaming,GoodGamer is India's first Daily Fantasy Sport...,Charles Creighton,,2500000.0,Seed,
637,Jai Kisan,2017.0,Mumbai,Fintech,A platform that caters the need of customers i...,"Arjun Ahluwalia, Adriel Maniego","Arkam Ventures, Nabventures",3937000.0,Pre series A,
1050,Leverage Edu,,Delhi,Edtech,AI enabled marketplace that provides career gu...,Akshay Chaturvedi,"DSG Consumer Partners, Blume Ventures",1500000.0,,
767,Pariksha,2015.0,Pune,Edtech,Platform for vernacular test-preparation,"Karanvir Singh, Utkarsh Bagri, Vikram Shekhawa...","INSEAD Angels, IIT Kanpur Angels, Sixth Sense ...",,Pre series A,
380,Infilect,2015.0,Bangalore,SaaS startup,Infilect Technologies specialises in visual co...,"Anand Prabhu Subramanian, Vijay Gabale","Mela Ventures, 1Crowd",1500000.0,Pre-series A,


In [32]:
#Now, we check for information on the datatypes
data_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.5+ KB


In [33]:
#Moving on to find the number of rows and columns
data_2020.shape

(1055, 10)

In [34]:
#Information on the number of rows and columns
print(f"There are {data_2020.shape[0]}rows, and {data_2020.shape[1]}columns")

There are 1055rows, and 10columns


In [35]:
#Finding missing values
data_2020.isnull().sum()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

In [36]:
#Finding duplicated values
data_2020.duplicated()
data_2020.duplicated().sum()

3

In [37]:
#Finally,describe the data for this set
data_2020.describe()

Unnamed: 0,Founded,Amount
count,842.0,801.0
mean,2015.36342,113043000.0
std,4.097909,2476635000.0
min,1973.0,12700.0
25%,2014.0,1000000.0
50%,2016.0,3000000.0
75%,2018.0,11000000.0
max,2020.0,70000000000.0


**EDA For 2021 Dataset**

In [38]:
#EDA with the 2021 dataset
#Preview the rows and columns for this dataset
 
data_2021.sample(5)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
1015,Revfin,2017.0,New Delhi,Financial Services,RevFin is a financial technology (FinTech) com...,Sameer Aggarwal,"Ruchirans Jaipuria, Rishi Kajaria",$4000000,Pre-series A
720,Zotalabs,2019.0,Pune,Tech Startup,Zotalabs gives the power of Emerging Technolog...,"Nausherwan Shah, Wasim Khan",Alfa Ventures,"$1,250,000",Seed
671,Pathfndr.io,2015.0,Bangalore,SaaS startup,Intelligent Travel Tech Stack making sense of ...,Varun Gupta,Arali Ventures,$Undisclosed,Pre-series A
61,WESS,1989.0,Mumbai,Renewable Energy,Waaree is India's Largest Solar Module Manufac...,Hitesh Doshi,Centrum Financial Services,"$2,000,000",Seed
79,CareerLabs,2019.0,Bangalore,EdTech,"Aim to help students become future-ready, sett...","PN Santosh, Prasanna Alagesan, Krithika Sriniv...",Global Founders Capital,"$2,200,000",


In [39]:
#Now, we check for information on the datatypes
data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [40]:
#Moving on to find the number of rows and columns
data_2021.shape

(1209, 9)

In [41]:
#Information on the number of rows and columns
print(f"There are {data_2021.shape[0]}rows, and {data_2021.shape[1]}columns")

There are 1209rows, and 9columns


In [42]:
#Finding missing values
data_2021.isnull().sum()

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
dtype: int64

In [43]:
#Finding duplicated values
data_2021.duplicated()
data_2021.duplicated().sum()

19

In [44]:
#Finally,describe the data for this set
data_2021.describe()

Unnamed: 0,Founded
count,1208.0
mean,2016.655629
std,4.517364
min,1963.0
25%,2015.0
50%,2018.0
75%,2020.0
max,2021.0


**Issues Identified And Possible Solutions**                           

1. *Different Currencies in Amounts*:
   - *Issue*: Funding amounts are recorded in different currencies (e.g., INR, USD).
   - *Solution*: Normalize the amounts to a single currency using historical rates .

2. *Missing Data*:
   - *Issue*: Potential missing values in the datasets as indicated by the count of non-null entries.
   - *Solution*: Implement strategies for handling missing data such as imputation, filling with default values, or dropping missing entries.

3. *Column Consistency*:
   - *Issue*: Inconsistencies in column names across the datasets for different years.
   - *Solution*: Standardize column names across datasets to ensure consistency.

4. *Irrelevant columns*:
   - *Issue*: There are some irrelevant columns in the dataset 
   - *Solution*: Drop the irrelevant columns

5. *Inconsistent values  in categorical features*
   - *Issue*: Inconsistent values in categorical columns, for eg, stages and sectors.
   - *Solution*: Standardizing the values of the categorical features.

# **DATA CLEANING AND PREPARATION**