**Analysis of Funding Distribution of India Startup**

**BUSINESS UNDERSTANDING**

We are exploring The Indian Startup Ecosystem to understand funding trends and identify promising opportunity. The goal is to analyze funding data from 2018 to 2021, focusing on key factors such as amount of funding received, sectors, stages of investment, and geographic locations.

HYPOTHESIS TESTING
Null Hypothesis (H0): There is no significant difference in the amount of funding received by startups across different sectors and stages.
 
Alternate Hypothesis (H1): There is a significant difference in the amount of funding received by startups across different sectors and stages.

DATA UNDERSTANDING
COLUMNS
Company_Brand: Name of the startup.
Founded: Year the startup was founded.
HeadQuarter: City where the startup is headquartered.
Sector: Industry sector of the startup.
What_it_does: Brief description of the startup's business.
Founders: Names of the founders.
Investor: Investors or investment firms that funded the startup.
Amount: Amount of funding received (in dollars).
Stage: Stage of investment (e.g., Pre-seed, Seed, Series A).

ANALYTICAL QUESTIONS
1. Funding Trends:
How has the total funding amount changed year over year from 2018 to 2021?
How has the average funding amount in each sector changed over the years (2018 - 2021)
 
2. Sector Analysis:
Which sectors have received the most funding, and how does the funding distribution vary across sectors?
 
3.Stage Analysis:
What is the distribution of funding across different investment stages (e.g., Pre-seed, Seed, Series A)?
 
4. Geographical Analysis:
Which cities or regions have the highest concentration of funded startups?
 
5.Investor Influence:
Who are the top investors in the Indian startup ecosystem, and what is their funding pattern?
 
6. Founder Impact:
Is there a correlation between the number of founders and the amount of funding received?
 
7. What are the characteristics of startups in the highest-funded sectors (e.g., number of founders, location)?
 
8. Which Business is more viable to set - The best performing businesses?

In [1]:
%pip install pyodbc  
%pip install python-dotenv
%pip install seaborn

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pyodbc     
from dotenv import dotenv_values    #import the dotenv_values function from the dotenv package
import pandas as pd
import warnings 

warnings.filterwarnings('ignore')

In [3]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("server")
database = environment_variables.get("database")
username = environment_variables.get("username")
password = environment_variables.get("password")

In [4]:
# Create a connection string
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"


In [5]:
# Use the connect method of the pyodbc library and pass in the connection string.
# This will connect to the server and might take a few seconds to be complete. 
# Check your internet connection if it takes more time than necessary

connection = pyodbc.connect(connection_string)

TABLE 1

In [6]:
# Now the sql query to get the data is what what you see below. 
# Note that you will not have permissions to insert delete or update this database table. 

query = '''SELECT * FROM dbo.LP1_startup_funding2020'''

data = pd.read_sql(query, connection)

In [7]:
data.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


Data Cleaning

In [8]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [9]:
## types of data
data.dtypes

Company_Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What_it_does      object
Founders          object
Investor          object
Amount           float64
Stage             object
column10          object
dtype: object

In [13]:
## finding missing values
data.isna().sum

<bound method DataFrame.sum of       Company_Brand  Founded  HeadQuarter  Sector  What_it_does  Founders  \
0             False    False        False   False         False     False   
1             False    False        False   False         False     False   
2             False    False        False   False         False     False   
3             False    False        False   False         False     False   
4             False    False        False   False         False     False   
...             ...      ...          ...     ...           ...       ...   
1050          False     True        False   False         False     False   
1051          False     True         True   False         False     False   
1052          False    False        False   False         False     False   
1053          False    False        False   False         False     False   
1054          False    False        False   False         False     False   

      Investor  Amount  Stage  column10  
0 

In [16]:
print(data.isna().sum())

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64


In [18]:
##duplicates
data.duplicated().sum()

3

In [19]:
data.describe()

Unnamed: 0,Founded,Amount
count,842.0,801.0
mean,2015.36342,113043000.0
std,4.097909,2476635000.0
min,1973.0,12700.0
25%,2014.0,1000000.0
50%,2016.0,3000000.0
75%,2018.0,11000000.0
max,2020.0,70000000000.0


table 2

In [20]:
query = '''SELECT * FROM dbo.LP1_startup_funding2021'''

data = pd.read_sql(query, connection)

In [25]:
data.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [22]:
data.dtypes

Company_Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What_it_does      object
Founders          object
Investor          object
Amount            object
Stage             object
dtype: object

In [23]:
## finding missing values
data.isna().sum

<bound method DataFrame.sum of       Company_Brand  Founded  HeadQuarter  Sector  What_it_does  Founders  \
0             False    False        False   False         False     False   
1             False    False        False   False         False     False   
2             False    False        False   False         False     False   
3             False    False        False   False         False     False   
4             False    False        False   False         False     False   
...             ...      ...          ...     ...           ...       ...   
1204          False    False        False   False         False     False   
1205          False    False        False   False         False     False   
1206          False    False        False   False         False     False   
1207          False    False        False   False         False     False   
1208          False    False        False   False         False     False   

      Investor  Amount  Stage  
0        Fal

In [24]:
print(data.isna().sum())

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
dtype: int64


In [26]:
##duplicates
data.duplicated().sum()

19

In [30]:
data.describe()

Unnamed: 0,Founded
count,1208.0
mean,2016.655629
std,4.517364
min,1963.0
25%,2015.0
50%,2018.0
75%,2020.0
max,2021.0


In [31]:
data.shape

(1209, 9)