# INDIAN STARTUP ECOSYSTEM ANALYSIS

BUSINESS UNDERSTANDING


Our team is exploring the Indian startup ecosystem to understand funding trends and identify promising opportunities. The goal is to analyze funding data from 2018 to 2021, focusing on key factors such as the amount of funding received, sectors, stages of investment, and geographic locations



HPOTHESIS TESTING


Null Hypothesis (H0): There is no significant difference in the amount of funding received by startups across different sectors and stages.



Alternate Hypothesis (H1): There is a significant difference in the amount of funding received by startups across different sectors and stages.



DATA UNDERSTANDING


COLUMNS


Company_Brand: Name of the startup.

Founded: Year the startup was founded.

HeadQuarter: City where the startup is headquartered.

Sector: Industry sector of the startup.

What_it_does: Brief description of the startup's business.

Founders: Names of the founders.

Investor: Investors or investment firms that funded the startup.

Amount: Amount of funding received (in dollars).

Stage: Stage of investment (e.g., Pre-seed, Seed, Series A).


ANALYTICAL QUESTIONS


1. Funding Trends:
How has the total funding amount changed year over year from 2018 to 2021?

How has the average funding amount in each sector changed over the years (2018 - 2021)


2. Sector Analysis:
Which sectors have received the most funding, and how does the funding distribution vary across sectors?


3.Stage Analysis:
What is the distribution of funding across different investment stages (e.g., Pre-seed, Seed, Series A)?


4. Geographical Analysis:
Which cities or regions have the highest concentration of funded startups?


5.Investor Influence:
Who are the top investors in the Indian startup ecosystem, and what is their funding pattern?


6. Founder Impact:
Is there a correlation between the number of founders and the amount of funding received?


7. What are the characteristics of startups in the highest-funded sectors (e.g., number of founders, location)?


8. Which Business is more viable to set - The best performing businesses?


In [647]:
%pip install pyodbc  
%pip install python-dotenv 

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [648]:
import pyodbc     
from dotenv import dotenv_values    #import the dotenv_values function from the dotenv package
import pandas as pd
import numpy as np
import warnings 

warnings.filterwarnings('ignore')

In [649]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("server")
database = environment_variables.get("database")
username = environment_variables.get("username")
password = environment_variables.get("password")

In [650]:
# Create a connection string
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"

In [651]:
# This will connect to the server and might take a few seconds to be complete.

connection = pyodbc.connect(connection_string)

# STARTUP FUNDING 2020

In [652]:
# Now the sql query to get the data is what what you see below. 
# Note that you will not have permissions to insert delete or update this database table. 

query = '''SELECT * FROM dbo.LP1_startup_funding2020'''

data = pd.read_sql(query, connection)

# EXPLORING THE DATA

In [653]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


In [654]:
#Knowing the first 10 rows
data.head(10)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,
5,qZense,2019.0,Bangalore,AgriTech,qZense Labs is building the next-generation Io...,"Rubal Chib, Dr Srishti Batra","Venture Catalysts, 9Unicorns Accelerator Fund",600000.0,Seed,
6,MyClassboard,2008.0,Hyderabad,EdTech,MyClassboard is a full-fledged School / Colleg...,Ajay Sakhamuri,ICICI Bank.,600000.0,Pre-series A,
7,Metvy,2018.0,Gurgaon,Networking platform,AI driven networking platform for individuals ...,Shawrya Mehrotra,HostelFund,,Pre-series,
8,Rupeek,2015.0,Bangalore,FinTech,Rupeek is an online lending platform that spec...,"Amar Prabhu, Ashwin Soni, Sumit Maniyar","KB Investment, Bertelsmann India Investments",45000000.0,Series C,
9,Gig India,2017.0,Pune,Crowdsourcing,GigIndia is a marketplace that provides on-dem...,"Aditya Shirole, Sahil Sharma","Shantanu Deshpande, Subramaniam Ramadorai",1000000.0,Pre-series A,


In [655]:
#Summary Statistics
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,842.0,2015.363,4.097909,1973.0,2014.0,2016.0,2018.0,2020.0
Amount,801.0,113043000.0,2476635000.0,12700.0,1000000.0,3000000.0,11000000.0,70000000000.0


In [656]:
# Data types
print(data.dtypes)

Company_Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What_it_does      object
Founders          object
Investor          object
Amount           float64
Stage             object
column10          object
dtype: object


In [657]:
# Unique values in each column
data.nunique()

Company_Brand    905
Founded           26
HeadQuarter       77
Sector           302
What_it_does     990
Founders         927
Investor         848
Amount           300
Stage             42
column10           2
dtype: int64

In [658]:
##Information On Dataset
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.6+ KB


In [659]:
##
data.shape

(1055, 10)

In [660]:
##Duplicates
data.duplicated().sum()

3

In [661]:
## Displaying columns With Missing Values
data.isna().sum()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

# DATA CLEANING

In [662]:
#Dropping Columns
data.drop(['column10'],axis =1,inplace = True)

#Handling missing values
For columns like Sector, Founders, and Investor, we decided to  fill missing values with a placeholder.

##Handling Columns with Many Missing Values:

Columns like Founded, Amount, and Stage have a significant number of missing values. We decided to fill  with median value.

In [663]:

# For columns with a few missing values, fill with 'Unknown'
data['HeadQuarter'].fillna('Unknown', inplace=True)
data['Sector'].fillna('Unknown', inplace=True)
data['Founders'].fillna('Unknown', inplace=True)
data['Investor'].fillna('Unknown', inplace=True)

# For 'Founded', fill missing values with the unknown
data['Founded'].fillna(0, inplace=True)

data['Founded'].fillna('Unknown', inplace=True)

# For 'Amount', fill missing values with the median funding amount
data['Amount'].fillna(data['Amount'].median(), inplace=True)

# For 'Stage', fill missing values with a placeholder
data['Stage'].fillna('Unknown', inplace=True)

In [664]:
#Treating the 'founded' as Year
# Convert 'Founded' to integer type
data['Founded'] = data['Founded'].astype(int)

In [665]:
#Optimizing Memroy Usage
# Convert 'Stage' to categorical type
data['Stage'] = data['Stage'].astype('category')

In [666]:
#Verifying the changes
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   Company_Brand  1055 non-null   object  
 1   Founded        1055 non-null   int32   
 2   HeadQuarter    1055 non-null   object  
 3   Sector         1055 non-null   object  
 4   What_it_does   1055 non-null   object  
 5   Founders       1055 non-null   object  
 6   Investor       1055 non-null   object  
 7   Amount         1055 non-null   float64 
 8   Stage          1055 non-null   category
dtypes: category(1), float64(1), int32(1), object(6)
memory usage: 64.4+ KB


In [667]:
#Identifying Duplicate Rows

duplicates = data[data.duplicated()]
print(duplicates)

    Company_Brand  Founded HeadQuarter                 Sector  \
145     Krimanshi     2015     Jodhpur  Biotechnology company   
205         Nykaa     2012      Mumbai              Cosmetics   
362        Byju’s     2011   Bangalore                 EdTech   

                                          What_it_does         Founders  \
145  Krimanshi aims to increase rural income by imp...     Nikhil Bohra   
205  Nykaa is an online marketplace for different b...    Falguni Nayar   
362  An Indian educational technology and online tu...  Byju Raveendran   

                                           Investor       Amount    Stage  
145  Rajasthan Venture Capital Fund, AIM Smart City     600000.0     Seed  
205                        Alia Bhatt, Katrina Kaif    3000000.0  Unknown  
362           Owl Ventures, Tiger Global Management  500000000.0  Unknown  


In [668]:
# # Remove duplicate rows
data = data.drop_duplicates()


# Verify that duplicates have been removed
print(data.duplicated().sum())

0


# STARTUP FUNDING 2021

In [669]:
query = '''SELECT * FROM dbo.LP1_startup_funding2021'''
data2= pd.read_sql(query, connection)

In [670]:
#Knowing the first 10 rows with columns
data2.head(10)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed
5,Urban Company,2014.0,New Delhi,Home services,Urban Company (Formerly UrbanClap) is a home a...,"Abhiraj Singh Bhal, Raghav Chandra, Varun Khaitan",Vy Capital,"$188,000,000",
6,Comofi Medtech,2018.0,Bangalore,HealthTech,Comofi Medtech is a healthcare robotics startup.,Gururaj KB,"CIIE.CO, KIIT-TBI","$200,000",
7,Qube Health,2016.0,Mumbai,HealthTech,India's Most Respected Workplace Healthcare Ma...,Gagan Kapur,Inflection Point Ventures,Undisclosed,Pre-series A
8,Vitra.ai,2020.0,Bangalore,Tech Startup,Vitra.ai is an AI-based video translation plat...,Akash Nidhi PS,Inflexor Ventures,Undisclosed,
9,Taikee,2010.0,Mumbai,E-commerce,"Taikee is the ISO-certified, B2B e-commerce pl...","Nidhi Ramachandran, Sachin Chhabra",,"$1,000,000",


In [671]:
data2.tail(10)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
1199,Proeon,2018.0,Pune,Food Production,Innovating plant protein ingredients with supe...,"Ashish Korde, Kevin Parekh","Shaival Desai, Flowstate Ventures",$2000000,Seed
1200,InfyU Labs,2019.0,Gandhinagar,AgriTech,InfyU Labs is a team of dedicated professional...,"Amit Srivastava, Ankit Chauhan",IAN,$200000,Seed
1201,TechEagle,2015.0,Gurugram,Aviation & Aerospace,"Safe, secure & reliable On-Demand Drone delive...",Vikram Singh Meena,India Accelerator,$500000,Seed
1202,Voxelgrids,2017.0,Bangalore,Deeptech,Voxelgrids is an Magnetic Resonance Imaging te...,Arjun Arunachalam,Zoho,$5000000,
1203,Cogos Technologies,2016.0,Bangalore,Logistics & Supply Chain,A smart-tech-enabled platform offering a one-s...,Prasad Sreeram,Transworld Group,$2000000,Pre-series A
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B
1208,WeRize,2019.0,Bangalore,Financial Services,India’s first socially distributed full stack ...,"Vishal Chopra, Himanshu Gupta","3one4 Capital, Kalaari Capital",$8000000,Series A


In [672]:
#Summary statistics
data2.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,1208.0,2016.655629,4.517364,1963.0,2015.0,2018.0,2020.0,2021.0


In [673]:
# Data types
print(data2.dtypes)

Company_Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What_it_does      object
Founders          object
Investor          object
Amount            object
Stage             object
dtype: object


In [674]:
# Unique values in each column
data2.nunique()

Company_Brand    1033
Founded            30
HeadQuarter        70
Sector            254
What_it_does     1143
Founders         1095
Investor          937
Amount            278
Stage              31
dtype: int64

In [675]:
#Information on the data
data2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [676]:
data2.shape

(1209, 9)

In [677]:
##Duplicates
data2.duplicated().sum()

19

In [678]:
## Displaying columns With Missing Values
data2.isna().sum()

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
dtype: int64

#CLEANING THE DATA

In [679]:
# Handle missing values
# Filling The Year Founded with 0 and converting from  object to Integer
data2['Founded'] = data2['Founded'].fillna(0).astype(int)

##Converting Headquarter, Founders and Investor with 'Unknown'
data2['HeadQuarter'].fillna('Unknown', inplace=True)
data2['Founders'].fillna('Unknown', inplace = True)
data2['Investor'].fillna('Unknown' ,inplace  = True)



In [680]:
##Converting Amount to float and filling missing values with 0
#data2['Amount'].data2['Amount'].fillna(0)
#data2['Amount'] = data2['Amount'].fillna(0)
#data2['Amount'] = data['Amount'].astype(float)
# Step 1: Convert all values in the 'Amount' column to strings
data2['Amount'] = data2['Amount'].astype(str)

# Step 2: Remove any non-numeric characters such as '$' and ','
data2['Amount'] = data2['Amount'].str.replace('[\$,]', '', regex=True)

# Step 3: Replace 'undisclosed' with NaN
data2['Amount'].replace('undisclosed', np.nan, inplace=True)

# Step 4: Convert the column to numeric (float), ignoring errors
data2['Amount'] = pd.to_numeric(data2['Amount'], errors='coerce')

# Step 5: Fill NaN values with 0
data2['Amount'].fillna(0, inplace=True)


In [681]:
#Optimizing Memroy Usage
# Convert 'Stage' to categorical type
#data2['Stage'] = data2['Stage'].astype('category')
# For 'Stage', fill missing values with a placeholder
#data2['Stage'].fillna('Unknown', inplace=True)

# Step 1: Convert 'Stage' to categorical type
data2['Stage'] = data2['Stage'].astype('category')

# Step 2: Add 'Unknown' to the categories
data2['Stage'] = data2['Stage'].cat.add_categories('Unknown')

# Step 3: Fill missing values with 'Unknown'
data2['Stage'].fillna('Unknown', inplace=True)

print(data2)


       Company_Brand  Founded HeadQuarter                 Sector  \
0     Unbox Robotics     2019   Bangalore             AI startup   
1             upGrad     2015      Mumbai                 EdTech   
2        Lead School     2012      Mumbai                 EdTech   
3            Bizongo     2015      Mumbai         B2B E-commerce   
4           FypMoney     2021    Gurugram                FinTech   
...              ...      ...         ...                    ...   
1204        Gigforce     2019    Gurugram  Staffing & Recruiting   
1205          Vahdam     2015   New Delhi       Food & Beverages   
1206    Leap Finance     2019   Bangalore     Financial Services   
1207    CollegeDekho     2015    Gurugram                 EdTech   
1208          WeRize     2019   Bangalore     Financial Services   

                                           What_it_does  \
0     Unbox Robotics builds on-demand AI-driven ware...   
1        UpGrad is an online higher education platform.   
2     

In [682]:
# Verify missing values are handled
print(data2.isnull().sum())

Company_Brand    0
Founded          0
HeadQuarter      0
Sector           0
What_it_does     0
Founders         0
Investor         0
Amount           0
Stage            0
dtype: int64


In [683]:
#Identifying Duplicate Rows

duplicates = data2[data2.duplicated()]
print(duplicates)

          Company_Brand  Founded             HeadQuarter  \
107           Curefoods     2020               Bangalore   
109            Bewakoof     2012                  Mumbai   
111             FanPlay     2020          Computer Games   
117      Advantage Club     2014                  Mumbai   
119              Ruptok     2020               New Delhi   
243            Trinkerr     2021               Bangalore   
244               Zorro     2021                Gurugram   
245       Ultraviolette     2021               Bangalore   
246          NephroPlus     2009               Hyderabad   
247             Unremot     2020               Bangalore   
248         FanAnywhere     2021               Bangalore   
249          PingoLearn     2021                    Pune   
250                Spry     2021                  Mumbai   
251             Enmovil     2015               Hyderabad   
252       ASQI Advisors     2019                  Mumbai   
253  Insurance Samadhan     2018        

In [684]:
# # Remove duplicate rows
data2 = data2.drop_duplicates()


# Verify that duplicates have been removed
print(data2.duplicated().sum())

0


In [685]:
data2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1190 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   Company_Brand  1190 non-null   object  
 1   Founded        1190 non-null   int32   
 2   HeadQuarter    1190 non-null   object  
 3   Sector         1190 non-null   object  
 4   What_it_does   1190 non-null   object  
 5   Founders       1190 non-null   object  
 6   Investor       1190 non-null   object  
 7   Amount         1190 non-null   float64 
 8   Stage          1190 non-null   category
dtypes: category(1), float64(1), int32(1), object(6)
memory usage: 81.5+ KB


# STARTUP FUNDING 2018

In [686]:
data3 = pd.read_csv('C:\\Users\Admin\\Pictures\\Demo Indian Startup\\Demo-indian-start-up\\Notebooks\\startup_funding2018.csv')
data3

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


In [687]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [688]:
data3.describe().T

Unnamed: 0,count,unique,top,freq
Company Name,526,525,TheCollegeFever,2
Industry,526,405,—,30
Round/Series,526,21,Seed,280
Amount,526,198,—,148
Location,526,50,"Bangalore, Karnataka, India",102
About Company,526,524,"TheCollegeFever is a hub for fun, fiesta and f...",2


In [689]:
data3.dtypes

Company Name     object
Industry         object
Round/Series     object
Amount           object
Location         object
About Company    object
dtype: object

In [690]:
data3.nunique()

Company Name     525
Industry         405
Round/Series      21
Amount           198
Location          50
About Company    524
dtype: int64

In [691]:
data3.shape

(526, 6)

In [692]:
data3.duplicated().sum()

1

In [708]:
data3.isna().sum()

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64

#CLEANING



In [694]:
#Converting 'Amount' to Float
data3['Amount'] =pd.to_numeric(data3['Amount'], errors='coerce')
 
# Check to confirm 'Amount Column is float
data3['Amount'] = data3['Amount'].astype(float)
 
print(data3['Amount'].dtype)
 

float64


In [709]:
# Step 1: Convert all values in the 'Amount' column to strings
data3['Amount'] = data3['Amount'].astype(str)

# Step 2: Remove any non-numeric characters such as '$' and ','
data3['Amount'] = data3['Amount'].str.replace('[\₹,]', '', regex=True)

# Step 3: Replace 'undisclosed' with NaN
data3['Amount'].replace('undisclosed', np.nan, inplace=True)

# Step 4: Convert the column to numeric (float), ignoring errors
data3['Amount'] = pd.to_numeric(data3['Amount'], errors='coerce')

# Step 5: Fill NaN values with 0
data3['Amount'].fillna(0, inplace=True)

In [696]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company Name   526 non-null    object 
 1   Industry       526 non-null    object 
 2   Round/Series   526 non-null    object 
 3   Amount         526 non-null    float64
 4   Location       526 non-null    object 
 5   About Company  526 non-null    object 
dtypes: float64(1), object(5)
memory usage: 24.8+ KB


In [697]:
data3 = data3.drop_duplicates()


# Verify that duplicates have been removed
print(data3.duplicated().sum())

0


# STARTUP FUNDING 2019

In [698]:
data4= pd.read_csv('C:\\Users\\Admin\\Pictures\\Demo Indian Startup\\Demo-indian-start-up\\Notebooks\\startup_funding2019.csv')
data4                  

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",


In [699]:
data4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [700]:
data4.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Founded,60.0,2014.533333,2.937003,2004.0,2013.0,2015.0,2016.25,2019.0


In [701]:
data4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [702]:
data4.shape

(89, 9)

In [703]:
print(data4.dtypes)

Company/Brand     object
Founded          float64
HeadQuarter       object
Sector            object
What it does      object
Founders          object
Investor          object
Amount($)         object
Stage             object
dtype: object


In [704]:
data4.duplicated().sum()

0

In [705]:
data4.isnull().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64