## Business Understanding 

Start-up funding plays a crucial role providing essential capital to nurture new ventures that drive economic growth and technological advancement.Indian startups ecosystem span across various sectors and domains, such as e-commerce, fintech, edtech, healthtech, and agritech. This project aims to equip the team with the knowledge and strategic insights on identifying the most promising sectors, cities, funding trends, and key players necessary to make informed decisions and successfully engage with the dynamic and rapidly evolving Indian start-up landscape.

### **Data understanding**
 
The datasets contains information about startup funding from 2018 to 2021. It includes various attributes such as the company’s name, sector, funding amount, stage, investor details, and location.
 
The key attributes in the dataset include:
 
**Company/Brand**: Name of the company/start-up
 
**Founded**: Year start-up was founded
 
**Sector**: Sector of service
 
**What it does**: Description about Company
 
**Founders**: Founders of the Company
 
**Investor**: Investors
 
**Amount($)**: Raised fund
 
**Stage**: Round of funding reached
 
**Headquarters**: Location of   the start-up_company

## Hypothesis
 
Null Hypothesis (H0): Funding to start-ups is centralized around specific locations and sectors.
 
Alternative Hypothesis (H1): Funding to start-ups is spread across different locations and sectors.
 
 
 
## RESEARCH QUESTIONS
 
1.How has funding to startups changed over the period of time?
 
2.What is the average amount of funding for start-ups in?
 
3.Which headquarter is the most preferred startup location?
 
4.Which sectors are most favoured by investors?
 
5.What are the most common funding stages among indian startups?

## Install Packages

In [3]:

pip install pyodbc


Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install python-dotenv 




## Import Packages Needed for This Analysis 

In [173]:
import pyodbc
#import the dotenv_values function from the dotenv package  
from dotenv import dotenv_values    
import pandas as pd
import warnings 
import numpy as np
# Ignore warning due to depreciating packages
warnings.filterwarnings('ignore')

## Loading .env file and creating a connectiong to the database

In [6]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("server")
database = environment_variables.get("database")
username = environment_variables.get("Login")
password = environment_variables.get("password")

In [7]:
# Create a connection string
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

In [8]:
# Use the pyodbc library to pass in the connection string.

connection = pyodbc.connect(connection_string)

## Accessing 2021 data from the database

In [9]:
query = "SELECT * FROM dbo.LP1_startup_funding2021"

data_2021 = pd.read_sql(query, connection)

data_2021

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed
...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B


## Accessing 2020 data from the database

In [10]:
query = "SELECT * FROM dbo.LP1_startup_funding2020"

data_2020 = pd.read_sql(query, connection)

data_2020

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,
...,...,...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,AI enabled marketplace that provides career gu...,Akshay Chaturvedi,"DSG Consumer Partners, Blume Ventures",1500000.0,,
1051,EpiFi,,,Fintech,It offers customers with a single interface fo...,"Sujith Narayanan, Sumit Gwalani","Sequoia India, Ribbit Capital",13200000.0,Seed Round,
1052,Purplle,2012.0,Mumbai,Cosmetics,Online makeup and beauty products retailer,"Manish Taneja, Rahul Dash",Verlinvest,8000000.0,,
1053,Shuttl,2015.0,Delhi,Transport,App based bus aggregator serice,"Amit Singh, Deepanshu Malviya",SIG Global India Fund LLP.,8043000.0,Series C,


## Accessing a csv file containing 2019 data from the root directory of this project

In [11]:

data_2019 = pd.read_csv("startup_funding2019.csv")

data_2019

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",


## Accessing a csv file containing 2018 data from the root directory of this project

In [12]:
data_2018 = pd.read_csv("startup_funding2018.csv")

data_2018

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


# **Data preparation** 

## Inspecting all four datasets

In [13]:
data_2021.shape

(1209, 9)

In [14]:
data_2020.shape

(1055, 10)

In [15]:
data_2019.shape

(89, 9)

In [16]:
data_2018.shape

(526, 6)

## Checking the datatype the number of columns of the four datasets using the **.info()** method

In [17]:
data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [18]:
data_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.5+ KB


In [19]:
data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [20]:
data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


## Checking for null values using the **.isnull().sum()**

In [21]:
data_2021.isnull().sum()

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
dtype: int64

In [22]:
data_2020.isnull().sum()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

In [23]:
data_2019.isnull().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64

In [24]:
data_2018.isnull().sum()

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64

## After careful analysis **"column10"** is dropped because it contains no values

In [25]:
data_2020_1 = data_2020.drop(["column10"],axis=1,inplace=True)

In [26]:
data_2020.shape

(1055, 9)

## Adding an additional column called **"Funding Year"** to show the year the companny got funds

In [27]:
data_2021["Funding_Year"] = 2021

data_2021.head(15)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A,2021
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",,2021
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D,2021
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C,2021
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed,2021
5,Urban Company,2014.0,New Delhi,Home services,Urban Company (Formerly UrbanClap) is a home a...,"Abhiraj Singh Bhal, Raghav Chandra, Varun Khaitan",Vy Capital,"$188,000,000",,2021
6,Comofi Medtech,2018.0,Bangalore,HealthTech,Comofi Medtech is a healthcare robotics startup.,Gururaj KB,"CIIE.CO, KIIT-TBI","$200,000",,2021
7,Qube Health,2016.0,Mumbai,HealthTech,India's Most Respected Workplace Healthcare Ma...,Gagan Kapur,Inflection Point Ventures,Undisclosed,Pre-series A,2021
8,Vitra.ai,2020.0,Bangalore,Tech Startup,Vitra.ai is an AI-based video translation plat...,Akash Nidhi PS,Inflexor Ventures,Undisclosed,,2021
9,Taikee,2010.0,Mumbai,E-commerce,"Taikee is the ISO-certified, B2B e-commerce pl...","Nidhi Ramachandran, Sachin Chhabra",,"$1,000,000",,2021


In [28]:
data_2020["Funding_Year"] = 2020

data_2020.head(15)

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,Funding_Year
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,2020
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,2020
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,2020
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,2020
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,2020
5,qZense,2019.0,Bangalore,AgriTech,qZense Labs is building the next-generation Io...,"Rubal Chib, Dr Srishti Batra","Venture Catalysts, 9Unicorns Accelerator Fund",600000.0,Seed,2020
6,MyClassboard,2008.0,Hyderabad,EdTech,MyClassboard is a full-fledged School / Colleg...,Ajay Sakhamuri,ICICI Bank.,600000.0,Pre-series A,2020
7,Metvy,2018.0,Gurgaon,Networking platform,AI driven networking platform for individuals ...,Shawrya Mehrotra,HostelFund,,Pre-series,2020
8,Rupeek,2015.0,Bangalore,FinTech,Rupeek is an online lending platform that spec...,"Amar Prabhu, Ashwin Soni, Sumit Maniyar","KB Investment, Bertelsmann India Investments",45000000.0,Series C,2020
9,Gig India,2017.0,Pune,Crowdsourcing,GigIndia is a marketplace that provides on-dem...,"Aditya Shirole, Sahil Sharma","Shantanu Deshpande, Subramaniam Ramadorai",1000000.0,Pre-series A,2020


In [29]:
data_2019["Funding_Year"] = 2019

data_2019.head(15)

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Funding_Year
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",,2019
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C,2019
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding,2019
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D,2019
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",,2019
5,FlytBase,,Pune,Technology,A drone automation platform,Nitin Gupta,Undisclosed,Undisclosed,,2019
6,Finly,,Bangalore,SaaS,It builds software products that makes work si...,"Vivek AG, Veekshith C Rai","Social Capital, AngelList India, Gemba Capital...",Undisclosed,,2019
7,Kratikal,2013.0,Noida,Technology,It is a product-based cybersecurity solutions ...,"Pavan Kushwaha, Paratosh Bansal, Dip Jung Thapa","Gilda VC, Art Venture, Rajeev Chitrabhanu.","$1,000,000",Pre series A,2019
8,Quantiphi,,,AI & Tech,It is an AI and big data services company prov...,Renuka Ramnath,Multiples Alternate Asset Management,"$20,000,000",Series A,2019
9,Lenskart,2010.0,Delhi,E-commerce,It is a eyewear company,"Peyush Bansal, Amit Chaudhary, Sumeet Kapahi",SoftBank,"$275,000,000",Series G,2019


In [30]:
data_2018["Funding_Year"] = 2018

data_2018.head(15)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Funding_Year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,1600000,"Bengaluru, Karnataka, India",Hasura is a platform that allows developers to...,2018
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...,2018
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...,2018
8,Freightwalla,"Information Services, Information Technology",Seed,—,"Mumbai, Maharashtra, India",Freightwalla is an international forwarder tha...,2018
9,Microchip Payments,Mobile Payments,Seed,—,"Bangalore, Karnataka, India",Microchip payments is a mobile-based payment a...,2018


## Renaming columns to facilitate merging 

In [31]:
data_2021.rename(columns={"Company_Brand" : "CompanyName"}, inplace = True)

data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   1209 non-null   object 
 1   Founded       1208 non-null   float64
 2   HeadQuarter   1208 non-null   object 
 3   Sector        1209 non-null   object 
 4   What_it_does  1209 non-null   object 
 5   Founders      1205 non-null   object 
 6   Investor      1147 non-null   object 
 7   Amount        1206 non-null   object 
 8   Stage         781 non-null    object 
 9   Funding_Year  1209 non-null   int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 94.6+ KB


In [32]:
data_2020.rename(columns={"Company_Brand" : "CompanyName"}, inplace = True)

data_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   1055 non-null   object 
 1   Founded       842 non-null    float64
 2   HeadQuarter   961 non-null    object 
 3   Sector        1042 non-null   object 
 4   What_it_does  1055 non-null   object 
 5   Founders      1043 non-null   object 
 6   Investor      1017 non-null   object 
 7   Amount        801 non-null    float64
 8   Stage         591 non-null    object 
 9   Funding_Year  1055 non-null   int64  
dtypes: float64(2), int64(1), object(7)
memory usage: 82.5+ KB


In [33]:
data_2019.rename(columns={"Company/Brand" : "CompanyName"}, inplace = True)

data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   89 non-null     object 
 1   Founded       60 non-null     float64
 2   HeadQuarter   70 non-null     object 
 3   Sector        84 non-null     object 
 4   What it does  89 non-null     object 
 5   Founders      86 non-null     object 
 6   Investor      89 non-null     object 
 7   Amount($)     89 non-null     object 
 8   Stage         43 non-null     object 
 9   Funding_Year  89 non-null     int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 7.1+ KB


In [34]:
data_2018.rename(columns={"Company Name" : "CompanyName"}, inplace=True)

data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   CompanyName    526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_Year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [35]:
## close the database connection 

connection.close()

In [36]:
data_2021.rename(columns={"Amount" : "Fund_Amount"}, inplace=True)

data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   1209 non-null   object 
 1   Founded       1208 non-null   float64
 2   HeadQuarter   1208 non-null   object 
 3   Sector        1209 non-null   object 
 4   What_it_does  1209 non-null   object 
 5   Founders      1205 non-null   object 
 6   Investor      1147 non-null   object 
 7   Fund_Amount   1206 non-null   object 
 8   Stage         781 non-null    object 
 9   Funding_Year  1209 non-null   int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 94.6+ KB


In [37]:
data_2020.rename(columns={"Amount" : "Fund_Amount"}, inplace=True)

data_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   1055 non-null   object 
 1   Founded       842 non-null    float64
 2   HeadQuarter   961 non-null    object 
 3   Sector        1042 non-null   object 
 4   What_it_does  1055 non-null   object 
 5   Founders      1043 non-null   object 
 6   Investor      1017 non-null   object 
 7   Fund_Amount   801 non-null    float64
 8   Stage         591 non-null    object 
 9   Funding_Year  1055 non-null   int64  
dtypes: float64(2), int64(1), object(7)
memory usage: 82.5+ KB


In [38]:
data_2019.rename(columns={"Amount($)" : "Fund_Amount"}, inplace=True)

data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   89 non-null     object 
 1   Founded       60 non-null     float64
 2   HeadQuarter   70 non-null     object 
 3   Sector        84 non-null     object 
 4   What it does  89 non-null     object 
 5   Founders      86 non-null     object 
 6   Investor      89 non-null     object 
 7   Fund_Amount   89 non-null     object 
 8   Stage         43 non-null     object 
 9   Funding_Year  89 non-null     int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 7.1+ KB


In [39]:
data_2018.rename(columns={"Amount" : "Fund_Amount"}, inplace=True)

data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   CompanyName    526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Fund_Amount    526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_Year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [40]:
data_2018.rename(columns={"Industry" : "Sector"}, inplace=True)

data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   CompanyName    526 non-null    object
 1   Sector         526 non-null    object
 2   Round/Series   526 non-null    object
 3   Fund_Amount    526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_Year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [41]:
data_2018.rename(columns={"Round/Series" : "Stage"}, inplace=True)

data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   CompanyName    526 non-null    object
 1   Sector         526 non-null    object
 2   Stage          526 non-null    object
 3   Fund_Amount    526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_Year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [42]:
data_2018.rename(columns={"Location" : "HeadQuarter"}, inplace=True)

data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   CompanyName    526 non-null    object
 1   Sector         526 non-null    object
 2   Stage          526 non-null    object
 3   Fund_Amount    526 non-null    object
 4   HeadQuarter    526 non-null    object
 5   About Company  526 non-null    object
 6   Funding_Year   526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


In [43]:
data_2019.rename(columns={"What it does" : "What_it_does"}, inplace=True)

data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   89 non-null     object 
 1   Founded       60 non-null     float64
 2   HeadQuarter   70 non-null     object 
 3   Sector        84 non-null     object 
 4   What_it_does  89 non-null     object 
 5   Founders      86 non-null     object 
 6   Investor      89 non-null     object 
 7   Fund_Amount   89 non-null     object 
 8   Stage         43 non-null     object 
 9   Funding_Year  89 non-null     int64  
dtypes: float64(1), int64(1), object(8)
memory usage: 7.1+ KB


In [44]:
data_2018.rename(columns={"About Company" : "What_it_does"}, inplace=True)

data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   CompanyName   526 non-null    object
 1   Sector        526 non-null    object
 2   Stage         526 non-null    object
 3   Fund_Amount   526 non-null    object
 4   HeadQuarter   526 non-null    object
 5   What_it_does  526 non-null    object
 6   Funding_Year  526 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 28.9+ KB


## Droping Columns to facilitate Merging 

In [45]:
columns_to_drop = ["Founded", "Founders", "Investor"]

data_2019 = data_2019.drop(columns=columns_to_drop)

data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   CompanyName   89 non-null     object
 1   HeadQuarter   70 non-null     object
 2   Sector        84 non-null     object
 3   What_it_does  89 non-null     object
 4   Fund_Amount   89 non-null     object
 5   Stage         43 non-null     object
 6   Funding_Year  89 non-null     int64 
dtypes: int64(1), object(6)
memory usage: 5.0+ KB


In [46]:
columns_to_drop = ["Founded", "Founders", "Investor"]

data_2020 = data_2020.drop(columns=columns_to_drop)

data_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   CompanyName   1055 non-null   object 
 1   HeadQuarter   961 non-null    object 
 2   Sector        1042 non-null   object 
 3   What_it_does  1055 non-null   object 
 4   Fund_Amount   801 non-null    float64
 5   Stage         591 non-null    object 
 6   Funding_Year  1055 non-null   int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 57.8+ KB


In [47]:
columns_to_drop = ["Founded", "Founders", "Investor"]

data_2021 = data_2021.drop(columns=columns_to_drop)

data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   CompanyName   1209 non-null   object
 1   HeadQuarter   1208 non-null   object
 2   Sector        1209 non-null   object
 3   What_it_does  1209 non-null   object
 4   Fund_Amount   1206 non-null   object
 5   Stage         781 non-null    object
 6   Funding_Year  1209 non-null   int64 
dtypes: int64(1), object(6)
memory usage: 66.2+ KB


## Merge all datasets to one

In [48]:
joined_data = pd.concat([data_2018, data_2019, data_2020, data_2021], ignore_index=True)

joined_data.info

<bound method DataFrame.info of           CompanyName                                             Sector  \
0     TheCollegeFever  Brand Marketing, Event Promotion, Marketing, S...   
1     Happy Cow Dairy                               Agriculture, Farming   
2          MyLoanCare   Credit, Financial Services, Lending, Marketplace   
3         PayMe India                        Financial Services, FinTech   
4            Eunimart                 E-Commerce Platforms, Retail, SaaS   
...               ...                                                ...   
2874         Gigforce                              Staffing & Recruiting   
2875           Vahdam                                   Food & Beverages   
2876     Leap Finance                                 Financial Services   
2877     CollegeDekho                                             EdTech   
2878           WeRize                                 Financial Services   

             Stage  Fund_Amount                       H

In [49]:
joined_data.to_csv(r"C:\Users\Safowaa\Documents\Azibiafrica\AzubiPython\Indian_start-up_ecosystem\joined_data.csv", index=False)


In [121]:
combined21_18 = pd.read_csv("joined_data.csv")

combined21_18.head(20)

Unnamed: 0,CompanyName,Sector,Stage,Fund_Amount,HeadQuarter,What_it_does,Funding_Year
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,1600000,"Bengaluru, Karnataka, India",Hasura is a platform that allows developers to...,2018
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...,2018
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...,2018
8,Freightwalla,"Information Services, Information Technology",Seed,—,"Mumbai, Maharashtra, India",Freightwalla is an international forwarder tha...,2018
9,Microchip Payments,Mobile Payments,Seed,—,"Bangalore, Karnataka, India",Microchip payments is a mobile-based payment a...,2018


In [122]:
combined21_18.sample(n=20)

Unnamed: 0,CompanyName,Sector,Stage,Fund_Amount,HeadQuarter,What_it_does,Funding_Year
916,Equiwatt,Tech company,,300000.0,"Newcastle Upon Tyne, Newcastle upon Tyne, Unit...",Equiwatt is a digital platform enabling househ...,2020
69,Enakshi,Fashion,Seed,"₹8,000,000","Ahmedabad, Gujarat, India",Enakshi is an online apparel brand for women.,2018
1430,SaScan,Healthcare,,,Bangalore,Cancer diagnostics startup,2020
1254,LetsTransport,Logitech,,1313500.0,Bangalore,Offers techno logistics solutions,2020
70,Ultraviolette Automotive,"Automotive, Electric Vehicle, Energy Storage",Series A,"₹60,000","Bangalore, Karnataka, India",Ultraviolette is a startup working on electric...,2018
1417,RaRa Delivery,Logistics,Seed Round,800000.0,,Provides same day delivery service for ecommer...,2020
935,Mobile Premier League,Entertainment,,74000000.0,Bangalore,Mobile Premier League(MPL) is a skill based E-...,2020
281,Inntot Technologies,Consumer Electronics,Seed,—,"Kochi, Kerala, India","Inntot Technologies, is a technology driven co...",2018
742,SignalX,AI startup,Seed,800000.0,Hyderabad,SignalX offers an artificial intelligence-powe...,2020
106,Vivriti Capital,Financial Services,Venture - Series Unknown,"$28,500,000","Chennai, Tamil Nadu, India",Vivriti Capital is an online platform for inst...,2018


## Checking for columns with empty and / null rows

In [123]:
combined21_18.isna().any()

CompanyName     False
Sector           True
Stage            True
Fund_Amount      True
HeadQuarter      True
What_it_does    False
Funding_Year    False
dtype: bool

In [124]:
combined21_18.isnull().any()

CompanyName     False
Sector           True
Stage            True
Fund_Amount      True
HeadQuarter      True
What_it_does    False
Funding_Year    False
dtype: bool

## Checking for the number of empty rows in each column

In [125]:
if combined21_18["Sector"].isna().any():
    print(combined21_18["Sector"].isna().sum())
else:
    print("There are no empty rows in the column.")

18


In [126]:
if combined21_18["Stage"].isna().any():
    print(combined21_18["Stage"].isna().sum())
else:
    print("There are no empty rows in the column.")

938


In [127]:
if combined21_18["Fund_Amount"].isna().any():
    print(combined21_18["Fund_Amount"].isna().sum())
else:
    print("There are no empty rows in the column.")


257


In [128]:
if combined21_18["HeadQuarter"].isna().any():
    print(combined21_18["HeadQuarter"].isna().sum())
else:
    print("There are no empty rows in the column.")

114


In [129]:
# Calculate the percentage of missing rows
total_rows = len(combined21_18)
missing_rows = combined21_18.isnull().any(axis=1).sum()
missing_percentage = (missing_rows / total_rows) * 100

print(f"Percentage of missing rows: {missing_percentage:.2f}%")

Percentage of missing rows: 38.59%


### Percentage of missing rows: 38.59%

## Since the percentage of missing data is more than 10% we will fill them with the mode/median of each column

## From our findings above the "Sector" column has 18 empty rows so  we fill it with the most common category

In [130]:
# First we take a look at a sample
combined21_18["Sector"].sample(n=15)

411                         Fashion, Jewelry
513                Beauty, Fashion, Wellness
796                              QSR startup
1251                         Yoga & wellness
38      E-Commerce, Fashion, Jewelry, Retail
1952                   Industrial Automation
1083                              E-commerce
1037                                  EdTech
908                                   Gaming
939                                  FinTech
1999                                 FinTech
99            Health Care, Hospital, Medical
1746                                 FinTech
2874                   Staffing & Recruiting
1831                      Financial Services
Name: Sector, dtype: object

In [131]:
# Convert all values in the "Sector" column to strings and handle NaN's and empty strings

combined21_18["Sector"] = combined21_18["Sector"].fillna("Unknown").replace({"": "Unknown", "—": "Unknown"})

In [132]:
# Split the strings into lists
combined21_18["Sector"] = combined21_18["Sector"].str.split(", ")

In [133]:
# Flatten the lists into a single list
split_sector = combined21_18["Sector"].explode()

In [134]:
# Determine the most common category
most_common_sector = split_sector.value_counts().idxmax()

In [135]:
# Fill missing or empty rows with the most common category
combined21_18["Sector"].fillna(most_common_sector, inplace=True)
combined21_18["Sector"] = combined21_18["Sector"].replace("", most_common_sector)
combined21_18["Sector"] = combined21_18["Sector"].apply(lambda x: most_common_sector if isinstance(x, str) and ',' not in x else x)

In [151]:
# Test to see if there are any missing values
combined21_18["Sector"].isna().any()
combined21_18["Sector"].isnull().sum()

False

In [153]:
# Run samples to check the data
combined21_18["Sector"].sample(n=25)

965                                     [Entertainment]
1432                               [Health and Fitness]
2263                                        [Logistics]
2446                                      [Hospitality]
2579                 [Professional Training & Coaching]
1043                                             [Tech]
2421                                      [Health care]
1046                                      [Home Design]
863                                      [Tech Startup]
2789                                       [E-learning]
1511                                         [Foodtech]
1097                             [Linguistic Spiritual]
2699                                          [Finance]
1225                                          [Fintech]
528                                            [Edtech]
68      [Food and Beverage, Food Processing, Nutrition]
326                                           [Unknown]
247                                           [U

## From our findings earlier the "Stage" column has 938 empty rows so  we fill it with "Unknown"

In [155]:
combined21_18["Stage"] = combined21_18["Stage"].fillna("Unknown").replace("", "Unknown")

In [156]:
combined21_18["Stage"].isna().any()

False

In [157]:
# Run samples to check the data
combined21_18["Stage"].sample(n=25)

1067        Series C
43              Seed
252             Seed
698             Seed
1997            Seed
2742    Pre-series A
1355         Unknown
1375         Unknown
1017        Series C
1123    Pre Series A
2584            Seed
2492            Seed
723          Unknown
2115         Unknown
1241         Unknown
69              Seed
2832            Seed
1185         Unknown
521         Series C
2761         Unknown
2636            Seed
2853        Series A
2782        Pre-seed
1527         Unknown
691             Seed
Name: Stage, dtype: object

## From our findings earlier the "Fund_Amount" column has 257 empty rows so  we fill it with  its Median

#### But first we change the currency to our assumed currency which is dollar and remove the currency symbols. In our conversion we make use of the average exchange rate of each year. **source: [Google](https://www.exchangerates.org.uk/INR-USD-spot-exchange-rates-history.html)**

In [169]:
# Run samples to check the data
combined21_18["Fund_Amount"].sample(n=25)

91      ₹1,000,000,000
2714      $Undisclosed
1527               NaN
1033          650000.0
1046         9000000.0
378         $3,300,000
2628          $5000000
2465        $1,000,000
1342               NaN
287                  —
743           300000.0
1013        30000000.0
1734       Undisclosed
242       ₹103,000,000
101     $1,000,000,000
542       $540,000,000
597         $1,500,000
1266               NaN
2196          $5000000
794          1000000.0
1677       Undisclosed
1838         $28000000
2875         $20000000
1053          500000.0
31             1000000
Name: Fund_Amount, dtype: object

#### Replace Non-numeric Values with NaN

In [174]:
# Replace 'Undisclosed', '$Undisclosed', '—', '-', and other non-numeric values with NaN

combined21_18['Fund_Amount'] = combined21_18['Fund_Amount'].replace(['Undisclosed', '$Undisclosed', '—', '-'], np.nan)

#### Define Exchange Rates

In [176]:
# Exchange rates from INR to USD for each year
exchange_rates = {
    2018: 0.0146,
    2019: 0.0142,
    2020: 0.0135,
    2021: 0.0135
}


#### Define the Conversion Function

In [178]:
# Function to convert string amounts to float and handle currency symbols
def convert_currency(row):
    amount = row['Fund_Amount']
    year = row['Funding_Year']
    
    if pd.isna(amount):
        return np.nan
    amount = str(amount)
    
    if '₹' in amount:
        # Remove commas and ₹ symbol
        cleaned_amount = amount.replace('₹', '').replace(',', '')
        try:
            return float(cleaned_amount) * exchange_rates[year]
        except ValueError:
            return np.nan
    elif '$' in amount:
        # Remove commas and $ symbol
        cleaned_amount = amount.replace('$', '').replace(',', '')
        try:
            return float(cleaned_amount)
        except ValueError:
            return np.nan
    else:
        # Remove commas and convert to float if possible
        cleaned_amount = amount.replace(',', '')
        try:
            return float(cleaned_amount)
        except ValueError:
            return np.nan


#### Apply the conversion function to the column

In [186]:
combined21_18["Fund_Amount"] = combined21_18.apply(convert_currency, axis=1)

In [189]:
combined21_18.sample(n=30)

Unnamed: 0,CompanyName,Sector,Stage,Fund_Amount,HeadQuarter,What_it_does,Funding_Year
1671,upGrad,[EdTech],Unknown,120000000.0,Mumbai,UpGrad is an online higher education platform.,2021
1383,DailyHunt,[Media],Unknown,24000000.0,,News aggregator,2020
2490,LegalPay,[FinTech],Seed,,New Delhi,A trusted financial partner for advocates and ...,2021
234,HealthPlix,"[Fitness, Health Care, Wellness]",Series A,3000000.0,"Bangalore, Karnataka, India",HealthPlix is a healthtech startup,2018
2616,Smartstaff,[Recruitment],Unknown,4300000.0,Bangalore,Smartstaff (previously Qikwork) is a full stac...,2021
1365,AsknBid,[Investment Tech],Unknown,,Bangalore,It builds algorithmic investing-based tech pro...,2020
897,Genrobotics,[AI startup],Unknown,300000.0,"Trivandrum, Kerala, India",GenRobotic specializes in powered robotic exos...,2020
1187,Crio,[Edtech],Unknown,934000.0,Bangalore,A learning platform for developers,2020
949,Dream11,[Entertainment],Unknown,225000000.0,Mumbai,Dream11 is India’s Biggest Sports Game with 30...,2020
1621,Capital Quotient,[Fintech],Unknown,600000.0,Bangalore,Investment advisor,2020


#### Now we fill all NaN with the median 

In [192]:
# Calculate the median of the 'Fund_Amount' column
median_value = combined21_18["Fund_Amount"].median()

In [193]:
median_value

3000000.0

In [195]:
# Fill NaN values in the "Fund_Amount" column with the median
combined21_18["Fund_Amount"].fillna(median_value, inplace=True)

In [200]:
combined21_18["Fund_Amount"].isna().any()

False

In [201]:
combined21_18.sample(n=25)

Unnamed: 0,CompanyName,Sector,Stage,Fund_Amount,HeadQuarter,What_it_does,Funding_Year
827,Jumbotail,[Retail],Series B2,11000000.0,Bangalore,Jumbotail is solving an important problem of o...,2020
123,FeedMyPockets,"[Advertising, Human Resources, Marketing]",Seed,642400.0,"Bangalore City, Karnataka, India",On Demand Staffing Platform,2018
715,Credgenics,[FinTech],Pre-series A,3000000.0,New Delhi,Credgenics is a tech-enabled platform backed b...,2020
2325,saarthi.ai,[AI startup],Seed,3000000.0,Bangalore,Multilingual Conversational Enterprise AI Plat...,2021
985,Hire Me Car,[TravelTech],Seed,3000000.0,Noida,HireMeCar.com offers customers the fastest and...,2020
1271,YoloBus,[Mobility/Transport],Series A,3300000.0,Gurugram,Intercity bus service startup,2020
587,Stanza Living,[Accomodation],Unknown,5700000.0,Delhi,Provides comfortable and secure accomodation f...,2019
993,StayQrious,[EdTech],Seed,2000000.0,Bangalore,Live online coding courses with social learnin...,2020
558,Pumpkart,[E-marketplace],Unknown,3000000.0,Chandigarh,B2B model for appliances and electrical products,2019
2203,Slang Labs,[Computer software],Unknown,3000000.0,Bangalore,Slang Labs provides accurate and multilingual ...,2021
