## PROJECT NAME: INDIAN START-UP FUNDING DATA ANALYSIS

## PROJECT DESCRIPTION/SCENARIO
This project seeks to gain insight into the fundings received by start-ups companies in India between 2018 and 2021. And advice a team trying to venture into the Indian start-up ecosystem, by proposing the best course of action. This would be done by developing a unique story from this dataset, stating and testing a hypothesis, asking questions, perform analysis and share insights with relevant visualisations.

## HYPOTHESIS
Null Hypothesis: There is no significant difference in the funding patterns of Indian startups in the last few years.

Alternate Hypothesis: There is a significant difference in the funding patterns of Indian startups in the last few years.


## BUSINESS QUESTIONS
1. What Industry/sector received the most funding from investors?
2. What are the top five (5) cities with the most start-ups?
3. What are the top (10) start-ups and their major investors?
4. Who are the leading or top investors in the Indian start-up ecosystem?
5. What is the trend of funding in the Indian start-up ecosystem?
6. How has the covid-19 pandemic affected funding in the Indian start-up ecosystem?
7. What is the percentage of startups that receive multiple rounds of funding?
8. What is the average amount a start-up is likely to receive as a seed fund?
9. Is investor confidence raising or dwindling in the Indian start-up ecosystem?

In [1]:
# Import libraries

import pandas as pd
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
# Import funding datasets

funding_2018 = pd.read_csv("startup_funding2018.csv")
funding_2019 = pd.read_csv("startup_funding2019.csv")
funding_2020 = pd.read_csv("startup_funding2020.csv")
funding_2021 = pd.read_csv("startup_funding2021.csv")

#### EXPLORING THE 2018 FUNDING DATASET

In [114]:
# Description funds for 2018

funding_2018

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,1600000,"Bengaluru, Karnataka, India",Hasura is a platform that allows developers to...
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...
8,Freightwalla,"Information Services, Information Technology",Seed,—,"Mumbai, Maharashtra, India",Freightwalla is an international forwarder tha...
9,Microchip Payments,Mobile Payments,Seed,—,"Bangalore, Karnataka, India",Microchip payments is a mobile-based payment a...


In [6]:
# Checking the shape of the dataset
funding_2018.shape

(526, 6)

In [8]:
# Checking the info of the 2018 dataset
funding_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [115]:
# Checking the statistical info of the 2018 dataset
funding_2018.describe(include="all").transpose()

Unnamed: 0,count,unique,top,freq
Company Name,526,525,TheCollegeFever,2
Industry,526,405,—,30
Round/Series,526,21,Seed,280
Amount,526,198,—,148
Location,526,50,"Bangalore, Karnataka, India",102
About Company,526,524,"TheCollegeFever is a hub for fun, fiesta and f...",2


In [13]:
# Checking for unavailable values in the dataset
funding_2018.isna().any()

Company Name     False
Industry         False
Round/Series     False
Amount           False
Location         False
About Company    False
dtype: bool

In [9]:
# Checking for null values in the dataset
funding_2018.isnull().any()

Company Name     False
Industry         False
Round/Series     False
Amount           False
Location         False
About Company    False
dtype: bool

In [28]:
# Checking for duplicates in the 2018 funding dataset
funding_2018.duplicated().sum()

1

In [69]:
# Checking for duplicated entry
pd.set_option('display.max_rows', None)
funding_2018[funding_2018.duplicated()]

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
348,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."


In [71]:
# More info on duplicated entry
funding_2018[funding_2018["Company Name"]=="TheCollegeFever"]

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
348,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."


In [14]:
# Checking the Amount column
funding_2018["Amount"]

0           250000
1      ₹40,000,000
2      ₹65,000,000
3          2000000
4                —
          ...     
521      225000000
522              —
523           7500
524    ₹35,000,000
525       35000000
Name: Amount, Length: 526, dtype: object

In [29]:
funding_2018["Company Name"].value_counts()

TheCollegeFever                      2
NIRAMAI Health Analytix              1
Drivezy                              1
Hush - Speak Up. Make Work Better    1
The Souled Store                     1
                                    ..
Qandle                               1
iChamp                               1
Credy                                1
Survaider                            1
Netmeds                              1
Name: Company Name, Length: 525, dtype: int64

In [75]:
# Further checking of selected columns

pd.set_option('display.max_rows', None)
funding_2018[["Company Name", "Industry", "Round/Series", "Location"]]


Unnamed: 0,Company Name,Industry,Round/Series,Location
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,"Bangalore, Karnataka, India"
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"Mumbai, Maharashtra, India"
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"Gurgaon, Haryana, India"
3,PayMe India,"Financial Services, FinTech",Angel,"Noida, Uttar Pradesh, India"
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,"Hyderabad, Andhra Pradesh, India"
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,"Bengaluru, Karnataka, India"
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"Kalkaji, Delhi, India"
7,Hyperdata.IO,Market Research,Angel,"Hyderabad, Andhra Pradesh, India"
8,Freightwalla,"Information Services, Information Technology",Seed,"Mumbai, Maharashtra, India"
9,Microchip Payments,Mobile Payments,Seed,"Bangalore, Karnataka, India"


#### UNIVARIATE ANALYSIS OF 2018 FUNDING DATA

In [117]:
# sns.boxplot(x='Amount', data=funding_2018, color='lightblue')

# # Set the title and axis labels
# plt.title('Distribution of Funding Amounts in 2018')
# plt.xlabel('Amount')
# plt.ylabel('')

# # Display the boxplot
# plt.show()

In [116]:
# plt.boxplot(funding_2018['Amount'],
#             vert=False,  # Plot the boxplot horizontally
#             widths=0.5,  # Set the width of the boxplot
#             notch=True,  # Add a notch to the boxplot
#             patch_artist=True,  # Fill the box with color
#             boxprops=dict(facecolor='lightblue', linewidth=1.5),
#             medianprops=dict(color='red', linewidth=2),
#             whiskerprops=dict(color='black', linewidth=1.5),
#             capprops=dict(color='black', linewidth=1.5),
#             flierprops=dict(marker='o', markersize=5, markerfacecolor='black'))

### EXPLORING THE 2019 FUNDING DATASET


In [9]:
funding_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [10]:
funding_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [77]:
# Cecking the summary description of 2019 funding data
funding_2019.describe(include="all").transpose()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Company/Brand,89.0,87.0,Kratikal,2.0,,,,,,,
Founded,60.0,,,,2014.533333,2.937003,2004.0,2013.0,2015.0,2016.25,2019.0
HeadQuarter,70.0,17.0,Bangalore,21.0,,,,,,,
Sector,84.0,52.0,Edtech,7.0,,,,,,,
What it does,89.0,88.0,Online meat shop,2.0,,,,,,,
Founders,86.0,85.0,"Vivek Gupta, Abhay Hanjura",2.0,,,,,,,
Investor,89.0,86.0,Undisclosed,3.0,,,,,,,
Amount($),89.0,50.0,Undisclosed,12.0,,,,,,,
Stage,43.0,15.0,Series A,10.0,,,,,,,


In [13]:
funding_2019.isna().any()

Company/Brand    False
Founded           True
HeadQuarter       True
Sector            True
What it does     False
Founders          True
Investor         False
Amount($)        False
Stage             True
dtype: bool

In [14]:
funding_2019.isna().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64

In [15]:
funding_2019.duplicated().any()

False

In [27]:
pd.set_option('display.max_rows', None)
funding_2019[["Company/Brand", "Sector", "Investor", "Stage"]]

Unnamed: 0,Company/Brand,Sector,Investor,Stage
0,Bombay Shaving,Ecommerce,Sixth Sense Ventures,
1,Ruangguru,Edtech,General Atlantic,Series C
2,Eduisfun,Edtech,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey",Fresh funding
3,HomeLane,Interior design,"Evolvence India Fund (EIF), Pidilite Group, FJ...",Series D
4,Nu Genes,AgriTech,Innovation in Food and Agriculture (IFA),
5,FlytBase,Technology,Undisclosed,
6,Finly,SaaS,"Social Capital, AngelList India, Gemba Capital...",
7,Kratikal,Technology,"Gilda VC, Art Venture, Rajeev Chitrabhanu.",Pre series A
8,Quantiphi,AI & Tech,Multiples Alternate Asset Management,Series A
9,Lenskart,E-commerce,SoftBank,Series G


### EXPLORING 2020 FUNDING DATA

In [29]:
funding_2020.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
0,Aqgromalin,2019,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,"$200,000",,
1,Krayonnz,2019,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,"$100,000",Pre-seed,
2,PadCare Labs,2018,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,Undisclosed,Pre-seed,
3,NCOME,2020,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital","$400,000",,
4,Gramophone,2016,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge","$340,000",,


In [30]:
funding_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company/Brand  1055 non-null   object
 1   Founded        843 non-null    object
 2   HeadQuarter    961 non-null    object
 3   Sector         1042 non-null   object
 4   What it does   1055 non-null   object
 5   Founders       1043 non-null   object
 6   Investor       1017 non-null   object
 7   Amount($)      1052 non-null   object
 8   Stage          591 non-null    object
 9   Unnamed: 9     2 non-null      object
dtypes: object(10)
memory usage: 82.5+ KB


In [31]:
pd.set_option('display.max_rows', None)
funding_2020["Unnamed: 9"]

0              NaN
1              NaN
2              NaN
3              NaN
4              NaN
5              NaN
6              NaN
7              NaN
8              NaN
9              NaN
10             NaN
11             NaN
12             NaN
13             NaN
14             NaN
15             NaN
16             NaN
17             NaN
18             NaN
19             NaN
20             NaN
21             NaN
22             NaN
23             NaN
24             NaN
25             NaN
26             NaN
27             NaN
28             NaN
29             NaN
30             NaN
31             NaN
32             NaN
33             NaN
34             NaN
35             NaN
36             NaN
37             NaN
38             NaN
39             NaN
40             NaN
41             NaN
42             NaN
43             NaN
44             NaN
45             NaN
46             NaN
47             NaN
48             NaN
49             NaN
50             NaN
51             NaN
52          

In [32]:
pd.set_option('display.max_rows', None)
funding_2020[["Company/Brand", "Sector", "Investor", "Stage", "HeadQuarter"]]

Unnamed: 0,Company/Brand,Sector,Investor,Stage,HeadQuarter
0,Aqgromalin,AgriTech,Angel investors,,Chennai
1,Krayonnz,EdTech,GSF Accelerator,Pre-seed,Bangalore
2,PadCare Labs,Hygiene management,Venture Center,Pre-seed,Pune
3,NCOME,Escrow,"Venture Catalysts, PointOne Capital",,New Delhi
4,Gramophone,AgriTech,"Siana Capital Management, Info Edge",,Indore
5,qZense,AgriTech,"Venture Catalysts, 9Unicorns Accelerator Fund",Seed,Bangalore
6,MyClassboard,EdTech,ICICI Bank.,Pre-series A,Hyderabad
7,Metvy,Networking platform,HostelFund,Pre-series,Gurgaon
8,Rupeek,FinTech,"KB Investment, Bertelsmann India Investments",Series C,Bangalore
9,Gig India,Crowdsourcing,"Shantanu Deshpande, Subramaniam Ramadorai",Pre-series A,Pune


In [34]:
funding_2020.describe().transpose()

Unnamed: 0,count,unique,top,freq
Company/Brand,1055,905,Nykaa,6
Founded,843,27,2015,136
HeadQuarter,961,77,Bangalore,317
Sector,1042,302,Fintech,80
What it does,1055,990,Provides online learning classes,4
Founders,1043,927,Falguni Nayar,6
Investor,1017,848,Venture Catalysts,20
Amount($),1052,309,Undisclosed,243
Stage,591,42,Series A,96
Unnamed: 9,2,2,Pre-Seed,1


In [44]:
# List of suplicated start-ups

pd.set_option('display.max_rows', None)
funding_2020[funding_2020.duplicated()]

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
145,Krimanshi,2015,Jodhpur,Biotechnology company,Krimanshi aims to increase rural income by imp...,Nikhil Bohra,"Rajasthan Venture Capital Fund, AIM Smart City","$600,000",Seed,
205,Nykaa,2012,Mumbai,Cosmetics,Nykaa is an online marketplace for different b...,Falguni Nayar,"Alia Bhatt, Katrina Kaif",Undisclosed,,
362,Byju’s,2011,Bangalore,EdTech,An Indian educational technology and online tu...,Byju Raveendran,"Owl Ventures, Tiger Global Management","$500,000,000",,


In [37]:
funding_2020.duplicated().sum()

3

In [65]:
# Highlighting duplicated value(Krimanshi)
funding_2020[funding_2020["Company/Brand"]=="Krimanshi"]

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
129,Krimanshi,2015.0,Jodhpur,Biotechnology company,Krimanshi aims to increase rural income by imp...,Nikhil Bohra,"Rajasthan Venture Capital Fund, AIM Smart City","$600,000",Seed,
145,Krimanshi,2015.0,Jodhpur,Biotechnology company,Krimanshi aims to increase rural income by imp...,Nikhil Bohra,"Rajasthan Venture Capital Fund, AIM Smart City","$600,000",Seed,
941,Krimanshi,,Jodhpur,Agritech,Sustainable system to feed animals by valorizi...,Nikhil Bohra,Arunachal Pradesh Social Entrepreneurship Meet,"$20,000",,


In [66]:
# Highlighting duplicated value(Nykaa)
funding_2020[funding_2020["Company/Brand"]=="Nykaa"]

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
120,Nykaa,2012,Mumbai,Cosmetics,Nykaa is an online marketplace for different b...,Falguni Nayar,"Alia Bhatt, Katrina Kaif",Undisclosed,,
205,Nykaa,2012,Mumbai,Cosmetics,Nykaa is an online marketplace for different b...,Falguni Nayar,"Alia Bhatt, Katrina Kaif",Undisclosed,,
213,Nykaa,2012,Mumbai,E-commerce,Nykaa is an online marketplace for different b...,Falguni Nayar,"Katrina Kaif, Steadview Capital",Undisclosed,,
340,Nykaa,2012,Mumbai,Fashion,Cosmetics & beauty products online,Falguni Nayar,Steadview Capital,"$24,700,000",,
712,Nykaa,2012,Mumbai,Ecommerce,Deals in cosmetic and wellness products,Falguni Nayar,Steadview capital,"$8,800,000",,
813,Nykaa,2012,Mumbai,Ecommerce,Deals in cosmetic and wellness products,Falguni Nayar,Steadview capital,"$13,137,000",,


In [67]:
# Highlighting duplicated value(Byju’s)
funding_2020[funding_2020["Company/Brand"]=="Byju’s"]

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
326,Byju’s,2011,Bangalore,EdTech,An Indian educational technology and online tu...,Byju Raveendran,"Owl Ventures, Tiger Global Management","$500,000,000",,
362,Byju’s,2011,Bangalore,EdTech,An Indian educational technology and online tu...,Byju Raveendran,"Owl Ventures, Tiger Global Management","$500,000,000",,


### EXPLORING 2021 FUNDING DATASET


In [79]:
# Checking the head of the data
funding_2021.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [80]:
# Cecking the tail of the data
funding_2021.tail()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B
1208,WeRize,2019.0,Bangalore,Financial Services,India’s first socially distributed full stack ...,"Vishal Chopra, Himanshu Gupta","3one4 Capital, Kalaari Capital",$8000000,Series A


In [81]:
# Checking the basic info of the data
funding_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What it does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount($)      1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [82]:
# Summary description of the data
funding_2021.describe(include="all").transpose()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Company/Brand,1209.0,1033.0,BharatPe,8.0,,,,,,,
Founded,1208.0,,,,2016.655629,4.517364,1963.0,2015.0,2018.0,2020.0,2021.0
HeadQuarter,1208.0,70.0,Bangalore,426.0,,,,,,,
Sector,1209.0,254.0,FinTech,122.0,,,,,,,
What it does,1209.0,1143.0,BharatPe develops a QR code-based payment app ...,4.0,,,,,,,
Founders,1205.0,1095.0,"Ashneer Grover, Shashvat Nakrani",7.0,,,,,,,
Investor,1147.0,937.0,Inflection Point Ventures,24.0,,,,,,,
Amount($),1206.0,278.0,$Undisclosed,73.0,,,,,,,
Stage,781.0,31.0,Seed,246.0,,,,,,,


In [84]:
# Checking for duplicates
funding_2021.isna().any()

Company/Brand    False
Founded           True
HeadQuarter       True
Sector           False
What it does     False
Founders          True
Investor          True
Amount($)         True
Stage             True
dtype: bool

In [85]:
# Finding the sum of total of missing values
funding_2021.isna().sum()

Company/Brand      0
Founded            1
HeadQuarter        1
Sector             0
What it does       0
Founders           4
Investor          62
Amount($)          3
Stage            428
dtype: int64

In [87]:
# Checking for duplicated values
funding_2021.duplicated().any()

True

In [88]:
# dataframe showing duplicated values
funding_2021[funding_2021.duplicated()]

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
107,Curefoods,2020.0,Bangalore,Food & Beverages,Healthy & nutritious foods and cold pressed ju...,Ankit Nagori,"Iron Pillar, Nordstar, Binny Bansal",$13000000,
109,Bewakoof,2012.0,Mumbai,Apparel & Fashion,Bewakoof is a lifestyle fashion brand that mak...,Prabhkiran Singh,InvestCorp,$8000000,
111,FanPlay,2020.0,Computer Games,Computer Games,A real money game app specializing in trivia g...,YC W21,"Pritesh Kumar, Bharat Gupta",Upsparks,$1200000
117,Advantage Club,2014.0,Mumbai,HRTech,Advantage Club is India's largest employee eng...,"Sourabh Deorah, Smiti Bhatt Deorah","Y Combinator, Broom Ventures, Kunal Shah",$1700000,
119,Ruptok,2020.0,New Delhi,FinTech,Ruptok fintech Pvt. Ltd. is an online gold loa...,Ankur Gupta,Eclear Leasing,$1000000,
243,Trinkerr,2021.0,Bangalore,Capital Markets,Trinkerr is India's first social trading platf...,"Manvendra Singh, Gaurav Agarwal",Accel India,$6600000,Series A
244,Zorro,2021.0,Gurugram,Social network,Pseudonymous social network platform,"Jasveer Singh, Abhishek Asthana, Deepak Kumar","Vijay Shekhar Sharma, Ritesh Agarwal, Ankiti Bose",$32000000,Seed
245,Ultraviolette,2021.0,Bangalore,Automotive,Create and Inspire the future of sustainable u...,"Subramaniam Narayan, Niraj Rajmohan","TVS Motor, Zoho",$150000000,Series C
246,NephroPlus,2009.0,Hyderabad,Hospital & Health Care,A vision and passion of redefining healthcare ...,Vikram Vuppala,IIFL Asset Management,$24000000,Series E
247,Unremot,2020.0,Bangalore,Information Technology & Services,Unremot is a personal office for consultants!,Shiju Radhakrishnan,Inflection Point Ventures,$700000,Seed


## ISSUES WITH THE DATA
#### FUNDING YEAR 2018
1. Amount has datatype as object. It should be a float data type
2. There are different currencies in the Amount column. All currencies are supposed to be in dollars. There are commas in certain values.
3. There are Non-Avalaible Values(NaN) in the dataset
4. Company names mixed with website
5. There is a google document link in the Rounds/Series column
6. The "About" column has no influence on our analysis so hence must be removed.
7. There is 1 duplicated information in the dataset(Company Name = TheCollegeFever)
8. Location column is not consistent with the rest of the dataset. It contained city, State, and Country.
9. There are locations that contain India, Asia. This will be replaced with the most recurring city.

#### FUNDING YEAR 2019
1. There are lots NaN values*** in the Stage, HeadQuaters and Founded columns
2. There is the currency sign and commas attached to the amounts.
3. What the company does has no influence on our data.
4. The Founded Column is in float
5. Amount column is in object.
6. The Founder columnn is not important to our analysis
7. Company/Brand column must be renamed to Company

#### FUNDING YEAR 2020
1. Company/Brand column must be renamed to Company
2. Column, "Unnamed:9" has no influence on our dataset
3. There is the currency sign and commas attached to the amounts.
4. Spelling error (>Vikram Sud, column=192) in the investor column
5. There are Headquaters cities that are outside India.
6. Duplicated three entries (Byju’s, Nykaa, Krimanshi)
7. Columns with names of cities and state.

#### FUNDING YEAR 2021
1. Company/Brand column must be renamed to Company
2. There is the currency sign and commas attached to the amounts.
3. HeadQuaters changed to Location
4. There are lots of NaN values
5. There are duplicated values (Curefoods, Bewakoofs, FanPlay, Advantage Club, Ruptok, Trinker, Zorro, Ultraviollette, Nephroplus, Unremot, Fansanywhere,Pingolearn, Spy, Enmovil, ASQI Advisers, Insurance Samadhan, Evenflow Brands, MasterChow, Fullife healthcare)

### RESOLVING ISSUES RAISED FROM THE DATASET
1. Write python functions and pandas codes to remove currency symbols, empty spaces and commas in the numeric values.
2. Write python functions and codes to covert datatype of numeric values recorded as objects(string) to floats or ints.
3. Replace all categorical null and NaN values with Unknown or Undisclosed.
4. Remove columns that will have no bearing on the analysis.
5. Provide a common name for values with same meaning in the Stage column
6. Change HeadQuarters column name to Location
7. Remove hedquaters cities that are outside India
8. Use Mode to fill missing or NaN numeric values. Missing and NaN values are almost half of the data and cannot be dropped.
9. Locations that contain names of cities, states and country will be replaced with only names of cities
10. Row 78, location will be changed to Chennai, row 161 location will be changed to Jaipur, row 184 chnage location to Dhingsara, row 282 change location to Trivanduram, row 284 change location to Samastipor, row 288 change location to Tomkor
11. Remove duplicated values.

## ## DATA CLEANING AND PREPARATION

1. All identified errors or issues in the dataset will be corrected in the cleaning process.
2. The datasets will be concactenanted into a single dataset

## RECOMMENDATIONS