### Data Analysis Project -- Indian Start-up Funding Analysis

### BUSINESS UNDERSTANDING

### Ideas, creativity, and execution are essential for a start-up to flourish. But are they enough? Investors provide start-ups and other entrepreneurial ventures with the capital---popularly known as "funding"---to think big, grow rich, and leave a lasting impact. 

### In this project, you are going to analyse funding received by start-ups in India from 2018 to 2021. You will find the data for each year of funding in a separate csv file in the dataset provided. 

### In these files you'll find the start-ups' details, the funding amounts received, and the investors' information.



### Scenario

### Your team is trying to venture into the Indian start-up ecosystem. As the data expert of the team you are to investigate the ecosystem and propose the best course of action.

### Instructions

### Your task is to develop a unique story from this dataset by stating and testing a hypothesis, 

### asking questions, perform analysis and share insights with appropriate visualisations.

### So as part of the project you are to:

### 1) Ask questions

### 2) Develop hypothesis

### 3) Process the data

### 4) Analyse the data

### 5) Visualise the data



### UNDERSTANDING YOUR DATA

### INPORTING LIBRARIES

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import plotly.express as px
import plotly.graph_objs as go


### LOADING DATASETS

In [2]:
startup_2018= pd.read_csv(r'C:\Users\User\Desktop\AZUBI AFRICA\Projects\LP 1\startup_funding2018.csv')

startup_2018
                           


Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


In [3]:
startup_2019 = pd.read_csv(r'C:\Users\User\Desktop\AZUBI AFRICA\Projects\LP 1\startup_funding2019.csv')

startup_2020 = pd.read_csv(r'C:\Users\User\Desktop\AZUBI AFRICA\Projects\LP 1\startup_funding2020.csv')

startup_2021 = pd.read_csv(r'C:\Users\User\Desktop\AZUBI AFRICA\Projects\LP 1\startup_funding2021.csv')

startup_2019

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",
...,...,...,...,...,...,...,...,...,...
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",


In [4]:
startup_2020


Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
0,Aqgromalin,2019,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,"$200,000",,
1,Krayonnz,2019,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,"$100,000",Pre-seed,
2,PadCare Labs,2018,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,Undisclosed,Pre-seed,
3,NCOME,2020,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital","$400,000",,
4,Gramophone,2016,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge","$340,000",,
...,...,...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,AI enabled marketplace that provides career gu...,Akshay Chaturvedi,"DSG Consumer Partners, Blume Ventures","$1,500,000",,
1051,EpiFi,,,Fintech,It offers customers with a single interface fo...,"Sujith Narayanan, Sumit Gwalani","Sequoia India, Ribbit Capital","$13,200,000",Seed Round,
1052,Purplle,2012,Mumbai,Cosmetics,Online makeup and beauty products retailer,"Manish Taneja, Rahul Dash",Verlinvest,"$8,000,000",,
1053,Shuttl,2015,Delhi,Transport,App based bus aggregator serice,"Amit Singh, Deepanshu Malviya",SIG Global India Fund LLP.,"$8,043,000",Series C,


In [5]:
startup_2021

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed
...,...,...,...,...,...,...,...,...,...
1204,Gigforce,2019.0,Gurugram,Staffing & Recruiting,A gig/on-demand staffing company.,"Chirag Mittal, Anirudh Syal",Endiya Partners,$3000000,Pre-series A
1205,Vahdam,2015.0,New Delhi,Food & Beverages,VAHDAM is among the world’s first vertically i...,Bala Sarda,IIFL AMC,$20000000,Series D
1206,Leap Finance,2019.0,Bangalore,Financial Services,International education loans for high potenti...,"Arnav Kumar, Vaibhav Singh",Owl Ventures,$55000000,Series C
1207,CollegeDekho,2015.0,Gurugram,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",Ruchir Arora,"Winter Capital, ETS, Man Capital",$26000000,Series B


### STATING THE HYPOTHESIS

### HO (Null): Sector of service is not dependent on the amount raised.

### H1(Alternative): Sector of service is dependent on the amount raised.

### ANALYTICAL QUESTIONS

### 1) What is the total amount raised by all the investors

### 2) Which sector has the highest amount invested.

### 3) Which year has the most start up founded.

### 4) Which sector has the highest stage of funding.

### 5) Which Company has the highest amount invested on and who is the investor.

### 6) What is the Maximum, Minimum and Average amount raised.

### DATA PRE-PROCESSING

In [6]:
# 2018 STARTUP DATA PROCESSING

startup_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [7]:
startup_2018.describe()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
count,526,526,526,526,526,526
unique,525,405,21,198,50,524
top,TheCollegeFever,—,Seed,—,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
freq,2,30,280,148,102,2


In [8]:
startup_2018.head(10)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,1600000,"Bengaluru, Karnataka, India",Hasura is a platform that allows developers to...
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...
8,Freightwalla,"Information Services, Information Technology",Seed,—,"Mumbai, Maharashtra, India",Freightwalla is an international forwarder tha...
9,Microchip Payments,Mobile Payments,Seed,—,"Bangalore, Karnataka, India",Microchip payments is a mobile-based payment a...


In [9]:
startup_2018.shape

(526, 6)

In [10]:
startup_2018.dtypes

Company Name     object
Industry         object
Round/Series     object
Amount           object
Location         object
About Company    object
dtype: object

In [11]:
startup_2018.isna().sum()

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64

### RENAMING THE COLUMNS

In [12]:
startup_2018.rename(columns={'Company Name':'Company/Brand', 'Industry':'Sector', 'Round/Series':'Stage', 'Amount':'Amount ($)',
                             'Location':'HeadQuarter', 'About Company':'What it does'}, inplace = True)

In [13]:
startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


### FEATURE ENGINEERING 

###               CREATING THE YEAR OF FUNDING COLUMN

In [14]:
startup_2018['Year of Funding'] = '2018'

In [15]:
startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018
...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018


### CREATING ANOTHER AMOUNT (USD) COLUMN

### THIS COLUMN WILL BE USED TO ISOLATE THE RUPEES, DOLLARS AND EMPTY SIGNS

In [16]:
# CREATING THE AMOUNT (USD) COLUMN

startup_2018['Amount(USD)'] = ' '


# USING THE STRING FUNCTION

inr = startup_2018[startup_2018["Amount ($)"].str.startswith('₹')]

inr


Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018,
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018,
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...,2018,
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...,2018,
15,Pitstop,"Automotive, Search Engine, Service Industry",Seed,"₹100,000,000","Bengaluru, Karnataka, India",Pitstop offers general repair and maintenance ...,2018,
...,...,...,...,...,...,...,...,...
513,Nykaa,"Beauty, Fashion, Wellness",Secondary Market,"₹1,130,000,000","Mumbai, Maharashtra, India",Nykaa.com is a premier online beauty and welln...,2018,
514,Chaayos,"Food and Beverage, Restaurants, Tea",Series B,"₹810,000,000","New Delhi, Delhi, India",Chaayos was born in November 2012 out of this ...,2018,
516,LT Foods,"Food and Beverage, Food Processing, Manufacturing",Post-IPO Equity,"₹1,400,000,000","New Delhi, Delhi, India",LT Foods believe that nature will continue to ...,2018,
517,Multibashi,"E-Learning, Internet",Seed,"₹10,000,000","Bengaluru, Karnataka, India",Free language learning platform.,2018,


### FILLING THE AMOUNT (USD) COLUMN WITH INR FOR ₹, USD FOR $ AND EMPTY FOR -


In [17]:
startup_2018.loc[startup_2018['Amount ($)'].str.startswith('₹'), 'Amount(USD)'] = 'INR'

startup_2018.loc[startup_2018['Amount ($)'].str.startswith('$'), 'Amount(USD)'] = ' '

startup_2018.loc[startup_2018['Amount ($)'] =='—', 'Amount(USD)'] = 'NUL'



startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018,INR
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018,INR
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018,NUL
...,...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018,
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018,NUL
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018,
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018,INR


### REMOVING THE DOLLAR AND RUPEES SIGN 

### AND REMOVING ALL COMAS, HIPHENS IN THE AMOUNT ($) COLUMN

In [18]:
startup_2018.loc[startup_2018['Amount ($)'].str.startswith('₹'), 'Amount ($)'] = startup_2018['Amount ($)'].str[1:]

startup_2018.loc[startup_2018['Amount ($)'].str.startswith('$'), 'Amount ($)'] = startup_2018['Amount ($)'].str[1:]

startup_2018.loc[startup_2018['Amount ($)'].str.startswith('—'), 'Amount ($)'] = '0.0'

startup_2018.loc[startup_2018['Amount ($)'].str.contains(',', regex=True), 'Amount ($)'] = startup_2018['Amount ($)'].str.replace(',',' ')


startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,40 000 000,"Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018,INR
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,65 000 000,"Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018,INR
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,0.0,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018,NUL
...,...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018,
522,Happyeasygo Group,"Tourism, Travel",Series A,0.0,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018,NUL
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018,
524,Droni Tech,Information Technology,Seed,35 000 000,"Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018,INR


In [19]:
# checking datatype

startup_2018.dtypes

Company/Brand      object
Sector             object
Stage              object
Amount ($)         object
HeadQuarter        object
What it does       object
Year of Funding    object
Amount(USD)        object
dtype: object

### CONVERTING AMOUNT ($) AND YEAR OF FUNDING DATATYPE FROM OBJECT TO FLOAT AND INT

In [20]:
#startup_2018['Amount ($)'] = startup_2018['Amount ($)'].astype(float)

startup_2018['Amount ($)'] = pd.to_numeric(startup_2018['Amount ($)'], errors='coerce')

startup_2018['Year of Funding'] = pd.to_numeric(startup_2018['Year of Funding'], errors='coerce')

print(startup_2018.dtypes)

Company/Brand       object
Sector              object
Stage               object
Amount ($)         float64
HeadQuarter         object
What it does        object
Year of Funding      int64
Amount(USD)         object
dtype: object


### CURRENCY CONVERSION - WE ARE USING 68.41 RUPEES TO 1 USD

In [21]:
# indian exchange rate = 68.41

indian_rate = 68.41

indian_rate

68.41

In [22]:
startup_2018.loc[startup_2018['Amount(USD)'] == 'INR', 'Amount ($)'] = startup_2018['Amount ($)']/indian_rate

startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000.0,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,,"Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018,INR
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,,"Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018,INR
3,PayMe India,"Financial Services, FinTech",Angel,2000000.0,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,0.0,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018,NUL
...,...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000.0,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif...",2018,
522,Happyeasygo Group,"Tourism, Travel",Series A,0.0,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.,2018,NUL
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500.0,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...,2018,
524,Droni Tech,Information Technology,Seed,,"Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...,2018,INR


In [23]:
startup_2018.head(100)

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000.0,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",2018,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,,"Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,2018,INR
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,,"Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,2018,INR
3,PayMe India,"Financial Services, FinTech",Angel,2000000.0,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,2018,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,0.0,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,2018,NUL
...,...,...,...,...,...,...,...,...
95,AuthMetrik,"B2B, Biometrics, Cyber Security, Fraud Detecti...",Grant,,"Gurgaon, Haryana, India","SaaS, B2B, Security, Stop account sharing, Fra...",2018,
96,Khidki,"Artificial Intelligence, Social",Seed,0.0,"Bangalore, Karnataka, India",Vernacular Social Network Focused on Town Leve...,2018,NUL
97,LetsTransport,"Logistics, Transportation, Travel",Series B,,"Bangalore, Karnataka, India",Lets transport is a logistics solution provider.,2018,INR
98,Next Digital Solutions,"Digital Marketing, SEM, SEO, Web Development",Angel,,"Kota, Rajasthan, India",Next Digital Solutions is website design & Dig...,2018,INR


In [24]:
startup_2018['Amount ($)'].describe()

count    3.230000e+02
mean     6.609109e+06
std      3.063988e+07
min      0.000000e+00
25%      0.000000e+00
50%      1.500000e+05
75%      1.500000e+06
max      3.650000e+08
Name: Amount ($), dtype: float64

### REMOVING THE REGION AND COUNTRY FROM THE HEADQUARTER LEAVING ONLY THE CITY TO ALIGN WITH THE REST OF THE DATA 

In [25]:
startup_2018['HeadQuarter'] = startup_2018['HeadQuarter'].str.split(',').str[0]

startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000.0,Bangalore,"TheCollegeFever is a hub for fun, fiesta and f...",2018,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,,Mumbai,A startup which aggregates milk from dairy far...,2018,INR
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,,Gurgaon,Leading Online Loans Marketplace in India,2018,INR
3,PayMe India,"Financial Services, FinTech",Angel,2000000.0,Noida,PayMe India is an innovative FinTech organizat...,2018,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,0.0,Hyderabad,Eunimart is a one stop solution for merchants ...,2018,NUL
...,...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000.0,Bangalore,"Udaan is a B2B trade platform, designed specif...",2018,
522,Happyeasygo Group,"Tourism, Travel",Series A,0.0,Haryana,HappyEasyGo is an online travel domain.,2018,NUL
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500.0,Mumbai,Mombay is a unique opportunity for housewives ...,2018,
524,Droni Tech,Information Technology,Seed,,Mumbai,Droni Tech manufacture UAVs and develop softwa...,2018,INR


### REPLACING MIS-SPELT BENGALURU TO BANGALORE IN HEADQUARTER COLUMN

In [26]:
Bengaluru = startup_2018.loc[startup_2018['HeadQuarter'] == 'HeadQuarter'].count()
if(Bengaluru['HeadQuarter'].sum() > 0):
    startup_2018['HeadQuarter'] = startup_2018['HeadQuarter'].str.replace('Bengaluru', 'Bangalore')

startup_2018

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding,Amount(USD)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000.0,Bangalore,"TheCollegeFever is a hub for fun, fiesta and f...",2018,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,,Mumbai,A startup which aggregates milk from dairy far...,2018,INR
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,,Gurgaon,Leading Online Loans Marketplace in India,2018,INR
3,PayMe India,"Financial Services, FinTech",Angel,2000000.0,Noida,PayMe India is an innovative FinTech organizat...,2018,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,0.0,Hyderabad,Eunimart is a one stop solution for merchants ...,2018,NUL
...,...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000.0,Bangalore,"Udaan is a B2B trade platform, designed specif...",2018,
522,Happyeasygo Group,"Tourism, Travel",Series A,0.0,Haryana,HappyEasyGo is an online travel domain.,2018,NUL
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500.0,Mumbai,Mombay is a unique opportunity for housewives ...,2018,
524,Droni Tech,Information Technology,Seed,,Mumbai,Droni Tech manufacture UAVs and develop softwa...,2018,INR


In [27]:
#DROPPING AMOUNT(USD) COLUMN

startup_2018.drop(['Amount(USD)'],  axis=1)

Unnamed: 0,Company/Brand,Sector,Stage,Amount ($),HeadQuarter,What it does,Year of Funding
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000.0,Bangalore,"TheCollegeFever is a hub for fun, fiesta and f...",2018
1,Happy Cow Dairy,"Agriculture, Farming",Seed,,Mumbai,A startup which aggregates milk from dairy far...,2018
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,,Gurgaon,Leading Online Loans Marketplace in India,2018
3,PayMe India,"Financial Services, FinTech",Angel,2000000.0,Noida,PayMe India is an innovative FinTech organizat...,2018
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,0.0,Hyderabad,Eunimart is a one stop solution for merchants ...,2018
...,...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000.0,Bangalore,"Udaan is a B2B trade platform, designed specif...",2018
522,Happyeasygo Group,"Tourism, Travel",Series A,0.0,Haryana,HappyEasyGo is an online travel domain.,2018
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500.0,Mumbai,Mombay is a unique opportunity for housewives ...,2018
524,Droni Tech,Information Technology,Seed,,Mumbai,Droni Tech manufacture UAVs and develop softwa...,2018


### SAVING CLEANED 2018 DATASET

In [28]:
startup_2018.to_csv('cleaned_2018.csv', index = False)

### PROCESSING 2020 DATASET

In [29]:
startup_2020

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
0,Aqgromalin,2019,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,"$200,000",,
1,Krayonnz,2019,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,"$100,000",Pre-seed,
2,PadCare Labs,2018,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,Undisclosed,Pre-seed,
3,NCOME,2020,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital","$400,000",,
4,Gramophone,2016,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge","$340,000",,
...,...,...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,AI enabled marketplace that provides career gu...,Akshay Chaturvedi,"DSG Consumer Partners, Blume Ventures","$1,500,000",,
1051,EpiFi,,,Fintech,It offers customers with a single interface fo...,"Sujith Narayanan, Sumit Gwalani","Sequoia India, Ribbit Capital","$13,200,000",Seed Round,
1052,Purplle,2012,Mumbai,Cosmetics,Online makeup and beauty products retailer,"Manish Taneja, Rahul Dash",Verlinvest,"$8,000,000",,
1053,Shuttl,2015,Delhi,Transport,App based bus aggregator serice,"Amit Singh, Deepanshu Malviya",SIG Global India Fund LLP.,"$8,043,000",Series C,


In [30]:
startup_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company/Brand  1055 non-null   object
 1   Founded        843 non-null    object
 2   HeadQuarter    961 non-null    object
 3   Sector         1042 non-null   object
 4   What it does   1055 non-null   object
 5   Founders       1043 non-null   object
 6   Investor       1017 non-null   object
 7   Amount($)      1052 non-null   object
 8   Stage          591 non-null    object
 9   Unnamed: 9     2 non-null      object
dtypes: object(10)
memory usage: 82.5+ KB


In [31]:
startup_2020.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Unnamed: 9
0,Aqgromalin,2019,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,"$200,000",,
1,Krayonnz,2019,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,"$100,000",Pre-seed,
2,PadCare Labs,2018,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,Undisclosed,Pre-seed,
3,NCOME,2020,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital","$400,000",,
4,Gramophone,2016,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge","$340,000",,


In [32]:
startup_2020.isna().sum()

Company/Brand       0
Founded           212
HeadQuarter        94
Sector             13
What it does        0
Founders           12
Investor           38
Amount($)           3
Stage             464
Unnamed: 9       1053
dtype: int64

In [33]:
startup_2020.shape

(1055, 10)

### WE ARE GOING TO CLEAN THIS DATA COLUMN BY COLUMN

### CLEANING COMPANY/BRAND COLUMN

In [34]:
startup_2020['Company/Brand'].isna().sum()

0

In [35]:
startup_2020['Company/Brand']

0         Aqgromalin
1           Krayonnz
2       PadCare Labs
3              NCOME
4         Gramophone
            ...     
1050    Leverage Edu
1051           EpiFi
1052         Purplle
1053          Shuttl
1054           Pando
Name: Company/Brand, Length: 1055, dtype: object

### THERE IS NO MISSING VALUES IN THE COMPANY/BRAND COLUMN

### CLEANING THE 'FOUNDED ' COLUMN

In [36]:
startup_2020["Founded"].isnull().sum()

212

### There are 212 empty values. 

### We will be using  mode or mean of the column to fill the missing values. 

In [40]:
# copying the column

founded_2020 = startup_2020.Founded 

founded_2020

0       2019
1       2019
2       2018
3       2020
4       2016
        ... 
1050     NaN
1051     NaN
1052    2012
1053    2015
1054    2017
Name: Founded, Length: 1055, dtype: object

### From above the it has "-" and "nan" values. They have to be dropped inorder to find mean and mode.

In [41]:
# searching for the index of the value "-" from the column
for i in range(startup_2020.shape[0]):
    if founded_2020[i] == "-":
        print(i)

482


### It has a Total number of 482 "-"and "nan" values that need to be dropped

In [42]:
founded_2020 = founded_2020.drop(482)

In [45]:
founded_2020 = founded_2020.dropna()

founded_2020

0       2019
1       2019
2       2018
3       2020
4       2016
        ... 
1048    2016
1049    2017
1052    2012
1053    2015
1054    2017
Name: Founded, Length: 842, dtype: object

In [44]:
founded_2020.unique()

array(['2019', '2018', '2020', '2016', '2008', '2015', '2017', '2014',
       '1998', '2007', '2011', '1982', '2013', '2009', '2012', '1995',
       '2010', '2006', '1978', '1999', '1994', '2005', '1973', '2002',
       '2004', '2001'], dtype=object)

### Founded_2020 is free from error for our calculation. 

### Let us convert it to integer

In [46]:
# converting to int data type

founded_2020 = founded_2020.astype(int) 

founded_2020

0       2019
1       2019
2       2018
3       2020
4       2016
        ... 
1048    2016
1049    2017
1052    2012
1053    2015
1054    2017
Name: Founded, Length: 842, dtype: int32

In [48]:
founded_2020.info()

<class 'pandas.core.series.Series'>
Int64Index: 842 entries, 0 to 1054
Series name: Founded
Non-Null Count  Dtype
--------------  -----
842 non-null    int32
dtypes: int32(1)
memory usage: 9.9 KB


### Calculating Mode and Mean

In [49]:
founded_2020.mode()

0    2015
Name: Founded, dtype: int32

In [50]:
founded_2020.mean()

2015.3634204275534

### The Average is 2015. 

###  2015  will be used as an imputer for missed value for column Founded.

In [51]:
startup_2020.Founded.unique()

array(['2019', '2018', '2020', '2016', '2008', '2015', '2017', '2014',
       '1998', '2007', '2011', '1982', '2013', '2009', '2012', '1995',
       '2010', '2006', '1978', nan, '1999', '1994', '2005', '1973', '-',
       '2002', '2004', '2001'], dtype=object)

In [52]:
# repalcing "-" with 2015

startup_2020.Founded = startup_2020.Founded.replace("-",2015) 


# replacing NaN 

startup_2020.Founded = startup_2020.Founded.fillna(2015) 

In [53]:
startup_2020.Founded.isnull().any()

False

### NOW, OUR "FOUNDED " COLUMN IS CLEANED

### CLEANING THE SECTOR COLUMN

In [54]:
# checking if there is MISING value

startup_2020.Sector.isnull().sum() 

13

###  WE HAVE 13 MISSING VALUES INTHE SECTOR COLUMN

In [55]:
# copying the column

sector_2020 = startup_2020.Sector.isnull() 

In [56]:
# finding frequented value

mode = startup_2020.Sector.mode() 

mode

0    Fintech
Name: Sector, dtype: object

In [57]:
 # searching for empty/nan value

for i in range(len(sector_2020)):
    if sector_2020[i] == True:
        startup_2020.Sector[i] = mode

In [58]:
startup_2020.Sector.isnull().any()

False

### SECTOR COLUMN IS NOW CLEANED

### CLEANING INVESTOR COLUMN

In [59]:
# CHECKING FOR MISSING VALUES IN INVESTOR COLUMN

startup_2020.Investor.isnull().sum()

38

### It has a total of 38 missing values

In [61]:
#FILLING THE MISSING VALUES WITH " UNDEFINED"


investor_2020 = startup_2020.Investor.isnull()

In [62]:
for i in range(len(investor_2020)):
    if investor_2020[i] == True:
        startup_2020.Investor[i] = "undefined"

In [63]:
startup_2020.Investor.isnull().sum()

0

### THE INVESTOR COLUMN IS CLEANED NOW

### CLEANING THE AMOUNT COLUMN

In [64]:
startup_2020["Amount($)"].unique()

array(['$200,000', '$100,000', 'Undisclosed', '$400,000', '$340,000',
       '$600,000', '$45,000,000', '$1,000,000', '$2,000,000',
       '$1,200,000', '$660,000,000', '$120,000', '$7,500,000',
       '$5,000,000', '$500,000', '$3,000,000', '$10,000,000',
       '$145,000,000', '$100,000,000', nan, '$21,000,000', '$4,000,000',
       '$20,000,000', '$560,000', '$275,000', '$4,500,000', '$15,000,000',
       '$390,000,000', '$7,000,000', '$5,100,000', '$700,000,000',
       '$2,300,000', '$700,000', '$19,000,000', '$9,000,000',
       '$40,000,000', '$750,000', '$1,500,000', '$7,800,000',
       '$50,000,000', '$80,000,000', '$30,000,000', '$1,700,000',
       '$2,500,000', '$40,000', '$33,000,000', '$35,000,000', '$300,000',
       '$25,000,000', '$3,500,000', '$200,000,000', '$6,000,000',
       '$1,300,000', '$4,100,000', '$575,000', '$800,000', '$28,000,000',
       '$18,000,000', '$3,200,000', '$900,000', '$250,000', '$4,700,000',
       '$75,000,000', '$8,000,000', '$121,000,000'

### FROM THE ABOVE, THE COLUMN HAS "NAN" AND "UNDISCLOSED" VALUES WHICH NEED TO TREATED

### WE ARE GOING TO TREAT THEM BY REPLACING THEM WITH THEIR MEAN VALUES

In [65]:
# copying the column

amount_2020 = startup_2020["Amount($)"] 

In [66]:
# replacing values

amount_2020 = amount_2020.apply(lambda x: str(x).replace(",",""))
amount_2020 = amount_2020.apply(lambda x: str(x).replace("Undisclosed","0"))
amount_2020 = amount_2020.apply(lambda x: str(x).replace("$",""))
amount_2020 = amount_2020.apply(lambda x: str(x).replace("Undislosed","0"))
amount_2020 = amount_2020.apply(lambda x: str(x).replace("Undiclsosed","0"))
amount_2020 = amount_2020.apply(lambda x: str(x).replace("nan","0"))



In [67]:
amount_2020

0         200000
1         100000
2              0
3         400000
4         340000
          ...   
1050     1500000
1051    13200000
1052     8000000
1053     8043000
1054     9000000
Name: Amount($), Length: 1055, dtype: object

In [71]:
amount_2020

0         200000
1         100000
2              0
3         400000
4         340000
          ...   
1050     1500000
1051    13200000
1052     8000000
1053     8043000
1054     9000000
Name: Amount($), Length: 1055, dtype: object