## Business Understanding
### Business Scenario
Your team is trying to venture into the Indian start-up
ecosystem. As the data expert of the team, you are to
investigate the ecosystem and propose the best course
of action.

*Analyze funding received by start-ups in India from
2018 to 2021.*
- Separate data for each year of funding will is
provided.
- In these datasets, you'll find the start-ups' details,
the funding amounts received, and the investors'
information.

### Business Objective
The aim of this project is to perform analysis on the Indian start-ups ecosystem and advice stakeholders on which venture to invest in to increase the potential of high profit/income.

In [1]:
#import all necessary libraries

# data manipulation
import pandas as pd
import numpy as np

# data visualization libraries
import matplotlib.pyplot as plt
from plotly import express as px
import seaborn as sns

# statistical libraries
from scipy import stats
import statistics as stat

# database manipulation libraries
import pyodbc
from dotenv import dotenv_values

# hide warnings
import warnings
warnings.filterwarnings("ignore")


 

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Setup Database Connection

In [2]:
# load environment variables
environment_variables = dotenv_values(".env")

# load database configurations
database = environment_variables.get("DB_DATABASENAME")
server = environment_variables.get("DB_SERVER")
username = environment_variables.get("DB_USERNAME")
password = environment_variables.get("DB_PASSWORD")

# database connection string
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"



In [3]:
# create pyodbc connector
connection = pyodbc.connect(connection_string)

## Data Understanding
### Loading dataset from the different sources


In [4]:
# Loading 2021 dataset from MS SQL server
query_2021 = " SELECT * FROM dbo.LP1_startup_funding2021"
df_2021 = pd.read_sql(query_2021,connection)
df_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [5]:
# Load 2020 dataset from MS SQL Server
query_2020 = "SELECT * FROM dbo.LP1_startup_funding2020"
df_2020 = pd.read_sql(query_2020,connection)
df_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [6]:
# load 2019 dataset
df_2019 = pd.read_csv("data/startup_funding2019.csv")
df_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [7]:
# load 2018 dataset
df_2018 = pd.read_csv("D:\Programming Stuffs\DAP(Azubi Africa)\Career Accelerator\Sprint1\Indian_Start_Up_Analysis\data\startup_funding2018.csv")
df_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [8]:
# concatenated all dataset
data = pd.concat([df_2018,df_2019,df_2020,df_2021],ignore_index=False)
data.head() 

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Company_Brand,What_it_does,column10
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",,,,,,,,,,,,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,,,,,,,,,,,,
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,,,,,,,,,,,,
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,,,,,,,,,,,,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,,,,,,,,,,,,


In [9]:
# save dataframe to csv
data.to_csv("Indian_startup_dataset")

In [10]:
# convert final dataset from csv to dataframe
data.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Company_Brand,What_it_does,column10
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",,,,,,,,,,,,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,,,,,,,,,,,,
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,,,,,,,,,,,,
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,,,,,,,,,,,,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,,,,,,,,,,,,


In [11]:
# view the last five rows
data.tail()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Company_Brand,What_it_does,column10
1204,,,,$3000000,,,,2019.0,Gurugram,Staffing & Recruiting,,"Chirag Mittal, Anirudh Syal",Endiya Partners,,Pre-series A,Gigforce,A gig/on-demand staffing company.,
1205,,,,$20000000,,,,2015.0,New Delhi,Food & Beverages,,Bala Sarda,IIFL AMC,,Series D,Vahdam,VAHDAM is among the world’s first vertically i...,
1206,,,,$55000000,,,,2019.0,Bangalore,Financial Services,,"Arnav Kumar, Vaibhav Singh",Owl Ventures,,Series C,Leap Finance,International education loans for high potenti...,
1207,,,,$26000000,,,,2015.0,Gurugram,EdTech,,Ruchir Arora,"Winter Capital, ETS, Man Capital",,Series B,CollegeDekho,"Collegedekho.com is Student’s Partner, Friend ...",
1208,,,,$8000000,,,,2019.0,Bangalore,Financial Services,,"Vishal Chopra, Himanshu Gupta","3one4 Capital, Kalaari Capital",,Series A,WeRize,India’s first socially distributed full stack ...,


In [12]:
# check the shape of data
data.shape

(2879, 18)

In [13]:
# check for columns in the data
data.columns

Index(['Company Name', 'Industry', 'Round/Series', 'Amount', 'Location',
       'About Company', 'Company/Brand', 'Founded', 'HeadQuarter', 'Sector',
       'What it does', 'Founders', 'Investor', 'Amount($)', 'Stage',
       'Company_Brand', 'What_it_does', 'column10'],
      dtype='object')

In [14]:
# descriptive statistics of data
data.describe(include="all").T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Company Name,526.0,525.0,TheCollegeFever,2.0,,,,,,,
Industry,526.0,405.0,—,30.0,,,,,,,
Round/Series,526.0,21.0,Seed,280.0,,,,,,,
Amount,2533.0,754.0,—,148.0,,,,,,,
Location,526.0,50.0,"Bangalore, Karnataka, India",102.0,,,,,,,
About Company,526.0,524.0,"TheCollegeFever is a hub for fun, fiesta and f...",2.0,,,,,,,
Company/Brand,89.0,87.0,Kratikal,2.0,,,,,,,
Founded,2110.0,,,,2016.079621,4.368006,1963.0,2015.0,2017.0,2019.0,2021.0
HeadQuarter,2239.0,123.0,Bangalore,764.0,,,,,,,
Sector,2335.0,502.0,FinTech,173.0,,,,,,,


In [16]:
# check for the Unique sectors
sectors = data["Sector"].unique()
sectors

array([nan, 'Ecommerce', 'Edtech', 'Interior design', 'AgriTech',
       'Technology', 'SaaS', 'AI & Tech', 'E-commerce', 'E-commerce & AR',
       'Fintech', 'HR tech', 'Food tech', 'Health', 'Healthcare',
       'Safety tech', 'Pharmaceutical', 'Insurance technology', 'AI',
       'Foodtech', 'Food', 'IoT', 'E-marketplace', 'Robotics & AI',
       'Logistics', 'Travel', 'Manufacturing', 'Food & Nutrition',
       'Social Media', 'E-Sports', 'Cosmetics', 'B2B', 'Jewellery',
       'B2B Supply Chain', 'Games', 'Food & tech', 'Accomodation',
       'Automotive tech', 'Legal tech', 'Mutual Funds', 'Cybersecurity',
       'Automobile', 'Sports', 'Healthtech', 'Yoga & wellness',
       'Virtual Banking', 'Transportation', 'Transport & Rentals',
       'Marketing & Customer loyalty', 'Infratech', 'Hospitality',
       'Automobile & Technology', 'Banking', 'EdTech',
       'Hygiene management', 'Escrow', 'Networking platform', 'FinTech',
       'Crowdsourcing', 'Food & Bevarages', 'HealthTec

## Hypothesis Testing
*Hypothesis* - The amount of funds a company receive depends on the sector a company finds itself
- Null Hypothesis(H_o) - The funds a company receive does not depend on the sector of investment
- Alternate Hypothesis(H_a) - The funds a company receive depends on the sector of investment

### Business Questions
- Which particular sector received the most funding?
- What is the distribution of sectors based on location?
- Which year had the most start-up companies being set-up?
- What is the distribution of start-ups based on their locations?
- What is the distribution of companies based on the Round/Series 