##  Data Driven Insight for Indian Startup Growth

        Business Understanding
The target is to identify the factors that contributing to the StartUp Growth in India 
        

    Hypotheses for Data-Driven Insight for Indian Startup Growth 
    
  Null Hypothesis (H0):
There is no significant difference in the growth rates among startups located in different sectors and regions within the Indian startup ecosystem.

  Alternative Hypothesis (H1):
There is a significant difference in the growth rates among startups located in different sectors and regions within the Indian startup.

Hypothesis aims to explore whether the growth rates of startups in India are influenced not only by their sector but also by their geographical location. By analyzing data on startup growth, sectors, and regional dynamics, we seek to identify patterns that can provide strategic insights for  investors.




In [3]:
# importing the necessary libraries
import os, sys
from sqlalchemy import create_engine
import pyodbc # Database Connectivity with  Open Database Connectivity (pyodbc)

from dotenv import dotenv_values # dotenv library to load environment variables
# Lib for data manipulation  analysis and Viz
import pandas as pd
import numpy as np
import warnings
import matplotlib.pyplot as plt

warnings.filterwarnings('ignore')

    Load env Variable and assign Login Credentials 

In [4]:
# Load environment variables from .env file 

environment_variables=dotenv_values('.env')

# Get the values for the  login credentials from  the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("UID")
password = environment_variables.get("PWD")


    Test Credectiom Connection

In [5]:

connection = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"


    connect to the server

In [6]:
# This will connect to the server with help of pyodbc.

con = pyodbc.connect(connection)

      Data Loading with login credections

In [7]:
# Load the dataset SQL query
query1 = "Select * from dbo.LP1_startup_funding2020"
query2 = "Select * from dbo.LP1_startup_funding2021"

In [8]:
# Read data from the  on dataset 
data_2018 = pd.read_csv('dataset\startup_funding2018.csv')
data_2019 = pd.read_csv('dataset\startup_funding2019.csv')
data_2020 = pd.read_sql(query1, con)
data_2021 = pd.read_sql(query2, con)


    Data Overview 

In [11]:
# 2018 dataset overview
data_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [12]:
# 2019 dataset overview
data_2019.tail()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
84,Infra.Market,,Mumbai,Infratech,It connects client requirements to their suppl...,"Aaditya Sharda, Souvik Sengupta","Tiger Global, Nexus Venture Partners, Accel Pa...","$20,000,000",Series A
85,Oyo,2013.0,Gurugram,Hospitality,Provides rooms for comfortable stay,Ritesh Agarwal,"MyPreferred Transformation, Avendus Finance, S...","$693,000,000",
86,GoMechanic,2016.0,Delhi,Automobile & Technology,Find automobile repair and maintenance service...,"Amit Bhasin, Kushal Karwa, Nitin Rana, Rishabh...",Sequoia Capital,"$5,000,000",Series B
87,Spinny,2015.0,Delhi,Automobile,Online car retailer,"Niraj Singh, Ramanshu Mahaur, Ganesh Pawar, Mo...","Norwest Venture Partners, General Catalyst, Fu...","$50,000,000",
88,Ess Kay Fincorp,,Rajasthan,Banking,Organised Non-Banking Finance Company,Rajendra Setia,"TPG, Norwest Venture Partners, Evolvence India","$33,000,000",


In [13]:
# 2020 dataset overview
data_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [14]:
# 2021 dataset overview
data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


#   Data Cleaning for all datasets

    Check Duplications each dataset

In [18]:
# Check Dulpications for 2018 data
data_2018.duplicated().sum()

1

In [16]:
# View the Duplication base Compnay Name

duplicates = data_2018[data_2018.duplicated(subset=['Company Name'], keep=False)]
# View the Duplication base
duplicates

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
348,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."


In [22]:
# Drop 2018 data duplicate
data_2018 = data_2018.drop_duplicates()

In [10]:
data_2019.duplicated().sum()

0

check for missing data

In [48]:
#check for missing data for 2018
missing_values_2018 = data_2018.isnull().any
missing_values_2018()



Company Name    False
Industry        False
Amount (₹)      False
Location        False
dtype: bool

In [49]:
#check for missing data for 2019
missing_values_2019 = data_2019.isnull().any
missing_values_2019()


Company/Brand    False
HeadQuarter      False
Sector            True
Amount ($)       False
dtype: bool

In [37]:
#check for missing data for 2020
missing_values_2020 = data_2020.isnull().sum
missing_values_2020()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

In [38]:
#check for missing data for 2021
missing_values_2021 = data_2021.isnull().sum
missing_values_2021()

Company_Brand      0
Founded            1
HeadQuarter        1
Sector             0
What_it_does       0
Founders           4
Investor          62
Amount             3
Stage            428
dtype: int64

In [42]:

# Select the specified columns for analysis
selected_columns_2018 = data_2018[["Company Name","Industry","Amount","Location",]]
data_2018 = selected_columns_2018

selected_columns_2019 =  data_2019[["Company/Brand", "HeadQuarter","Sector", "Amount($)"]]
data_2019 = selected_columns_2019

selected_columns_2020 = data_2020[["Company_Brand",  "Sector", "Amount", "HeadQuarter","Investor" ]]
data_2020 = selected_columns_2020

selected_columns_2021 = data_2021[["Company_Brand",  "Sector","Amount","HeadQuarter","Investor"]]
data_2021 = selected_columns_2021

In [45]:

#Replace the $ sign in the Amount column: in 2018,20 and 2021

data_2018['Amount'] = data_2018['Amount'].str.replace('₹', '')
data_2019['Amount($)'] = data_2019['Amount($)'].str.replace('$', '')
# data_2020['Amount'] = data_2020['Amount'].str.replace('$', '')
data_2021['Amount'] = data_2021['Amount'].str.replace('$', '')

#Team Remember to rename column with specific currency signs
data_2018=data_2018.rename(columns={'Amount': 'Amount (₹)'})
data_2019=data_2019.rename(columns={'Amount($)':'Amount ($)'})
data_2020=data_2020.rename(columns={'Amount': 'Amount ($)'})
data_2021=data_2021.rename(columns={'Amount': 'Amount ($)'})

In [46]:
data_2018.head()

Unnamed: 0,Company Name,Industry,Amount (₹),Location
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",250000,"Bangalore, Karnataka, India"
1,Happy Cow Dairy,"Agriculture, Farming",40000000,"Mumbai, Maharashtra, India"
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",65000000,"Gurgaon, Haryana, India"
3,PayMe India,"Financial Services, FinTech",2000000,"Noida, Uttar Pradesh, India"
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",—,"Hyderabad, Andhra Pradesh, India"


 handle ning the missing Values