ANALYSIS ON INDIAN STARTUP ECOSYSTEM FROM 2018-2021.

## Business Objective: 
The objective for this project is to analyze the funding data of Indian start-ups from the years 2018 to 2021 to gain a deep understanding of the financial landscape within the Indian start-up ecosystem. The primary focus is to identify the sectors or industries that have consistently demonstrated significant growth potential and attractiveness for investments. This analysis will serve as the foundation for making strategic decisions regarding entry or expansion into the Indian start-up ecosystem, ensuring that resources are directed towards the most promising areas that offer the best prospects for success and return on investment.
1.	Hypothesis 1: Funding Amount Growth
•	Null Hypothesis (H0): The average funding amount for startups in India has remained constant from 2018 to 2021.
•	Alternative Hypothesis (H1): The average funding amount for startups in India has increased from 2018 to 2021.

In [1]:
#%pip install pyodbc
#%pip install python-dotenv

In [2]:
#Import necessary packages
import pyodbc 
from dotenv import dotenv_values 
import pandas as pd
import warnings
import numpy as np
warnings.filterwarnings('ignore')

In [3]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('configuration.env')
# Get the values for the credentials you set in the '.env' file
database = environment_variables.get("DATABASE")
server = environment_variables.get("SERVER")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")


connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

In [4]:
#Establishing connection to the database
connection = pyodbc.connect(connection_string)

In [5]:
#Getting the 2018 data
data2018 = pd.read_csv('data_sources/startup_funding2018.csv')
data2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [6]:
#Getting the 2019 data
data2019 = pd.read_csv('data_sources/startup_funding2019.csv')
data2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [7]:
#Getting the 2020 data from the server
query = "Select * from dbo.LP1_startup_funding2020"
data2020 = pd.read_sql(query, connection)
data2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [8]:
#Getting the 2021 data from the server
query = "Select * from dbo.LP1_startup_funding2021"
data2021 = pd.read_sql(query, connection)
data2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


EDA & Cleaning up the 2018 data

In [9]:
#Getting the 2018 info
data2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [10]:
#Checking for the shape of the data in 2018.
data2018.shape

(526, 6)

In [11]:
#Checking for any null values from the 2018 dataset.
Null_values = data2018.isnull().sum()
print(Null_values)

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64


In [12]:
#Check for duplicate values in 2018 data frame
duplicate = data2018[data2018.duplicated()] 
print(duplicate)

        Company Name                                           Industry  \
348  TheCollegeFever  Brand Marketing, Event Promotion, Marketing, S...   

    Round/Series  Amount                     Location  \
348         Seed  250000  Bangalore, Karnataka, India   

                                         About Company  
348  TheCollegeFever is a hub for fun, fiesta and f...  


In [13]:
#Finding out more about the duplicate values.
data2018[data2018['Company Name'] == "TheCollegeFever"]

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
348,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."


In [14]:
#Dropping duplicate values in 2018 dataframe.
data2018.drop_duplicates()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
...,...,...,...,...,...,...
521,Udaan,"B2B, Business Development, Internet, Marketplace",Series C,225000000,"Bangalore, Karnataka, India","Udaan is a B2B trade platform, designed specif..."
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...


In [15]:
#Check for missing values
data2018.isna().any()

Company Name     False
Industry         False
Round/Series     False
Amount           False
Location         False
About Company    False
dtype: bool

In [16]:
#Dropping the columns Founded and Founders
data2018.drop(['About Company'], axis=1, inplace=True)

In [17]:
#Cleaning up of the amount column. 
# Function to convert Indian Rupee to USD
def convert_to_usd(Amount):
    # Assuming an exchange rate of 0.014 for illustration purposes
    exchange_rate = 0.014
    return Amount * exchange_rate
 
# Function to classify and convert undeclared values
def classify_and_convert(row):
    if '₹' in row['Amount']:
        amount_str = row['Amount'][1:].replace(',', '')  # Remove commas
        return convert_to_usd(float(amount_str))  # Extract amount and convert to float
    elif '$' in row['Amount']:
        amount_str = row['Amount'][1:].replace(',', '')  # Remove commas
        return float(amount_str)  # Extract amount and convert to float
    elif '—' in row['Amount']:
        return np.nan  # Replace '-' with NaN
    else:
        threshold = 10000000.0  # Adjust the threshold as needed
        if float(row['Amount']) >= threshold:
            return convert_to_usd(float(row['Amount']))
        else:
            return float(row['Amount']) 
 
 #Apply the custom function to create a new column 'cleaned_amount'
data2018['Amount($)'] = data2018.apply(classify_and_convert, axis=1)
 



In [18]:
#Check whether for the new dataframe
data2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,Amount($)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India",250000.0
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",560000.0
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",910000.0
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",2000000.0
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",


In [19]:
#Sampling various data entries to confirm the values were accurately converted
data2018.sample()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,Amount($)
350,Career Anna,"Commercial, E-Learning, Education",Seed,"₹40,000,000","Bangalore, Karnataka, India",560000.0


In [20]:
# Drop the 'Amount' column and display the cleaned DataFrame
data2018.drop('Amount', axis=1, inplace=True)

EDA and CLEANING UP THE 2019 DATA

In [21]:
#EDA on 2019 data
data2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [22]:
#Checking for null values
data2019.isna().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64

In [23]:
#Check for duplicate values
duplicate = data2019[data2019.duplicated()] 
print(duplicate)

Empty DataFrame
Columns: [Company/Brand, Founded, HeadQuarter, Sector, What it does, Founders, Investor, Amount($), Stage]
Index: []


In [25]:
#Dropping $ sign, replace 'Undisclosed' with Nan, and removing the ','s
data2019['Amount($)'] = pd.to_numeric(data2019['Amount($)'].replace('Undisclosed', np.nan).str.replace('[$,]', '', regex=True))


In [26]:
#Dropping the columns Founded and Founders
data2019.drop(['Founded','Founders', 'What it does'], axis=1, inplace=True)

In [27]:
data2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   HeadQuarter    70 non-null     object 
 2   Sector         84 non-null     object 
 3   Investor       89 non-null     object 
 4   Amount($)      77 non-null     float64
 5   Stage          43 non-null     object 
dtypes: float64(1), object(5)
memory usage: 4.3+ KB


In [None]:
#sampling the 2019 data to check the implementation of the changes.
data2019.sample()

EDA & Clean up of 2020 data

In [28]:
#Check for the shape of the data
data2020.shape

(1055, 10)

In [29]:
#Checking for the information of the 2020 dataframe
data2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.6+ KB


In [30]:
#Checking for null values
data2020.isna().sum()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

In [31]:
#Drop all Null values on the column amount
data2020.dropna(subset=['Amount'], inplace=True)

In [32]:
#Dropping the columns Founded and Founders
data2020.drop(['Founded','Founders','column10','What_it_does'], axis=1, inplace=True)

In [33]:
data2020.head()

Unnamed: 0,Company_Brand,HeadQuarter,Sector,Investor,Amount,Stage
0,Aqgromalin,Chennai,AgriTech,Angel investors,200000.0,
1,Krayonnz,Bangalore,EdTech,GSF Accelerator,100000.0,Pre-seed
3,NCOME,New Delhi,Escrow,"Venture Catalysts, PointOne Capital",400000.0,
4,Gramophone,Indore,AgriTech,"Siana Capital Management, Info Edge",340000.0,
5,qZense,Bangalore,AgriTech,"Venture Catalysts, 9Unicorns Accelerator Fund",600000.0,Seed


In [34]:
#Check for duplicate values
duplicate = data2020[data2020.duplicated()] 
print(duplicate)

    Company_Brand HeadQuarter                 Sector  \
145     Krimanshi     Jodhpur  Biotechnology company   
362        Byju’s   Bangalore                 EdTech   

                                           Investor       Amount Stage  
145  Rajasthan Venture Capital Fund, AIM Smart City     600000.0  Seed  
362           Owl Ventures, Tiger Global Management  500000000.0  None  


EDA & Clean up of 2021 data

In [36]:
#Dropping the columns Founded and Founders
data2021.drop(['Founded','Founders','What_it_does'], axis=1, inplace=True)

In [38]:
data2021.sample()

Unnamed: 0,Company_Brand,HeadQuarter,Sector,Investor,Amount,Stage
316,Hubhopper,New Delhi,Podcast,"ITI Growth Opportunities Fund, Unit-E Ventures",Undisclosed,


In [39]:
#Code to drop $ sign, replace 'Undisclosed' with Nan, and removing the ','s
data2021['Amount'] = data2021['Amount'].replace('[\$,]', '', regex=True).replace(['Pre-series A','Undisclosed','Upsparks','J0','undisclosed','ah! Ventures','ITO Angel Network LetsVenture', 'LetsVenture','JITO Angel Network','Series C','Seed',' None'],0, regex=True).replace('', np.nan , regex=True)
data2021['Amount'].astype(float) #Converting the column amount to float

0         1200000.0
1       120000000.0
2        30000000.0
3        51000000.0
4         2000000.0
           ...     
1204      3000000.0
1205     20000000.0
1206     55000000.0
1207     26000000.0
1208      8000000.0
Name: Amount, Length: 1209, dtype: float64

In [42]:
#Drop all Null values on the column amount
data2021.dropna(subset=['Amount'], inplace=True)

Renaming of all columns to map out uniform columns.

In [43]:
column_mapping_2021= {'Company_Brand': 'Company_Name', 'Amount': 'Amount($)'} #Creating dictionary mapping old column names to new column names
data2021.rename(columns=column_mapping_2021, inplace=True)# Rename multiple columns using the rename() method

In [44]:
column_mapping_2020 = {'Company_Brand': 'Company_Name', 'Amount': 'Amount($)'} #Creating dictionary mapping old column names to new column names
data2020.rename(columns=column_mapping_2020, inplace=True)# Rename multiple columns using the rename() method

In [45]:
column_mapping_2019 = {'Company/Brand': 'Company_Name'} #Creating dictionary mapping old column names to new column names
data2019.rename(columns=column_mapping_2019, inplace=True) # Rename multiple columns using the rename() method

In [46]:
#Creating dictionary mapping old column names to new column names
column_mapping_2018 = {'Company Name': 'Company_Name', 'Industry': 'Sector', 'Round/Series':'Series', 'Location':'HeadQuarter'}
# Rename multiple columns using the rename() method
data2018.rename(columns=column_mapping_2018, inplace=True)