# Indian Start-up Investment Analysis (2018 - 2021)

### Aim:
To assess the investment potential and attractiveness of the Indian startup ecosystem and provide recommendations for optimal course of action

### Objectives:
 
1. To assess the overall attractiveness of the Indian startup ecosystem based on funding trends and investor activity from 2018 to 2021.
2. To identify key sectors with high potential for investment based on their funding attractiveness and growth prospects.
3. To evaluate the investment opportunities across different stages of startup development and their risk-return profiles.
4. To analyze the geographical distribution of startups and funding to identify strategic investment locations and regional investment disparities.
5. To determine the correlation between funding amounts received by startups and their subsequent performance, providing insights into potential returns on investment and success rates.

### Business Questions:
1. What are the trends in funding amounts for Indian startups from 2018 to 2021? Are there any significant fluctuations or consistent growth patterns observed over this period?

2. Which sectors within the Indian startup ecosystem attracted the highest total funding during the specified timeframe? Are there any emerging sectors that have shown rapid growth in terms of investment?

3. What is the distribution of investment amounts across different stages of startup development (e.g., seed, early-stage, growth)? Are certain stages more favored by investors, and if so, why?

4. How is the geographical distribution of startups and funding within India? Are there specific regions or cities that have emerged as hubs for startup activity and investment, and are there any notable regional disparities?

5. Is there a correlation between the funding amounts received by startups and their subsequent performance metrics such as revenue growth, user acquisition, or market share? What insights can be gleaned from this correlation in terms of potential returns on investment and success rates?

6. Who are the top investors in the Indian startup ecosystem during the specified period? What sectors do they predominantly invest in, and are there any patterns in their investment strategies?

7. What are the characteristics of successful Indian startups in terms of founding team composition, industry focus, and funding trajectory? Can these characteristics be used to identify potential investment opportunities or predict startup success

### Hypothesis to Test:
 
Given the goal of assessing the investment potential in the Indian startup ecosystem, we hypothesize that:
 
**Null Hypothesis (H0)**: There is no clear pattern in the funding received by Indian startups from 2018 to 2021, and factors like sector, stage, location, and funding amount do not affect startup success.

**Alternative Hypothesis (H1)**: There is a clear pattern in the funding received by Indian startups from 2018 to 2021, and factors like sector, stage, location, and funding amount affect startup success.

## Import Packages for Analysis

In [84]:
# import relevant packages
import pyodbc
from dotenv import dotenv_values
import pandas as pd
import warnings
import numpy as np

warnings.filterwarnings('ignore')


#### Connect to server for 2020 and 2021 datasets

In [85]:
# load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

# Get the values for the credentials from .env file
database=environment_variables.get("DATABASE")
server=environment_variables.get("SERVER")
login=environment_variables.get("LOGIN")
password=environment_variables.get("PASSWORD")

# create a connection string
connection_string=f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={login};PWD={password}"

In [86]:
# create connection using the pyodbc method 

connection = pyodbc.connect(connection_string)

#### Select tables of interest from the Database

In [87]:
# selecting tables from Database
db_query = ''' SELECT *
            FROM INFORMATION_SCHEMA.TABLES
            WHERE TABLE_TYPE = 'BASE TABLE' '''

#### View tables of interest from the Database for verification purposes

In [88]:
# call selected table from SQL Database
ata=pd.read_sql(db_query, connection)

ata

Unnamed: 0,TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,TABLE_TYPE
0,dapDB,dbo,LP1_startup_funding2021,BASE TABLE
1,dapDB,dbo,LP1_startup_funding2020,BASE TABLE


### Data_2020

In [100]:
# Call DataFrame to understand DataFrame details for 2020
query= "SELECT * FROM dbo.LP1_startup_funding2020"
data_2020 =pd.read_sql(query, connection)

data_2020.head()



Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


### Data_2021

In [93]:
# Call DataFrame to understand DataFrame details for 2021.
query= "SELECT * FROM dbo.LP1_startup_funding2021"
data_2021 =pd.read_sql(query, connection)

data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


### Data_2019 

#### Load csv data from other sources for analysis

In [101]:
# Read 2019 DataFrame to understand data structure.
data_2019=pd.read_csv("D:\\JHanson\\Justice Hanson\\DS Career Accelerator\Project 1\\Indian-Start-up-Investment-Analysis\\CSV Data\\startup_funding2019.csv")

data_2019.head(5)

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


### Data_2018 

In [102]:
# Read 2018 DataFrame to understand data structure.
data_2018=pd.read_csv("D:\\JHanson\\Justice Hanson\\DS Career Accelerator\Project 1\\Indian-Start-up-Investment-Analysis\\CSV Data\\startup_funding2018.csv")

data_2018.head(5)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


#### Print Column Names for Comparison

In [103]:
# Print column names for comparison
print("Column names in data_2018:")
print(data_2018.columns)
print("\n")

# Column names for data_2019
print("Column names in data_2019:")
print(data_2019.columns)
print("\n")

# Column names for data_2020
print("Column names in data_2020:")
print(data_2020.columns)
print("\n")

# Column names for data_2021
print("Column names in data_2021:")
print(data_2021.columns)
print("\n")

Column names in data_2018:
Index(['Company Name', 'Industry', 'Round/Series', 'Amount', 'Location',
       'About Company'],
      dtype='object')


Column names in data_2019:
Index(['Company/Brand', 'Founded', 'HeadQuarter', 'Sector', 'What it does',
       'Founders', 'Investor', 'Amount($)', 'Stage'],
      dtype='object')


Column names in data_2020:
Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage', 'column10'],
      dtype='object')


Column names in data_2021:
Index(['Company_Brand', 'Founded', 'HeadQuarter', 'Sector', 'What_it_does',
       'Founders', 'Investor', 'Amount', 'Stage'],
      dtype='object')




Observations
1. **Inconsistency in Column Names**
Each year's dataset has different column names, making direct comparison difficult.

2. **Variations in Column Names**
The same type of information is represented by different column names across years (e.g., 'Company Name', 'Company/Brand', 'Company_Brand').

3. **Unique Columns**
Some years have unique columns not present in other years, which may complicate direct merging.

*This will be addressed by Standardizing column names across all datasets, align similar columns to a standard name for consistency and unique columns based on their importance and relevance will be handled while irrelevant columns will be dropped.*


#### Column renaming for consistency and merging

In [110]:
# Rename columns in each dataset
data_2018.rename(columns={
    'Company Name': 'company_name',
    'Industry': 'industry',
    'Round/Series': 'stage',
    'Amount': 'funding_amount',
    'Location': 'location',
    'About Company': 'description'
}, inplace=True)

data_2019.rename(columns={
    'Company/Brand': 'company_name',
    'HeadQuarter': 'location',
    'Sector': 'industry',
    'What it does': 'description',
    'Amount($)': 'funding_amount'
}, inplace=True)

data_2020.rename(columns={
    'Company_Brand': 'company_name',
    'What_it_does': 'description',
    'Amount': 'funding_amount'
}, inplace=True)

data_2021.rename(columns={
    'Company_Brand': 'company_name',
    'What_it_does': 'description',
    'Amount': 'funding_amount'
}, inplace=True)

# Merge datasets using the standardized column names
merged_data = pd.concat([data_2018, data_2019, data_2020, data_2021], ignore_index=True)

merged_data


Unnamed: 0,company_name,Sector,stage,funding_amount,location,description,founded,sector,Founders,investor,founders,column10
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",,,,,,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,,,,,,
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,,,,,,
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,,,,,,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
2874,Gigforce,,Pre-series A,$3000000,Gurugram,A gig/on-demand staffing company.,2019.0,Staffing & Recruiting,,Endiya Partners,"Chirag Mittal, Anirudh Syal",
2875,Vahdam,,Series D,$20000000,New Delhi,VAHDAM is among the world’s first vertically i...,2015.0,Food & Beverages,,IIFL AMC,Bala Sarda,
2876,Leap Finance,,Series C,$55000000,Bangalore,International education loans for high potenti...,2019.0,Financial Services,,Owl Ventures,"Arnav Kumar, Vaibhav Singh",
2877,CollegeDekho,,Series B,$26000000,Gurugram,"Collegedekho.com is Student’s Partner, Friend ...",2015.0,EdTech,,"Winter Capital, ETS, Man Capital",Ruchir Arora,


In [113]:
# Check if 'Sector' and 'sector' columns exist before merging
if 'Sector' in merged_data.columns and 'sector' in merged_data.columns:
    # Merge data from 'Sector' and 'sector' columns into a single column
    merged_data['sector'] = merged_data['Sector'].combine_first(merged_data['sector'])
    # Drop the original 'Sector' column
    merged_data.drop(columns=['Sector'], inplace=True)
elif 'Sector' in merged_data.columns:
    # Rename 'Sector' column to 'sector'
    merged_data.rename(columns={'Sector': 'sector'}, inplace=True)
elif 'sector' not in merged_data.columns:
    # If neither 'Sector' nor 'sector' column exists, print a warning message
    print("Warning: 'Sector' and 'sector' columns are missing. No action taken.")



In [114]:
merged_data

Unnamed: 0,company_name,stage,funding_amount,location,description,founded,sector,Founders,investor,founders,column10
0,TheCollegeFever,Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",,"Brand Marketing, Event Promotion, Marketing, S...",,,,
1,Happy Cow Dairy,Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,,"Agriculture, Farming",,,,
2,MyLoanCare,Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,,"Credit, Financial Services, Lending, Marketplace",,,,
3,PayMe India,Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,,"Financial Services, FinTech",,,,
4,Eunimart,Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,,"E-Commerce Platforms, Retail, SaaS",,,,
...,...,...,...,...,...,...,...,...,...,...,...
2874,Gigforce,Pre-series A,$3000000,Gurugram,A gig/on-demand staffing company.,2019.0,Staffing & Recruiting,,Endiya Partners,"Chirag Mittal, Anirudh Syal",
2875,Vahdam,Series D,$20000000,New Delhi,VAHDAM is among the world’s first vertically i...,2015.0,Food & Beverages,,IIFL AMC,Bala Sarda,
2876,Leap Finance,Series C,$55000000,Bangalore,International education loans for high potenti...,2019.0,Financial Services,,Owl Ventures,"Arnav Kumar, Vaibhav Singh",
2877,CollegeDekho,Series B,$26000000,Gurugram,"Collegedekho.com is Student’s Partner, Friend ...",2015.0,EdTech,,"Winter Capital, ETS, Man Capital",Ruchir Arora,
