##         INDIAN START_UP FUNDING ANALYSIS

### BUSINESS UNDERSTANDING

BACKGROUND: India has emerged as one of the most dynamic startup ecosystem attracting significant investment and fostering innovation accross various sectors. As our team plans to venture into this market, it is crucial to understand the key trends , investor patterns and sector specific insights to make informed strategic decisions.


OBJECTIVE : To comprehensively analyze the Indian Startup Ecosystem from 2018 to 2021, identify key trends and insights and provide data driven recommendations for strategic entry and investment opportunities in the indian startup market.

### UNDERSTANDING THE DATA

In [46]:
#!pip install pyodbc  
#!pip install python-dotenv 

In [47]:
import pyodbc     
from dotenv import dotenv_values    #import the dotenv_values function from the dotenv package
import pandas as pd
import warnings 

warnings.filterwarnings('ignore')

In [48]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('Notebooks/database_connection.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")
connection_string = (
    "DRIVER={ODBC Driver 17 for SQL Server};"
    "SERVER=dap-projects-database.database.windows.net;"
    "DATABASE=dapDB;"
    "UID=LP1_learner;"
    "PWD=Hyp0th3s!$T3$t!ng;"
)

In [49]:
# Create a connection string
#connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"



In [50]:
# Use the connect method of the pyodbc library and pass in the connection string.
# This will connect to the server and might take a few seconds to be complete. 
# Check your internet connection if it takes more time than necessary

connection=pyodbc.connect(connection_string)

In [51]:
# Now the sql query to get the data is what what you see below. 
# Note that you will not have permissions to insert delete or update this database table. 

# Now the SQL query to get data from the tables
query1 = "SELECT * FROM dbo.LP1_startup_funding2020"
query2 = "SELECT * FROM dbo.LP1_startup_funding2021"

# Execute the queries and load the data into pandas DataFrames
data_2020 = pd.read_sql(query1, connection)
data_2021 = pd.read_sql(query2, connection)



In [52]:
data_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


In [53]:
data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


In [54]:
data_2020.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.6+ KB


In [55]:
data_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [56]:
data_2020.shape


(1055, 10)

In [57]:
data_2021.shape

(1209, 9)

In [58]:
data_2019 =pd.read_csv(r"C:\Users\magyir\Documents\New folder\Team-Belize-Live-Project-1\Notebooks\startup_funding2019.csv")
data_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [59]:
data_2019.shape


(89, 9)

In [60]:
data_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [61]:
data_2018=pd.read_csv(r"C:\Users\magyir\Documents\New folder\Team-Belize-Live-Project-1\Notebooks\startup_funding2018.csv")
data_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [62]:
data_2018.shape

(526, 6)

In [63]:
data_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [64]:
# checking for the number of undisclosed amounts in the data
# Function to check for 'Undisclosed' or 'undisclosed' amounts
def count_undisclosed(data, amount_column):
    return data[amount_column].apply(lambda amount: str(amount).strip().lower() == 'undisclosed' if pd.notna(amount) else False).sum()

# Check for 'Undisclosed' or 'undisclosed' amounts in each dataset
undisclosed_2018 = count_undisclosed(data_2018, 'Amount')
undisclosed_2019 = count_undisclosed(data_2019, 'Amount($)')
undisclosed_2020 = count_undisclosed(data_2020, 'Amount')
undisclosed_2021 = count_undisclosed(data_2021, 'Amount')

print(f"Number of 'Undisclosed' amounts in 2018: {undisclosed_2018}")
print(f"Number of 'Undisclosed' amounts in 2019: {undisclosed_2019}")
print(f"Number of 'Undisclosed' amounts in 2020: {undisclosed_2020}")
print(f"Number of 'Undisclosed' amounts in 2021: {undisclosed_2021}")


Number of 'Undisclosed' amounts in 2018: 0
Number of 'Undisclosed' amounts in 2019: 12
Number of 'Undisclosed' amounts in 2020: 0
Number of 'Undisclosed' amounts in 2021: 43


In [65]:
#removing all rows with undisclosed amounts

data_2018 = data_2018[data_2018['Amount'].apply(lambda x: str(x).strip().lower() != 'undisclosed')]
data_2019 = data_2019[data_2019['Amount($)'].apply(lambda x: str(x).strip().lower() != 'undisclosed')]
data_2020 = data_2020[data_2020['Amount'].apply(lambda x: str(x).strip().lower() != 'undisclosed')]
data_2021 = data_2021[data_2021['Amount'].apply(lambda x: str(x).strip().lower() != 'undisclosed')]




In [66]:
# Define exchange rates for each year
exchange_rates = {
    2018: 68,   # 1 USD = 68 INR in 2018
    2019: 70,   # 1 USD = 70 INR in 2019
    2020: 74,   # 1 USD = 74 INR in 2020
    2021: 75    # 1 USD = 75 INR in 2021
}

# Function to convert amount to dollars based on the year
def convert_to_dollars(amount, year):
    if pd.isna(amount) or amount.strip() == '' or amount.lower() == 'undisclosed':
        return None
    
    # Handle amount if it is already a float
    if isinstance(amount, float):
        return amount
    
    amount = str(amount).replace(',', '').replace('—', '').replace(' ', '').strip()
    if amount.startswith('$'):
        return float(amount.replace('$', ''))
    elif amount.startswith('₹'):
        return float(amount.replace('₹', '')) / exchange_rates[year]
    else:
        try:
            return float(amount)  # If no currency symbol, assume it's already in dollars
        except ValueError:
            return None  # Return None if conversion fails


# Add a 'Year' column to each dataset
data_2018['Year'] = 2018
data_2019['Year'] = 2019
data_2020['Year'] = 2020
data_2021['Year'] = 2021





MERGING THE DATASET

In [67]:


# Standardize column names for 2018 data to match others
data_2018.columns = ['Company/Brand', 'Sector', 'Stage', 'Amount', 'HeadQuarter', 'What it does', 'Year']
data_2018['Founded'] = None  # Add missing columns with None values
data_2018['Founders'] = None
data_2018['Investor'] = None

# Select relevant columns from 2018 data
data_2018 = data_2018[['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage', 'Year']]

# Standardize columns for 2019, 2020, and 2021 data
data_2019 = data_2019[['Company/Brand', 'Founded','Sector', 'What it does', 'Amount($)', 'Stage', 'Year']]
data_2020 = data_2020[['Company_Brand', 'Founded', 'Sector', 'What_it_does', 'Amount','Stage', 'Year']]
data_2021 = data_2021[['Company_Brand', 'Founded', 'Sector', 'What_it_does', 'Amount','Stage', 'Year']]

# Rename columns to maintain consistency
data_2019.columns = ['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage', 'Year']
data_2020.columns = ['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage', 'Year']
data_2021.columns = ['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage', 'Year']

# Combine all data into a single DataFrame based on common columns
combined_data = pd.concat([data_2018, data_2019, data_2020, data_2021], ignore_index=True)

# Display the first few rows of the combined dataset
combined_data.head()




Unnamed: 0,Company/Brand,Founded,Sector,What it does,Amount,Stage,Year
0,TheCollegeFever,,"Brand Marketing, Event Promotion, Marketing, S...","TheCollegeFever is a hub for fun, fiesta and f...",250000,Seed,2018
1,Happy Cow Dairy,,"Agriculture, Farming",A startup which aggregates milk from dairy far...,"₹40,000,000",Seed,2018
2,MyLoanCare,,"Credit, Financial Services, Lending, Marketplace",Leading Online Loans Marketplace in India,"₹65,000,000",Series A,2018
3,PayMe India,,"Financial Services, FinTech",PayMe India is an innovative FinTech organizat...,2000000,Angel,2018
4,Eunimart,,"E-Commerce Platforms, Retail, SaaS",Eunimart is a one stop solution for merchants ...,—,Seed,2018


In [68]:
combined_data.tail()

Unnamed: 0,Company/Brand,Founded,Sector,What it does,Amount,Stage,Year
2819,Gigforce,2019.0,Staffing & Recruiting,A gig/on-demand staffing company.,$3000000,Pre-series A,2021
2820,Vahdam,2015.0,Food & Beverages,VAHDAM is among the world’s first vertically i...,$20000000,Series D,2021
2821,Leap Finance,2019.0,Financial Services,International education loans for high potenti...,$55000000,Series C,2021
2822,CollegeDekho,2015.0,EdTech,"Collegedekho.com is Student’s Partner, Friend ...",$26000000,Series B,2021
2823,WeRize,2019.0,Financial Services,India’s first socially distributed full stack ...,$8000000,Series A,2021


In [69]:
# Display information about the combined dataset
print(combined_data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2824 entries, 0 to 2823
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  2824 non-null   object 
 1   Founded        2059 non-null   float64
 2   Sector         2806 non-null   object 
 3   What it does   2824 non-null   object 
 4   Amount         2567 non-null   object 
 5   Stage          1911 non-null   object 
 6   Year           2824 non-null   int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 154.6+ KB
None


### ANALYTICAL QUESTIONS

1. Which sectors received the most funding from 2018 to 2021?
     a. Analyze the distribution of funding accross various sectors and determine the porportion of funding each sector received.
     b. Highlight any emerging sectors that have seen significant growth in funding during this period.

2. How has the total funding  amount changed over the years from 2018 to 2021?
     a. Identify trends and patterns in annual funding such as periods of rapid growth or decline.
     b. Investigate any external factors or events that may have influenced changes in funding levels during these years.

3. Is there a significant difference in the total amount of funding received by Technology related startups compared to Non-Technology related startups?
     a. Compare the total funding amounts received by Technology related startups.
     b. Perform statistical analysis to determine if the differences in the funding amounts are significant.

4. How does the funding amounts differ accross sectors?
     a. Perform a detailed analysis of the funding amounts across different sectors.
     b. Calculate the mean, median and range of funding amounts for different sectors.
     c.  Investigate the distributuion of funding amounts within sectors to understand if funding is concentrated among a few startup or more evenly distributed. 

### HYPOTHESIS

Null Hypothesis (H0) - The sector in which a company operates has no significant impact on the funding amount it receives.

Alternative Hypothesis (H1) - The sector in which a company operates has significant impact on the funding amount it receives.

## EXPLORATORY DATA ANALYSIS

In [70]:
# Check for missing values
missing_values = combined_data.isnull().sum()
print("Missing Values in the Dataset")
print(missing_values)

Missing Values in the Dataset
Company/Brand      0
Founded          765
Sector            18
What it does       0
Amount           257
Stage            913
Year               0
dtype: int64


In [71]:
# Display the 18 rows of data with missing values for sector column

missing_sector_rows = combined_data[combined_data['Sector'].isnull()]
missing_sector_rows


Unnamed: 0,Company/Brand,Founded,Sector,What it does,Amount,Stage,Year
560,VMate,,,A short video platform,"$100,000,000",,2019
567,Awign Enterprises,2016.0,,It supplies workforce to the economy,"$4,000,000",Series A,2019
570,TapChief,2016.0,,It connects individuals in need of advice in a...,"$1,500,000",Pre series A,2019
572,KredX,,,Invoice discounting platform,"$26,000,000",Series B,2019
573,m.Paani,,,It digitizes and organises local retailers,"$5,500,000",Series A,2019
1121,Text Mercato,2015.0,,Cataloguing startup that serves ecommerce plat...,649600.0,Series A,2020
1172,Magicpin,2015.0,,"It is a local discovery, rewards, and commerce...",7000000.0,Series D,2020
1290,Leap Club,,,Community led professional network for women,340000.0,Pre seed round,2020
1302,Juicy Chemistry,2014.0,,It focuses on organic based skincare products,650000.0,Series A,2020
1310,Magicpin,2015.0,,"It is a local discovery, rewards, and commerce...",3879000.0,,2020
