# Indian Startup Funding Analysis

## Business Understanding

### Background

A *'Startup'* in India is characterized by it's age, size and funding method, though there isn't a strict definition. Generally, a startup is a young company, only a few years old, that hasn't yet achieved any consistent revenue. These companies operate on a small scale, often with just a working prototype or a paid pilot but they have the potential for rapid growth and expansion. Initially, they are founded by the founders' personal networks, including friends and family and they actively seek additional financing to support their growth and establish a sustainable business.

As an example, the government of India's startup program defines a startup as a company (PB 2017) that is;
-  Headquartered in India with not more than 10 years since incorporation or registration
-  Having an annual turnover of less than 1NR Billion (Roughly $14 million)

~Available~ [ADBI Working Paper series](https://www.adb.org/publications/policy-regulatory-changes-successful-startup-revolution-india)


#### Scenario
You are trying to venture intro the Indian Startup ecosytem. As the data expert of the team, you are to investigate the ecosystem and propose the best course of action.

## Business Objective;

The goal of this project is to investigate the Indian Startup ecosystem to better understand and provide valuabale insight into the opportunities and challenges in the ecosystem to help stakeholders  who plan on venturing into the startup ecosystem in India to make informed decisons based on findings from analyzing the dataset from 2018 to 2021.

### Hypothesis

Null;  Funding amount received by tech startups does not differ significantly from non-tech startups.

Alternate; Funding amount received by tech startups differs significantly  from non-tech startups.

### Business Questions
1. What are the funding trends over the years?
2. Which sectors received the most funding?
3. Who are the key investors in the Indian start-up space?
4. What are the differences in funding by region?
5. What strategic recommendations can be made based on the analysis?

## Data Understanding

<span style="background-color: grey"><strong>Data Collection</strong></span>

To effectively analyze the Indian Startup ecosystem from 2018-2021, comprehensive data collection is crucial.
The data was sourced from multiple datasets that deetail startup funding activities within this period. Each dataset will encompass various aspects essntial for a holistic understanding of the funding landscape. Soecifically, the dataset will include;

**A. Startup Details**
- Company/ Brand; Name of the company/startup
- Fonded; Year company was founded
- Headquarters/Location; Geographical location os the startup including city and Region
- Sector/ Industry; The industry or sector in which the company operates,ie healthtech, fintech etc
- What it does/ About comany; Description about the company
- Founders; Founders of the company

**B. Funding Information**
- Amount; The total amount of funding received by the startup in each funding round
- Stage/Round; Details of the funding stages such as seed, Series A, Series B

**C. Investors Information**
- Investors; The name of the investors or investment firms involved

<span style="background-color: grey"><strong>Data Quality Considerations</strong></span>

Ensuring high data quality is paramount for reliable analysis and actionable insights. Key considerations for maintaining data quality include;

**A. Completeness**

Ensuring that all necessary fields are filled across the datasets.This includes making sure that nocritical informatio is missing for any of the startups, funding rounds or investors.

**B. Consistency**

Handling any inconsistensies in data entry. This may involve standardizing entries for different sectors, locations and investor names to ensure uniformity. ie Health tech and Health Technology were standardized to a single format. 

**C. Handling missing Data**

Identifying and addressing missing data points. Techniques such as imputation, where appropriate, or excluding certain records if the missing data is minimal and does not impact overall analysis, will be employed 






### Import all necessary libraries

%pip install pyodbc  


In [26]:
# A package for creating a connection
%pip install pyodbc

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [27]:
# Database Library
import pyodbc

#suppress all warnings
from warnings import filterwarnings
filterwarnings('ignore')


# set display options
import pandas as pd
#pd.set_option(display.max_rows",100)
#pd.set_option(display.max_columns",100)

### Create a connection using SQL Alchemy

In [23]:
import os
from dotenv import load_dotenv, dotenv_values

load_dotenv('.env')

host = os.getenv('host')
database_name = os.getenv('database_name')
username1 = os.getenv('username1')
password = os.getenv('password')

In [27]:
print(database_name)

dapDB


In [33]:
from sqlalchemy import create_engine, text

conn_strn = f"mssql+pyodbc://{username1}:{password}@{host}/{database_name}?driver=ODBC+Driver+18+for+SQL+Server"

engine1 = create_engine(conn_strn)

In [34]:
engine1

Engine(mssql+pyodbc://LP1_learner:***@dap-projects-database.database.windows.net/dapDB?driver=ODBC+Driver+18+for+SQL+Server)

In [30]:
#"SELECT*FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE'"

In [35]:
with engine1.connect() as connection:
  result = connection.execute(text("SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE'"))


#Fetch results
  rows = result.fetchall()

  for row in rows:
    print(row)
  


OperationalError: (pyodbc.OperationalError) ('08001', '[08001] [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Timeout error [258].  (258) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Unable to complete login process due to delay in prelogin response (258)')
(Background on this error at: https://sqlalche.me/e/20/e3q8)

# Show 2021 table
# SELECT * FROM LP1_startup_funding2021

In [36]:
with engine.connect() as connection:
  result1 = connection.execute(text("SELECT * FROM LP1_startup_funding2021"))


#Fetch results
  rows1 = result.fetchall()

  for row in rows1:
    print(row)