# Title: Indian Startup Funding Analysis

In [1]:
# CRISP- DM FRAMEWORK - 6 steps
# Business understanding 
    # Background of the problem 
    # Scenarios
    # Business objective - reason
    # Hypothesis
    # Business questions

# Data Understanding
    # Data collection
    # - columns and various observations
    # - complete, consistency, handling missing values, 
    # Import all necessary libraries
    # Connect to the load / load the data
    # Apply all necessary transformation

# Data Preparation
    # Scikit learn library - ml
    # normalize a data
    # standardize a data

# Modelling 
    # selct the algorithm to create a model

# Evaluation
     # Performance of the model
    # Evaluation metrics
    # Hyperparameter tuning - grid search & parameter grid

# Deployment 
    # - streamlit, Swagger UI, 



## Business Understanding

Background

A "startup" is typically characterized by its age, size, and funding method, though there isn't a strict definition. Generally, a startup is a young company, only a few years old, that hasn't yet achieved consistent revenue. These companies operate on a small scale, often with just a working prototype or a paid pilot, but they have the potential for rapid growth and expansion. Initially, they are funded by the founders' personal networks, including friends and family, and they actively seek additional financing to support their growth and establish a sustainable business.

As an example, the Government of India’s Startup India program defines a “startup” as a company (PIB 2017) that is:

1. Headquartered in India with not more than ten years since incorporation or registration

2. Having an annual turnover of less than INR 1 billion (roughly $14 million) (Startup India 2019)

Available: ADBI Working Paper Series       


Scenario

Your team is trying to venture into the Indian start-up ecosystem. As the data expert of the team, you are to investigate the ecosystem and propose the best course of action.     


Business Objective:

The goal of this project is to investigate the Indian Startup Ecosystem to better understand and provide valuable insight into the opportunities and challenges in the ecosystem to help stakeholders who plan on venturing into the startup ecosystem in India to make informed decisions based on findings from analyzing the dataset from 2018 to 2021.


Hypothesis       

Null: Funding amount received by tech startups does not differ significantly from non-tech startups funding received.    

Alternative: Funding amount received by tech startups does differ significantly from non-tech startups funding received.

Business Questions  

    What is the total disclosed amount of funding received from investors from 2018 to 2021?    
    What is the trend for the number of startups that received a disclosed amount of funding from 2018-2021?   
    What is the total disclosed amount of funding for startups based on their sector?       
    What are the three locations that received the most disclosed funding amount?   
    What are the three locations that received the least disclosed funding amount?    

# Data Understanding 

In [2]:
%pip install pyodbc

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
# importing the pyodbc library after installation

import pyodbc
import pandas as pd

In [4]:
pip list

Package            Version
------------------ -----------
asttokens          3.0.0
certifi            2025.1.31
charset-normalizer 3.4.1
colorama           0.4.6
comm               0.2.2
contourpy          1.3.1
cycler             0.12.1
debugpy            1.8.9
decorator          5.1.1
executing          2.1.0
fonttools          4.56.0
idna               3.10
ipykernel          6.29.5
ipython            8.30.0
jedi               0.19.2
jupyter_client     8.6.3
jupyter_core       5.7.2
kiwisolver         1.4.8
matplotlib         3.10.0
matplotlib-inline  0.1.7
mysqlclient        2.2.7
nest-asyncio       1.6.0
packaging          24.2
pandas             2.2.3
parso              0.8.4
pillow             11.1.0
pip                24.3.1
platformdirs       4.3.6
prompt_toolkit     3.0.48
psutil             6.1.0
pure_eval          0.2.3
Pygments           2.18.0
pymssql            2.3.2
pyodbc             5.2.0
pyparsing          3.2.1
python-dateutil    2.9.0.post0
pytz               2025.



In [23]:
# data collection

from sqlalchemy import create_engine, text, inspect

# Database credentials 
host="dap-projects-database.database.windows.net";
database_name="dapDB";
username="LP1_learner";
password="Hyp0th3s!$T3$t!ng";

# create the connection string 

connct_str = f"mssql+pyodbc://{username}:{password}@{host}/{database_name}?driver=ODBC+Driver+18+for+SQL+Server"

# ctreate connection / engine
engine = create_engine(connct_str)


In [27]:
engine

Engine(mssql+pyodbc://LP1_learner:***@dap-projects-database.database.windows.net/dapDB?driver=ODBC+Driver+18+for+SQL+Server)

In [28]:
# # using the inspect to check the tables

t = inspect(engine)

tables = t.get_table_names()
print(f"tables in databeses:", tables)

InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
(Background on this error at: https://sqlalche.me/e/20/rvf5)

In [None]:
# the pyodbc is a library that is used for open source databases 
%pip install pyodbc

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
# collecting the data 

from sqlalchemy import create_engine, text

#database credentials 
host = "dap-projects-database.database.windows.net";
username = "LP1_learner";
password = "Hyp0th3s!$T3$t!ng";
database_name = "dapDB";

# create connection / engine to database 
engine = create_engine(f"mssql+pyodbc://{username}:{password}@{host}/{database_name}?driver=ODBC+Driver+18+for+SQL+Server")

In [None]:
engine

Engine(mssql+pyodbc://LP1_learner:***@dap-projects-database.database.windows.net/dapDB?driver=ODBC+Driver+18+for+SQL+Server)

In [None]:
# # using the inspect to check the tables

# t = inspect(engine)

# tables = t.get_table_names()
# print(f"tables in databeses:", tables)

In [None]:
# load the data from the mssql database

query = "SELECT * FROM LP1_startup_funding2020"

In [None]:
# fetching data 

with engine.connect() as connection:
    result = connection.execute(text(query))

    data = result.fetchall()



NameError: name 'engine' is not defined

In [None]:
# this the lines below converts the data in text format above to a dataframe 

import pandas as pd #this is to import the pandas library 
dp = pd.DataFrame(data) #this code is to convert the data

NameError: name 'data' is not defined

In [None]:
# loading the data in the LP1_startup_funding2020 table 

with engine.connect() as connection:
    result1  = connection.execute(text("select * from LP1_startup_funding2020"))

rows1 = result1.fetchall()

for row in rows1:
    print(row)

DBAPIError: (pyodbc.Error) ('HY010', '[HY010] [Microsoft][ODBC Driver 18 for SQL Server]Function sequence error (0) (SQLFetch)')
(Background on this error at: https://sqlalche.me/e/20/dbapi)

In [None]:
# Load a data

query = "SELECT * FROM LP1_startup_funding2021"

In [None]:
with engine.connect() as connection:
    result = connection.execute(text(query))

    data = result.fetchall()

DBAPIError: (pyodbc.Error) ('HY000', '[HY000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Cannot open server " dap-projects-database.database.windows.net" requested by the login.  The login failed. (40532) (SQLDriverConnect); [HY000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Cannot open server " dap-projects-database.database.windows.net" requested by the login.  The login failed. (40532)')
(Background on this error at: https://sqlalche.me/e/20/dbapi)

In [None]:
# convert to a data frame
import pandas as pd

df = pd.DataFrame(data)

df

In [None]:
pip freeze

altair==5.4.1
annotated-types==0.7.0
anyio @ file:///C:/ci_311/anyio_1676425491996/work/dist
argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work
argon2-cffi-bindings @ file:///C:/ci_311/argon2-cffi-bindings_1676424443321/work
arrow==1.3.0
astor==0.8.1
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
async-lru @ file:///C:/b/abs_e0hjkvwwb5/croot/async-lru_1699554572212/work
attrs @ file:///C:/b/abs_35n0jusce8/croot/attrs_1695717880170/work
Babel @ file:///C:/ci_311/babel_1676427169844/work
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
beautifulsoup4 @ file:///C:/b/abs_0agyz1wsr4/croot/beautifulsoup4-split_1681493048687/work
bleach @ file:///opt/conda/conda-bld/bleach_1641577558959/work
blinker==1.8.2
Brotli @ file:///C:/ci_311/brotli-split_1676435766766/work
bs4==0.0.2
cachetools==5.5.0
catboost==1.2.7
certifi @ file:///C:/b/abs_91u83siphd/croot/certifi_1700501720658/work/certifi
cffi @ file:///C:/b/abs_924gv1kxzj/croot/cffi_1

In [None]:
import pyodbc

d = pyodbc.drivers()

d

['SQL Server',
 'SQL Server Native Client RDA 11.0',
 'ODBC Driver 17 for SQL Server',
 'MySQL ODBC 8.3 ANSI Driver',
 'MySQL ODBC 8.3 Unicode Driver',
 'Devart ODBC Driver for SQL Server',
 'ODBC Driver 18 for SQL Server',
 'Microsoft Access Driver (*.mdb, *.accdb)',
 'Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)',
 'Microsoft Access Text Driver (*.txt, *.csv)',
 'Microsoft Access dBASE Driver (*.dbf, *.ndx, *.mdx)']