## Title: Indian Startup Funding Analysis

## Business Understanding

### Background
 A "startup" is typically characterized by its age, size, and funding method, though there isn't a strict definition. Generally, a startup is a young company, only a few years old, that hasn't yet achieved consistent revenue. These companies operate on a small scale, often with just a working prototype or a paid pilot, but they have the potential for rapid growth and expansion. Initially, they are funded by the founders' personal networks, including friends and family, and they actively seek additional financing to support their growth and establish a sustainable business.

As an example, the Government of India’s Startup India program defines a “startup” as a company (PIB 2017) that is:

     1. Headquartered in India with not more than ten years since incorporation or registration

     2. Having an annual turnover of less than INR 1 billion (roughly $14 million) (Startup India 2019)

`Available:` [ADBI Working Paper Series](https://www.adb.org/publications/startup-environment-and-funding-activity-india)

### Scenario
Your team is trying to venture into the Indian start-up ecosystem. As the data expert of the team, you are to investigate the ecosystem and propose the best course of action. 


## Business Objective:
The goal of this project is to investigate the Indian Startup Ecosystem to better understand and provide valuable insight into the opportunities and challenges in the ecosystem to help stakeholders who plan on venturing into the startup ecosystem in India to make informed decisions based on findings from analyzing the dataset from 2018 to 2021.

<h3>Hypothesis</h3>

<h3> Business Questions</h3>

## Step 2: Data Understanding

`Data Collection`

To effectively analyze the Indian start-up ecosystem from 2018 to 2021, comprehensive data collection is crucial. The data will be sourced from multiple datasets that detail startup funding activities within this period. Each dataset will encompass various aspects essential for a holistic understanding of the funding landscape. Specifically, the datasets will include:

**A. Start-up Details:**

-   Company/Brand: Name of the company/start-up

-   Founded: Year start-up was founded

-   Headquarters/Location: The geographical location of the start-up, including city and region.

-   Sector/Industry: The industry or sector in which the start-up operates, such as health tech, fintech, etc.

-   What it does/About Company: Description about the Company

-   Founders: Founders of the Company

**B. Funding Information:**

-   Amount: The total amount of funding received by the start-up in each funding round.

-   Stage/Round: Details of the funding stages such as seed, series A, series B, etc.

**C. Investors' Information:**

-   Investors: The names of the investors or investment firms involved.

`Data Quality Considerations`

Ensuring high data quality is paramount for reliable analysis and actionable insights. Key considerations for maintaining data quality include:

**A. Completeness:**

Ensure that all necessary fields are filled in across the datasets. This includes making sure that no critical information is missing for any of the start-ups, funding rounds, or investors.

**B. Consistency:**

Handle any inconsistencies in data entries. This may involve standardizing entries for sectors, locations, and investor names to ensure uniformity. For example, variations in how sectors are labeled (e.g., "Healthtech" vs. "Health Technology") should be standardized to a single format.

**C. Handling Missing Data:**

Identify and address missing data points. Techniques such as imputation, where appropriate, or excluding certain records if the missing data is minimal and does not impact overall analysis, will be employed.

In [1]:
%pip install pyodbt

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement pyodbt (from versions: none)
ERROR: No matching distribution found for pyodbt

[notice] A new release of pip is available: 24.1.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [14]:
%pip install sqlalchemy pyodbc

Collecting pyodbc
  Downloading pyodbc-5.2.0-cp312-cp312-win_amd64.whl.metadata (2.8 kB)
Downloading pyodbc-5.2.0-cp312-cp312-win_amd64.whl (69 kB)
   ---------------------------------------- 0.0/69.5 kB ? eta -:--:--
   ----------------- ---------------------- 30.7/69.5 kB ? eta -:--:--
   ---------------------------------------- 69.5/69.5 kB 940.4 kB/s eta 0:00:00
Installing collected packages: pyodbc
Successfully installed pyodbc-5.2.0
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [16]:
import warnings
import pandas as pd
warnings.filterwarnings("ignore")

#set display
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", 11)

In [17]:
from sqlalchemy import create_engine, text, inspect

In [18]:
host="dap-projects-database.database.windows.net";
database_name="dapDB";
username="LP1_learner";
password="Hyp0th3s!$T3$t!ng";

In [21]:
conn_string = f"mssql+pyodbc://{username}:{password}@{host}/{database_name}?driver=ODBC+Driver+18+for+SQL+Server"


engine = create_engine(conn_string)
tables = inspect(engine)
my_tables = tables.get_table_names()