##         INDIAN START_UP FUNDING ANALYSIS

### BUSINESS UNDERSTANDING

BACKGROUND: India has emerged as one of the most dynamic startup ecosystem attracting significant investment and fostering innovation accross various sectors. As our team plans to venture into this market, it is crucial to understand the key trends , investor patterns and sector specific insights to make informed strategic decisions.


OBJECTIVE : To comprehensively analyze the Indian Startup Ecosystem from 2018 to 2021, identify key trends and insights and provide data driven recommendations for strategic entry and investment opportunities in the indian startup market.

### UNDERSTANDING THE DATA

In [1]:
!pip install pyodbc  
!pip install python-dotenv 



In [2]:
import pyodbc     
from dotenv import dotenv_values    #import the dotenv_values function from the dotenv package
import pandas as pd
import warnings 

warnings.filterwarnings('ignore')

In [3]:
# Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('database_connection.env')

# Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

In [4]:
# Create a connection string
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"

In [5]:
# Use the connect method of the pyodbc library and pass in the connection string.
# This will connect to the server and might take a few seconds to be complete. 
# Check your internet connection if it takes more time than necessary

connection = pyodbc.connect(connection_string)

OperationalError: ('08001', '[08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (17) (SQLDriverConnect); [08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (Connect()). (53); [08001] [Microsoft][ODBC SQL Server Driver]Invalid connection string attribute (0)')

In [None]:
# Now the sql query to get the data is what what you see below. 
# Note that you will not have permissions to insert delete or update this database table. 

# Now the SQL query to get data from the tables
query1 = "SELECT * FROM dbo.LP1_startup_funding2020"
query2 = "SELECT * FROM dbo.LP1_startup_funding2021"

# Execute the queries and load the data into pandas DataFrames
data_2020 = pd.read_sql(query1, connection)
data_2021 = pd.read_sql(query2, connection)



In [None]:
data_2020.head()

In [None]:
data_2021.head()

In [None]:
data_2020.info()


In [None]:
data_2021.info()

In [None]:
data_2020.shape


In [None]:
data_2021.shape

In [None]:
data_2019 =pd.read_csv(r"C:\Users\magyir\Documents\New folder\Team-Belize-Live-Project-1\Notebooks\startup_funding2019.csv")
data_2019.head()

In [None]:
data_2019.shape


In [None]:
data_2019.info()

In [None]:
data_2018=pd.read_csv(r"C:\Users\magyir\Documents\New folder\Team-Belize-Live-Project-1\Notebooks\startup_funding2018.csv")
data_2018.head()

In [None]:
data_2018.shape

In [None]:
data_2018.info()

DATA CLEANING AND PROCESSING

In [None]:


# Standardize column names for 2018 data to match others
data_2018.columns = ['Company/Brand', 'Sector', 'Stage', 'Amount', 'HeadQuarter', 'What it does']
data_2018['Founded'] = None  # Add missing columns with None values
data_2018['Founders'] = None
data_2018['Investor'] = None

# Select relevant columns from 2018 data
data_2018 = data_2018[['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage']]

# Standardize columns for 2019, 2020, and 2021 data
data_2019 = data_2019[['Company/Brand', 'Founded','Sector', 'What it does', 'Amount($)', 'Stage']]
data_2020 = data_2020[['Company_Brand', 'Founded', 'Sector', 'What_it_does', 'Amount','Stage']]
data_2021 = data_2021[['Company_Brand', 'Founded', 'Sector', 'What_it_does', 'Amount','Stage']]

# Rename columns to maintain consistency
data_2019.columns = ['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage']
data_2020.columns = ['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage']
data_2021.columns = ['Company/Brand', 'Founded', 'Sector', 'What it does', 'Amount', 'Stage']

# Combine all data into a single DataFrame based on common columns
combined_data = pd.concat([data_2018, data_2019, data_2020, data_2021], ignore_index=True)

# Display the first few rows of the combined dataset
combined_data.head()




In [None]:
combined_data.tail()

In [None]:
# Display information about the combined dataset
print(combined_data.info())

### ANALYTICAL QUESTIONS

1. Which sectors received the most funding from 2018 to 2021?
     a. Analyze the distribution of funding accross various sectors and determine the porportion of funding each sector received.
     b. Highlight any emerging sectors that have seen significant growth in funding during this period.

2. How has the total funding  amount changed over the years from 2018 to 2021?
     a. Identify trends and patterns in annual funding such as periods of rapid growth or decline.
     b. Investigate any external factors or events that may have influenced changes in funding levels during these years.

3. Is there a significant difference in the total amount of funding received by Technology related startups compared to Non-Technology related startups?
     a. Compare the total funding amounts received by Technology related startups.
     b. Perform statistical analysis to determine if the differences in the funding amounts are significant.

4. How does the funding amounts differ accross sectors?
     a. Perform a detailed analysis of the funding amounts across different sectors.
     b. Calculate the mean, median and range of funding amounts for different sectors.
     c.  Investigate the distributuion of funding amounts within sectors to understand if funding is concentrated among a few startup or more evenly distributed. 

### HYPOTHESIS

Null Hypothesis (H0) - The sector in which a company operates has no significant impact on the funding amount it receives.

Alternative Hypothesis (H1) - The sector in which a company operates has significant impact on the funding amount it receives.