# **CAREER ACCELERATOR LP1 - PROJECT**

### **Introduction:**

Ideas, creativity, and execution are essential for a start-up to flourish. But are they enough? Investors provide start-ups and other entrepreneurial ventures with the capital---popularly known as "funding"---to think big, grow rich, and leave a lasting impact. In this project, you are going to analyse funding received by start-ups in India in 2018. You will find the data for the year of funding in a separate csv file in the dataset provided. In these files you'll find the start-ups' details, the funding amounts received, and the investors' information.


### **Scenario:**
Team Iridium Has been tasked with analyzing the Indian Startup Ecosystem. The analysis should provide insight as to the best course of action for the company.

### **Task:**

Our task is to develop a unique story from this dataset by stating and testing a hypothesis, asking questions, perform analysis and share insights with appropriate visualisations.

# **INDIAN STARTUP ECOSYSTEM ANALYSIS 2018**

# **1. Business Understanding**

To be able to understand anything, We must first break it apart and examine it's components before we understand how it works as a whole. The task is to perform an analysis of the 'Indian Start-Up Ecosystem', but what exactly do each of these mean? Let's dive into the definitions of each of the elements in the task;

#### **Definitions** ####
##### **Ecosystem:**
In natural sciences, ‘ecosystems’ are generally defined as a system, or a group of interconnected elements, formed by the interaction of a community of organisms with their environment. 

##### **Startup:**
A startup or start-up is a company or project undertaken by an entrepreneur to seek, develop, and validate a scalable business model. Startups are new businesses that intend to grow large beyond the solo founder. At the beginning, startups face high uncertainty and have high rates of failure, but a minority of them do go on to become successful and influential.

##### **India:**
India, country that occupies the greater part of South Asia. India is made up of 28 states and eight union territories, and its national capital is New Delhi. It is the seventh-largest country by area and the most populous country as of June 2023.

#### **So What is a Start-Up Ecosystem and why should we care?**

A startup ecosystem is community of people, startups in their various stages and various types of organizations (funders, governments, etc) in a location (physical or virtual), interacting as a system to create and scale new startups. 

Neither biological nor startup ecosystems can be created, designed or built by an outside actor. While this makes the term ‘start-up ecosystem’ hard to grasp, it does underline that start-ups operate in complex and highly dynamic environments. For this reason, it is particularly important to take sufficient time to analyse and understand the ecosystem before designing interventions to partake in it.

Just like biological ecosystems, a startup ecosystem consists of different elements, which can be individuals, groups, organisations and institutions that form a community by interacting with one another, but also environmental determinants that have an influence on how these actors work and interconnect; in startup ecosystems, these can be laws and policies or cultural norms.

![**A Start-Up Ecosystem**](https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/StartupEcosystem.png/300px-StartupEcosystem.png)

#### **Previous Studies / Research**

In nature, for any and all participants to thrive, the ecosystem must be healthy and in balance. For a company this could be the best indicator for whether to invest in an ecosystem or not. Previous studies and researchers have identified 5 key aspects of an ecosystem that can be tracked to measure it's vibrance and and these are:


**1. What is the Density and ecosystem value?**  \
A first step to mapping an ecosystem is to look at its actual size, growth, and value. This can be tracked by the number of new startups founded in a region during a specific period but also the total combined valuation of all these companies over time, and even break them down by funding year to monitor each cohort. Looking at the number of exits, especially the larger ones are also an interesting indicator of startup success.

**2. How does the Funding activity look in the Ecosystem?** \
To assess the health of a startup ecosystem we need to have an eye on the quality, quantity, and ease of access to funding. To evaluate the ease of access to funding, start tracking early-stage funding rounds. Their volume and growth over time will let us know if start-ups are getting the support they need to take their business off the ground. The location of the investors will help you to identify foreign VCs already investing in your Indian startup ecosystem and allow us to build bridges for potential collaboration and partnerships.

**3. Market reach and scaling opportunities** \
The easiest way to gauge the success of your startups is to watch the unicorns (measured in terms of companies valued at over $1 billion) in your ecosystem. Although it may be a metric not relevant in the future (due to the increase in number of unicorns), it remains an interesting indicator of startup ecosystem success.

**4.Knowledge and innovation** \
Innovation and entrepreneurship often flourish alongside world-class knowledge institutes and R&D incentives. These institutions often foster high-impact innovation, collaboration, and success across sectors. You can measure the level of innovation and new technology in your local ecosystem through research and patent activity, and by keeping tabs on the number of spinouts your local knowledge institutions produce. 

**5. Connectedness, Talent, Diversity, and more…** \
A vibrant ecosystem is not simply a collection of isolated elements, the connections between the elements matter just as much as the elements themselves. The metrics for connectedness and access to quality and diverse talent are a little more complex. You could however look out for the number of accelerators & incubators in your region, on job boards to access the type of talent your startups are looking for the most and on investment heatmaps to understand the breadth of various industries or depth of expertise present in your community.

### **Business Objective** 
To find out whether to invest in the Indian start-up ecosystem or not.

#### **Hypothesis**
Null - The Indian Startup Ecosystem is healthy and worth an investment\
Alternative  - The Indian Startup Ecosystem is weak and not worthy of investment

#### **Key Questions**

Using metrics similar to those of previous researchers enables the company to easily compare the Indian case with other global thereby giving the company a broader worldview and the ability to make a more informed decision. 
This is to mean our Key questions will be influenced heavily by the body of previous research.

**1. What is the Total Value of the Indian Startup Ecosystem?**
* How Many startups were founded in the period
* How Much Money has the ecosystem receive in funding 

**2. How has the Ecosystem changed over time?**
* What is the change in performance year on year
* Which region has the best performance

**3. What is the Success rate of Start-ups in the ecosystem?**
* Are there any unicorns from the ecosystem
* How Many Unicorns

**4. Who is already in the Ecosystem?**
* How many companies are already involved in the ecosystem
* What fields are they invested in

**5. Which is the best performing sector in the ecosystem?**
* Sector with highest amount raised
* Sector with most start-ups


#### **Success Criteria**

1. To produce a dashboard that showcases the metrics monitoring the health of the Indian Start-up Ecosystem.
2. To provide an objective metric that can be used to compare with other startup ecosystems.
3. If decision is to invest, to provide guidance on the best path of investment into the Indian Startup Ecosystem.

# **2. Data Understanding**

### **Data Preparation**

#### **Importations**

In [35]:
# import all necessary libraries
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import plotly.express as px
import plotly.graph_objs as go

#remove pandas display limits
pd.set_option('display.max_columns', None)

#hide warnings
import warnings

warnings.filterwarnings('ignore')


#confrimation all libraries loaded
print("all libraries loaded successfully")

all libraries loaded successfully


#### **Reading the Data**

##### *YEAR: 2018*

In [36]:
# import 2018 data from GitHub
# Available from Azubi Africa Career Accelerator LP1 Repository as csv

df_2018 = pd.read_csv("https://raw.githubusercontent.com/Azubi-Africa/Career_Accelerator_LP1-Data_Analysis/main/startup_funding2018.csv")

#Reading the first five(5) rows of data.
df_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [37]:
#Reading the last four (4) rows of data.
df_2018.tail(4)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...
525,Netmeds,"Biotechnology, Health Care, Pharmaceutical",Series C,35000000,"Chennai, Tamil Nadu, India",Welcome to India's most convenient pharmacy!


**Notes:** \
    1. The data for each year is saved in variables named 'df_year'

## **Exploratory Data Analysis**

The data provided is expected to have the following columns to be used in the analysis:


|  | **COLUMN NAME** | **DESCRIPTION** | **EXPECTED DATATYPE** |
|--|-----------------|-----------------|-----------------------|
|**1**| **Company/Brand** | Name of the company/start-up | Object |
|**2**| **Founded** | Year start-up was founded | Datetime / int / float |
|**3**| **Sector** | Sector of service | Object |
|**4**| **What it does** | Description about Company | Object |
|**5**| **Founders** | Founders of the Company | Object |
|**6**| **Investor** | Investors | Object |
|**7**| **Amount(\$)** | Raised funds | float / int |
|**8**| **Stage** | Round of funding reached | Object / int |

#### **1. EDA FOR 2018**

In [38]:
#checking info
df_2018.shape

(526, 6)

The dataframe has 6 columns and 526 rows

In [39]:
#checking nulls
df_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


Amount records should be integer as data type, not as object as shown about.

**Decision**: To convert Amount records to appropriate data type as "int"

In [40]:
#checking for duplicates
df_2018.duplicated().sum()

1

There is one duplicate record\
**Decision:**
    To drop the duplicate column.

In [41]:
#describing the data
df_2018.describe(include = 'all')

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
count,526,526,526,526,526,526
unique,525,405,21,198,50,524
top,TheCollegeFever,—,Seed,—,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
freq,2,30,280,148,102,2


## Data Cleaning

**Handling Missing Values**

In [42]:
#Dropping rows with any missing values
df_2018.dropna(inplace=True)

**Removing Duplicates**

In [43]:
#Removing the duplicates
df_2018.drop_duplicates(inplace=True)

**Handling inconsistent Data**

In [44]:
#Removing "₹", ",", "—" signs on the Amount records
df_2018["Amount"]

0           250000
1      ₹40,000,000
2      ₹65,000,000
3          2000000
4                —
          ...     
521      225000000
522              —
523           7500
524    ₹35,000,000
525       35000000
Name: Amount, Length: 525, dtype: object

In [45]:
# Amount column
df_2018["Amount"]=df_2018.Amount.apply(lambda x:str(x).replace("₹", "")) # removes ₹
df_2018["Amount"]=df_2018.Amount.apply(lambda x:str(x).replace(",", "")) # removes ,
df_2018["Amount"]=df_2018.Amount.apply(lambda x:str(x).replace("—", "")) # removes —
df_2018["Amount"]

0         250000
1       40000000
2       65000000
3        2000000
4               
         ...    
521    225000000
522             
523         7500
524     35000000
525     35000000
Name: Amount, Length: 525, dtype: object

In [46]:
#Industry Column
df_2018["Industry"]=df_2018.Industry.apply(lambda x:str(x).replace("—", "")) # removes — in the Industry column

**Converting Amount records to numeric (int) and Replacing the empty records in Amount records with 0 value**

In [47]:
#Converting Amount records to numeric and replacing the empty records to zero (0)
df_2018["Amount"]=pd.to_numeric(df_2018["Amount"], errors="coerce").fillna(0).astype(int)
df_2018.info()


<class 'pandas.core.frame.DataFrame'>
Index: 525 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   525 non-null    object
 1   Industry       525 non-null    object
 2   Round/Series   525 non-null    object
 3   Amount         525 non-null    int32 
 4   Location       525 non-null    object
 5   About Company  525 non-null    object
dtypes: int32(1), object(5)
memory usage: 26.7+ KB


In [13]:
#Showing the Amount Column with zeros replacement for empty cells.
df_2018["Amount"]

0         250000
1       40000000
2       65000000
3        2000000
4              0
         ...    
521    225000000
522            0
523         7500
524     35000000
525     35000000
Name: Amount, Length: 525, dtype: int32

**Handling Missing values**

In [48]:
#Checking missing values
df_2018.isna().any()

Company Name     False
Industry         False
Round/Series     False
Amount           False
Location         False
About Company    False
dtype: bool

**There is no Missing values in the dataset**

**Describing the Amount records**

In [49]:
df_2018.describe()

Unnamed: 0,Amount
count,525.0
mean,38695270.0
std,356391200.0
min,-2147484000.0
25%,0.0
50%,500000.0
75%,16000000.0
max,2029600000.0


**Exporting data to csv file**

In [51]:
# 'df_2018' is DataFrame containing cleaned data
df_2018.to_excel('cleaned_data_df_2018.xlsx', index=False)  # Specify the filename and set index=False to exclude row numbers