# **CAREER ACCELERATOR LP1 - PROJECT**

### **Introduction:**

Ideas, creativity, and execution are essential for a start-up to flourish. But are they enough? Investors provide start-ups and other entrepreneurial ventures with the capital---popularly known as "funding"---to think big, grow rich, and leave a lasting impact. In this project, you are going to analyse funding received by start-ups in India from 2018 to 2021. You will find the data for each year of funding in a separate csv file in the dataset provided. In these files you'll find the start-ups' details, the funding amounts received, and the investors' information.


### **Scenario:**
My team Has been tasked with analyzing the Indian Startup Ecosystem. The analysis should provide insight as to the best course of action for the company.

### **Task:**

Our task is to develop a unique story from this dataset by stating and testing a hypothesis, asking questions, perform analysis and share insights with appropriate visualisations.

# **INDIAN STARTUP ECOSYSTEM ANALYSIS 2018 - 2021**

# **1. Business Understanding**

To be able to understand anything, We must first break it apart and examine it's components before we understand how it works as a whole. The task is to perform an analysis of the 'Indian Start-Up Ecosystem', but what exactly do each of these mean? Let's dive into the definitions of each of the elements in the task;

#### **Definitions** ####
##### **Ecosystem:**
In natural sciences, ‘ecosystems’ are generally defined as a system, or a group of interconnected elements, formed by the interaction of a community of organisms with their environment. 

##### **Startup:**
A startup or start-up is a company or project undertaken by an entrepreneur to seek, develop, and validate a scalable business model. Startups are new businesses that intend to grow large beyond the solo founder. At the beginning, startups face high uncertainty and have high rates of failure, but a minority of them do go on to become successful and influential.

##### **India:**
India, country that occupies the greater part of South Asia. India is made up of 28 states and eight union territories, and its national capital is New Delhi. It is the seventh-largest country by area and the most populous country as of June 2023.

#### **So What is a Start-Up Ecosystem and why should we care?**

A startup ecosystem is community of people, startups in their various stages and various types of organizations (funders, governments, etc) in a location (physical or virtual), interacting as a system to create and scale new startups. 

Neither biological nor startup ecosystems can be created, designed or built by an outside actor. While this makes the term ‘start-up ecosystem’ hard to grasp, it does underline that start-ups operate in complex and highly dynamic environments. For this reason, it is particularly important to take sufficient time to analyse and understand the ecosystem before designing interventions to partake in it.

Just like biological ecosystems, a startup ecosystem consists of different elements, which can be individuals, groups, organisations and institutions that form a community by interacting with one another, but also environmental determinants that have an influence on how these actors work and interconnect; in startup ecosystems, these can be laws and policies or cultural norms.

![**A Start-Up Ecosystem**](https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/StartupEcosystem.png/300px-StartupEcosystem.png)

#### **Previous Studies / Research**

In nature, for any and all participants to thrive, the ecosystem must be healthy and in balance. For a company this could be the best indicator for whether to invest in an ecosystem or not. Previous studies and researchers have identified 5 key aspects of an ecosystem that can be tracked to measure it's vibrance and and these are:


**1. What is the Density and ecosystem value?**  \
A first step to mapping an ecosystem is to look at its actual size, growth, and value. This can be tracked by the number of new startups founded in a region during a specific period but also the total combined valuation of all these companies over time, and even break them down by funding year to monitor each cohort. Looking at the number of exits, especially the larger ones are also an interesting indicator of startup success.

**2. How does the Funding activity look in the Ecosystem?** \
To assess the health of a startup ecosystem we need to have an eye on the quality, quantity, and ease of access to funding. To evaluate the ease of access to funding, start tracking early-stage funding rounds. Their volume and growth over time will let us know if start-ups are getting the support they need to take their business off the ground. The location of the investors will help you to identify foreign VCs already investing in your Indian startup ecosystem and allow us to build bridges for potential collaboration and partnerships.

**3. Market reach and scaling opportunities** \
The easiest way to gauge the success of your startups is to watch the unicorns (measured in terms of companies valued at over $1 billion) in your ecosystem. Although it may be a metric not relevant in the future (due to the increase in number of unicorns), it remains an interesting indicator of startup ecosystem success.

**4.Knowledge and innovation** \
Innovation and entrepreneurship often flourish alongside world-class knowledge institutes and R&D incentives. These institutions often foster high-impact innovation, collaboration, and success across sectors. You can measure the level of innovation and new technology in your local ecosystem through research and patent activity, and by keeping tabs on the number of spinouts your local knowledge institutions produce. 

**5. Connectedness, Talent, Diversity, and more…** \
A vibrant ecosystem is not simply a collection of isolated elements, the connections between the elements matter just as much as the elements themselves. The metrics for connectedness and access to quality and diverse talent are a little more complex. You could however look out for the number of accelerators & incubators in your region, on job boards to access the type of talent your startups are looking for the most and on investment heatmaps to understand the breadth of various industries or depth of expertise present in your community.

### **Business Objective** 
To find out whether to invest in the Indian start-up ecosystem or not.

#### **Hypothesis**
Null - The Indian Startup Ecosystem is healthy and worth an investment\
Alternative  - The Indian Startup Ecosystem is weak and not worthy of investment

#### **Key Questions**

Using metrics similar to those of previous researchers enables the company to easily compare the Indian case with other global thereby giving the company a broader worldview and the ability to make a more informed decision. 
This is to mean our Key questions will be influenced heavily by the body of previous research.

**1. What is the Total Value of the Indian Startup Ecosystem?**
* How Many startups were founded in the period
* How Much Money has the ecosystem receive in funding 

**2. How has the Ecosystem changed over time?**
* What is the change in performance year on year
* Which region has the best performance

**3. What is the Success rate of Start-ups in the ecosystem?**
* Are there any unicorns from the ecosystem
* How Many Unicorns

**4. Who is already in the Ecosystem?**
* How many companies are already involved in the ecosystem
* What fields are they invested in

**5. Which is the best performing sector in the ecosystem?**
* Sector with highest amount raised
* Sector with most start-ups


#### **Success Criteria**

1. To produce a dashboard that showcases the metrics monitoring the health of the Indian Start-up Ecosystem.
2. To provide an objective metric that can be used to compare with other startup ecosystems.
3. If decision is to invest, to provide guidance on the best path of investment into the Indian Startup Ecosystem.

# **2. Data Understanding**

### **Data Preparation**

#### **Importations**

In [4]:
# import all necessary libraries
import os
import pandas as pd
import numpy as np
import pyodbc
from dotenv import dotenv_values

#remove pandas display limits
pd.set_option('display.max_columns', None)

#hide warnings
import warnings

warnings.filterwarnings('ignore')


#confrimation all libraries loaded
print("all libraries loaded successfully")

all libraries loaded successfully


### **Database Connection**

In [9]:
#reading data from database


# Get the values for the credentials you set in the '.env' file
database = "dapDB"
server = "dap-projects-database.database.windows.net"
username = "LP1_learner"
password = "Hyp0th3s!$T3$t!ng"

#Connecting to the database
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

# Using the connect method of the pyodbc library.
# This will connect to the server. 
connection=pyodbc.connect(connection_string)

print("connected successfully")

connected successfully


In [18]:
#SQL query to retrive data
sql_query = "SELECT * FROM dbo.LP1_startup_funding2020"

#Fetching data
data = pd.read_sql(sql_query, connection)

#### **Reading the Data**

##### *YEAR: 2018*

In [26]:
# import 2018 data from GitHub
# Available from Azubi Africa Career Accelerator LP1 Repository as csv

df_2018 = pd.read_csv("https://raw.githubusercontent.com/Azubi-Africa/Career_Accelerator_LP1-Data_Analysis/main/startup_funding2018.csv")

#Reading the first five(5) rows of data.
df_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [28]:
#Reading the last four (4) rows of data.
df_2018.tail(4)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
522,Happyeasygo Group,"Tourism, Travel",Series A,—,"Haryana, Haryana, India",HappyEasyGo is an online travel domain.
523,Mombay,"Food and Beverage, Food Delivery, Internet",Seed,7500,"Mumbai, Maharashtra, India",Mombay is a unique opportunity for housewives ...
524,Droni Tech,Information Technology,Seed,"₹35,000,000","Mumbai, Maharashtra, India",Droni Tech manufacture UAVs and develop softwa...
525,Netmeds,"Biotechnology, Health Care, Pharmaceutical",Series C,35000000,"Chennai, Tamil Nadu, India",Welcome to India's most convenient pharmacy!


##### *YEAR: 2019*

In [6]:
# import 2019 data from csv
df_2019 = pd.read_csv("startup_funding2019.csv")
df_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


##### *YEAR: 2020*

In [23]:
#reading the 2020 SQL table into a dataframe

#SQL query to retrive data
sql_query = "SELECT * FROM dbo.LP1_startup_funding2020"

#Fetching data
data_2020 = pd.read_sql(sql_query, connection)
data_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


##### *YEAR: 2021*

In [22]:
#reading the 2021 SQL table into a dataframe

#SQL query to retrive data
sql_query = "SELECT * FROM dbo.LP1_startup_funding2021"

#Fetching data
data_2021 = pd.read_sql(sql_query, connection)
data_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


**Notes:** \
    1. The data for each year is saved in variables named 'df_year'

## **Exploratory Data Analysis**

The data provided is expected to have the following columns to be used in the analysis:


|  | **COLUMN NAME** | **DESCRIPTION** | **EXPECTED DATATYPE** |
|--|-----------------|-----------------|-----------------------|
|**1**| **Company/Brand** | Name of the company/start-up | Object |
|**2**| **Founded** | Year start-up was founded | Datetime / int / float |
|**3**| **Sector** | Sector of service | Object |
|**4**| **What it does** | Description about Company | Object |
|**5**| **Founders** | Founders of the Company | Object |
|**6**| **Investor** | Investors | Object |
|**7**| **Amount(\$)** | Raised funds | float / int |
|**8**| **Stage** | Round of funding reached | Object / int |

#### **1. EDA FOR 2018**

In [9]:
#checking info
df_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


The dataframe has 6 columns and 526 rows

**Notes:**
1. The columns are less than those expected, This means there is probably some missing/incomplete data.
2. The column names do not match with those expected, they need to be changed to match.
3. Some columns are not in the same datatype as expected.

In [10]:
#checking nulls
df_2018.isnull().sum()

Company Name     0
Industry         0
Round/Series     0
Amount           0
Location         0
About Company    0
dtype: int64

There are no nulls in the dataset

In [11]:
#checking for duplicates
df_2018.duplicated().sum()

1

There is one duplicate record\
**Decision:**
    To drop the duplicate column.

In [12]:
#describing the data
df_2018.describe(include = 'all')

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
count,526,526,526,526,526,526
unique,525,405,21,198,50,524
top,TheCollegeFever,—,Seed,—,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
freq,2,30,280,148,102,2


#### **2. EDA FOR 2019**

In [13]:
#checking the info
df_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


The data has 9 columns and 89 records

**Notes:**
1. The dataset has all the expected columns and they are properly named
2. The amount column is not in the expected datatype

In [14]:
#checking nulls
df_2019.isnull().sum()

Company/Brand     0
Founded          29
HeadQuarter      19
Sector            5
What it does      0
Founders          3
Investor          0
Amount($)         0
Stage            46
dtype: int64

**Notes:**
There are a number of null values in the dataset.

In [15]:
#checking for duplicates
df_2019.duplicated().sum()

0

There are no duplicates in the dataframe

#### **3. EDA FOR 2020**

In [25]:
#  View the data
df_2020

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,
...,...,...,...,...,...,...,...,...,...,...
1050,Leverage Edu,,Delhi,Edtech,AI enabled marketplace that provides career gu...,Akshay Chaturvedi,"DSG Consumer Partners, Blume Ventures",1500000.0,,
1051,EpiFi,,,Fintech,It offers customers with a single interface fo...,"Sujith Narayanan, Sumit Gwalani","Sequoia India, Ribbit Capital",13200000.0,Seed Round,
1052,Purplle,2012.0,Mumbai,Cosmetics,Online makeup and beauty products retailer,"Manish Taneja, Rahul Dash",Verlinvest,8000000.0,,
1053,Shuttl,2015.0,Delhi,Transport,App based bus aggregator serice,"Amit Singh, Deepanshu Malviya",SIG Global India Fund LLP.,8043000.0,Series C,


In [18]:
#checking the shapes of the dataframe
df_2020.shape

(1055, 10)

The dataframe has 1055 observations and 10 features, the number of features is morethan expected.

In [20]:
# Checking the info of the Dataframe
df_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.6+ KB


Note:
    Founded column should be in the int Dtype
    Some of the columns are not properly named.
    There is a column (column 10) that is not required for the analysis - to be dropped.

In [23]:
# Checking for nulls
df_2020.isna().sum()

Company_Brand       0
Founded           213
HeadQuarter        94
Sector             13
What_it_does        0
Founders           12
Investor           38
Amount            254
Stage             464
column10         1053
dtype: int64

Note: 
    The data has so many null values.

In [26]:
# Check for duplicates
df_2020.duplicated().sum()

3

Note:
    There are 3 duplicates in the data, which will be dropped

In [28]:
# Descriptive statistics
df_2020.describe(include='all')

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
count,1055,842.0,961,1042,1055,1043,1017,801.0,591,2
unique,905,,77,302,990,927,848,,42,2
top,Nykaa,,Bangalore,Fintech,Provides online learning classes,Falguni Nayar,Venture Catalysts,,Series A,Pre-Seed
freq,6,,317,80,4,6,20,,96,1
mean,,2015.36342,,,,,,113043000.0,,
std,,4.097909,,,,,,2476635000.0,,
min,,1973.0,,,,,,12700.0,,
25%,,2014.0,,,,,,1000000.0,,
50%,,2016.0,,,,,,3000000.0,,
75%,,2018.0,,,,,,11000000.0,,


In [32]:
# Checking the uniqueness of the data
df_2020.nunique()

Company_Brand    905
Founded           26
HeadQuarter       77
Sector           302
What_it_does     990
Founders         927
Investor         848
Amount           300
Stage             42
column10           2
dtype: int64

#### **4. EDA FOR 2021**

In [27]:
#checking the shapes of the dataframes
df_2021.shape

(1209, 9)

##### Notes:
*The dataframes have different shapes which indicate they may be a challenge to concatenate into one dataframe.*
*We need to apply some preliminary processing*

* The 2018 dataset has 5 columns and all in the object datatype

In [32]:
#checking the column names and dtypes
df_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


* The 2019 dataset has 8 columns and all in the object datatype except Founder which is in the float64 data  type

In [37]:
#checking the column names and dtypes

df_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.6+ KB


* The 2020 dataset has 9 columns and all in the object datatype except Founded and Amount which is in the float64 data type

In [35]:
#checking the column names and dtypes
df_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


In [None]:
dfs_info = []

##### **Notes:** #####
1. There seem to be similar column names across all dataframes.
These include:\
            - Company Name, Stage/Series, Sector/Industry, Location/HQuarters, About/What-it-does, Amount
2. There are columns in three tables but not one.
These include:\
            - Founders, Investors
3. There are columns unique to specific datasets
These include:\
            - Column 10, Founded

##### **Decisions:** #####
Our first attempt will be to concatenate the similar columns across the years and create dataframes with those specific columns to avoid mismatched columns

In [43]:
common_cols_all = ['Company', 'Stage','Series','Sector', 'Industry', 'Location', 'HeadQuarter', 'About', 'What_it_does', 'Amount']
dfs = [df_2018,df_2019,df_2020,df_2021]




Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage,Company_Brand,What_it_does,column10
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",,,,,,,,,,,,
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,,,,,,,,,,,,
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,,,,,,,,,,,,
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,,,,,,,,,,,,
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1204,,,,$3000000,,,,2019.0,Gurugram,Staffing & Recruiting,,"Chirag Mittal, Anirudh Syal",Endiya Partners,,Pre-series A,Gigforce,A gig/on-demand staffing company.,
1205,,,,$20000000,,,,2015.0,New Delhi,Food & Beverages,,Bala Sarda,IIFL AMC,,Series D,Vahdam,VAHDAM is among the world’s first vertically i...,
1206,,,,$55000000,,,,2019.0,Bangalore,Financial Services,,"Arnav Kumar, Vaibhav Singh",Owl Ventures,,Series C,Leap Finance,International education loans for high potenti...,
1207,,,,$26000000,,,,2015.0,Gurugram,EdTech,,Ruchir Arora,"Winter Capital, ETS, Man Capital",,Series B,CollegeDekho,"Collegedekho.com is Student’s Partner, Friend ...",


## Data Cleaning