### AN ANALYSIS OF THE INDIAN STARTUP ECOSYSTEM

## Business Understanding

India has emerged as the world's third-largest startup ecosystem. With India having more than 50,000 startups in 2018 it is imperative too know the key factors that have led to the rapid growth over the years.

# Business Objectives

1. Identify Different Fundraising Methods.
Research and assess typical means of obtaining funds, such as equity financing, debt financing, venture capital, crowdfunding, and bootstrapping, to determine which is best suited to the company's needs and growth stage.

2. Optimize cost-effective funding sources.
Prioritize obtaining cash from the most cost-effective sources, such as personal savings, government grants, or low-interest loans, to reduce financial stress while increasing returns.

3. Leverage readily available funding opportunities.
Build ties with local angel investors, look at business accelerators, and use crowdfunding sites to fast acquire capital with little barriers.

4. Implement an efficient legal structure.
Examine typical startup legal structures such as sole proprietorships, partnerships, LLCs, and corporations to find a model that strikes a balance between liability protection, taxation, and operational simplicity.

5. Promote popular legal choices.
Understand why LLCs and corporations are frequently preferred for their flexibility, investment attractiveness, and liability protection, and use this knowledge to attract new stakeholders.

6. Ensure Compliance and Scalability.
Align the business's legal structure with current requirements while allowing for future scalability and restructuring as it expands.

7. Establish a Strategic Location for Operations.
Choose an operating base based on characteristics such as accessibility to target markets, availability of skilled staff, and rent affordability, ensuring that it aligns with business objectives.

8. Explore Startup-Friendly Hubs.
Investigate common startup hotspots, such as metropolitan centers, coworking spaces, or innovation districts, to take advantage of networking, mentoring, and funding options.

9. Enhance Location Attractiveness
Identify and leverage on features that make certain regions attractive to entrepreneurs, such as tax breaks, infrastructure, resource availability, and supportive local ecosystems.


# Key Hypothesis

On Fundraising Methods

Hypothesis: Startups that utilize a combination of equity financing and crowdfunding will achieve higher initial funding compared to those relying solely on debt financing.
Hypothesis: Bootstrapping is more effective for early-stage startups in reducing financial dependency compared to external funding methods.

On Cost-Effective Sources

Hypothesis: Startups that secure funding through government grants or low-interest loans will experience lower financial strain and higher ROI compared to those funded through high-interest private loans.
Hypothesis: Personal savings provide the fastest and most reliable funding option for startups with minimal financial requirements.

On Readily Available Funding Opportunities

Hypothesis: Establishing strong ties with local angel investors leads to faster funding acquisition compared to applying for accelerator programs.
Hypothesis: Crowdfunding platforms with niche audiences yield higher engagement and funding success rates for targeted product-based startups.

On Efficient Legal Structures

Hypothesis: LLCs offer the optimal balance of liability protection and operational simplicity for startups compared to corporations and sole proprietorships.
Hypothesis: Sole proprietorships are more viable for solopreneurs with minimal risk exposure but scale poorly as businesses grow.

On Popular Legal Choices

Hypothesis: Startups that adopt LLCs or corporations attract more investors compared to those structured as sole proprietorships or partnerships.
Hypothesis: Legal structures with limited liability features are preferred due to higher stakeholder confidence and risk mitigation.

On Compliance and Scalability

Hypothesis: Startups that align their legal structures with scalability considerations face fewer regulatory hurdles during expansion compared to those that prioritize simplicity in early stages.
Hypothesis: Regularly updating legal structures to meet compliance needs reduces the risk of operational disruptions and penalties.

On Strategic Location for Operations

Hypothesis: Startups located closer to target markets and talent pools achieve faster market penetration and operational efficiency compared to those in remote locations.
Hypothesis: Affordable rent plays a more significant role in location selection for startups with tight budgets than proximity to markets.

On Startup-Friendly Hubs

Hypothesis: Startups based in metropolitan centers with established innovation districts secure funding faster than those in rural areas.
Hypothesis: Coworking spaces provide higher networking benefits for early-stage startups compared to traditional office spaces.

On Location Attractiveness

Hypothesis: Startups in regions with tax incentives and robust infrastructure experience faster growth compared to those without these benefits.
Hypothesis: Access to industry-specific resources (e.g., technology hubs for tech startups) increases the likelihood of operational success.

# Problem Areas

Fund Raising

1.What are the common methods of raising funds?
2.What are the cheapest sources of funds?
3.What are the readily available sources of funds?

Location
1.How do startups choose there locations for their operations?
2.Where are the common locations for these startups?
3.Why are these locations attractive to the startups?

Legal Structure
1.What are the common legal structure among the startups?
2.why are they so popular?


## Data Understanding

In [1]:
import pyodbc
from dotenv import dotenv_values
import numpy as np
import pandas as pd 
import warnings

warnings.filterwarnings('ignore')

In [2]:
#Load environment variables from .env file into a dictionary
environment_variables = dotenv_values('.env')

#Get the values for the credentials you set in the '.env' file
server = environment_variables.get("SERVER")
database = environment_variables.get("DATABASE")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

# Create a connection string
connection_string = f"DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password};MARS_Connection=yes;MinProtocolVersion=TLSv1.2;"



In [3]:
# Use the connect method of the pyodbc library and pass in the connection string.
# This will connect to the server and might take a few seconds to be complete. 
# Check your internet connection if it takes more time than necessary

connection = pyodbc.connect(connection_string)

In [4]:
#Query to get the data from the database
query = "SELECT * FROM LP2_Telco_churn_first_3000"

data = pd.read_sql(query, connection)

In [5]:
data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,False,True,False,1,False,,DSL,False,...,False,False,False,False,Month-to-month,True,Electronic check,29.85,29.85,False
1,5575-GNVDE,Male,False,False,False,34,True,False,DSL,True,...,True,False,False,False,One year,False,Mailed check,56.950001,1889.5,False
2,3668-QPYBK,Male,False,False,False,2,True,False,DSL,True,...,False,False,False,False,Month-to-month,True,Mailed check,53.849998,108.150002,True
3,7795-CFOCW,Male,False,False,False,45,False,,DSL,True,...,True,True,False,False,One year,False,Bank transfer (automatic),42.299999,1840.75,False
4,9237-HQITU,Female,False,False,False,2,True,False,Fiber optic,False,...,False,False,False,False,Month-to-month,True,Electronic check,70.699997,151.649994,True


In [6]:
data.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [7]:
data.describe(include='object').columns

Index(['customerID', 'gender', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaymentMethod', 'Churn'],
      dtype='object')

In [8]:
data.describe(include='number').columns

Index(['tenure', 'MonthlyCharges', 'TotalCharges'], dtype='object')

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        3000 non-null   object 
 1   gender            3000 non-null   object 
 2   SeniorCitizen     3000 non-null   bool   
 3   Partner           3000 non-null   bool   
 4   Dependents        3000 non-null   bool   
 5   tenure            3000 non-null   int64  
 6   PhoneService      3000 non-null   bool   
 7   MultipleLines     2731 non-null   object 
 8   InternetService   3000 non-null   object 
 9   OnlineSecurity    2349 non-null   object 
 10  OnlineBackup      2349 non-null   object 
 11  DeviceProtection  2349 non-null   object 
 12  TechSupport       2349 non-null   object 
 13  StreamingTV       2349 non-null   object 
 14  StreamingMovies   2349 non-null   object 
 15  Contract          3000 non-null   object 
 16  PaperlessBilling  3000 non-null   bool   


# Problem
1. The data types of some columns are incorrect. They include; MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV and StreamingMovies.
2. There are null values in various columns.
3. In the MultipleLines and the Churn columns there are three unique values; None, False and True.

# Solution

1. Change the data types of the mentioned columns to boolean
2. Change the values in 'MultipleLines' and 'Churn' to boolean values
3. Fill the missing values in the 'TotalCharges' column with it's mean

In [10]:
data.shape

(3000, 21)

In [11]:
#Lets check for null values

data.isna().sum()

customerID            0
gender                0
SeniorCitizen         0
Partner               0
Dependents            0
tenure                0
PhoneService          0
MultipleLines       269
InternetService       0
OnlineSecurity      651
OnlineBackup        651
DeviceProtection    651
TechSupport         651
StreamingTV         651
StreamingMovies     651
Contract              0
PaperlessBilling      0
PaymentMethod         0
MonthlyCharges        0
TotalCharges          5
Churn                 1
dtype: int64

In [12]:
data.duplicated()


0       False
1       False
2       False
3       False
4       False
        ...  
2995    False
2996    False
2997    False
2998    False
2999    False
Length: 3000, dtype: bool

In [13]:
# Check the unique values in the 'MultipleLines' column.

multiplelines_values = data['MultipleLines'].unique()

print(multiplelines_values)

[None False True]


In [14]:
# Replace the 'None' value in the multipleLine column to 'False and also in the Churn column'

# Convert to string, strip, and lower-case safely
data['MultipleLines'] = data['MultipleLines'].astype(str).str.strip().str.lower()

# Replace 'none' with False
data['MultipleLines'] = data['MultipleLines'].replace({'none': False, 'false': False, 'true': True})

# Convert to string, strip, and lower-case safely
data['Churn'] = data['Churn'].astype(str).str.strip().str.lower()

# Replace 'none' with False
data['Churn'] = data['Churn'].replace({'none': False, 'false': False, 'true': True})






In [15]:
multiplelines_values1 = data['MultipleLines'].unique()

print(multiplelines_values1)

[False  True]


In [16]:
# Change data types of some columns to boolean

data['OnlineSecurity'] = data['OnlineSecurity'].astype(bool)
data['OnlineBackup'] = data['OnlineBackup'].astype(bool)
data['DeviceProtection'] = data['DeviceProtection'].astype(bool)
data['TechSupport'] = data['TechSupport'].astype(bool)
data['StreamingTV'] = data['StreamingTV'].astype(bool)
data['StreamingMovies'] = data['StreamingMovies'].astype(bool)

In [17]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        3000 non-null   object 
 1   gender            3000 non-null   object 
 2   SeniorCitizen     3000 non-null   bool   
 3   Partner           3000 non-null   bool   
 4   Dependents        3000 non-null   bool   
 5   tenure            3000 non-null   int64  
 6   PhoneService      3000 non-null   bool   
 7   MultipleLines     3000 non-null   bool   
 8   InternetService   3000 non-null   object 
 9   OnlineSecurity    3000 non-null   bool   
 10  OnlineBackup      3000 non-null   bool   
 11  DeviceProtection  3000 non-null   bool   
 12  TechSupport       3000 non-null   bool   
 13  StreamingTV       3000 non-null   bool   
 14  StreamingMovies   3000 non-null   bool   
 15  Contract          3000 non-null   object 
 16  PaperlessBilling  3000 non-null   bool   


In [18]:
data.isna().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        5
Churn               0
dtype: int64

In [19]:
churn_values = data['Churn']. unique()

print(churn_values)

[False  True]


In [20]:
#fill the missing value in the TotalCharges column with its mean

data['TotalCharges'] = data['TotalCharges'].fillna(data['TotalCharges'].mean())

In [21]:
data.isna().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [22]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        3000 non-null   object 
 1   gender            3000 non-null   object 
 2   SeniorCitizen     3000 non-null   bool   
 3   Partner           3000 non-null   bool   
 4   Dependents        3000 non-null   bool   
 5   tenure            3000 non-null   int64  
 6   PhoneService      3000 non-null   bool   
 7   MultipleLines     3000 non-null   bool   
 8   InternetService   3000 non-null   object 
 9   OnlineSecurity    3000 non-null   bool   
 10  OnlineBackup      3000 non-null   bool   
 11  DeviceProtection  3000 non-null   bool   
 12  TechSupport       3000 non-null   bool   
 13  StreamingTV       3000 non-null   bool   
 14  StreamingMovies   3000 non-null   bool   
 15  Contract          3000 non-null   object 
 16  PaperlessBilling  3000 non-null   bool   


In [23]:
data2 = pd.read_csv("C:/Users/MST/Desktop/Marlon/LP1/India Start Up Ecosystem/india-ecosystem/Data/startup_funding2018.csv")


In [24]:
data2.head(10)

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...
5,Hasura,"Cloud Infrastructure, PaaS, SaaS",Seed,1600000,"Bengaluru, Karnataka, India",Hasura is a platform that allows developers to...
6,Tripshelf,"Internet, Leisure, Marketplace",Seed,"₹16,000,000","Kalkaji, Delhi, India",Tripshelf is an online market place for holida...
7,Hyperdata.IO,Market Research,Angel,"₹50,000,000","Hyderabad, Andhra Pradesh, India",Hyperdata combines advanced machine learning w...
8,Freightwalla,"Information Services, Information Technology",Seed,—,"Mumbai, Maharashtra, India",Freightwalla is an international forwarder tha...
9,Microchip Payments,Mobile Payments,Seed,—,"Bangalore, Karnataka, India",Microchip payments is a mobile-based payment a...


In [25]:
data2.shape

(526, 6)

In [26]:
data2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [27]:
data2.describe()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
count,526,526,526,526,526,526
unique,525,405,21,198,50,524
top,TheCollegeFever,—,Seed,—,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
freq,2,30,280,148,102,2


In [28]:
data2.columns

Index(['Company Name', 'Industry', 'Round/Series', 'Amount', 'Location',
       'About Company'],
      dtype='object')

In [29]:
data2.describe(include='object').columns

Index(['Company Name', 'Industry', 'Round/Series', 'Amount', 'Location',
       'About Company'],
      dtype='object')

# Problems
1. Column names are inconsistent therefore they need to be changed/renamed.
2. The data type of the 'Amount' column is incorrect


In [30]:
#Change column names
data2 = data2.rename(columns={'Company Name' : 'Company', 'Industry' : 'Industry', 'Round/Series' : 'Series', 'Amount' : 'Amount', 'Location' : 'Location', 'About Company' : 'About Company'})

In [31]:
#Check new column names
data2.head()

Unnamed: 0,Company,Industry,Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


In [32]:
#Change 'Amount' column data type

# Step 1: Extract currency symbols
data2['Currency'] = data2['Amount'].str.extract(r'([^\d,\.])')  # Extract the first non-numeric character
data2['Amount'] = data2['Amount'].str.replace(r'[^\d.]', '', regex=True)  # Remove all non-numeric characters

# Step 2: Convert Amount to float
data2['Amount'] = pd.to_numeric(data2['Amount'], errors='coerce')  # Handle non-convertible cases with NaN

# Step 3: Define exchange rates
exchange_rates = {'$': 1.0, '₹': 0.012, '€': 1.1}  # Example exchange rates (to USD)

# Step 4: Apply exchange rates to normalize amounts
data2['Amount_in_USD'] = data2.apply(
    lambda row: row['Amount'] * exchange_rates.get(row['Currency'], float('nan')),
    axis=1
)


In [33]:
# Replace NaN values in 'Amount_in_USD' with corresponding values from 'Amount'
data2['Amount_in_USD'] = data2['Amount_in_USD'].fillna(data2['Amount'])

In [34]:
# Fill the null values in the Amount_in_USD column
data2['Amount_in_USD'] = data2['Amount_in_USD'].interpolate(method='linear')

In [36]:
# Rename the Amount_in_USD column

data2 = data2.rename(columns={'Amount_in_USD': 'Amount($)'})

In [37]:
data2.head()

Unnamed: 0,Company,Industry,Series,Amount,Location,About Company,Currency,Amount($)
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000.0,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f...",,250000.0
1,Happy Cow Dairy,"Agriculture, Farming",Seed,40000000.0,"Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...,₹,480000.0
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,65000000.0,"Gurgaon, Haryana, India",Leading Online Loans Marketplace in India,₹,780000.0
3,PayMe India,"Financial Services, FinTech",Angel,2000000.0,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...,,2000000.0
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...,—,1800000.0


In [38]:
data2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company        526 non-null    object 
 1   Industry       526 non-null    object 
 2   Series         526 non-null    object 
 3   Amount         378 non-null    float64
 4   Location       526 non-null    object 
 5   About Company  526 non-null    object 
 6   Currency       351 non-null    object 
 7   Amount($)      526 non-null    float64
dtypes: float64(2), object(6)
memory usage: 33.0+ KB


In [39]:
# Check for null values
data2.isna().sum()

Company            0
Industry           0
Series             0
Amount           148
Location           0
About Company      0
Currency         175
Amount($)          0
dtype: int64

In [41]:
data2.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
521    False
522    False
523    False
524    False
525    False
Length: 526, dtype: bool

In [43]:
data3 = pd.read_csv("C:/Users/MST/Desktop/Marlon/LP1/India Start Up Ecosystem/india-ecosystem/Data/startup_funding2019.csv")

In [44]:
data3.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


In [45]:
data3.shape

(89, 9)

In [46]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [47]:
data3.describe()

Unnamed: 0,Founded
count,60.0
mean,2014.533333
std,2.937003
min,2004.0
25%,2013.0
50%,2015.0
75%,2016.25
max,2019.0


In [48]:
data3.columns

Index(['Company/Brand', 'Founded', 'HeadQuarter', 'Sector', 'What it does',
       'Founders', 'Investor', 'Amount($)', 'Stage'],
      dtype='object')

In [49]:
#Rename the column names

data3 = data3.rename(columns={'Company/Brand': 'Company', 'What it does': 'About Company', 'Stage' : 'Series'})

In [52]:
# Change the Amount$ column data type

# Step 1: Replace empty strings with NaN
data3['Amount($)'] = data3['Amount($)'].replace('', np.nan)  # Replace empty strings with NaN

# Step 2: Remove non-numeric characters (like $ or ₹)
data3['Amount($)'] = data3['Amount($)'].str.replace(r'[^\d.]', '', regex=True)

# Step 3: Convert to float, handling NaN values
data3['Amount($)'] = pd.to_numeric(data3['Amount($)'], errors='coerce')  # Coerce invalid values to NaN

In [54]:
data3.isnull().sum()

Company           0
Founded          29
HeadQuarter      19
Sector            5
About Company     0
Founders          3
Investor          0
Amount($)        12
Series           46
dtype: int64