## Case Study - Questions

### Question 1

Your Friend has developed the Product and he wants to establish the product startup and he is searching for a perfect location where getting the investment has a high chance. But due to its financial restriction, he can choose only between three locations -  Bangalore, Mumbai, and NCR. As a friend, you want to help your friend deciding the location. NCR include Gurgaon, Noida and New Delhi. Find the location where the most number of funding is done. That means, find the location where startups has received funding maximum number of times. Plot the bar graph between location and number of funding. Take city name "Delhi" as "New Delhi". Check the case-sensitiveness of cities also. That means, at some place instead of "Bangalore", "bangalore" is given. Take city name as "Bangalore". For few startups multiple locations are given, one Indian and one Foreign. Consider the startup if any one of the city lies in given locations.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
df_startup = pd.read_csv("startup_funding.csv", encoding = "utf-8")

# Since there are many NaN entries in the data, we can go ahead and drop them 
# as they won't be helpful in any way in this example and will only increase the complexity

df_startup.dropna(subset = ["CityLocation"], inplace = True)

# Now we can correct the problems in our "CityLocation" column such as Delhi -> New Delhi and bangalore -> Bangalore
# and only keep the Indian cities

def updateCity(location):
    cities = location.split('/')
    indian_city = cities[0].strip()
    if indian_city == "Delhi":
        indian_city = "New Delhi"
    if indian_city == "bangalore":
        indian_city = "Bangalore"
    return indian_city

df_startup["CityLocation"] = df_startup.CityLocation.apply(updateCity)

# Now lets extract the startups in Bangalore, Mumbai and Delhi NCR (Delhi, Gurgaon, Nodia)

def chooseCity(city):
    return city in ["Bangalore", "Mumbai", "New Delhi", "Gurgaon", "Noida"]

df_startup = df_startup[df_startup["CityLocation"].apply(chooseCity)]

# Our dataframe only has the required cities, Now we can use value_counts function to extract the required data

cities = df_startup["CityLocation"].value_counts().index
no_of_fundings = df_startup["CityLocation"].value_counts().values

# Let's plot this data on a graph

plt.bar(cities, no_of_fundings, color = "green", width = 0.5)
plt.grid()
plt.title("No of fundings in Bangalore, Mumbai and Delhi NCR (Between January 2015 to August 2017)")
plt.xlabel("Cities")
plt.ylabel("No. of fundings")
plt.show()

# Lets print the data as well, Just the numbers are clear

for i in range(len(cities)):
    print(cities[i], no_of_fundings[i])

<Figure size 640x480 with 1 Axes>

Bangalore 635
Mumbai 449
New Delhi 389
Gurgaon 241
Noida 79


### Inference : 
**Bangalore** looks like the best city to establish a startup, as the startups in Bangalore have received fundings maximum number of times i.e 635

### Question 2

Even after trying for so many times, your friend’s startup could not find the investment. So you decided to take this matter in your hand and try to find the list of investors who probably can invest in your friend’s startup. Your list will increase the chance of your friend startup getting some initial investment by contacting these investors. Find the top 5 investors who have invested maximum number of times (consider repeat investments in one company also). In a startup, multiple investors might have invested. So consider each investor for that startup. Ignore undisclosed investors.

In [2]:
import pandas as pd
df_startup = pd.read_csv("startup_funding.csv", encoding = "utf-8")

# Lets us deal with "Undisclosed Investors" first, They are represented by Nan values in "InvestorsName" column
# So, let's remove those rows altogether

df_startup.dropna(subset = ["InvestorsName"], inplace = True)
def removeUndisclosed(investor):
    return investor.lower() != "undisclosed investor" and investor.lower() != "undisclosed investors"
df_startup = df_startup[df_startup["InvestorsName"].apply(removeUndisclosed)]

# We can store a all the investors and the number of investments they have done in a Dictionary

investment_dict = {}
def getInvestors(investors):
    list_of_investors = investors.split(',')
    for inv in list_of_investors:
        inv = inv.strip()
        investment_dict[inv] = investment_dict.get(inv, 0) + 1

df_startup["InvestorsName"].apply(getInvestors)

# We have the dictionary filled up, Lets convert it to a DataFrame

df_new_startup = pd.DataFrame.from_dict(investment_dict, orient='index')

# All the data is in our dataframe, lets sort it and show the first Value

df_new_startup.sort_values([0], ascending = False, inplace = True)

# Once the Dataframe is sorted in Descending order, We can just show the first value

print("Investor","No. of Investments",sep='\t\t')
for i in range(5):
    print(df_new_startup.index[i], df_new_startup.values[i][0], sep = '\t\t')

Investor		No. of Investments
Sequoia Capital		64
Accel Partners		53
Kalaari Capital		44
SAIF Partners		41
Indian Angel Network		40


### Inference : 
The top 5 invvestors with the maxiumum investments are as follows:
**Sequoia Capital** with 64 investments, **Accel Partners** with 53 investments, **Kalaari Capital** with 44 investments, **SAIF Partners** with 41 investments and **Indian Angel Network** with 40 investments

## Question 3
After re-analysing the dataset you found out that some investors have invested in the same startup at different number of funding rounds. So before finalising the previous list, you want to improvise it by finding the top 5 investors who have invested in different number of startups. This list will be more helpful than your previous list in finding the investment for your friend startup. Find the top 5 investors who have invested maximum number of times in different companies. That means, if one investor has invested multiple times in one startup, count one for that company. There are many errors in startup names. Ignore correcting all, just handle the important ones - Ola, Flipkart, Oyo and Paytm.

In [3]:
import pandas as pd
df_startup = pd.read_csv("startup_funding.csv", encoding = "utf-8")

# Lets start wihh removing the Undisclosed Investors

df_startup.dropna(subset = ["InvestorsName"], inplace = True)
def removeUndisclosed(investor):
    return investor.lower() != "undisclosed investor" and investor.lower() != "undisclosed investors"
df_startup = df_startup[df_startup["InvestorsName"].apply(removeUndisclosed)]


# Handling the errors in Popular startup names

df_startup["StartupName"].replace("Flipkart.com", "Flipkart", inplace = True) 
df_startup["StartupName"].replace("Ola Cabs", "Ola", inplace = True) 
df_startup["StartupName"].replace("Olacabs", "Ola", inplace = True) 
df_startup["StartupName"].replace("Paytm Marketplace", "Paytm", inplace = True) 
df_startup["StartupName"].replace("Oyo Rooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("Oyorooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("OyoRooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("OYO Rooms", "Oyo", inplace = True)

# Lets create a Dictionary that stores the Investor and the Companies they have invested in

invest_dict = {}

def getUniqueInvestors(row):
    list_of_investors = row["InvestorsName"].split(',')
    startup = row["StartupName"]
    for investor in list_of_investors:
        if investor is '':
            continue
        investor = investor.strip()
        if invest_dict.get(investor, None) is None:
            invest_dict[investor] = list()
            invest_dict[investor].append(startup)
        elif startup not in invest_dict[investor]:
            invest_dict[investor].append(startup)
    
df_startup.apply(getUniqueInvestors, axis = 1)

# Now we can use this dictionary to make another data frame

df_unique_startup = pd.DataFrame(columns=['Investor', 'UniqueInvestments'])
for investor in invest_dict:
    df_unique_startup = df_unique_startup.append({'Investor': investor, 'UniqueInvestments': len(invest_dict[investor])}, ignore_index=True)

# Our dataframe is now ready, Lets sort it and reset the index
    
df_unique_startup.sort_values(["UniqueInvestments"], ascending = False, inplace = True)
df_unique_startup.reset_index(inplace = True, drop = True)

# Now all that's left is to print the top 5 values from this Dataframe

print("Investor","No. of Unique Investments",sep='\t\t')

for i in range(5):
    print(df_unique_startup.Investor[i], df_unique_startup.UniqueInvestments[i], sep='\t\t')

Investor		No. of Unique Investments
Sequoia Capital		48
Accel Partners		47
Kalaari Capital		41
Indian Angel Network		40
Blume Ventures		36


### Inference : 
The top 5 invvestors with the maxiumum investments are as follows:
**Sequoia Capital** with 48 unique investments, **Accel Partners** with 47 unique investments, **Kalaari Capital** with 41 unique investments, **Indian Angel Network** with 40 unique investments and **Blume Ventures** with 36 unique investments

## Question 4
Even after putting so much effort in finding the probable investors, it didn't turn out to be helpful for your friend. So you went to your investor friend to understand the situation better and your investor friend explained to you about the different Investment Types and their features. This new information will be helpful in finding the right investor. Since your friend startup is at an early stage startup, the best-suited investment type would be - Seed Funding and Crowdfunding. Find the top 5 investors who have invested in a different number of startups and their investment type is Crowdfunding or Seed Funding. Correct spelling of investment types are - "Private Equity", "Seed Funding", "Debt Funding", and "Crowd Funding". Keep an eye for any spelling mistake. You can find this by printing unique values from this column. There are many errors in startup names. Ignore correcting all, just handle the important ones - Ola, Flipkart, Oyo and Paytm.

In [4]:
import pandas as pd
df_startup = pd.read_csv("startup_funding.csv", encoding = "utf-8")

# Lets start wihh removing the Undisclosed Investors

df_startup.dropna(subset = ["InvestorsName"], inplace = True)

def removeUndisclosed(investor):
    return investor.lower() != "undisclosed investor" and investor.lower() != "undisclosed investors"
df_startup = df_startup[df_startup["InvestorsName"].apply(removeUndisclosed)]

# Handling the errors in "Investment type"

df_startup['InvestmentType'].replace("SeedFunding","Seed Funding",inplace=True) 
df_startup['InvestmentType'].replace("PrivateEquity","Private Equity",inplace=True) 
df_startup['InvestmentType'].replace("Crowd funding","Crowd Funding",inplace=True) 

# We only need Seed Funding and Crowd Funding

df_startup = df_startup[(df_startup.InvestmentType == "Crowd Funding") | (df_startup.InvestmentType == "Seed Funding")]

# Handling the errors in Popular startup names

df_startup["StartupName"].replace("Flipkart.com", "Flipkart", inplace = True) 
df_startup["StartupName"].replace("Ola Cabs", "Ola", inplace = True) 
df_startup["StartupName"].replace("Olacabs", "Ola", inplace = True) 
df_startup["StartupName"].replace("Paytm Marketplace", "Paytm", inplace = True) 
df_startup["StartupName"].replace("Oyo Rooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("Oyorooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("OyoRooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("OYO Rooms", "Oyo", inplace = True)

# Lets create a Dictionary that stores the Investor and the Companies they have invested in

invest_dict = {}

def getUniqueInvestors(row):
    list_of_investors = row["InvestorsName"].split(',')
    startup = row["StartupName"]
    for investor in list_of_investors:
        if investor is '':
            continue
        investor = investor.strip()
        if invest_dict.get(investor, None) is None:
            invest_dict[investor] = list()
            invest_dict[investor].append(startup)
        elif startup not in invest_dict[investor]:
            invest_dict[investor].append(startup)
    
df_startup.apply(getUniqueInvestors, axis = 1)

# Now we can use this dictionary to make another data frame

df_unique_startup = pd.DataFrame(columns=['Investor', 'UniqueInvestments'])
for investor in invest_dict:
    df_unique_startup = df_unique_startup.append({'Investor': investor, 'UniqueInvestments': len(invest_dict[investor])}, ignore_index=True)

# Our dataframe is now ready, Lets sort it and reset the index
    
df_unique_startup.sort_values(["UniqueInvestments"], ascending = False, inplace = True)
df_unique_startup.reset_index(inplace = True, drop = True)

# Now all that's left is to print the top 5 values from this Dataframe

print("Investor","No. of Unique Investments (Seed Funding and Crowd Funding)",sep='\t\t')

for i in range(5):
    print(df_unique_startup.Investor[i], df_unique_startup.UniqueInvestments[i], sep='\t\t')

Investor		No. of Unique Investments (Seed Funding and Crowd Funding)
Indian Angel Network		33
Rajan Anandan		23
LetsVenture		16
Anupam Mittal		16
Group of Angel Investors		14


### Inference : 
The top 5 invvestors with investment type as "Seed Funding" or "Crowd Funding" with the maxiumum investments are as follows:
**Indian Angel Network** with 33 unique investments, **Rajan Anandan** with 23 unique investments, **LetsVenture** with 16 unique investments, **Anupam Mittal** with 16 unique investments and **Group of Angel Investors** with 14 unique investments

## Question 5
Due to your immense help, your friend startup successfully got seed funding and it is on the operational mode. Now your friend wants to expand his startup and he is looking for new investors for his startup. Now you again come as a saviour to help your friend and want to create a list of probable new new investors. Before moving forward you remember your investor friend advice that finding the investors by analysing the investment type. Since your friend startup is not in early phase it is in growth stage so the best-suited investment type is Private Equity. Find the top 5 investors who have invested in a different number of startups and their investment type is Private Equity. Correct spelling of investment types are - "Private Equity", "Seed Funding", "Debt Funding", and "Crowd Funding". Keep an eye for any spelling mistake. You can find this by printing unique values from this column.There are many errors in startup names. Ignore correcting all, just handle the important ones - Ola, Flipkart, Oyo and Paytm.

In [5]:
import pandas as pd
df_startup = pd.read_csv("startup_funding.csv", encoding = "utf-8")

# Lets start wihh removing the Undisclosed Investors

df_startup.dropna(subset = ["InvestorsName"], inplace = True)

def removeUndisclosed(investor):
    return investor.lower() != "undisclosed investor" and investor.lower() != "undisclosed investors"
df_startup = df_startup[df_startup["InvestorsName"].apply(removeUndisclosed)]

# Handling the errors in "Investment type"

df_startup['InvestmentType'].replace("SeedFunding","Seed Funding",inplace=True) 
df_startup['InvestmentType'].replace("PrivateEquity","Private Equity",inplace=True) 
df_startup['InvestmentType'].replace("Crowd funding","Crowd Funding",inplace=True) 

# We only need Investment type as Private Equity

df_startup = df_startup[df_startup.InvestmentType == "Private Equity"]

# Handling the errors in Popular startup names

df_startup["StartupName"].replace("Flipkart.com", "Flipkart", inplace = True) 
df_startup["StartupName"].replace("Ola Cabs", "Ola", inplace = True) 
df_startup["StartupName"].replace("Olacabs", "Ola", inplace = True) 
df_startup["StartupName"].replace("Paytm Marketplace", "Paytm", inplace = True) 
df_startup["StartupName"].replace("Oyo Rooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("Oyorooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("OyoRooms", "Oyo", inplace = True) 
df_startup["StartupName"].replace("OYO Rooms", "Oyo", inplace = True)

# Lets create a Dictionary that stores the Investor and the Companies they have invested in

invest_dict = {}

def getUniqueInvestors(row):
    list_of_investors = row["InvestorsName"].split(',')
    startup = row["StartupName"]
    for investor in list_of_investors:
        if investor is '':
            continue
        investor = investor.strip()
        if invest_dict.get(investor, None) is None:
            invest_dict[investor] = list()
            invest_dict[investor].append(startup)
        elif startup not in invest_dict[investor]:
            invest_dict[investor].append(startup)
    
df_startup.apply(getUniqueInvestors, axis = 1)

# Now we can use this dictionary to make another data frame

df_unique_startup = pd.DataFrame(columns=['Investor', 'UniqueInvestments'])
for investor in invest_dict:
    df_unique_startup = df_unique_startup.append({'Investor': investor, 'UniqueInvestments': len(invest_dict[investor])}, ignore_index=True)

# Our dataframe is now ready, Lets sort it and reset the index
    
df_unique_startup.sort_values(["UniqueInvestments"], ascending = False, inplace = True)
df_unique_startup.reset_index(inplace = True, drop = True)

# Now all that's left is to print the top 5 values from this Dataframe

print("Investor","No. of Unique Investments (Private Equity)",sep='\t\t')

for i in range(5):
    print(df_unique_startup.Investor[i], df_unique_startup.UniqueInvestments[i], sep='\t\t')

Investor		No. of Unique Investments (Private Equity)
Sequoia Capital		45
Accel Partners		43
Kalaari Capital		35
Blume Ventures		27
SAIF Partners		24


### Inference : 
The top 5 invvestors with investment type as "Private Equity" with the maxiumum investments are as follows:
**Sequoia Capital** with 45 unique investments, **Accel Partners** with 43 unique investments, **Kalaari Capital** with 35 unique investments, **Blume Ventures** with 27 unique investments and **SAIF Partners** with 24 unique investments