# Visualization Project
# Unicorn Startup Analysis

**By Cara-Li Farrell**

Language(s): Python

Software(s): Jupyter Notebook, Tableau

## I. Data collection

This project uses data from the "Unicorn Startups" dataset from Kaggle (https://www.kaggle.com/datasets/ramjasmaurya/unicorn-startups?resource=download)


*Although I could have used Tableau Prep Builder, I do not have access to Tableau cloud with my free student subscription to Tableau to publish my datasource*

## II. Data preprocessing

In [1]:
import pandas as pd

In [14]:
unicorns = pd.read_csv("unicorns_until_sep_2022.csv")
unicorns

Unnamed: 0,Company,Valuation ($B),Date Joined,Country,City,Industry,Investors
0,ByteDance,$140,4/7/2017,China,Beijing,Artificial intelligence,"Sequoia Capital China, SIG Asia Investments, S..."
1,SpaceX,$127,12/1/2012,United States,Hawthorne,Other,"Founders Fund, Draper Fisher Jurvetson, Rothen..."
2,SHEIN,$100,7/3/2018,China,Shenzhen,E-commerce & direct-to-consumer,"Tiger Global Management, Sequoia Capital China..."
3,Stripe,$95,1/23/2014,United States,San Francisco,Fintech,"Khosla Ventures, LowercaseCapital, capitalG"
4,Canva,$40,1/8/2018,Australia,Surry Hills,Internet software & services,"Sequoia Capital China, Blackbird Ventures, Mat..."
...,...,...,...,...,...,...,...
1181,LeadSquared,$1,6/21/2022,India,Bengaluru,Internet software & services,"Gaja Capital Partners, Stakeboat Capital, West..."
1182,FourKites,$1,6/21/2022,United States,Chicago,"Supply chain, logistics, & delivery","Hyde Park Venture Partners, Bain Capital Ventu..."
1183,VulcanForms,$1,7/5/2022,United States,Burlington,"Supply chain, logistics, & delivery","Eclipse Ventures, D1 Capital Partners, Industr..."
1184,SingleStore,$1,7/12/2022,United States,San Francisco,Data management & analytics,"Google Ventures, Accel, Data Collective"


### Data Overview

In [20]:
# There are 1186 rows and 7 variables (columns)
print(unicorns.shape)

(1186, 7)


### Data Summary

In [22]:
# All columns are complete as there are 1186 rows of data and each column has exactly 1186 
# non-null valuesm except for "Investors", which is missing 18 values.
print(unicorns.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1186 entries, 0 to 1185
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Company         1186 non-null   object
 1   Valuation ($B)  1186 non-null   object
 2   Date Joined     1186 non-null   object
 3   Country         1186 non-null   object
 4   City            1186 non-null   object
 5   Industry        1186 non-null   object
 6   Investors       1168 non-null   object
dtypes: object(7)
memory usage: 65.0+ KB
None


In [27]:
# Identifying the different unicorn companies included in this dataset
unicorn_list = unicorns["Company"].unique()
print("The unique unicorn companies in this dataset are: "+ str(unicorn_list))

# Counting the number of unique companies symbols
count = 0
for i in unicorn_list:
    count += 1
print("\nThere are " + str(count) + " different unicorn companies in this dataset.")

The unique unicorn companies in this dataset are: ['ByteDance' 'SpaceX' 'SHEIN' ... 'VulcanForms' 'SingleStore'
 'Unstoppable Domains']

There are 1183 different unicorn companies in this dataset.


### Dealing with missing investor values

In [182]:
# Collect all rows with missing investor values
missing_investors = unicorns[unicorns['Investors'].isna()]
missing_investors

# We can observe that these observations are missing cities so the data is shifted to the left, 
# hence the missing investor values. We will need to perform a right shift by one as of the 
# "City" column and replace the missing values with "None."

Unnamed: 0,Company,Valuation ($B),Date Joined,Country,City,Industry,Investors
10,FTX,$32,7/20/2021,Bahamas,Fintech,"Sequoia Capital, Thoma Bravo, Softbank",
242,HyalRoute,$3.50,5/26/2020,Singapore,Mobile & telecommunications,Kuang-Chi,
316,Amber Group,$3,6/21/2021,Hong Kong,Fintech,"Tiger Global Management, Tiger Brokers, DCM Ve...",
346,Moglix,$2.60,5/17/2021,Singapore,E-commerce & direct-to-consumer,"Jungle Ventures, Accel, Venture Highway",
371,Coda Payments,$2.50,4/15/2022,Singapore,Fintech,"GIC. Apis Partners, Insight Partners",
482,Advance Intelligence Group,$2,9/23/2021,Singapore,Artificial intelligence,"Vision Plus Capital, GSR Ventures, ZhenFund",
495,Trax,$2,7/22/2019,Singapore,Artificial intelligence,"Hopu Investment Management, Boyu Capital, DC T...",
865,Carousell,$1.10,9/15/2021,Singapore,E-commerce & direct-to-consumer,"500 Global, Rakuten Ventures, Golden Gate Vent...",
917,LinkSure Network,$1,1/1/2015,China,Shanghai,Mobile & telecommunications,
941,WeLab,$1,11/8/2017,Hong Kong,Fintech,"Sequoia Capital China, ING, Alibaba Entreprene...",


## III. Descriptive Analysis

As I will mainly be focusing on visualization, the cleaned up data set will be used in Tableau to create a dashboard.