# Raising Series A Fund Prediction
InReach Venture's technical interview task: predict if a company will raise series A, with a stretch to predict how much.

Here I will take the necessary steps to make this happen. I will start by importing the necessary packages. Next I will load the data and conduct exploratory data analysis (EDA) this is critical when deciding on which machine specific features which is then fed intomy machine learning model. I'd like to then do a quick accuracy measure using an ROC curve to see the performance of the chosen machine learning algorithm.

I will then do a quick summary of my findings.

In [1]:
#importing necessary packages
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import statsmodels.api as sm
from sklearn.model_selection import train_test_split,KFold,cross_validate
from sklearn.linear_model import LinearRegression

  import pandas.util.testing as tm


In [2]:
#importing the necessary data
funding_rounds = pd.read_csv('../venturecapitalproject/data/funding_rounds.csv', delimiter=',', quotechar='"', escapechar='\\')
objects = pd.read_csv('../venturecapitalproject/data/objects.csv',  delimiter=',', quotechar='"', escapechar='\\')

In [3]:
funding_rounds.head()

Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,raised_amount,raised_currency_code,pre_money_valuation_usd,...,post_money_valuation,post_money_currency_code,participants,is_first_round,is_last_round,source_url,source_description,created_by,created_at,updated_at
0,1,1,c:4,2006-12-01,series-b,b,8500000,8500000,USD,N,...,N,N,2,0,0,http://www.marketingvox.com/archives/2006/12/2...,N,initial-importer,2007-07-04 04:52:57,2008-02-27 23:14:29
1,2,2,c:5,2004-09-01,angel,angel,500000,500000,USD,N,...,N,USD,2,0,1,N,N,initial-importer,2007-05-27 06:08:18,2013-06-28 20:07:23
2,3,3,c:5,2005-05-01,series-a,a,12700000,12700000,USD,115000000,...,N,USD,3,0,0,http://www.techcrunch.com/2007/11/02/jim-breye...,Jim Breyer: Extra $500 Million Round For Faceb...,initial-importer,2007-05-27 06:09:10,2013-06-28 20:07:23
3,4,4,c:5,2006-04-01,series-b,b,27500000,27500000,USD,525000000,...,N,USD,4,0,0,http://www.facebook.com/press/info.php?factsheet,Facebook Funding,initial-importer,2007-05-27 06:09:36,2013-06-28 20:07:24
4,5,5,c:7299,2006-05-01,series-b,b,10500000,10500000,USD,N,...,N,N,2,0,0,http://www.techcrunch.com/2006/05/14/photobuck...,PhotoBucket Closes $10.5M From Trinity Ventures,initial-importer,2007-05-29 11:05:59,2008-04-16 17:09:12


In [4]:
funding_rounds.shape

(52928, 23)

In [5]:
funding_rounds.keys()

Index(['id', 'funding_round_id', 'object_id', 'funded_at',
       'funding_round_type', 'funding_round_code', 'raised_amount_usd',
       'raised_amount', 'raised_currency_code', 'pre_money_valuation_usd',
       'pre_money_valuation', 'pre_money_currency_code',
       'post_money_valuation_usd', 'post_money_valuation',
       'post_money_currency_code', 'participants', 'is_first_round',
       'is_last_round', 'source_url', 'source_description', 'created_by',
       'created_at', 'updated_at '],
      dtype='object')

In [6]:
funding_rounds.dtypes

id                           int64
funding_round_id             int64
object_id                   object
funded_at                   object
funding_round_type          object
funding_round_code          object
raised_amount_usd           object
raised_amount               object
raised_currency_code        object
pre_money_valuation_usd     object
pre_money_valuation         object
pre_money_currency_code     object
post_money_valuation_usd    object
post_money_valuation        object
post_money_currency_code    object
participants                 int64
is_first_round               int64
is_last_round                int64
source_url                  object
source_description          object
created_by                  object
created_at                  object
updated_at                  object
dtype: object

In [7]:
funding_rounds.describe()

Unnamed: 0,id,funding_round_id,participants,is_first_round,is_last_round
count,52928.0,52928.0,52928.0,52928.0,52928.0
mean,28962.894536,28962.894536,1.528567,0.604576,0.604538
std,16821.871803,16821.871803,2.060192,0.488946,0.488954
min,1.0,1.0,0.0,0.0,0.0
25%,14343.75,14343.75,0.0,0.0,0.0
50%,28885.5,28885.5,1.0,1.0,1.0
75%,43561.25,43561.25,2.0,1.0,1.0
max,57952.0,57952.0,36.0,1.0,1.0


In [8]:
objects.head()

Unnamed: 0,id,entity_type,entity_id,parent_id,name,normalized_name,permalink,category_code,status,founded_at,...,last_funding_at,funding_rounds,funding_total_usd,first_milestone_at,last_milestone_at,milestones,relationships,created_by,created_at,updated_at
0,c:1,Company,1,N,Wetpaint,wetpaint,/company/wetpaint,web,operating,2005-10-17,...,2008-05-19,3,39750000,2010-09-05,2013-09-18,5,17,initial-importer,2007-05-25 06:51:27,2013-04-13 03:29:00
1,c:10,Company,10,N,Flektor,flektor,/company/flektor,games_video,acquired,N,...,N,N,N,N,N,N,6,initial-importer,2007-05-31 21:11:51,2008-05-23 23:23:14
2,c:100,Company,100,N,There,there,/company/there,games_video,acquired,N,...,N,N,N,2003-02-01,2011-09-23,4,12,initial-importer,2007-08-06 23:52:45,2013-11-04 02:09:48
3,c:10000,Company,10000,N,MYWEBBO,mywebbo,/company/mywebbo,network_hosting,operating,2008-07-26,...,N,N,N,N,N,N,N,N,2008-08-24 16:51:57,2008-09-06 14:19:18
4,c:10001,Company,10001,N,THE Movie Streamer,the movie streamer,/company/the-movie-streamer,games_video,operating,2008-07-26,...,N,N,N,N,N,N,N,N,2008-08-24 17:10:34,2008-09-06 14:19:18


In [9]:
objects.shape

(462651, 40)

In [10]:
objects.keys()

Index(['id', 'entity_type', 'entity_id', 'parent_id', 'name',
       'normalized_name', 'permalink', 'category_code', 'status', 'founded_at',
       'closed_at', 'domain', 'homepage_url', 'twitter_username', 'logo_url',
       'logo_width', 'logo_height', 'short_description', 'description',
       'overview', 'tag_list', 'country_code', 'state_code', 'city', 'region',
       'first_investment_at', 'last_investment_at', 'investment_rounds',
       'invested_companies', 'first_funding_at', 'last_funding_at',
       'funding_rounds', 'funding_total_usd', 'first_milestone_at',
       'last_milestone_at', 'milestones', 'relationships', 'created_by',
       'created_at', 'updated_at'],
      dtype='object')

In [11]:
objects.dtypes

id                     object
entity_type            object
entity_id               int64
parent_id              object
name                   object
normalized_name        object
permalink              object
category_code          object
status                 object
founded_at             object
closed_at              object
domain                 object
homepage_url           object
twitter_username       object
logo_url               object
logo_width             object
logo_height            object
short_description      object
description            object
overview               object
tag_list               object
country_code           object
state_code             object
city                   object
region                 object
first_investment_at    object
last_investment_at     object
investment_rounds      object
invested_companies     object
first_funding_at       object
last_funding_at        object
funding_rounds         object
funding_total_usd      object
first_mile

## One common occurence within the datasets is the `N` listed in various columns. I am going to opt out of removing any columns with this entry. When having a closer look the N is placed in spaces for which companies on the crunchbase dataset choose not to disclose certain information within columns. 

# Cleaning & Preprocessing

In [12]:
funding_rounds.head()

Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,raised_amount,raised_currency_code,pre_money_valuation_usd,...,post_money_valuation,post_money_currency_code,participants,is_first_round,is_last_round,source_url,source_description,created_by,created_at,updated_at
0,1,1,c:4,2006-12-01,series-b,b,8500000,8500000,USD,N,...,N,N,2,0,0,http://www.marketingvox.com/archives/2006/12/2...,N,initial-importer,2007-07-04 04:52:57,2008-02-27 23:14:29
1,2,2,c:5,2004-09-01,angel,angel,500000,500000,USD,N,...,N,USD,2,0,1,N,N,initial-importer,2007-05-27 06:08:18,2013-06-28 20:07:23
2,3,3,c:5,2005-05-01,series-a,a,12700000,12700000,USD,115000000,...,N,USD,3,0,0,http://www.techcrunch.com/2007/11/02/jim-breye...,Jim Breyer: Extra $500 Million Round For Faceb...,initial-importer,2007-05-27 06:09:10,2013-06-28 20:07:23
3,4,4,c:5,2006-04-01,series-b,b,27500000,27500000,USD,525000000,...,N,USD,4,0,0,http://www.facebook.com/press/info.php?factsheet,Facebook Funding,initial-importer,2007-05-27 06:09:36,2013-06-28 20:07:24
4,5,5,c:7299,2006-05-01,series-b,b,10500000,10500000,USD,N,...,N,N,2,0,0,http://www.techcrunch.com/2006/05/14/photobuck...,PhotoBucket Closes $10.5M From Trinity Ventures,initial-importer,2007-05-29 11:05:59,2008-04-16 17:09:12


In [13]:
funding_rounds.dtypes

id                           int64
funding_round_id             int64
object_id                   object
funded_at                   object
funding_round_type          object
funding_round_code          object
raised_amount_usd           object
raised_amount               object
raised_currency_code        object
pre_money_valuation_usd     object
pre_money_valuation         object
pre_money_currency_code     object
post_money_valuation_usd    object
post_money_valuation        object
post_money_currency_code    object
participants                 int64
is_first_round               int64
is_last_round                int64
source_url                  object
source_description          object
created_by                  object
created_at                  object
updated_at                  object
dtype: object

My first step is to remove any currencies/columns that are not in USD. Most companies even outside of the U.S. choose to raise funds in dollars. Another common occurence that I see within the dataset is that most of the companies listed here are based in the U.S. This is important to InReach, given that InReach aims to change the market and make venture capital more global and standardized across the UK & EU. 

In [14]:
funding_rounds = funding_rounds.drop(['raised_amount', 'raised_currency_code',
                                     'pre_money_valuation', 'pre_money_currency_code', 
                                      'post_money_valuation', 'post_money_currency_code',
                                     'source_url', 'source_description', 'updated_at '], axis =1)

In [15]:
funding_rounds.head()

Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,pre_money_valuation_usd,post_money_valuation_usd,participants,is_first_round,is_last_round,created_by,created_at
0,1,1,c:4,2006-12-01,series-b,b,8500000,N,N,2,0,0,initial-importer,2007-07-04 04:52:57
1,2,2,c:5,2004-09-01,angel,angel,500000,N,N,2,0,1,initial-importer,2007-05-27 06:08:18
2,3,3,c:5,2005-05-01,series-a,a,12700000,115000000,N,3,0,0,initial-importer,2007-05-27 06:09:10
3,4,4,c:5,2006-04-01,series-b,b,27500000,525000000,N,4,0,0,initial-importer,2007-05-27 06:09:36
4,5,5,c:7299,2006-05-01,series-b,b,10500000,N,N,2,0,0,initial-importer,2007-05-29 11:05:59


In [16]:
print(funding_rounds.shape)

(52928, 14)


In [17]:
#check to see object_id in terms of uniqueness, this will show us how many companies have listed multiple rounds on
#this crunchbase dataset
funding_rounds['object_id'].nunique()

31939

This shows that 31,939 entries are different. My assumption here would be that the remaining amount of data entries are from the same companines on different funding rounds. Now that I have seen the number of unique `object_id`, next I would like to query which `object_id` have made it to series A and obtain the sum. 

In [18]:
obtained_series_a = funding_rounds.query('funding_round_type == "series-a"').groupby(['object_id']).count()
obtained_series_a


Unnamed: 0_level_0,id,funding_round_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,pre_money_valuation_usd,post_money_valuation_usd,participants,is_first_round,is_last_round,created_by,created_at
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
c:1,1,1,1,1,1,1,1,1,1,1,1,1,1
c:1001,1,1,1,1,1,1,1,1,1,1,1,1,1
c:10015,1,1,1,1,1,1,1,1,1,1,1,1,1
c:100271,1,1,1,1,1,1,1,1,1,1,1,1,1
c:1003,1,1,1,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
c:993,1,1,1,1,1,1,1,1,1,1,1,1,1
c:994,1,1,1,1,1,1,1,1,1,1,1,1,1
c:99685,1,1,1,1,1,1,1,1,1,1,1,1,1
c:9972,1,1,1,1,1,1,1,1,1,1,1,1,1


In [19]:
obtained_series_a.sum()

id                          9873
funding_round_id            9873
funded_at                   9873
funding_round_type          9873
funding_round_code          9873
raised_amount_usd           9873
pre_money_valuation_usd     9873
post_money_valuation_usd    9873
participants                9873
is_first_round              9873
is_last_round               9873
created_by                  9873
created_at                  9873
dtype: int64

In [20]:
#who has raised series angel, convertible, seed
funding_rounds['raised_before_a'] = np.where((funding_rounds['funding_round_type'] == 'angel') |
                                      (funding_rounds['funding_round_type'] == 'convertible') |
                                      (funding_rounds['funding_round_type'] == 'seed'), 1, 0) 

In [21]:
#who has raised series a
funding_rounds['raised_series_a'] = np.where((funding_rounds['funding_round_type'] == 'series-a'), 1, 0)

In [22]:
funding_rounds.head()

Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,pre_money_valuation_usd,post_money_valuation_usd,participants,is_first_round,is_last_round,created_by,created_at,raised_before_a,raised_series_a
0,1,1,c:4,2006-12-01,series-b,b,8500000,N,N,2,0,0,initial-importer,2007-07-04 04:52:57,0,0
1,2,2,c:5,2004-09-01,angel,angel,500000,N,N,2,0,1,initial-importer,2007-05-27 06:08:18,1,0
2,3,3,c:5,2005-05-01,series-a,a,12700000,115000000,N,3,0,0,initial-importer,2007-05-27 06:09:10,0,1
3,4,4,c:5,2006-04-01,series-b,b,27500000,525000000,N,4,0,0,initial-importer,2007-05-27 06:09:36,0,0
4,5,5,c:7299,2006-05-01,series-b,b,10500000,N,N,2,0,0,initial-importer,2007-05-29 11:05:59,0,0


## Drop rows that are post series A

In [23]:
funding_rounds['funding_round_code'].unique()

array(['b', 'angel', 'a', 'seed', 'c', 'd', 'unattributed', 'debt_round',
       'e', 'f', 'private_equity', 'grant', 'post_ipo_equity',
       'post_ipo_debt', 'partial', 'convertible', 'crowd', 'g',
       'secondary_market', 'crowd_equity'], dtype=object)

In [24]:
#deleting all rounds after series A
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'b']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'c']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'd']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'unattributed']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'debt_round']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'e']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'f']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'private_equity']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'grant']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'post_ipo_equity']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'post_ipo_debt']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'partial']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'convertible']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'crowed']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'g']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'secondary_market']
funding_rounds = funding_rounds[funding_rounds.funding_round_code != 'crowd_equity']
funding_rounds.shape #checking to see amount reduced

(22957, 16)

In [25]:
funding_rounds.head()

Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,pre_money_valuation_usd,post_money_valuation_usd,participants,is_first_round,is_last_round,created_by,created_at,raised_before_a,raised_series_a
1,2,2,c:5,2004-09-01,angel,angel,500000,N,N,2,0,1,initial-importer,2007-05-27 06:08:18,1,0
2,3,3,c:5,2005-05-01,series-a,a,12700000,115000000,N,3,0,0,initial-importer,2007-05-27 06:09:10,0,1
5,6,6,c:9,2007-01-01,series-a,a,1500000,8500000,10000000,1,0,1,initial-importer,2007-05-31 20:19:28,0,1
7,8,8,c:13,2005-12-01,series-a,seed,1500000,N,N,4,0,0,initial-importer,2007-06-01 19:14:34,0,1
8,9,9,c:14,2007-05-01,series-a,a,6300000,N,N,2,0,1,initial-importer,2007-06-01 20:09:47,0,1


In [26]:
funding_rounds.dtypes

id                           int64
funding_round_id             int64
object_id                   object
funded_at                   object
funding_round_type          object
funding_round_code          object
raised_amount_usd           object
pre_money_valuation_usd     object
post_money_valuation_usd    object
participants                 int64
is_first_round               int64
is_last_round                int64
created_by                  object
created_at                  object
raised_before_a              int64
raised_series_a              int64
dtype: object

## Split Pre-Series A and Series A rounds 

In [27]:
funding_rounds['angel'] = funding_rounds['funding_round_code'].apply(lambda x: 1 if x == 'angel' else 0)
funding_rounds['seed'] = funding_rounds['funding_round_code'].apply(lambda x: 1 if x == 'seed' else 0)
funding_rounds['convertible'] = funding_rounds['funding_round_code'].apply(lambda x: 1 if x == 'convertible' else 0)

funding_rounds.head(100)


Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,pre_money_valuation_usd,post_money_valuation_usd,participants,is_first_round,is_last_round,created_by,created_at,raised_before_a,raised_series_a,angel,seed,convertible
1,2,2,c:5,2004-09-01,angel,angel,500000,N,N,2,0,1,initial-importer,2007-05-27 06:08:18,1,0,1,0,0
2,3,3,c:5,2005-05-01,series-a,a,12700000,115000000,N,3,0,0,initial-importer,2007-05-27 06:09:10,0,1,0,0,0
5,6,6,c:9,2007-01-01,series-a,a,1500000,8500000,10000000,1,0,1,initial-importer,2007-05-31 20:19:28,0,1,0,0,0
7,8,8,c:13,2005-12-01,series-a,seed,1500000,N,N,4,0,0,initial-importer,2007-06-01 19:14:34,0,1,0,1,0
8,9,9,c:14,2007-05-01,series-a,a,6300000,N,N,2,0,1,initial-importer,2007-06-01 20:09:47,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
164,173,173,c:180,2004-09-01,series-a,a,6600000,N,N,2,0,1,initial-importer,2007-07-18 06:30:19,0,1,0,0,0
167,176,176,c:192,2005-03-01,series-a,a,1500000,N,N,9,0,1,initial-importer,2007-07-18 07:02:03,0,1,0,0,0
168,177,177,c:253,2000-05-01,series-a,a,7700000,N,N,1,0,1,initial-importer,2007-07-24 10:56:39,0,1,0,0,0
172,181,181,c:190,2007-07-01,series-a,a,5000000,N,N,2,1,0,initial-importer,2007-07-18 08:15:32,0,1,0,0,0


In [28]:
pre_series_a = funding_rounds[funding_rounds.funding_round_code != 'a']
pre_series_a.head(50)


Unnamed: 0,id,funding_round_id,object_id,funded_at,funding_round_type,funding_round_code,raised_amount_usd,pre_money_valuation_usd,post_money_valuation_usd,participants,is_first_round,is_last_round,created_by,created_at,raised_before_a,raised_series_a,angel,seed,convertible
1,2,2,c:5,2004-09-01,angel,angel,500000,N,N,2,0,1,initial-importer,2007-05-27 06:08:18,1,0,1,0,0
7,8,8,c:13,2005-12-01,series-a,seed,1500000,N,N,4,0,0,initial-importer,2007-06-01 19:14:34,0,1,0,1,0
9,10,10,c:15,2006-06-01,angel,seed,12000,N,N,1,0,1,initial-importer,2007-06-02 07:36:21,1,0,0,1,0
10,11,11,c:15,2007-01-01,angel,angel,40000,N,N,1,0,0,initial-importer,2007-06-02 07:38:17,1,0,1,0,0
31,35,35,c:43,2007-06-01,angel,seed,1000000,N,N,2,1,1,initial-importer,2007-06-16 05:19:47,1,0,0,1,0
39,43,43,c:54,2007-02-01,angel,seed,700000,N,N,1,0,1,initial-importer,2007-06-20 11:04:09,1,0,0,1,0
45,49,49,c:65,2007-01-01,series-a,seed,3000000,N,N,1,1,1,initial-importer,2007-06-23 07:10:02,0,1,0,1,0
48,53,53,c:75,2005-04-01,angel,seed,N,N,N,4,1,1,initial-importer,2007-08-15 09:31:41,1,0,0,1,0
62,68,68,c:87,2007-01-01,angel,angel,100000,N,N,0,1,1,initial-importer,2007-06-28 08:54:12,1,0,1,0,0
94,101,101,c:122,2006-11-01,angel,angel,287033,0,0,0,1,1,initial-importer,2008-02-05 22:44:24,1,0,1,0,0


In [31]:
data_clean = pre_series_a[['object_id', 'raised_amount_usd', 'angel', 'seed', 'convertible', 'participants']]
data_clean

Unnamed: 0,object_id,raised_amount_usd,angel,seed,convertible,participants
1,c:5,500000,1,0,0,2
7,c:13,1500000,0,1,0,4
9,c:15,12000,0,1,0,1
10,c:15,40000,1,0,0,1
31,c:43,1000000,0,1,0,2
...,...,...,...,...,...,...
52857,c:285244,N,1,0,0,3
52866,c:161321,50000,0,1,0,0
52877,c:286151,1969827,0,1,0,0
52879,c:266189,2014005,0,0,0,0


In [32]:
data_clean.query('raised_amount_usd=="N"')

Unnamed: 0,object_id,raised_amount_usd,angel,seed,convertible,participants
48,c:75,N,0,1,0,4
97,c:127,N,0,1,0,0
221,c:265,N,0,1,0,0
236,c:295,N,1,0,0,1
265,c:329,N,0,1,0,1
...,...,...,...,...,...,...
52677,c:285707,N,0,1,0,1
52718,c:25287,N,1,0,0,0
52750,c:161247,N,1,0,0,1
52840,c:283015,N,0,1,0,1


In [33]:
data_clean = data_clean[data_clean.raised_amount_usd != 'N'] #removing N values
data_clean

Unnamed: 0,object_id,raised_amount_usd,angel,seed,convertible,participants
1,c:5,500000,1,0,0,2
7,c:13,1500000,0,1,0,4
9,c:15,12000,0,1,0,1
10,c:15,40000,1,0,0,1
31,c:43,1000000,0,1,0,2
...,...,...,...,...,...,...
52825,c:286052,1000000,0,1,0,0
52866,c:161321,50000,0,1,0,0
52877,c:286151,1969827,0,1,0,0
52879,c:266189,2014005,0,0,0,0


In [35]:
#transform raised_amount_usd into float and then into int
data_clean['raised_amount_usd'] = (data_clean['raised_amount_usd'].astype(float))
data_clean['raised_amount_usd'] = (data_clean['raised_amount_usd'].astype(int))


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [36]:
data_clean.dtypes

object_id            object
raised_amount_usd     int64
angel                 int64
seed                  int64
convertible           int64
participants          int64
dtype: object

### In the cell below I am trying to sum up the same object_ids that show up more than once and simplify it into one row. I have tried countless options and due to the constraint on time I have chosen to leave this for now.

In [52]:
#data_clean.groupby(['object_id']).sum()
data_clean.groupby(data_clean['object_id']).sum()
#data_clean.groupby(['object_id']).transform('sum')
data_clean.head(50)

Unnamed: 0,object_id,raised_amount_usd,angel,seed,convertible,participants
1,c:5,500000,1,0,0,2
7,c:13,1500000,0,1,0,4
9,c:15,12000,0,1,0,1
10,c:15,40000,1,0,0,1
31,c:43,1000000,0,1,0,2
39,c:54,700000,0,1,0,1
45,c:65,3000000,0,1,0,1
62,c:87,100000,1,0,0,0
94,c:122,287033,1,0,0,0
99,c:125,50000,1,0,0,1


## I will check the other columns that I find to be important when it comes to raising a round. Pre and Post money valuations. 

In [53]:
premoney_n = funding_rounds.query('pre_money_valuation_usd == "N"').count()
total_entries = 52928
n_val_premoney = premoney_n/total_entries * 100
print(n_val_premoney, "%")

id                          43.317337
funding_round_id            43.317337
object_id                   43.317337
funded_at                   43.317337
funding_round_type          43.317337
funding_round_code          43.317337
raised_amount_usd           43.317337
pre_money_valuation_usd     43.317337
post_money_valuation_usd    43.317337
participants                43.317337
is_first_round              43.317337
is_last_round               43.317337
created_by                  43.317337
created_at                  43.317337
raised_before_a             43.317337
raised_series_a             43.317337
angel                       43.317337
seed                        43.317337
convertible                 43.317337
dtype: float64 %


In [54]:
postmoney_n = funding_rounds.query('post_money_valuation_usd == "N"').count()
total_entries = 52928
n_val_postmoney = postmoney_n/total_entries * 100
print(n_val_postmoney, "%")

id                          41.48088
funding_round_id            41.48088
object_id                   41.48088
funded_at                   41.48088
funding_round_type          41.48088
funding_round_code          41.48088
raised_amount_usd           41.48088
pre_money_valuation_usd     41.48088
post_money_valuation_usd    41.48088
participants                41.48088
is_first_round              41.48088
is_last_round               41.48088
created_by                  41.48088
created_at                  41.48088
raised_before_a             41.48088
raised_series_a             41.48088
angel                       41.48088
seed                        41.48088
convertible                 41.48088
dtype: float64 %
