




![Kickstarter_logo](Kickstarter_Logo.png)

Launched in 2009, **Kickstarter** is one of the world's leading crowdfunding platforms. As of December 2019, Kickstarter has received more than $4.6 billion in pledges from 17.2 million backers to fund 445,000 projects, such as films, music, stage shows, comics, journalism, video games, technology, publishing, and food-related projects. It's mission is to bring "*creative projects to life*"

Kickstarter, founded in 2009, is one particularly well-known and popular crowdfunding platform. It has an all-or-nothing funding model, whereby a project is only funded if it meets its goal amount; otherwise no money is given by backers to a project.
A huge variety of factors contribute to the success or failure of a project — in general, and also on Kickstarter. Some of these are able to be quantified or categorized, which allows for the construction of a model to attempt to predict whether a project will succeed or not.

## Objective of the Analysis 

1. Primary **business objective** is to provide creators with a recommendation on how to launch a successful Kickstarter campaign! 

2. Determine which factors decide whether or not a project will achieved is funding goal. 

### How do we define a "successful" Project? 

Success in the context of the Kickstarter Dataset is defined as achieving the funding goal.

## Project Requirements: 

1. Try different (at least 3) machine learning algorithms to check which performs best on the problem at hand
2. What would be the right performance metric- precision, recall, accuracy, F1 score, or something else? (Check TPR?)

**Hint**: Check for Data imbalance


## Description of the Dataset 

**'backers_count':** Number of folks who pledge money to join creators in bringing projects to life

**'blurb':** Description of the project / company

**'category':** Describes the topic of the project (e.g. music, fashion)

**'converted_pledged_amount':** Amount of money pledged, converted to the currency on the `current_currency`column 

**'country':** Country where the project creators originates from 

**'created_at':** Date and time when the project was initially created on Kickstarter

**'creator':** The person or team behind the project idea, working to bring it to life

**'currency':** Name of original currency 

**'currency_symbol':** corresponding currency symbol

**'currency_trailing_code':** ?

**'current_currency':** Currency after the conversion has taken place

**'deadline':** Final crowdfunding date

**'disable_communication':** whether or not a project creator is able to communicate with they backers

**'friends':** unclear, null or empty

**'fx_rate':** Foreign exchange rate between the original currency and the current currency

**'goal':** The amount of money that a creator needs to complete their project. Minimum requirement for the project to be financed

**'id':** Project ID

**'is_backing':** 

**'is_starrable':** provides the option to leave a star review

**'is_starred':** has received a star review

**'launched_at':** state and time when the project was launched for funding at Kickstarter

**'location':** Contains the town or city of the project creator

**'name':** Name of the campaign

**'permissions':** unclear; is either NA or empty in the dataset

**'photo':** contains a link and information to the projects photos

**'pledged':** Amount pledged by the contributors in the original currency 

**'profile':** Details about the projects profile including ID number and various visual settings

**'slug':** Name of the project with hyphens and lowercase letters instead of spaces and uppercase letters

**'source_url':** link to the project category on the Kickstarter website

**'spotlight':** Option to put the campaign in a spotlight via a landing page on Kickstarter after it has been successfully financed

**'staff_pick':** Whether a project was handpicked and highlighted by the Kickstarter team. These projects are displayed favorably on the Kickstarter page.

**'state':** Status of the campaign that can be classified into one of the following categories:
   * *'successful'*: project has achieved the funding goal and is neither canceled or suspended. (only classified as successful after the deadline has passed?)
   * *'failed'*: project has failed to achieve the funding goal within the deadline
   * *'live'*: Campaign are classified as live when they are still ongoing regardless of whether they have already achieved the funding goal or not
   * *'suspended'*: A project may be suspended if the Trust & Safety team uncovers evidence that it is in violation of Kickstarter's rules
   * *'canceled'*: A project may be canceled if the creator wants to make any major changes to the project, such as the funding goal or campaign duration, or likes to rework the idea and start again
   
**'state_changed_at':** Date and time when a project status was changed (e.g. from live to successful / failed)

**'static_usd_rate':** Conversion rate between the original currency and USD

**'urls':** link to the creator's campaign on Kickstarter

**'usd_pledged':** Pledged amount converted to USD done by Kickstarter

**'usd_type':** unclear, classifies either as domestic or international 

##  Import Libaries

In [63]:
import pandas as pd
import numpy as np
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import time
import calendar

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, plot_confusion_matrix
from sklearn.metrics import roc_curve, accuracy_score, precision_recall_curve, f1_score, precision_score, recall_score
from sklearn.preprocessing import StandardScaler


%matplotlib inline
sns.set_theme(palette="light:#5A9")
sns.set_context("paper", rc={"font.size":8,"axes.titlesize":8,"axes.labelsize":5})

##  Import the Datasets

We have a total of 55 csv documents which combined make up our dataset. In a first step, we will load and concatenate all of the csv files into one dataframe. 

In [56]:
import glob

In [58]:
df = pd.concat([pd.read_csv(i) for i in glob.glob("data/Kickstarter*.csv")], ignore_index=True)

In [1]:
import pandas as pd

In [59]:
df.shape

(209222, 37)

**Remark:**
Our combined Kickstarter dataset has a total of 209222 observations and 37 columns.  

## Data Cleaning

### Understanding the dataset

In [72]:
df.head(2).T

Unnamed: 0,0,1
backers_count,315,47
blurb,Babalus Shoes,A colorful Dia de los Muertos themed oracle de...
category,"{""id"":266,""name"":""Footwear"",""slug"":""fashion/fo...","{""id"":273,""name"":""Playing Cards"",""slug"":""games..."
converted_pledged_amount,28645,1950
country,US,US
created_at,1541459205,1501684093
creator,"{""id"":2094277840,""name"":""Lucy Conroy"",""slug"":""...","{""id"":723886115,""name"":""Lisa Vollrath"",""slug"":..."
currency,USD,USD
currency_symbol,$,$
currency_trailing_code,True,True


In [73]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209222 entries, 0 to 209221
Data columns (total 37 columns):
backers_count               209222 non-null int64
blurb                       209214 non-null object
category                    209222 non-null object
converted_pledged_amount    209222 non-null int64
country                     209222 non-null object
created_at                  209222 non-null int64
creator                     209222 non-null object
currency                    209222 non-null object
currency_symbol             209222 non-null object
currency_trailing_code      209222 non-null bool
current_currency            209222 non-null object
deadline                    209222 non-null int64
disable_communication       209222 non-null bool
friends                     300 non-null object
fx_rate                     209222 non-null float64
goal                        209222 non-null float64
id                          209222 non-null int64
is_backing                  300 

**Commentary**:

Looking at the output from the commands `head()` and `info()`, there seem to a number of columns which contain information that are not usable in their current format.

In addition, `permissions`, `is_backing`, `friends` and `is_starred` only have 300 observations each. Hence, we should consider removing them. In contrast, the remaining columns are almost complete. 

Furthermore, the time related columns are not in a date format yet and therefore need to be converted. 
Finally, the columns `name`and `slug`contain the same information just in a slightly different form. We can therefore consider removing one of the columns. 

In [71]:
df.duplicated(subset='id', keep='first').sum()

26958

In [60]:
df.head(1).T

Unnamed: 0,0
backers_count,315
blurb,Babalus Shoes
category,"{""id"":266,""name"":""Footwear"",""slug"":""fashion/fo..."
converted_pledged_amount,28645
country,US
created_at,1541459205
creator,"{""id"":2094277840,""name"":""Lucy Conroy"",""slug"":""..."
currency,USD
currency_symbol,$
currency_trailing_code,True


In [45]:
df['is_starred'].unique()

array([nan, False], dtype=object)

In [41]:
df['is_backing'].unique()

array([nan, False], dtype=object)

In [42]:
df['is_starrable'].unique()

array([False,  True])

In [36]:
df["category"].unique()

array(['{"id":43,"name":"Rock","slug":"music/rock","position":17,"parent_id":14,"color":10878931,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/music/rock"}}}',
       '{"id":54,"name":"Mixed Media","slug":"art/mixed media","position":6,"parent_id":1,"color":16760235,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/art/mixed%20media"}}}',
       '{"id":280,"name":"Photobooks","slug":"photography/photobooks","position":5,"parent_id":15,"color":58341,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/photography/photobooks"}}}',
       '{"id":266,"name":"Footwear","slug":"fashion/footwear","position":5,"parent_id":9,"color":16752598,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/fashion/footwear"}}}',
       '{"id":51,"name":"Software","slug":"technology/software","position":11,"parent_id":16,"color":6526716,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/techno

In [33]:
list(df.columns)

['backers_count',
 'blurb',
 'category',
 'converted_pledged_amount',
 'country',
 'created_at',
 'creator',
 'currency',
 'currency_symbol',
 'currency_trailing_code',
 'current_currency',
 'deadline',
 'disable_communication',
 'friends',
 'fx_rate',
 'goal',
 'id',
 'is_backing',
 'is_starrable',
 'is_starred',
 'launched_at',
 'location',
 'name',
 'permissions',
 'photo',
 'pledged',
 'profile',
 'slug',
 'source_url',
 'spotlight',
 'staff_pick',
 'state',
 'state_changed_at',
 'static_usd_rate',
 'urls',
 'usd_pledged',
 'usd_type']

In [46]:
list(df['state'].unique())

['successful', 'failed', 'live', 'canceled', 'suspended']

In [20]:
print(len(df[df['state']=="successful"]))
print(len(df[df['state']=="failed"]))
print(len(df[df['state']=="live"]))
print(len(df[df['state']=="canceled"]))
print(len(df[df['state']=="suspended"]))

2224
1276
120
149
10


In [53]:
df['usd_type'].unique()

array(['international', 'domestic', nan], dtype=object)

In [21]:
df_successful = df[df['state']=='successful']

In [39]:
df["current_currency"].unique()

array(['USD', 'CAD', 'GBP', 'AUD'], dtype=object)

In [51]:
df['permissions'].unique()

array([nan, '[]'], dtype=object)

In [25]:
df_successful.head(2).T

Unnamed: 0,0,1
backers_count,21,97
blurb,2006 was almost 7 years ago.... Can you believ...,An adorable fantasy enamel pin series of princ...
category,"{""id"":43,""name"":""Rock"",""slug"":""music/rock"",""po...","{""id"":54,""name"":""Mixed Media"",""slug"":""art/mixe..."
converted_pledged_amount,802,2259
country,US,US
created_at,1387659690,1549659768
creator,"{""id"":1495925645,""name"":""Daniel"",""is_registere...","{""id"":1175589980,""name"":""Katherine"",""slug"":""fr..."
currency,USD,USD
currency_symbol,$,$
currency_trailing_code,True,True


In [27]:
df_failed = df[df['state']=='failed']

In [30]:
df_failed.sample(2).T

Unnamed: 0,1197,1036
backers_count,4,5
blurb,"Wir wollen ein Event veranstalten, bei dem all...",This is the start of the commercial space race...
category,"{""id"":259,""name"":""Civic Design"",""slug"":""design...","{""id"":340,""name"":""Space Exploration"",""slug"":""t..."
converted_pledged_amount,16,50
country,DE,US
created_at,1513432517,1498228691
creator,"{""id"":210132004,""name"":""Robin Gassmann"",""is_re...","{""id"":1640189118,""name"":""Jeneen Price-Mills"",""..."
currency,EUR,USD
currency_symbol,€,$
currency_trailing_code,False,True


In [31]:
df_canceled = df[df['state']=='canceled']

In [32]:
df_canceled.sample(2).T

Unnamed: 0,449,1284
backers_count,1,10
blurb,"Education overcomes oppression in Treemonisha,...",Let our app - The Butler - host a party at you...
category,"{""id"":24,""name"":""Performance Art"",""slug"":""art/...","{""id"":271,""name"":""Live Games"",""slug"":""games/li..."
converted_pledged_amount,10,502
country,US,US
created_at,1487210294,1410621005
creator,"{""id"":390369028,""name"":""Miriam Miller"",""is_reg...","{""id"":788704897,""name"":""Moonlark Mysteries"",""i..."
currency,USD,USD
currency_symbol,$,$
currency_trailing_code,True,True


In [47]:
df_live = df[df['state']=='live']

In [48]:
df_live.sample(3).T

Unnamed: 0,1839,2469,1445
backers_count,8,4,107
blurb,Help a small business grow and expand to reach...,"A Trading Card Game inspired by the Silent, Go...",A RPG zine focusing on epic/immortal games for...
category,"{""id"":10,""name"":""Food"",""slug"":""food"",""position...","{""id"":34,""name"":""Tabletop Games"",""slug"":""games...","{""id"":34,""name"":""Tabletop Games"",""slug"":""games..."
converted_pledged_amount,166,42,2963
country,AU,US,US
created_at,1548495035,1548834235,1548758900
creator,"{""id"":637032816,""name"":""Teegan K"",""is_register...","{""id"":2130740106,""name"":""HappyPlumz"",""slug"":""h...","{""id"":1443596074,""name"":""dplunkett.hcgaming@gm..."
currency,AUD,USD,USD
currency_symbol,$,$,$
currency_trailing_code,True,True,True


In [50]:
df[df['currency']=='JPY'].sample(2).T

Unnamed: 0,994,3007
backers_count,1705,4
blurb,"We wish to bring our SRPG Visual Novel ""VenusB...","Using a new artificial opal ""Hybrid Opal"" proc..."
category,"{""id"":35,""name"":""Video Games"",""slug"":""games/vi...","{""id"":267,""name"":""Jewelry"",""slug"":""fashion/jew..."
converted_pledged_amount,210046,737
country,JP,JP
created_at,1530788869,1508401681
creator,"{""id"":1396516721,""name"":""Ninetail"",""is_registe...","{""id"":1438345805,""name"":""Takeshi Hashimoto"",""i..."
currency,JPY,JPY
currency_symbol,¥,¥
currency_trailing_code,False,False


In [29]:
df["spotlight"].unique()

array([ True, False])

In [19]:
import numpy as np

In [None]:
df["state"].nu

In [6]:
df.columns

Index(['backers_count', 'blurb', 'category', 'converted_pledged_amount',
       'country', 'created_at', 'creator', 'currency', 'currency_symbol',
       'currency_trailing_code', 'current_currency', 'deadline',
       'disable_communication', 'friends', 'fx_rate', 'goal', 'id',
       'is_backing', 'is_starrable', 'is_starred', 'launched_at', 'location',
       'name', 'permissions', 'photo', 'pledged', 'profile', 'slug',
       'source_url', 'spotlight', 'staff_pick', 'state', 'state_changed_at',
       'static_usd_rate', 'urls', 'usd_pledged', 'usd_type'],
      dtype='object')

In [11]:
#df.info()

In [8]:
df1 = pd.read_csv("data/Kickstarter001.csv")

In [37]:
df1.head(2).T

Unnamed: 0,0,1
backers_count,30,0
blurb,Experience tea and coffee as it should be in o...,Playing Roles Outside of Basic Education (P.R....
category,"{""id"":350,""name"":""Pottery"",""slug"":""crafts/pott...","{""id"":49,""name"":""Periodicals"",""slug"":""publishi..."
converted_pledged_amount,1547,0
country,GB,US
created_at,1515610761,1426362805
creator,"{""id"":1322978446,""name"":""Harley Boden"",""is_reg...","{""id"":1083773055,""name"":""Bobby Walker"",""is_reg..."
currency,GBP,USD
currency_symbol,£,$
currency_trailing_code,False,True


In [10]:
df1.shape

(3784, 37)

In [13]:
df2=pd.read_csv("data/Kickstarter055.csv")

In [38]:
df2.head(2).T

Unnamed: 0,0,1
backers_count,36,15
blurb,"I love drawing, and I want to share this love!...","An historic event, romantic getaway, favorite ..."
category,"{""id"":22,""name"":""Illustration"",""slug"":""art/ill...","{""id"":20,""name"":""Conceptual Art"",""slug"":""art/c..."
converted_pledged_amount,465,987
country,NL,US
created_at,1429872801,1353894822
creator,"{""id"":1855992790,""name"":""Nouchka"",""is_register...","{""id"":2007614172,""name"":""Eric Hine"",""is_regist..."
currency,EUR,USD
currency_symbol,€,$
currency_trailing_code,False,True


In [15]:
df2.shape

(965, 37)