# Capstone Project

## Goal

The goal of this project is to use predictive analytics to determine what will make it more likely to have a successful Kickstarter based on historical data. The historical data tells us which projects were successful and which projects were not.

https://www.kickstarter.com/help/handbook/funding

Kickstarter provides what is called a creator's handbook for funding. The original objective of this analysis was to determine what leads to successful boardgames. From there the idea was to create a boardgame based on my findings to see if I could create a successful boardgame based on the findings. However, an important first phase of this analysis was to see if I could predict whether or not a project would be successful. So that is what I did here.

## Import Libraries

In [1]:
import os
import glob
import pandas as pd
# os.chdir("./datasets/kickstarter_data/") # uncomment to run initially

import re
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LassoCV, RidgeCV
from sklearn.preprocessing import PolynomialFeatures, PowerTransformer
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict
from sklearn.metrics import r2_score

%matplotlib inline

## Gather Data

Data came from:
https://webrobots.io/kickstarter-datasets/

## Combine Data

The cell below should only be ran one time. The code 

In [2]:
## uncomment to run initially
## credit: https://www.freecodecamp.org/news/how-to-combine-multiple-csv-files-with-8-lines-of-code-265183e0854/
# extension = 'csv'
# all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

# #combine all files in the list
# combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
# #export to csv
# combined_csv.to_csv( "combined.csv", index=False, encoding='utf-8-sig')

## Read in Data

In [3]:
df = pd.read_csv('./datasets/kickstarter_data/combined.csv')

## Exploratory Data Analysis (EDA)

In [43]:
pd.set_option('display.max_rows', 9999)
pd.set_option('display.max_columns', 9999)
pd.set_option('display.width', 9999)

In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 217006 entries, 0 to 217432
Data columns (total 34 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             217006 non-null  int64  
 1   blurb                     217006 non-null  object 
 2   category                  217006 non-null  object 
 3   converted_pledged_amount  217006 non-null  int64  
 4   country                   217006 non-null  object 
 5   country_displayable_name  217006 non-null  object 
 6   created_at                217006 non-null  int64  
 7   creator                   217006 non-null  object 
 8   currency                  217006 non-null  object 
 9   currency_symbol           217006 non-null  object 
 10  currency_trailing_code    217006 non-null  bool   
 11  current_currency          217006 non-null  object 
 12  deadline                  217006 non-null  int64  
 13  disable_communication     217006 non-null  b

In [5]:
df.describe()

Unnamed: 0,backers_count,converted_pledged_amount,created_at,deadline,fx_rate,goal,id,launched_at,pledged,state_changed_at,static_usd_rate,usd_pledged
count,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0,217433.0
mean,153.312377,13914.86,1475045000.0,1482085000.0,0.972468,50864.0,1073505000.0,1479240000.0,25285.57,1481932000.0,1.001734,13919.28
std,955.46558,111587.3,73251890.0,72977420.0,0.224465,1225217.0,619408500.0,72985110.0,914915.4,72873330.0,0.239715,111583.7
min,0.0,0.0,1240366000.0,1242468000.0,0.009327,0.01,18520.0,1240674000.0,0.0,1242468000.0,0.008771,0.0
25%,4.0,125.0,1422421000.0,1428688000.0,1.0,1500.0,536953800.0,1425783000.0,130.0,1428555000.0,1.0,125.0
50%,29.0,1632.0,1476545000.0,1483462000.0,1.0,5000.0,1073543000.0,1480562000.0,1677.0,1483387000.0,1.0,1633.22
75%,93.0,6820.0,1540860000.0,1549209000.0,1.0,15000.0,1610309000.0,1546381000.0,7340.0,1549132000.0,1.0,6833.0
max,105857.0,12969610.0,1589423000.0,1594600000.0,9.464383,100000000.0,2147476000.0,1589431000.0,235320500.0,1589432000.0,1.716408,12969610.0


### Missing Data

In [7]:
missing_values= df.isnull().sum()
missing_values/len(df)
missing_values.sort_values(ascending=False)

is_backing                  217361
permissions                 217361
friends                     217361
is_starred                  217361
location                       215
usd_type                       204
blurb                            8
staff_pick                       0
spotlight                        0
category                         0
converted_pledged_amount         0
country                          0
country_displayable_name         0
created_at                       0
creator                          0
currency                         0
currency_symbol                  0
currency_trailing_code           0
current_currency                 0
deadline                         0
disable_communication            0
urls                             0
fx_rate                          0
goal                             0
id                               0
usd_pledged                      0
is_starrable                     0
static_usd_rate                  0
launched_at         

### Resolve Missing Values

In [8]:
# drop these features due to having a significant number of missing values
df.drop([
    'friends',
    'is_backing',
    'is_starred',
    'permissions'
], axis=1, inplace=True)

In [9]:
# eliminate remaining missing values
df.dropna(inplace=True)

In [11]:
# verify missing values were resolved
missing_values= df.isnull().sum()
missing_values/len(df)
missing_values.sort_values(ascending=False)

usd_type                    0
currency                    0
fx_rate                     0
disable_communication       0
deadline                    0
current_currency            0
currency_trailing_code      0
currency_symbol             0
creator                     0
usd_pledged                 0
created_at                  0
country_displayable_name    0
country                     0
converted_pledged_amount    0
category                    0
blurb                       0
goal                        0
id                          0
is_starrable                0
launched_at                 0
location                    0
name                        0
photo                       0
pledged                     0
profile                     0
slug                        0
source_url                  0
spotlight                   0
staff_pick                  0
state                       0
state_changed_at            0
static_usd_rate             0
urls                        0
backers_co

### Re-Explore Data

In [12]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,country_displayable_name,created_at,creator,currency,currency_symbol,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,1,we are going Production herbal teabag of plan...,"{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",19,AU,Australia,1441269202,"{""id"":1555219532,""name"":""ehsan"",""is_registered...",AUD,$,...,production-herbal-teabag-of-plants-native-to-iran,https://www.kickstarter.com/discover/categorie...,False,False,failed,1444141184,0.691164,"{""web"":{""project"":""https://www.kickstarter.com...",18.66144,domestic
1,637,Two agents battle each other in another dimens...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",16233,US,the United States,1576048498,"{""id"":99575233,""name"":""David Gerrard"",""is_regi...",USD,$,...,slip-strike-0,https://www.kickstarter.com/discover/categorie...,True,False,successful,1583987400,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",16233.0,domestic
2,50,A collection of Hard Enamel pins inspired by T...,"{""id"":262,""name"":""Accessories"",""slug"":""fashion...",983,CA,Canada,1560821709,"{""id"":1855173855,""name"":""Caitlin Peters"",""slug...",CAD,$,...,tattoo-shop-flash,https://www.kickstarter.com/discover/categorie...,True,False,successful,1564165825,0.7629,"{""web"":{""project"":""https://www.kickstarter.com...",987.4137,domestic
3,8,"Low carb, no sugar sauces and marinades using ...","{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",361,US,the United States,1563139848,"{""id"":1148188586,""name"":""Ian"",""slug"":""penningt...",USD,$,...,penningtons-keto-sauces-and-marinades,https://www.kickstarter.com/discover/categorie...,False,False,failed,1569530544,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",361.0,domestic
4,6452,The everyday bag fused with Parisian chic and ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1385803,US,the United States,1561364892,"{""id"":1085606247,""name"":""Laflore"",""slug"":""bobo...",USD,$,...,bobobark-designed-for-women-made-for-life,https://www.kickstarter.com/discover/categorie...,True,False,successful,1568408340,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1385803.0,domestic


In [13]:
df.columns

Index(['backers_count', 'blurb', 'category', 'converted_pledged_amount',
       'country', 'country_displayable_name', 'created_at', 'creator',
       'currency', 'currency_symbol', 'currency_trailing_code',
       'current_currency', 'deadline', 'disable_communication', 'fx_rate',
       'goal', 'id', 'is_starrable', 'launched_at', 'location', 'name',
       'photo', 'pledged', 'profile', 'slug', 'source_url', 'spotlight',
       'staff_pick', 'state', 'state_changed_at', 'static_usd_rate', 'urls',
       'usd_pledged', 'usd_type'],
      dtype='object')

In [18]:
df.describe()

Unnamed: 0,backers_count,converted_pledged_amount,created_at,deadline,fx_rate,goal,id,launched_at,pledged,state_changed_at,static_usd_rate,usd_pledged
count,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0,217006.0
mean,153.397791,13918.43,1475155000.0,1482196000.0,0.971724,50935.42,1073520000.0,1479353000.0,25307.48,1482044000.0,1.001748,13929.65
std,956.261442,111670.7,72962000.0,72681080.0,0.213597,1226416.0,619465700.0,72684190.0,915798.9,72577960.0,0.239873,111681.6
min,0.0,0.0,1240366000.0,1242468000.0,0.009327,0.01,18520.0,1240920000.0,0.0,1242468000.0,0.008771,0.0
25%,4.0,125.0,1422486000.0,1428764000.0,1.0,1500.0,536864300.0,1425915000.0,130.0,1428638000.0,1.0,125.0
50%,29.0,1630.0,1476549000.0,1483467000.0,1.0,5000.0,1073560000.0,1480564000.0,1675.0,1483394000.0,1.0,1631.0
75%,93.0,6818.0,1540804000.0,1549072000.0,1.0,15000.0,1610402000.0,1546214000.0,7341.0,1549039000.0,1.0,6831.308
max,105857.0,12969610.0,1589423000.0,1594600000.0,1.226759,100000000.0,2147476000.0,1589431000.0,235320500.0,1589432000.0,1.716408,12969610.0


### Observations

At the time of this writing, no data dictionary can be found so I have to make some assumptions as to what some of these features are based on research on the terms. For terms that I cannot explain, they will likely be removed unless they provide substantial meaning.

After all of the missing values were removed, 34 columns remained:

||Feature|Data Type|Description|
|--------|--------|--------|-------|
|1|Backers count|integer|number of backers supporting the project|
|2|Blurb| text|text that describes the project|
|3|Category|object|a string of text that includes the project ID, the 'name' of the project, 'slug' which includes the name and the broader category that the project falls into, position number, parent id, parent name (the broader category), color number, and the url|
|4|Converted pledged amount|integer| -------------- |
|5|Country|nominal| --------------|
|6|Country displayable name|nominal|----------------|
|7|Created at|timestamp| ----------------|
|8|Creator|object|a string of text that includes the project ID, the 'name' of the project, 'slug' which includes the name and the broader category that the project falls into, position number, parent id, parent name (the broader category), color number, and the url|
|9|Currency|nominal| ----------------|
|10|Currency symbol|nominal| the symbol for the type of currency|
|11|Currency trailing code|boolean| ----------|
|12|Current currency|object|------------|
|13|Deadline|integer|--------|
|14|Disable communication|boolean|----------------------|
|15|FX_rate|float|-----------|
|16|Goal|float|--------------|
|17|ID|integer| number of backers supporting the project|
|18|Is starrable|integer| number of backers supporting the project|
|19|Launched at|integer| number of backers supporting the project|
|20|Location|integer| number of backers supporting the project|
|21|Name|integer| number of backers supporting the project|
|22|Photo|integer| number of backers supporting the project|
|23|Pledged|integer| number of backers supporting the project|
|24|Profile|integer| number of backers supporting the project|
|25|Slug|integer| number of backers supporting the project|
|26|Source url|integer| number of backers supporting the project|
|27|Spotlight|integer| number of backers supporting the project|
|28|Staff pick|integer| number of backers supporting the project|
|29|State|nominal| number of backers supporting the project|
|30|State changed at|integer| number of backers supporting the project|
|31|Static usd rate|integer| number of backers supporting the project|
|32|Urls|integer| number of backers supporting the project|
|33|USD pledged|integer| number of backers supporting the project|
|34|USD type|integer| number of backers supporting the project|

### Value Counts

In [32]:
df.columns

Index(['backers_count', 'blurb', 'category', 'converted_pledged_amount',
       'country', 'country_displayable_name', 'created_at', 'creator',
       'currency', 'currency_symbol', 'currency_trailing_code',
       'current_currency', 'deadline', 'disable_communication', 'fx_rate',
       'goal', 'id', 'is_starrable', 'launched_at', 'location', 'name',
       'photo', 'pledged', 'profile', 'slug', 'source_url', 'spotlight',
       'staff_pick', 'state', 'state_changed_at', 'static_usd_rate', 'urls',
       'usd_pledged', 'usd_type'],
      dtype='object')

In [62]:
df.creator.value_counts()

{"id":2053011023,"name":"Benjamin Hennessey","slug":"combatmedallions","is_registered":null,"chosen_currency":null,"is_superbacker":null,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/008/647/822/59acad1fb0a00a22cd0c5df2db43343f_original.jpg?ixlib=rb-2.1.0&w=40&h=40&fit=crop&v=1461536749&auto=format&frame=1&q=92&s=681661727d91252651719bdc7202b454","small":"https://ksr-ugc.imgix.net/assets/008/647/822/59acad1fb0a00a22cd0c5df2db43343f_original.jpg?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1461536749&auto=format&frame=1&q=92&s=119a85455aafb64c83a17e481c02a595","medium":"https://ksr-ugc.imgix.net/assets/008/647/822/59acad1fb0a00a22cd0c5df2db43343f_original.jpg?ixlib=rb-2.1.0&w=160&h=160&fit=crop&v=1461536749&auto=format&frame=1&q=92&s=119a85455aafb64c83a17e481c02a595"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/combatmedallions"},"api":{"user":"https://api.kickstarter.com/v1/users/2053011023?signature=1589516310.c7ebe463c5a4b9915638287eb55c3dbe464dffc5"}}}    12
{"id":1712