![kickstarter-logo](https://ksr-static.imgix.net/tq0sfld-kickstarter-logo-green.png?ixlib=rb-2.1.0&s=0cce952d7b55823ff451a58887a0c578)

# Workshop: Kickstarter data

Kickstarter is a crowdfunding website. Anyone can launch a fund to build something cool and anybody can chip in! It's an amazing resource for indie builders and when it first launched it was a really novel way to raise money around an idea.

We're going to go through the process of exploring and engineering data, building a model, and visualising the results. We'll try to predict whether or not a kickstarter project will be funded.

**Links**
 - Kickstarter website: https://www.kickstarter.com/
 - Shared folder for class: https://drive.google.com/open?id=1PlcVyu8PmquwxkqAAZO0mSidS68xbNZW
 - This code: https://git.generalassemb.ly/DSga38/ds_ga_38

# 1. Load data

First of all we just load up the data and take a peek at what we have.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
from sklearn.linear_model import SGDClassifier
from sklearn import metrics

In [3]:
# We have two columns that should have datatype 'datetime'
kaggle_data = pd.read_csv('data/ks-projects-201801.csv', parse_dates=['deadline', 'launched'])

In [9]:
# deadline = deadline for the project
# launched = launch date for the project
# pledged = amount pledged
# backers = number of people who pledged money
# usd_pledged = amount of US dollars pledged
# usd_pledged_real = amount of US dollars the project achieved at deadline
# usd_goal_real = amount of US dollars the project asked for initially
kaggle_data.head()

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,1000002330,The Songs of Adelaide & Abullah,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,1000003930,Greeting From Earth: ZGAC Arts Capsule For ET,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.0
2,1000004038,Where is Hank?,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.0
3,1000007540,ToshiCapital Rekordz Needs Help to Complete Album,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.0
4,1000011046,Community Film Project: The Art of Neighborhoo...,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,14,US,1283.0,1283.0,19500.0


In [6]:
# Failed - no longer active and didn't reach goal
# Canceled - no longer active because someone stopped project
# Undefined - campaign hasn't launched yet/incomplete data...
# Suspended - approved through launch but on closer inspection do not follow the rules

# Live - currently active, between start and end dates (usd_goal_real is more indicative??)
# Successful - no longer active but reached their goal (usd_pledged_real is equal or more than usd_goal_real??)
kaggle_data['state'].value_counts()

failed        197719
successful    133956
canceled       38779
undefined       3562
live            2799
suspended       1846
Name: state, dtype: int64

In [11]:
kaggle_data[kaggle_data['state'] == 'successful']

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
5,1000014025,Monarch Espresso Bar,Restaurants,Food,USD,2016-04-01,50000.0,2016-02-26 13:38:27,52375.00,successful,224,US,52375.00,52375.00,50000.00
6,1000023410,Support Solar Roasted Coffee & Green Energy! ...,Food,Food,USD,2014-12-21,1000.0,2014-12-01 18:30:44,1205.00,successful,16,US,1205.00,1205.00,1000.00
11,100005484,Lisa Lim New CD!,Indie Rock,Music,USD,2013-04-08,12500.0,2013-03-09 06:42:58,12700.00,successful,100,US,12700.00,12700.00,12500.00
14,1000057089,Tombstone: Old West tabletop game and miniatur...,Tabletop Games,Games,GBP,2017-05-03,5000.0,2017-04-05 19:44:18,94175.00,successful,761,GB,57763.78,121857.33,6469.73
18,1000070642,Mike Corey's Darkness & Light Album,Music,Music,USD,2012-08-17,250.0,2012-08-02 14:11:32,250.00,successful,7,US,250.00,250.00,250.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
378642,999929142,ÉPOUVANTAILS : 28mm Figurines de jeux pour KIN...,Tabletop Games,Games,EUR,2017-10-31,1000.0,2017-10-04 11:26:44,1246.00,successful,35,FR,66.72,1452.47,1165.70
378644,999934908,The Manual Bar Blade,Product Design,Design,USD,2015-12-15,3500.0,2015-11-23 07:33:14,6169.00,successful,120,US,6169.00,6169.00,3500.00
378646,999943841,The Dog Coffee Book,Children's Books,Publishing,USD,2013-11-30,950.0,2013-10-18 21:35:04,1732.02,successful,31,US,1732.02,1732.02,950.00
378651,999969812,AT THE BEACH,Classical Music,Music,CAD,2014-03-22,5000.0,2014-02-20 01:00:16,5501.00,successful,78,CA,5019.92,4983.69,4529.81
