**Plan**

Question
1. Which model can best predict whether or not you will get a Kickstarter funding?
2. Which "live" projects will likely be funded?

Steps:
1. Acquire and Prep
2. Explore
    - Which features are the strongest predictors of funding? (Feature selection)
    - Out of the ones that received funding, which ones are over funded and underfunded?
3. Model
4. Evaluate

### Packages

In [1]:
import wrangle
import pandas as pd
import numpy as np

## Acquire
> Skip to EDA

Read csv file from local repo using `get_kickstarter` function from the wrangle.py module.

In [35]:
kick = wrangle.get_kickstarter()

`recode_locations` from wrangle.py splits the `location` column into `city` and `origin` and re-encodes erratic location values like "Argent" to "Argentina" and fixes other location to match the standard form, like "Kyoto, Japan" to "Japan" for international locations.

In [36]:
kick = wrangle.recode_locations(kick)

In [37]:
kick.sample(5)

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,city,origin
32179,1514323028,Help Kickstart the Entire Dvorah Ivy Fashion Line,http://www.kickstarter.com/projects/dvorahivy/...,Fashion,Fashion,"Miami, FL",failed,6000.0,41.0,0.006833,3,"Fri, 17 Feb 2012 06:20:16 -0000",9,"$1,$25,$50,$100,$150,$200,$500,$2,000,$5,000",3,0,45.0,Miami,FL
184,9291808,Year of the Bunny - a feature film. Help find ...,http://www.kickstarter.com/projects/2025837335...,Film & Video,Narrative Film,"Hong Kong, Hong Kong",failed,75000.0,1055.0,0.014067,20,"Mon, 27 Jun 2011 17:19:23 -0000",9,"$1,$10,$25,$50,$100,$250,$1,000,$5,000,$10,000",2,4,60.0,Hong Kong,Hong Kong
14562,690118578,NeverLanding the Series: The Complete First Se...,http://www.kickstarter.com/projects/1359388650...,Film &amp; Video,Film &amp; Video,"New Brunswick, NJ",successful,2000.0,2435.0,1.2175,36,"Sat, 09 Apr 2011 21:00:00 -0000",7,"$1,$10,$25,$50,$100,$500,$1,000",2,0,40.19,New Brunswick,NJ
13602,646596755,"The Birth Write : A collection of poems, quote...",http://www.kickstarter.com/projects/1097999862...,Publishing,Poetry,"Burbank, CA",live,545.0,0.0,0.0,0,"Mon, 23 Jul 2012 21:49:50 -0000",2,"$10,$25",0,0,60.0,Burbank,CA
24248,1138666715,Hail Columbia: Behind the Scenes with the Spac...,http://www.kickstarter.com/projects/1129748399...,Publishing,Publishing,"Houston, TX",successful,3500.0,4463.0,1.275143,70,"Thu, 03 Mar 2011 03:36:32 -0000",6,"$1,$15,$25,$50,$102,$250",6,0,30.0,Houston,TX


## Clean Data

- Location NaN was changed to Unknown (about 1322 - too big of a sample to drop)
- Split city and state/country in location

In [39]:
kick[["city","origin"]] = kick["location"].str.split(', ', n=1, expand=True)
kick.origin.replace(to_replace="Argent",value="Argentina",inplace=True)
kick.origin.replace(to_replace="Mt",value="MT",inplace=True)
kick.origin.replace(to_replace="Dominican Re",value="Dominican Republic",inplace=True)
kick.origin.replace(to_replace="Kyoto, Japan",value="Japan",inplace=True)
kick.origin.replace(to_replace="Nakagyo Ward",value="Kyoto",inplace=True)
kick.origin.replace(to_replace="Kamigyo Ward",value="Kyoto",inplace=True)
kick.origin.replace(to_replace="Scotland, United Kingdom", value="Scotland", inplace=True)
kick.origin.replace(to_replace="Middleburg, MD", value="MD", inplace=True)

In [40]:
kick.head()

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,city,origin
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,66,"Fri, 19 Aug 2011 19:28:17 -0000",7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.0,Columbia,MO
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005,2,"Mon, 02 Aug 2010 03:59:00 -0000",5,"$1,$5,$10,$25,$50",6,0,47.18,Maplewood,NJ
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.0028,3,"Fri, 08 Jun 2012 00:00:31 -0000",10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.0,Los Angeles,CA
3,237090,GETTING OVER - One son's search to finally kno...,http://www.kickstarter.com/projects/charnick/g...,Film & Video,Documentary,"Los Angeles, CA",successful,6000.0,6535.0,1.089167,100,"Sun, 08 Apr 2012 02:14:00 -0000",13,"$1,$10,$25,$30,$50,$75,$85,$100,$110,$250,$500...",4,0,32.22,Los Angeles,CA
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,http://www.kickstarter.com/projects/flyegrlroy...,Fashion,Fashion,"Novi, MI",failed,3500.0,0.0,0.0,0,"Wed, 01 Jun 2011 15:25:39 -0000",6,"$10,$25,$50,$100,$150,$250",2,0,30.0,Novi,MI


In [41]:
kick.isnull().sum()

project id              0
name                    0
url                     0
category                0
subcategory             0
location             1322
status                  0
goal                    0
pledged                12
funded percentage       0
backers                 0
funded date             0
levels                  0
reward levels          59
updates                 0
comments                0
duration                0
city                 1322
origin               1323
dtype: int64

In [43]:
kick.origin = kick.origin.replace(np.nan, "Unknown")

In [44]:
kick[kick.origin=="Unknown"].count()

project id           1323
name                 1323
url                  1323
category             1323
subcategory          1323
location                1
status               1323
goal                 1323
pledged              1323
funded percentage    1323
backers              1323
funded date          1323
levels               1323
reward levels        1307
updates              1323
comments             1323
duration             1323
city                    1
origin               1323
dtype: int64

In [45]:
sorted(list(kick.origin.unique()))

['AK',
 'AL',
 'AR',
 'AZ',
 'Afghanistan',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Bahamas',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Bermuda',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Brazil',
 'Bulgaria',
 'Burkina Faso',
 'CA',
 'CO',
 'CT',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chile',
 'China',
 'Colombia',
 'Congo',
 'Costa Rica',
 'Cuba',
 'Cyprus',
 'Czech Republic',
 'DC',
 'DE',
 'Denmark',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Estonia',
 'Ethiopia',
 'FL',
 'Falkland Islands',
 'Finland',
 'France',
 'GA',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Greenland',
 'Guam',
 'Guatemala',
 'Guinea',
 'HI',
 'Haiti',
 'Honduras',
 'Hong Kong',
 'Hungary',
 'IA',
 'ID',
 'IL',
 'IN',
 'Iceland',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Ireland',
 'Isle of Man',
 'Israel',
 'Italy',
 'Jamaica',
 'Japan',
 'Jordan',
 'KS',
 'KY',
 'Kazakhst

- Fix Scotland
- Middleburg, MD
- Regex for US States, so origin becomes United States.

In [46]:
us_only = kick[kick.origin.str.contains(r'[A-Z]{2}')]

In [47]:
kick["state"] = np.where(kick.origin.str.contains(r'[A-Z]{2}')==True,kick.origin,"International")

In [48]:
kick.origin = np.where(kick.origin.str.contains(r'[A-Z]{2}')==True,"United States",kick.origin)

In [49]:
kick

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,city,origin,state
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,66,"Fri, 19 Aug 2011 19:28:17 -0000",7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.00,Columbia,United States,MO
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005000,2,"Mon, 02 Aug 2010 03:59:00 -0000",5,"$1,$5,$10,$25,$50",6,0,47.18,Maplewood,United States,NJ
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.002800,3,"Fri, 08 Jun 2012 00:00:31 -0000",10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.00,Los Angeles,United States,CA
3,237090,GETTING OVER - One son's search to finally kno...,http://www.kickstarter.com/projects/charnick/g...,Film & Video,Documentary,"Los Angeles, CA",successful,6000.0,6535.0,1.089167,100,"Sun, 08 Apr 2012 02:14:00 -0000",13,"$1,$10,$25,$30,$50,$75,$85,$100,$110,$250,$500...",4,0,32.22,Los Angeles,United States,CA
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,http://www.kickstarter.com/projects/flyegrlroy...,Fashion,Fashion,"Novi, MI",failed,3500.0,0.0,0.000000,0,"Wed, 01 Jun 2011 15:25:39 -0000",6,"$10,$25,$50,$100,$150,$250",2,0,30.00,Novi,United States,MI
5,316217,Dinner Party - a short film about friendship.....,http://www.kickstarter.com/projects/249354515/...,Film & Video,Short Film,"Portland, OR",successful,3500.0,3582.0,1.023331,39,"Wed, 22 Jun 2011 13:33:00 -0000",7,"$5,$25,$50,$100,$250,$500,$1,000",8,0,21.43,Portland,United States,OR
6,325034,Mezzo,http://www.kickstarter.com/projects/geoffsaysh...,Film & Video,Short Film,"Collegedale, TN",failed,1000.0,280.0,0.280000,8,"Sat, 18 Feb 2012 02:17:08 -0000",5,"$5,$10,$25,$50,$100",0,0,30.00,Collegedale,United States,TN
7,407836,Help APORTA continue to make handwoven/knit ac...,http://www.kickstarter.com/projects/1078097864...,Fashion,Fashion,"Chicago, IL",successful,2000.0,2180.0,1.090000,46,"Fri, 30 Dec 2011 04:36:53 -0000",7,"$10,$20,$50,$100,$250,$500,$1,000",13,5,30.00,Chicago,United States,IL
8,436325,Music - Comedy - Album!,http://www.kickstarter.com/projects/mattgriffo...,Music,Music,"Chicago, IL",successful,1000.0,1125.0,1.125000,30,"Sun, 18 Apr 2010 04:59:00 -0000",12,"$5,$8,$10,$15,$20,$30,$50,$100,$120,$250,$500,...",10,1,67.53,Chicago,United States,IL
9,610918,The Apocalypse Calendar,http://www.kickstarter.com/projects/tqvinn/the...,Art,Illustration,"Chicago, IL",successful,7500.0,9836.0,1.311527,255,"Tue, 01 Nov 2011 04:59:00 -0000",10,"$1,$20,$35,$50,$60,$100,$110,$500,$1,000,$1,500",6,5,35.29,Chicago,United States,IL


- Slice data frame to relevant columns only, that is:
  - project id
  - name
  - category
  - subcategory
  - status
  - goal
  - pledged
  - backers
  - funded data
  - duration
  - city
  - origin
  - state

In [51]:
kick = kick[["project id", "name","category","subcategory","status","goal","pledged","backers","funded date","duration","city","origin","state"]]

Download data frame to a .csv

In [53]:
# kick.to_csv("preprocessed_kick.csv",index=False)

## Exploratory Data Analysis

In [58]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

In [56]:
kick = pd.read_csv("preprocessed_kick.csv")

In [57]:
kick.head()

Unnamed: 0,project id,name,category,subcategory,status,goal,pledged,backers,funded date,duration,city,origin,state
0,39409,WHILE THE TREES SLEEP,Film & Video,Short Film,successful,10500.0,11545.0,66,"Fri, 19 Aug 2011 19:28:17 -0000",30.0,Columbia,United States,MO
1,126581,Educational Online Trading Card Game,Games,Board & Card Games,failed,4000.0,20.0,2,"Mon, 02 Aug 2010 03:59:00 -0000",47.18,Maplewood,United States,NJ
2,138119,STRUM,Film & Video,Animation,live,20000.0,56.0,3,"Fri, 08 Jun 2012 00:00:31 -0000",28.0,Los Angeles,United States,CA
3,237090,GETTING OVER - One son's search to finally kno...,Film & Video,Documentary,successful,6000.0,6535.0,100,"Sun, 08 Apr 2012 02:14:00 -0000",32.22,Los Angeles,United States,CA
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,Fashion,Fashion,failed,3500.0,0.0,0,"Wed, 01 Jun 2011 15:25:39 -0000",30.0,Novi,United States,MI


Questions:
1. 