**Plan**

Question
1. Which model can best predict whether or not you will get a Kickstarter funding?
2. Which "live" projects will likely be funded?

Steps:
1. Acquire and Prep
2. Explore
    - Which features are the strongest predictors of funding? (Feature selection)
    - Out of the ones that received funding, which ones are over funded and underfunded?
3. Model
4. Evaluate

### Packages

In [1]:
import wrangle
import pandas as pd
import numpy as np

## Acquire

Read csv file from local repo using `get_kickstarter` function from the wrangle.py module.

In [2]:
kick = wrangle.get_kickstarter()

`recode_locations` from wrangle.py splits the `location` column into `city` and `origin` and re-encodes erratic location values like "Argent" to "Argentina" and fixes other location to match the standard form, like "Kyoto, Japan" to "Japan" for international locations.

In [3]:
kick = wrangle.recode_locations(kick)

In [6]:
kick.sample(20)

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,city,origin
10251,484453422,"Viva Arte em Goi��΢nia, Brasil",http://www.kickstarter.com/projects/799918872/...,Art,Sculpture,"Goi��΢nia, Brazil",successful,1200.0,1440.0,1.2,29,"Tue, 11 Oct 2011 04:32:14 -0000",4,"$15,$25,$50,$100",17,0,30.0,Goi��΢nia,
33420,1571949865,GAYBY - The Feature Film!,http://www.kickstarter.com/projects/jonnynyc/g...,Film &amp; Video,Narrative Film,"New York, NY",successful,15000.0,16209.0,1.0806,134,"Sun, 11 Sep 2011 03:59:00 -0000",11,"$10,$25,$50,$100,$350,$350,$350,$500,$500,$1,0...",6,2,54.31,New York,
26222,1234313946,Painting Wild Horses from Life,http://www.kickstarter.com/projects/735205269/...,Art,Painting,"Apache Junction, AZ",successful,2000.0,2226.0,1.113,39,"Mon, 30 Apr 2012 16:31:34 -0000",11,"$1,$10,$25,$45,$75,$85,$100,$200,$300,$500,$1,000",15,2,25.0,Apache Junction,
3121,147117174,THREE ASTRONAUTS The Opera,http://www.kickstarter.com/projects/familyoper...,Theater,Theater,"New York, NY",failed,15000.0,2063.0,0.137533,25,"Wed, 18 Jan 2012 00:34:45 -0000",13,"$1,$10,$25,$50,$75,$100,$150,$250,$500,$750,$1...",7,3,60.0,New York,
25444,1196158444,ARI's Music Campaign,http://www.kickstarter.com/projects/1213756254...,Music,Pop,"New York, NY",live,10000.0,60.0,0.006,2,"Fri, 29 Jun 2012 15:59:00 -0000",7,"$10,$15,$25,$50,$65,$100,$150",0,0,51.09,New York,
13233,629398151,4th Wave Ska Album,http://www.kickstarter.com/projects/brose/4th-...,Music,Music,"Topeka, KS",failed,3000.0,50.0,0.016667,1,"Mon, 08 Aug 2011 01:41:22 -0000",3,"$5,$10,$20",3,0,60.0,Topeka,
4946,235049360,&quot;Hotel Velow&quot; The next Polar Eye EP,http://www.kickstarter.com/projects/polareye/h...,Music,Indie Rock,"Boston, MA",successful,2000.0,2000.0,1.0,18,"Wed, 12 Jan 2011 04:11:00 -0000",6,"$10,$25,$50,$75,$150,$250",3,0,55.36,Boston,
33806,1589850904,Glen Rock Fae - A documentary about the fairy ...,http://www.kickstarter.com/projects/61152447/g...,Film &amp; Video,Documentary,"Carlisle, PA",failed,2000.0,205.0,0.1025,5,"Thu, 17 Mar 2011 22:00:00 -0000",6,"$1,$25,$40,$50,$65,$2,500",3,0,36.87,Carlisle,
32262,1517846611,Red Light Winter - Portland Premiere,http://www.kickstarter.com/projects/885282698/...,Theater,Theater,"Portland, OR",successful,2500.0,3003.0,1.2012,75,"Mon, 14 May 2012 09:01:00 -0000",8,"$10,$25,$50,$100,$250,$500,$1,000,$5,000",7,3,16.58,Portland,
10617,502139071,Who Will Save The World? A WW1 Zombie epic,http://www.kickstarter.com/projects/darkslinge...,Comics,Comics,"Molalla, OR",live,3000.0,2578.0,0.859333,80,"Wed, 20 Jun 2012 00:48:41 -0000",26,"$1,$3,$5,$8,$15,$20,$25,$30,$35,$45,$50,$50,$5...",15,1,60.0,Molalla,


**Data Cleaning**

- Kept these columns "name","category","subcategory","location","status","goal","funded percentage","backers","duration"
- Location NaN was changed to Unknown (about 1322 - too big of a sample to drop)
- Split city and state/country in location

In [51]:
kick = recode_locations(kick)

In [53]:
kick

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,city,origin
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,66,"Fri, 19 Aug 2011 19:28:17 -0000",7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.00,Columbia,MO
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005000,2,"Mon, 02 Aug 2010 03:59:00 -0000",5,"$1,$5,$10,$25,$50",6,0,47.18,Maplewood,NJ
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.002800,3,"Fri, 08 Jun 2012 00:00:31 -0000",10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.00,Los Angeles,CA
3,237090,GETTING OVER - One son's search to finally kno...,http://www.kickstarter.com/projects/charnick/g...,Film & Video,Documentary,"Los Angeles, CA",successful,6000.0,6535.0,1.089167,100,"Sun, 08 Apr 2012 02:14:00 -0000",13,"$1,$10,$25,$30,$50,$75,$85,$100,$110,$250,$500...",4,0,32.22,Los Angeles,CA
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,http://www.kickstarter.com/projects/flyegrlroy...,Fashion,Fashion,"Novi, MI",failed,3500.0,0.0,0.000000,0,"Wed, 01 Jun 2011 15:25:39 -0000",6,"$10,$25,$50,$100,$150,$250",2,0,30.00,Novi,MI
5,316217,Dinner Party - a short film about friendship.....,http://www.kickstarter.com/projects/249354515/...,Film & Video,Short Film,"Portland, OR",successful,3500.0,3582.0,1.023331,39,"Wed, 22 Jun 2011 13:33:00 -0000",7,"$5,$25,$50,$100,$250,$500,$1,000",8,0,21.43,Portland,OR
6,325034,Mezzo,http://www.kickstarter.com/projects/geoffsaysh...,Film & Video,Short Film,"Collegedale, TN",failed,1000.0,280.0,0.280000,8,"Sat, 18 Feb 2012 02:17:08 -0000",5,"$5,$10,$25,$50,$100",0,0,30.00,Collegedale,TN
7,407836,Help APORTA continue to make handwoven/knit ac...,http://www.kickstarter.com/projects/1078097864...,Fashion,Fashion,"Chicago, IL",successful,2000.0,2180.0,1.090000,46,"Fri, 30 Dec 2011 04:36:53 -0000",7,"$10,$20,$50,$100,$250,$500,$1,000",13,5,30.00,Chicago,IL
8,436325,Music - Comedy - Album!,http://www.kickstarter.com/projects/mattgriffo...,Music,Music,"Chicago, IL",successful,1000.0,1125.0,1.125000,30,"Sun, 18 Apr 2010 04:59:00 -0000",12,"$5,$8,$10,$15,$20,$30,$50,$100,$120,$250,$500,...",10,1,67.53,Chicago,IL
9,610918,The Apocalypse Calendar,http://www.kickstarter.com/projects/tqvinn/the...,Art,Illustration,"Chicago, IL",successful,7500.0,9836.0,1.311527,255,"Tue, 01 Nov 2011 04:59:00 -0000",10,"$1,$20,$35,$50,$60,$100,$110,$500,$1,000,$1,500",6,5,35.29,Chicago,IL


In [52]:
kick[["city","origin"]] = kick["location"].str.split(', ', n=1, expand=True)
kick.origin.replace(to_replace="Argent",value="Argentina",inplace=True)
kick.origin.replace(to_replace="Mt",value="MT",inplace=True)
kick.origin.replace(to_replace="Dominican Re",value="Dominican Republic",inplace=True)
kick.origin.replace(to_replace="Kyoto, Japan",value="Japan",inplace=True)
kick.origin.replace(to_replace="Nakagyo Ward",value="Kyoto",inplace=True)
kick.origin.replace(to_replace="Kamigyo Ward",value="Kyoto",inplace=True)
kick.origin.replace(to_replace="Scotland, United Kingdom", value="Scotland", inplace=True)
kick.origin.replace(to_replace="Middleburg, MD", value="MD", inplace=True)

In [28]:
kick.head()

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,city,origin
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,66,"Fri, 19 Aug 2011 19:28:17 -0000",7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.0,Columbia,MO
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005,2,"Mon, 02 Aug 2010 03:59:00 -0000",5,"$1,$5,$10,$25,$50",6,0,47.18,Maplewood,NJ
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.0028,3,"Fri, 08 Jun 2012 00:00:31 -0000",10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.0,Los Angeles,CA
3,237090,GETTING OVER - One son's search to finally kno...,http://www.kickstarter.com/projects/charnick/g...,Film & Video,Documentary,"Los Angeles, CA",successful,6000.0,6535.0,1.089167,100,"Sun, 08 Apr 2012 02:14:00 -0000",13,"$1,$10,$25,$30,$50,$75,$85,$100,$110,$250,$500...",4,0,32.22,Los Angeles,CA
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,http://www.kickstarter.com/projects/flyegrlroy...,Fashion,Fashion,"Novi, MI",failed,3500.0,0.0,0.0,0,"Wed, 01 Jun 2011 15:25:39 -0000",6,"$10,$25,$50,$100,$150,$250",2,0,30.0,Novi,MI


In [29]:
kick.isnull().sum()

project id              0
name                    0
url                     0
category                0
subcategory             0
location             1322
status                  0
goal                    0
pledged                12
funded percentage       0
backers                 0
funded date             0
levels                  0
reward levels          59
updates                 0
comments                0
duration                0
city                 1322
origin               1323
dtype: int64

In [30]:
# kick[["city","origin"]] = kick["location"].str.split(', ', n=1, expand=True)
# # df[["city","state-country"]] = df["location"].str.split(',', n=1, expand=True)
# #     df = df.drop(["location"],axis=1)
# #     return df

In [31]:
kick.origin = kick.origin.replace(np.nan, "Unknown")

In [32]:
kick[kick.origin=="Unknown"].count()

project id           1323
name                 1323
url                  1323
category             1323
subcategory          1323
location                1
status               1323
goal                 1323
pledged              1323
funded percentage    1323
backers              1323
funded date          1323
levels               1323
reward levels        1307
updates              1323
comments             1323
duration             1323
city                    1
origin               1323
dtype: int64

In [33]:
sorted(list(kick.origin.unique()))

['AK',
 'AL',
 'AR',
 'AZ',
 'Afghanistan',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Bahamas',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Bermuda',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Brazil',
 'Bulgaria',
 'Burkina Faso',
 'CA',
 'CO',
 'CT',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chile',
 'China',
 'Colombia',
 'Congo',
 'Costa Rica',
 'Cuba',
 'Cyprus',
 'Czech Republic',
 'DC',
 'DE',
 'Denmark',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Estonia',
 'Ethiopia',
 'FL',
 'Falkland Islands',
 'Finland',
 'France',
 'GA',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Greenland',
 'Guam',
 'Guatemala',
 'Guinea',
 'HI',
 'Haiti',
 'Honduras',
 'Hong Kong',
 'Hungary',
 'IA',
 'ID',
 'IL',
 'IN',
 'Iceland',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Ireland',
 'Isle of Man',
 'Israel',
 'Italy',
 'Jamaica',
 'Japan',
 'Jordan',
 'KS',
 'KY',
 'Kazakhst

- Fix Scotland
- Middleburg, MD
- Regex for US States, so origin becomes United States.

In [38]:
us_only = kick[kick.origin.str.contains(r'[A-Z]{2}')]

In [48]:
def recode_locations(kick):
    kick[["city","origin"]] = kick["location"].str.split(', ', n=1, expand=True)
    kick.origin.replace(to_replace="Argent",value="Argentina",inplace=True)
    kick.origin.replace(to_replace="Mt",value="MT",inplace=True)
    kick.origin.replace(to_replace="Dominican Re",value="Dominican Republic",inplace=True)
    kick.origin.replace(to_replace="Kyoto, Japan",value="Japan",inplace=True)
    kick.origin.replace(to_replace="Nakagyo Ward",value="Kyoto",inplace=True)
    kick.origin.replace(to_replace="Kamigyo Ward",value="Kyoto",inplace=True)
    kick.origin.replace(to_replace="Scotland, United Kingdom", value="Scotland", inplace=True)
    kick.origin.replace(to_replace="Middleburg, MD", value="MD", inplace=True)
    kick.origin = kick.origin.replace(np.nan, "Unknown",inplace=True)
    return kick

In [40]:
kick["state"] = np.where(kick.origin.str.contains(r'[A-Z]{2}')==True,kick.origin,"International")

In [46]:
kick.origin = np.where(kick.origin.str.contains(r'[A-Z]{2}')==True,"United States",kick.origin)