**Plan**

Question
1. Which model can best predict whether or not you will get a Kickstarter funding?
2. Which "live" projects will likely be funded?

Steps:
1. Acquire and Prep
2. Explore
    - Which features are the strongest predictors of funding? (Feature selection)
    - Out of the ones that received funding, which ones are over funded and underfunded?
3. Model
4. Evaluate

### Packages

In [1]:
import wrangle
import pandas as pd

## Acquire

Read csv file from local repo using `get_kickstarter` function from `wrangle.py`.

In [54]:
def get_kickstarter():
    df = pd.read_csv("/Users/cris/codeup-data-science/kickstarter_project/kickstarter.csv")
    return df

In [75]:
org = pd.read_csv("/Users/cris/codeup-data-science/kickstarter_project/countryContinent.csv",encoding="latin-1")

In [76]:
states = pd.read_csv("/Users/cris/codeup-data-science/kickstarter_project/us_states.csv",encoding="latin-1")

In [77]:
org.head()

Unnamed: 0,country,code_2,code_3,country_code,iso_3166_2,continent,sub_region,region_code,sub_region_code
0,Afghanistan,AF,AFG,4,ISO 3166-2:AF,Asia,Southern Asia,142.0,34.0
1,Åland Islands,AX,ALA,248,ISO 3166-2:AX,Europe,Northern Europe,150.0,154.0
2,Albania,AL,ALB,8,ISO 3166-2:AL,Europe,Southern Europe,150.0,39.0
3,Algeria,DZ,DZA,12,ISO 3166-2:DZ,Africa,Northern Africa,2.0,15.0
4,American Samoa,AS,ASM,16,ISO 3166-2:AS,Oceania,Polynesia,9.0,61.0


In [78]:
org = org.rename(columns={"country":"origin"})

In [79]:
org = org[["origin","continent","sub_region"]]

In [80]:
org

Unnamed: 0,origin,continent,sub_region
0,Afghanistan,Asia,Southern Asia
1,Åland Islands,Europe,Northern Europe
2,Albania,Europe,Southern Europe
3,Algeria,Africa,Northern Africa
4,American Samoa,Oceania,Polynesia
5,Andorra,Europe,Southern Europe
6,Angola,Africa,Middle Africa
7,Anguilla,Americas,Caribbean
8,Antarctica,,
9,Antigua and Barbuda,Americas,Caribbean


In [83]:
states = states[["State","Code"]]
states.head()

Unnamed: 0,State,Code
0,Alabama,AL
1,Alaska,AK
2,Arizona,AZ
3,Arkansas,AR
4,California,CA


In [55]:
kick = get_kickstarter()

**Data Cleaning**

- Kept these columns "name","category","subcategory","location","status","goal","funded percentage","backers","duration"
- Location NaN was changed to Unknown (about 1322 - too big of a sample to drop)
- Split city and state/country in location

In [15]:
kick[kick.location.isnull()]

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration
11,727286,Offline Wikipedia iPhone app,http://www.kickstarter.com/projects/dphiffer/o...,Technology,Open Software,,successful,99.0,145.0,1.464646,25,"Tue, 14 Jul 2009 06:59:59 -0000",1,$1,5,19,79.64
14,893085,Esperanza Farm: A Novel,http://www.kickstarter.com/projects/JesusMaria...,Publishing,Fiction,,failed,6500.0,765.0,0.117692,20,"Fri, 16 Jul 2010 03:59:00 -0000",7,"$5,$10,$25,$50,$75,$150,$300",10,0,60.34
49,2442649,Dream with Me--A documentary about one year in...,http://www.kickstarter.com/projects/1800556280...,Film & Video,Documentary,,successful,5000.0,5615.0,1.123000,90,"Fri, 16 Oct 2009 01:08:00 -0000",8,"$10,$25,$50,$100,$500,$1,000,$2,500,$5,000",6,7,30.89
98,4719237,Indie Music Website (iTunes for Indie Musicians),http://www.kickstarter.com/projects/binarykoll...,Music,Indie Rock,,failed,10000.0,10.0,0.001000,1,"Thu, 01 Oct 2009 06:19:00 -0000",3,"$1,$20,$100",0,0,29.98
100,4732285,Insiders/Out: Exploring Outsider Art in America,http://www.kickstarter.com/projects/insidersou...,Art,Art,,successful,2000.0,2000.0,0.999830,30,"Thu, 08 Jul 2010 21:39:00 -0000",13,"$1,$5,$10,$15,$20,$25,$35,$50,$75,$100,$150,$2...",4,1,42.97
104,4948516,STRANGE POSITIONING SYSTEMS (SPS) - A global l...,http://www.kickstarter.com/projects/sps/strang...,Art,Performance Art,,failed,8500.0,695.0,0.081765,15,"Fri, 20 Aug 2010 21:37:00 -0000",10,"$10,$25,$50,$75,$100,$250,$500,$600,$1,000,$1,500",3,1,42.19
120,5785733,Faith - a film by Eli Daughdrill,http://www.kickstarter.com/projects/771183968/...,Film & Video,Film &amp; Video,,failed,50000.0,7890.0,0.157800,25,"Fri, 16 Apr 2010 05:48:00 -0000",8,"$10,$25,$50,$100,$250,$500,$1,000,$5,000",8,2,74.39
151,7808555,Fabric Fiction: A collaborative literary exper...,http://www.kickstarter.com/projects/1930061782...,Publishing,Fiction,,failed,3000.0,582.0,0.194000,13,"Fri, 04 Jun 2010 16:30:00 -0000",10,"$1,$5,$10,$12,$20,$25,$40,$50,$75,$100",7,0,89.94
176,9094253,"The People, Places &amp; Patterns Project",http://www.kickstarter.com/projects/pwc/the-pe...,Photography,Photography,,failed,10000.0,3078.0,0.307752,43,"Tue, 16 Feb 2010 19:11:00 -0000",9,"$30,$60,$80,$100,$150,$250,$500,$1,000,$2,000",11,6,61.67
186,9316432,Produce JigGsaw's never before heard &quot;The...,http://www.kickstarter.com/projects/jiggsaw/pr...,Music,Rock,,failed,700.0,25.0,0.035714,1,"Mon, 07 Jun 2010 04:00:00 -0000",6,"$15,$25,$50,$100,$500,$1,000",0,0,89.78


In [16]:
kick.isnull().sum()

project id              0
name                    0
url                     0
category                0
subcategory             0
location             1322
status                  0
goal                    0
pledged                12
funded percentage       0
backers                 0
funded date             0
levels                  0
reward levels          59
updates                 0
comments                0
duration                0
dtype: int64

In [27]:
kick[["city","origin"]] = kick["location"].str.split(', ', n=1, expand=True)
# df[["city","state-country"]] = df["location"].str.split(',', n=1, expand=True)
#     df = df.drop(["location"],axis=1)
#     return df

In [52]:
list(kick.origin.unique())

['MO',
 'NJ',
 'CA',
 'MI',
 'OR',
 'TN',
 'IL',
 nan,
 'NY',
 'DC',
 'NE',
 'ID',
 'FL',
 'TX',
 'CO',
 'ME',
 'Taiwan',
 'IN',
 'OH',
 'Norway',
 'MA',
 'MN',
 'PA',
 'NC',
 'WV',
 'CT',
 'Chile',
 'MD',
 'HI',
 'VA',
 'WA',
 'AZ',
 'OK',
 'NV',
 'Haiti',
 'GA',
 'AL',
 'UT',
 'Canada',
 'LA',
 'SC',
 'Ecuador',
 'WI',
 'Jamaica',
 'Argentina',
 'Hong Kong',
 'Germany',
 'NM',
 'Guatemala',
 'NH',
 'IA',
 'WY',
 'Australia',
 'RI',
 'Sweden',
 'France',
 'DE',
 'South Africa',
 'AK',
 'Nepal',
 'MT',
 'KY',
 'VT',
 'Kenya',
 'Bosnia and Herzegovina',
 'Iceland',
 'Mexico',
 'KS',
 'Hungary',
 'Indonesia',
 'China',
 'SD',
 'Cuba',
 'Peru',
 'Italy',
 'Netherlands',
 'Singapore',
 'Ethiopia',
 'New Zealand',
 'United Kingdom',
 'Austria',
 'Turkey',
 'AR',
 'Mt',
 'Congo',
 'Colombia',
 'India',
 'Mongolia',
 'MS',
 'Israel',
 'Dominica',
 'Spain',
 'Finland',
 'Czech Republic',
 'Japan',
 'Virgin Islands, U.S.',
 'Lebanon',
 'Armenia',
 'Portugal',
 'Qatar',
 'Morocco',
 'Martinique'

In [28]:
kick.groupby("origin").count()

Unnamed: 0_level_0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration,extra,city
origin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
AK,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110,110
AL,166,166,166,166,166,166,166,166,166,166,166,166,166,166,166,166,166,166,166
AR,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106
AZ,649,649,649,649,649,649,649,649,649,649,649,649,649,649,649,649,649,649,649
Afghanistan,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16
Argent,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
Argentina,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29,29
Armenia,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
Australia,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57,57
Austria,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19


Unusual Origins:  
- None
- Argent?
- Mt >> Montana
- nan
- Kyoto, Japan (as origin, drop Kyoto)
- Dominican Re, Dominican Republic
- 

**TASKS**
- Change Mt to Montana (36 entries)
- Change Argent to Argentina

In [84]:
org.head()

Unnamed: 0,origin,continent,sub_region
0,Afghanistan,Asia,Southern Asia
1,Åland Islands,Europe,Northern Europe
2,Albania,Europe,Southern Europe
3,Algeria,Africa,Northern Africa
4,American Samoa,Oceania,Polynesia


In [86]:
kick.join(org, on="origin")

KeyError: 'origin'