# Capstone Project

## Part 1: Topic Proposal

### Creating a Popular Board Game

#### Problem Statement

Can a recommender program recommend what characteristics would lead to a successful board game on Kickstarter?

#### Description of goals

In this analysis, the goal would be to use a recommender program to create a game that will result in a successful kickstarter program.

#### Criteria for success

This will be a successful project if I am able to first, determine whether I can predict what board games are the most popular on Kickstarter based on certain criteria and then second, create a game that becomes popular on Kickstarter.

#### Audience

This project's audience will be entrepreneurs and game enthusiasts.

#### Potential datasets

1. There is a website that has scraped several years of Kickstarter data
2. I will likely scrape from multiple websites the most popular mechanics for games.

Goal:

Describe your proposed problem statement and approach, summarize your initial EDA/data collection, and perform your initial EDA in a Jupyter notebook on *your personal GitHub* with a link submitted to your instructors on Google Classroom. 

Overview:

In this section you will update us on your project, including the project you have chosen, your problem statement, an extensive outline of EDA and modeling to date, the goal of your predictive model, and the data you will use to explore that model.

Your data must be fully in hand by this point OR you must have a solid, achievable plan to do so that has been communicated to your instructors.

Requirements:

We expect a formatted and complete Jupyter notebook hosted on your personal (*not* GA) GitHub by EOD on Monday June 22, 2020, which accomplishes the following:

- Identifies which of the three proposals you outlined in your lightning talk you have chosen
- Articulates the main goal of your project (your problem statement)
- Outlines your proposed methods and models
- Defines the risks & assumptions of your data
- Revises initial goals & success criteria, as needed
- Documents your data source
- Performs & summarizes preliminary EDA of your data

In [1]:
import os
import glob
import pandas as pd
os.chdir("./datasets/kickstarter_data/")

In [2]:
## credit: https://www.freecodecamp.org/news/how-to-combine-multiple-csv-files-with-8-lines-of-code-265183e0854/
# extension = 'csv'
# all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

# #combine all files in the list
# combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
# #export to csv
# combined_csv.to_csv( "combined.csv", index=False, encoding='utf-8-sig')

In [17]:
df = pd.read_csv('combined.csv')
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,country_displayable_name,created_at,creator,currency,currency_symbol,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,1,we are going Production herbal teabag of plan...,"{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",19,AU,Australia,1441269202,"{""id"":1555219532,""name"":""ehsan"",""is_registered...",AUD,$,...,production-herbal-teabag-of-plants-native-to-iran,https://www.kickstarter.com/discover/categorie...,False,False,failed,1444141184,0.691164,"{""web"":{""project"":""https://www.kickstarter.com...",18.66144,domestic
1,637,Two agents battle each other in another dimens...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",16233,US,the United States,1576048498,"{""id"":99575233,""name"":""David Gerrard"",""is_regi...",USD,$,...,slip-strike-0,https://www.kickstarter.com/discover/categorie...,True,False,successful,1583987400,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",16233.0,domestic
2,50,A collection of Hard Enamel pins inspired by T...,"{""id"":262,""name"":""Accessories"",""slug"":""fashion...",983,CA,Canada,1560821709,"{""id"":1855173855,""name"":""Caitlin Peters"",""slug...",CAD,$,...,tattoo-shop-flash,https://www.kickstarter.com/discover/categorie...,True,False,successful,1564165825,0.7629,"{""web"":{""project"":""https://www.kickstarter.com...",987.4137,domestic
3,8,"Low carb, no sugar sauces and marinades using ...","{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",361,US,the United States,1563139848,"{""id"":1148188586,""name"":""Ian"",""slug"":""penningt...",USD,$,...,penningtons-keto-sauces-and-marinades,https://www.kickstarter.com/discover/categorie...,False,False,failed,1569530544,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",361.0,domestic
4,6452,The everyday bag fused with Parisian chic and ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1385803,US,the United States,1561364892,"{""id"":1085606247,""name"":""Laflore"",""slug"":""bobo...",USD,$,...,bobobark-designed-for-women-made-for-life,https://www.kickstarter.com/discover/categorie...,True,False,successful,1568408340,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1385803.0,domestic


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217433 entries, 0 to 217432
Data columns (total 38 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             217433 non-null  int64  
 1   blurb                     217425 non-null  object 
 2   category                  217433 non-null  object 
 3   converted_pledged_amount  217433 non-null  int64  
 4   country                   217433 non-null  object 
 5   country_displayable_name  217433 non-null  object 
 6   created_at                217433 non-null  int64  
 7   creator                   217433 non-null  object 
 8   currency                  217433 non-null  object 
 9   currency_symbol           217433 non-null  object 
 10  currency_trailing_code    217433 non-null  bool   
 11  current_currency          217433 non-null  object 
 12  deadline                  217433 non-null  int64  
 13  disable_communication     217433 non-null  b

In [19]:
df.category.str.split(',')

0         [{"id":313, "name":"Small Batch", "slug":"food...
1         [{"id":34, "name":"Tabletop Games", "slug":"ga...
2         [{"id":262, "name":"Accessories", "slug":"fash...
3         [{"id":313, "name":"Small Batch", "slug":"food...
4         [{"id":28, "name":"Product Design", "slug":"de...
                                ...                        
217428    [{"id":13, "name":"Journalism", "slug":"journa...
217429    [{"id":277, "name":"Nature", "slug":"photograp...
217430    [{"id":52, "name":"Hardware", "slug":"technolo...
217431    [{"id":307, "name":"Drinks", "slug":"food/drin...
217432    [{"id":258, "name":"Architecture", "slug":"des...
Name: category, Length: 217433, dtype: object

In [29]:
df.category.str.split(':')

0         [{"id", 313,"name", "Small Batch","slug", "foo...
1         [{"id", 34,"name", "Tabletop Games","slug", "g...
2         [{"id", 262,"name", "Accessories","slug", "fas...
3         [{"id", 313,"name", "Small Batch","slug", "foo...
4         [{"id", 28,"name", "Product Design","slug", "d...
                                ...                        
217428    [{"id", 13,"name", "Journalism","slug", "journ...
217429    [{"id", 277,"name", "Nature","slug", "photogra...
217430    [{"id", 52,"name", "Hardware","slug", "technol...
217431    [{"id", 307,"name", "Drinks","slug", "food/dri...
217432    [{"id", 258,"name", "Architecture","slug", "de...
Name: category, Length: 217433, dtype: object

In [20]:
df.corr()

Unnamed: 0,backers_count,converted_pledged_amount,created_at,currency_trailing_code,deadline,disable_communication,fx_rate,goal,id,is_starrable,launched_at,pledged,spotlight,staff_pick,state_changed_at,static_usd_rate,usd_pledged
backers_count,1.0,0.796269,0.04416,0.011253,0.048626,,-0.001494,0.011092,-0.001067,0.00591,0.048571,0.202854,0.115542,0.147338,0.048835,-0.007335,0.796088
converted_pledged_amount,0.796269,1.0,0.038041,0.013555,0.041907,,0.000234,0.010006,-0.001896,0.003982,0.041651,0.203503,0.08971,0.124877,0.042076,-0.006712,0.999935
created_at,0.04416,0.038041,1.0,-0.166883,0.986535,,-0.090553,0.002338,-0.001614,0.22517,0.98669,0.01811,0.028742,-0.030163,0.986496,-0.123077,0.038013
currency_trailing_code,0.011253,0.013555,-0.166883,1.0,-0.163362,,-0.391075,-0.002633,0.0053,-0.044349,-0.163263,-0.012693,0.023707,0.009232,-0.16326,-0.543039,0.013484
deadline,0.048626,0.041907,0.986535,-0.163362,1.0,,-0.089952,0.003031,-0.001116,0.229351,0.999896,0.019075,0.030754,-0.02347,0.999956,-0.124356,0.041884
disable_communication,,,,,,,,,,,,,,,,,
fx_rate,-0.001494,0.000234,-0.090553,-0.391075,-0.089952,,1.0,-0.038095,-0.001354,-0.004373,-0.089432,-0.053011,0.019383,0.000156,-0.089965,0.902261,-0.000738
goal,0.011092,0.010006,0.002338,-0.002633,0.003031,,-0.038095,1.0,0.001349,0.000132,0.00259,0.068371,-0.035551,-0.004182,0.002864,-0.036774,0.010051
id,-0.001067,-0.001896,-0.001614,0.0053,-0.001116,,-0.001354,0.001349,1.0,-0.005013,-0.001117,-0.001441,0.001608,0.003511,-0.001082,-0.001952,-0.001909
is_starrable,0.00591,0.003982,0.22517,-0.044349,0.229351,,-0.004373,0.000132,-0.005013,1.0,0.228254,0.001531,-0.181931,-0.021474,0.22294,-0.041101,0.003583


In [21]:
df.state.value_counts()

successful    127093
failed         76260
canceled        9029
live            5051
Name: state, dtype: int64

In [22]:
df.state = df.state.astype('category')

In [23]:
df.state

0             failed
1         successful
2         successful
3             failed
4         successful
             ...    
217428    successful
217429        failed
217430        failed
217431        failed
217432        failed
Name: state, Length: 217433, dtype: category
Categories (4, object): [canceled, failed, live, successful]

In [24]:
df.dtypes

backers_count                  int64
blurb                         object
category                      object
converted_pledged_amount       int64
country                       object
country_displayable_name      object
created_at                     int64
creator                       object
currency                      object
currency_symbol               object
currency_trailing_code          bool
current_currency              object
deadline                       int64
disable_communication           bool
friends                       object
fx_rate                      float64
goal                         float64
id                             int64
is_backing                    object
is_starrable                    bool
is_starred                    object
launched_at                    int64
location                      object
name                          object
permissions                   object
photo                         object
pledged                      float64
p

In [25]:
df.state.value_counts()

successful    127093
failed         76260
canceled        9029
live            5051
Name: state, dtype: int64

In [28]:
df.category = df.category.str.replace('"','').split('\n')

AttributeError: 'Series' object has no attribute 'split'