# Capstone Project

Data came from:
https://webrobots.io/kickstarter-datasets/

https://www.kickstarter.com/help/handbook/funding

Kickstarter provides what is called a creator's handbook for funding. The original objective of this analysis was to determine what leads to successful boardgames. From there the idea was to create a boardgame based on my findings to see if I could create a successful boardgame based on the findings. However, an important first phase of this analysis was to see if I could predict whether or not a project would be successful. So that is what I did here.

## Import Libraries

In [1]:
import os
import glob
import pandas as pd
# os.chdir("./datasets/kickstarter_data/") # uncomment to run initially

import re

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LassoCV, RidgeCV
from sklearn.preprocessing import PolynomialFeatures, PowerTransformer
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict
from sklearn.metrics import r2_score

%matplotlib inline

## Gather Data

## Combine Data

In [3]:
## uncomment to run initially
## credit: https://www.freecodecamp.org/news/how-to-combine-multiple-csv-files-with-8-lines-of-code-265183e0854/
# extension = 'csv'
# all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

# #combine all files in the list
# combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
# #export to csv
# combined_csv.to_csv( "combined.csv", index=False, encoding='utf-8-sig')

## Read in Data

In [4]:
df = pd.read_csv('./datasets/kickstarter_data/combined.csv')

## Exploratory Data Analysis (EDA)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217433 entries, 0 to 217432
Data columns (total 38 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             217433 non-null  int64  
 1   blurb                     217425 non-null  object 
 2   category                  217433 non-null  object 
 3   converted_pledged_amount  217433 non-null  int64  
 4   country                   217433 non-null  object 
 5   country_displayable_name  217433 non-null  object 
 6   created_at                217433 non-null  int64  
 7   creator                   217433 non-null  object 
 8   currency                  217433 non-null  object 
 9   currency_symbol           217433 non-null  object 
 10  currency_trailing_code    217433 non-null  bool   
 11  current_currency          217433 non-null  object 
 12  deadline                  217433 non-null  int64  
 13  disable_communication     217433 non-null  b

In [6]:
missing_values= df.isnull().sum()
missing_values/len(df)
missing_values.sort_values(ascending=True)

backers_count                    0
source_url                       0
slug                             0
profile                          0
pledged                          0
photo                            0
state                            0
name                             0
state_changed_at                 0
launched_at                      0
static_usd_rate                  0
is_starrable                     0
usd_pledged                      0
id                               0
goal                             0
fx_rate                          0
urls                             0
disable_communication            0
deadline                         0
current_currency                 0
currency_trailing_code           0
currency_symbol                  0
currency                         0
creator                          0
created_at                       0
country_displayable_name         0
country                          0
converted_pledged_amount         0
category            

In [7]:
# drop these features due to having a significant number of missing values
df.drop([
    'friends',
    'is_backing',
    'is_starred',
    'permissions'
], axis=1, inplace=True)

In [8]:
missing_values= df.isnull().sum()
missing_values/len(df)

backers_count               0.000000
blurb                       0.000037
category                    0.000000
converted_pledged_amount    0.000000
country                     0.000000
country_displayable_name    0.000000
created_at                  0.000000
creator                     0.000000
currency                    0.000000
currency_symbol             0.000000
currency_trailing_code      0.000000
current_currency            0.000000
deadline                    0.000000
disable_communication       0.000000
fx_rate                     0.000000
goal                        0.000000
id                          0.000000
is_starrable                0.000000
launched_at                 0.000000
location                    0.000989
name                        0.000000
photo                       0.000000
pledged                     0.000000
profile                     0.000000
slug                        0.000000
source_url                  0.000000
spotlight                   0.000000
s

In [9]:
# eliminate remaining missing values
df.dropna(inplace=True)

In [10]:
missing_values= df.isnull().sum()
missing_values/len(df)

backers_count               0.0
blurb                       0.0
category                    0.0
converted_pledged_amount    0.0
country                     0.0
country_displayable_name    0.0
created_at                  0.0
creator                     0.0
currency                    0.0
currency_symbol             0.0
currency_trailing_code      0.0
current_currency            0.0
deadline                    0.0
disable_communication       0.0
fx_rate                     0.0
goal                        0.0
id                          0.0
is_starrable                0.0
launched_at                 0.0
location                    0.0
name                        0.0
photo                       0.0
pledged                     0.0
profile                     0.0
slug                        0.0
source_url                  0.0
spotlight                   0.0
staff_pick                  0.0
state                       0.0
state_changed_at            0.0
static_usd_rate             0.0
urls    

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 217006 entries, 0 to 217432
Data columns (total 34 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             217006 non-null  int64  
 1   blurb                     217006 non-null  object 
 2   category                  217006 non-null  object 
 3   converted_pledged_amount  217006 non-null  int64  
 4   country                   217006 non-null  object 
 5   country_displayable_name  217006 non-null  object 
 6   created_at                217006 non-null  int64  
 7   creator                   217006 non-null  object 
 8   currency                  217006 non-null  object 
 9   currency_symbol           217006 non-null  object 
 10  currency_trailing_code    217006 non-null  bool   
 11  current_currency          217006 non-null  object 
 12  deadline                  217006 non-null  int64  
 13  disable_communication     217006 non-null  b

In [12]:
df.created_at

0         1441269202
1         1576048498
2         1560821709
3         1563139848
4         1561364892
             ...    
217428    1315249915
217429    1422761841
217430    1434634988
217431    1521649603
217432    1433781463
Name: created_at, Length: 217006, dtype: int64

In [13]:
# df['created_at'] = pd.to_datetime(df['created_at'])

In [14]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,country_displayable_name,created_at,creator,currency,currency_symbol,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,1,we are going Production herbal teabag of plan...,"{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",19,AU,Australia,1441269202,"{""id"":1555219532,""name"":""ehsan"",""is_registered...",AUD,$,...,production-herbal-teabag-of-plants-native-to-iran,https://www.kickstarter.com/discover/categorie...,False,False,failed,1444141184,0.691164,"{""web"":{""project"":""https://www.kickstarter.com...",18.66144,domestic
1,637,Two agents battle each other in another dimens...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",16233,US,the United States,1576048498,"{""id"":99575233,""name"":""David Gerrard"",""is_regi...",USD,$,...,slip-strike-0,https://www.kickstarter.com/discover/categorie...,True,False,successful,1583987400,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",16233.0,domestic
2,50,A collection of Hard Enamel pins inspired by T...,"{""id"":262,""name"":""Accessories"",""slug"":""fashion...",983,CA,Canada,1560821709,"{""id"":1855173855,""name"":""Caitlin Peters"",""slug...",CAD,$,...,tattoo-shop-flash,https://www.kickstarter.com/discover/categorie...,True,False,successful,1564165825,0.7629,"{""web"":{""project"":""https://www.kickstarter.com...",987.4137,domestic
3,8,"Low carb, no sugar sauces and marinades using ...","{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",361,US,the United States,1563139848,"{""id"":1148188586,""name"":""Ian"",""slug"":""penningt...",USD,$,...,penningtons-keto-sauces-and-marinades,https://www.kickstarter.com/discover/categorie...,False,False,failed,1569530544,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",361.0,domestic
4,6452,The everyday bag fused with Parisian chic and ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1385803,US,the United States,1561364892,"{""id"":1085606247,""name"":""Laflore"",""slug"":""bobo...",USD,$,...,bobobark-designed-for-women-made-for-life,https://www.kickstarter.com/discover/categorie...,True,False,successful,1568408340,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1385803.0,domestic


In [15]:
# df['created_at'] = pd.to_datetime(df['created_at']).dt.date

In [16]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,country_displayable_name,created_at,creator,currency,currency_symbol,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,1,we are going Production herbal teabag of plan...,"{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",19,AU,Australia,1441269202,"{""id"":1555219532,""name"":""ehsan"",""is_registered...",AUD,$,...,production-herbal-teabag-of-plants-native-to-iran,https://www.kickstarter.com/discover/categorie...,False,False,failed,1444141184,0.691164,"{""web"":{""project"":""https://www.kickstarter.com...",18.66144,domestic
1,637,Two agents battle each other in another dimens...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",16233,US,the United States,1576048498,"{""id"":99575233,""name"":""David Gerrard"",""is_regi...",USD,$,...,slip-strike-0,https://www.kickstarter.com/discover/categorie...,True,False,successful,1583987400,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",16233.0,domestic
2,50,A collection of Hard Enamel pins inspired by T...,"{""id"":262,""name"":""Accessories"",""slug"":""fashion...",983,CA,Canada,1560821709,"{""id"":1855173855,""name"":""Caitlin Peters"",""slug...",CAD,$,...,tattoo-shop-flash,https://www.kickstarter.com/discover/categorie...,True,False,successful,1564165825,0.7629,"{""web"":{""project"":""https://www.kickstarter.com...",987.4137,domestic
3,8,"Low carb, no sugar sauces and marinades using ...","{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",361,US,the United States,1563139848,"{""id"":1148188586,""name"":""Ian"",""slug"":""penningt...",USD,$,...,penningtons-keto-sauces-and-marinades,https://www.kickstarter.com/discover/categorie...,False,False,failed,1569530544,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",361.0,domestic
4,6452,The everyday bag fused with Parisian chic and ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1385803,US,the United States,1561364892,"{""id"":1085606247,""name"":""Laflore"",""slug"":""bobo...",USD,$,...,bobobark-designed-for-women-made-for-life,https://www.kickstarter.com/discover/categorie...,True,False,successful,1568408340,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1385803.0,domestic


In [17]:
# df.set_index('created_at', inplace=True)

In [18]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,country_displayable_name,created_at,creator,currency,currency_symbol,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,1,we are going Production herbal teabag of plan...,"{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",19,AU,Australia,1441269202,"{""id"":1555219532,""name"":""ehsan"",""is_registered...",AUD,$,...,production-herbal-teabag-of-plants-native-to-iran,https://www.kickstarter.com/discover/categorie...,False,False,failed,1444141184,0.691164,"{""web"":{""project"":""https://www.kickstarter.com...",18.66144,domestic
1,637,Two agents battle each other in another dimens...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",16233,US,the United States,1576048498,"{""id"":99575233,""name"":""David Gerrard"",""is_regi...",USD,$,...,slip-strike-0,https://www.kickstarter.com/discover/categorie...,True,False,successful,1583987400,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",16233.0,domestic
2,50,A collection of Hard Enamel pins inspired by T...,"{""id"":262,""name"":""Accessories"",""slug"":""fashion...",983,CA,Canada,1560821709,"{""id"":1855173855,""name"":""Caitlin Peters"",""slug...",CAD,$,...,tattoo-shop-flash,https://www.kickstarter.com/discover/categorie...,True,False,successful,1564165825,0.7629,"{""web"":{""project"":""https://www.kickstarter.com...",987.4137,domestic
3,8,"Low carb, no sugar sauces and marinades using ...","{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",361,US,the United States,1563139848,"{""id"":1148188586,""name"":""Ian"",""slug"":""penningt...",USD,$,...,penningtons-keto-sauces-and-marinades,https://www.kickstarter.com/discover/categorie...,False,False,failed,1569530544,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",361.0,domestic
4,6452,The everyday bag fused with Parisian chic and ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1385803,US,the United States,1561364892,"{""id"":1085606247,""name"":""Laflore"",""slug"":""bobo...",USD,$,...,bobobark-designed-for-women-made-for-life,https://www.kickstarter.com/discover/categorie...,True,False,successful,1568408340,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1385803.0,domestic


In [19]:
# df.sort_index(inplace=True)

In [20]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country,country_displayable_name,created_at,creator,currency,currency_symbol,...,slug,source_url,spotlight,staff_pick,state,state_changed_at,static_usd_rate,urls,usd_pledged,usd_type
0,1,we are going Production herbal teabag of plan...,"{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",19,AU,Australia,1441269202,"{""id"":1555219532,""name"":""ehsan"",""is_registered...",AUD,$,...,production-herbal-teabag-of-plants-native-to-iran,https://www.kickstarter.com/discover/categorie...,False,False,failed,1444141184,0.691164,"{""web"":{""project"":""https://www.kickstarter.com...",18.66144,domestic
1,637,Two agents battle each other in another dimens...,"{""id"":34,""name"":""Tabletop Games"",""slug"":""games...",16233,US,the United States,1576048498,"{""id"":99575233,""name"":""David Gerrard"",""is_regi...",USD,$,...,slip-strike-0,https://www.kickstarter.com/discover/categorie...,True,False,successful,1583987400,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",16233.0,domestic
2,50,A collection of Hard Enamel pins inspired by T...,"{""id"":262,""name"":""Accessories"",""slug"":""fashion...",983,CA,Canada,1560821709,"{""id"":1855173855,""name"":""Caitlin Peters"",""slug...",CAD,$,...,tattoo-shop-flash,https://www.kickstarter.com/discover/categorie...,True,False,successful,1564165825,0.7629,"{""web"":{""project"":""https://www.kickstarter.com...",987.4137,domestic
3,8,"Low carb, no sugar sauces and marinades using ...","{""id"":313,""name"":""Small Batch"",""slug"":""food/sm...",361,US,the United States,1563139848,"{""id"":1148188586,""name"":""Ian"",""slug"":""penningt...",USD,$,...,penningtons-keto-sauces-and-marinades,https://www.kickstarter.com/discover/categorie...,False,False,failed,1569530544,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",361.0,domestic
4,6452,The everyday bag fused with Parisian chic and ...,"{""id"":28,""name"":""Product Design"",""slug"":""desig...",1385803,US,the United States,1561364892,"{""id"":1085606247,""name"":""Laflore"",""slug"":""bobo...",USD,$,...,bobobark-designed-for-women-made-for-life,https://www.kickstarter.com/discover/categorie...,True,False,successful,1568408340,1.0,"{""web"":{""project"":""https://www.kickstarter.com...",1385803.0,domestic


In [21]:
df.blurb.value_counts()

ALL-NEW SEXY BADGIRL characters from comic book INDIE legend Everette Hartsoe. 100% artwork in book                                       35
A beautiful natural Fine art nude book exemplifying the female form presented by female producer Nina Vain.                               28
Hard Enamel Pins                                                                                                                          22
The Decentralized Dance Party was founded on the belief that Partying is an art form that has the power to change the world.              17
Award Winning Footwear Designs | Crafted Using Italian Leathers with Bold and Comfortable Features | London Navy Men's Luxury Footwear    15
                                                                                                                                          ..
Brokntalk is a social game in which you listen to a line, repeat it as good as you can and get scored for how well you repeat it!          1
"Cities Witho

In [22]:
df.category.value_counts()

{"id":28,"name":"Product Design","slug":"design/product design","position":5,"parent_id":7,"parent_name":"Design","color":2577151,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/design/product%20design"}}}       4411
{"id":34,"name":"Tabletop Games","slug":"games/tabletop games","position":6,"parent_id":12,"parent_name":"Games","color":51627,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/games/tabletop%20games"}}}           4050
{"id":262,"name":"Accessories","slug":"fashion/accessories","position":1,"parent_id":9,"parent_name":"Fashion","color":16752598,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/fashion/accessories"}}}             3690
{"id":250,"name":"Comic Books","slug":"comics/comic books","position":2,"parent_id":3,"parent_name":"Comics","color":16776056,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/comics/comic%20books"}}}              3559
{"id":22,"name":"Illustratio

In [23]:
df.country.value_counts()
df.drop(['country'], axis=1, inplace=True)

In [24]:
df.country_displayable_name.value_counts()

the United States     149510
the United Kingdom     25023
Canada                 10232
Australia               5190
Germany                 3940
France                  3138
Mexico                  3054
Italy                   2740
Spain                   2462
the Netherlands         1920
Sweden                  1596
Hong Kong               1538
Denmark                  996
New Zealand              964
Singapore                884
Switzerland              752
Ireland                  709
Belgium                  645
Japan                    579
Austria                  548
Norway                   514
Luxembourg                72
Name: country_displayable_name, dtype: int64

In [25]:
df.creator.value_counts()
df.drop(['creator'], axis=1, inplace=True)

In [26]:
df.currency.value_counts()

USD    149510
GBP     25023
EUR     16174
CAD     10232
AUD      5190
MXN      3054
SEK      1596
HKD      1538
DKK       996
NZD       964
SGD       884
CHF       752
JPY       579
NOK       514
Name: currency, dtype: int64

In [27]:
df.currency_symbol.value_counts()

$      171372
£       25023
€       16174
kr       3106
Fr        752
¥         579
Name: currency_symbol, dtype: int64

In [28]:
df.currency_trailing_code.value_counts()

True     174478
False     42528
Name: currency_trailing_code, dtype: int64

In [29]:
df.current_currency.value_counts()
df.drop(['current_currency'], axis=1, inplace=True)

In [30]:
df.disable_communication.value_counts()
df.drop(['disable_communication'], axis=1, inplace=True)

In [31]:
df.is_starrable.value_counts()
df.drop(['is_starrable'], axis=1, inplace=True)

In [32]:
df.location.value_counts()
df.drop(['location'], axis=1, inplace=True)

In [33]:
df.name.value_counts()

Debut Album                                                     8
Home                                                            8
A Midsummer Night's Dream                                       7
Reflections                                                     6
My Hero Academia Enamel Pins                                    6
                                                               ..
Bohemian-Inspired and Modern Plus Size Clothes Sizes 14 - 30    1
SUPER SQUARE CHAOS                                              1
Storie Fantastiche di Gente Comune                              1
Prison, And How It Affects Our Youth.                           1
If everyone gave a dollar                                       1
Name: name, Length: 188992, dtype: int64

In [34]:
df.photo.value_counts()
df.drop(['photo'], axis=1, inplace=True)

In [35]:
df.profile.value_counts()
df.drop(['profile'], axis=1, inplace=True)

In [36]:
df.slug.value_counts()

infinite-academy-a-super-new-way-of-learning                  3
final-clancy-cup-tournament-extended-cut                      2
travel-sax-the-smallest-electronic-saxophone-in-th            2
bissy-the-kolanut-energy-drink                                2
the-bilby-400-lumen-high-powered-silicone-headlamp-by-knog    2
                                                             ..
producing-a-raw-diamond-into-a-star-lyrically-rap             1
eat-our-feelings                                              1
what-the-woof-a-dog-game-for-card-people                      1
the-thieves-lute-haunted-songs                                1
the-lost-princess-studio                                      1
Name: slug, Length: 189615, dtype: int64

In [37]:
df.source_url.value_counts()
df.drop(['source_url'], axis=1, inplace=True)

In [38]:
df.spotlight.value_counts()

True     126821
False     90185
Name: spotlight, dtype: int64

In [39]:
df.staff_pick.value_counts()

False    188376
True      28630
Name: staff_pick, dtype: int64

In [40]:
df.state.value_counts()

successful    126821
failed         76210
canceled        9015
live            4960
Name: state, dtype: int64

In [41]:
df.urls.value_counts()
df.drop(['urls'], axis=1, inplace=True)

In [42]:
df.usd_type.value_counts()
df.drop(['usd_type'], axis=1, inplace=True)

In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 217006 entries, 0 to 217432
Data columns (total 23 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             217006 non-null  int64  
 1   blurb                     217006 non-null  object 
 2   category                  217006 non-null  object 
 3   converted_pledged_amount  217006 non-null  int64  
 4   country_displayable_name  217006 non-null  object 
 5   created_at                217006 non-null  int64  
 6   currency                  217006 non-null  object 
 7   currency_symbol           217006 non-null  object 
 8   currency_trailing_code    217006 non-null  bool   
 9   deadline                  217006 non-null  int64  
 10  fx_rate                   217006 non-null  float64
 11  goal                      217006 non-null  float64
 12  id                        217006 non-null  int64  
 13  launched_at               217006 non-null  i

In [44]:
df.category = df.category.str.replace(':', ',')

In [45]:
df.category

0         {"id",313,"name","Small Batch","slug","food/sm...
1         {"id",34,"name","Tabletop Games","slug","games...
2         {"id",262,"name","Accessories","slug","fashion...
3         {"id",313,"name","Small Batch","slug","food/sm...
4         {"id",28,"name","Product Design","slug","desig...
                                ...                        
217428    {"id",13,"name","Journalism","slug","journalis...
217429    {"id",277,"name","Nature","slug","photography/...
217430    {"id",52,"name","Hardware","slug","technology/...
217431    {"id",307,"name","Drinks","slug","food/drinks"...
217432    {"id",258,"name","Architecture","slug","design...
Name: category, Length: 217006, dtype: object

In [46]:
df.category[0]

'{"id",313,"name","Small Batch","slug","food/small batch","position",10,"parent_id",10,"parent_name","Food","color",16725570,"urls",{"web",{"discover","http,//www.kickstarter.com/discover/categories/food/small%20batch"}}}'

In [47]:
import string

punctuation = "!\"#$%&'()*+-.:;<=>?@[\\]^_`{|}~"

def remove_punctuation(s):
    s_sans_punct = ""
    for letter in s:
        if letter not in punctuation:
            s_sans_punct += letter
    return s_sans_punct

In [48]:
# splits record strings up into lists
new_category = []
for line in df.category:
    line = remove_punctuation(line)
    new_category.append(line.split(','))
    
df.category = new_category

In [49]:
df.category[0]

['id',
 '313',
 'name',
 'Small Batch',
 'slug',
 'food/small batch',
 'position',
 '10',
 'parentid',
 '10',
 'parentname',
 'Food',
 'color',
 '16725570',
 'urls',
 'web',
 'discover',
 'http',
 '//wwwkickstartercom/discover/categories/food/small20batch']

In [50]:
for line in df.category:
    for element in line:
        clean_data = remove_punctuation(element)

In [51]:
df.category[42]

['id',
 '38',
 'name',
 'Electronic Music',
 'slug',
 'music/electronic music',
 'position',
 '6',
 'parentid',
 '14',
 'parentname',
 'Music',
 'color',
 '10878931',
 'urls',
 'web',
 'discover',
 'http',
 '//wwwkickstartercom/discover/categories/music/electronic20music']

In [52]:
df['category']

0         [id, 313, name, Small Batch, slug, food/small ...
1         [id, 34, name, Tabletop Games, slug, games/tab...
2         [id, 262, name, Accessories, slug, fashion/acc...
3         [id, 313, name, Small Batch, slug, food/small ...
4         [id, 28, name, Product Design, slug, design/pr...
                                ...                        
217428    [id, 13, name, Journalism, slug, journalism, p...
217429    [id, 277, name, Nature, slug, photography/natu...
217430    [id, 52, name, Hardware, slug, technology/hard...
217431    [id, 307, name, Drinks, slug, food/drinks, pos...
217432    [id, 258, name, Architecture, slug, design/arc...
Name: category, Length: 217006, dtype: object

In [53]:
all_categories = {}
for j, line in enumerate(df.category):
    categories = {}
    for i, ele in enumerate(line[:-4]):
        if i % 2 == 0:
            categories[ele] = line[i+1]
    all_categories[j] = categories

In [54]:
category = pd.DataFrame(all_categories).T

In [55]:
category.drop([
    'position',
    'parentid',
    'slug',
    'color',
    'urls'
],
axis = 1, inplace=True)

In [56]:
category.columns

Index(['id', 'name', 'parentname'], dtype='object')

In [57]:
df.columns

Index(['backers_count', 'blurb', 'category', 'converted_pledged_amount',
       'country_displayable_name', 'created_at', 'currency', 'currency_symbol',
       'currency_trailing_code', 'deadline', 'fx_rate', 'goal', 'id',
       'launched_at', 'name', 'pledged', 'slug', 'spotlight', 'staff_pick',
       'state', 'state_changed_at', 'static_usd_rate', 'usd_pledged'],
      dtype='object')

In [58]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country_displayable_name,created_at,currency,currency_symbol,currency_trailing_code,deadline,...,launched_at,name,pledged,slug,spotlight,staff_pick,state,state_changed_at,static_usd_rate,usd_pledged
0,1,we are going Production herbal teabag of plan...,"[id, 313, name, Small Batch, slug, food/small ...",19,Australia,1441269202,AUD,$,True,1444141184,...,1441549184,Production herbal teabag of plants native to Iran,27.0,production-herbal-teabag-of-plants-native-to-iran,False,False,failed,1444141184,0.691164,18.66144
1,637,Two agents battle each other in another dimens...,"[id, 34, name, Tabletop Games, slug, games/tab...",16233,the United States,1576048498,USD,$,True,1583987400,...,1581353979,Slip Strike,16233.0,slip-strike-0,True,False,successful,1583987400,1.0,16233.0
2,50,A collection of Hard Enamel pins inspired by T...,"[id, 262, name, Accessories, slug, fashion/acc...",983,Canada,1560821709,CAD,$,True,1564165822,...,1562005822,Tattoo Shop Flash,1294.29,tattoo-shop-flash,True,False,successful,1564165825,0.7629,987.4137
3,8,"Low carb, no sugar sauces and marinades using ...","[id, 313, name, Small Batch, slug, food/small ...",361,the United States,1563139848,USD,$,True,1569530542,...,1564346542,Pennington's - Keto Sauces and Marinades,361.0,penningtons-keto-sauces-and-marinades,False,False,failed,1569530544,1.0,361.0
4,6452,The everyday bag fused with Parisian chic and ...,"[id, 28, name, Product Design, slug, design/pr...",1385803,the United States,1561364892,USD,$,True,1568408340,...,1564502174,bobobark - Designed for Women. Made for Life.,1385803.0,bobobark-designed-for-women-made-for-life,True,False,successful,1568408340,1.0,1385803.0


In [59]:
df = df.merge(category, how='outer', left_index=True, right_index=True)

In [60]:
df.head()

Unnamed: 0,backers_count,blurb,category,converted_pledged_amount,country_displayable_name,created_at,currency,currency_symbol,currency_trailing_code,deadline,...,slug,spotlight,staff_pick,state,state_changed_at,static_usd_rate,usd_pledged,id_y,name_y,parentname
0,1.0,we are going Production herbal teabag of plan...,"[id, 313, name, Small Batch, slug, food/small ...",19.0,Australia,1441269000.0,AUD,$,True,1444141000.0,...,production-herbal-teabag-of-plants-native-to-iran,False,False,failed,1444141000.0,0.691164,18.66144,313,Small Batch,Food
1,637.0,Two agents battle each other in another dimens...,"[id, 34, name, Tabletop Games, slug, games/tab...",16233.0,the United States,1576048000.0,USD,$,True,1583987000.0,...,slip-strike-0,True,False,successful,1583987000.0,1.0,16233.0,34,Tabletop Games,Games
2,50.0,A collection of Hard Enamel pins inspired by T...,"[id, 262, name, Accessories, slug, fashion/acc...",983.0,Canada,1560822000.0,CAD,$,True,1564166000.0,...,tattoo-shop-flash,True,False,successful,1564166000.0,0.7629,987.4137,262,Accessories,Fashion
3,8.0,"Low carb, no sugar sauces and marinades using ...","[id, 313, name, Small Batch, slug, food/small ...",361.0,the United States,1563140000.0,USD,$,True,1569531000.0,...,penningtons-keto-sauces-and-marinades,False,False,failed,1569531000.0,1.0,361.0,313,Small Batch,Food
4,6452.0,The everyday bag fused with Parisian chic and ...,"[id, 28, name, Product Design, slug, design/pr...",1385803.0,the United States,1561365000.0,USD,$,True,1568408000.0,...,bobobark-designed-for-women-made-for-life,True,False,successful,1568408000.0,1.0,1385803.0,28,Product Design,Design


In [61]:
df.columns

Index(['backers_count', 'blurb', 'category', 'converted_pledged_amount',
       'country_displayable_name', 'created_at', 'currency', 'currency_symbol',
       'currency_trailing_code', 'deadline', 'fx_rate', 'goal', 'id_x',
       'launched_at', 'name_x', 'pledged', 'slug', 'spotlight', 'staff_pick',
       'state', 'state_changed_at', 'static_usd_rate', 'usd_pledged', 'id_y',
       'name_y', 'parentname'],
      dtype='object')

In [62]:
missing_values= df.isnull().sum()
missing_values/len(df)

backers_count               0.001964
blurb                       0.001964
category                    0.001964
converted_pledged_amount    0.001964
country_displayable_name    0.001964
created_at                  0.001964
currency                    0.001964
currency_symbol             0.001964
currency_trailing_code      0.001964
deadline                    0.001964
fx_rate                     0.001964
goal                        0.001964
id_x                        0.001964
launched_at                 0.001964
name_x                      0.001964
pledged                     0.001964
slug                        0.001964
spotlight                   0.001964
staff_pick                  0.001964
state                       0.001964
state_changed_at            0.001964
static_usd_rate             0.001964
usd_pledged                 0.001964
id_y                        0.001964
name_y                      0.001964
parentname                  0.040399
dtype: float64

In [63]:
df.dropna(inplace=True)

In [64]:
missing_values= df.isnull().sum()
missing_values/len(df)

backers_count               0.0
blurb                       0.0
category                    0.0
converted_pledged_amount    0.0
country_displayable_name    0.0
created_at                  0.0
currency                    0.0
currency_symbol             0.0
currency_trailing_code      0.0
deadline                    0.0
fx_rate                     0.0
goal                        0.0
id_x                        0.0
launched_at                 0.0
name_x                      0.0
pledged                     0.0
slug                        0.0
spotlight                   0.0
staff_pick                  0.0
state                       0.0
state_changed_at            0.0
static_usd_rate             0.0
usd_pledged                 0.0
id_y                        0.0
name_y                      0.0
parentname                  0.0
dtype: float64

In [65]:
df.category
df.drop(['category'], axis=1, inplace=True)

In [66]:
df.converted_pledged_amount.sort_values(ascending=False)

207883    12969608.0
206264    12969608.0
51782     12143435.0
197998    12143435.0
71884     11385449.0
             ...    
136678           0.0
64297            0.0
64327            0.0
136653           0.0
126425           0.0
Name: converted_pledged_amount, Length: 208235, dtype: float64

In [67]:
df.currency_trailing_code.value_counts()

True     167459
False     40776
Name: currency_trailing_code, dtype: int64

In [68]:
df.deadline.value_counts()

1.572581e+09    31
1.559362e+09    28
1.583039e+09    27
1.459483e+09    22
1.572592e+09    20
                ..
1.483826e+09     1
1.445598e+09     1
1.445597e+09     1
1.445595e+09     1
1.428631e+09     1
Name: deadline, Length: 172438, dtype: int64

In [69]:
df.fx_rate.value_counts()

1.000000    143456
1.221140     17718
1.080912     11807
0.709285      7400
1.226759      6245
0.643694      3799
1.085077      3723
0.711371      2441
0.041296      2061
0.647046      1214
0.101724      1189
0.129025      1006
0.041245       871
0.144964       712
0.598910       692
0.703586       638
1.027844       571
0.129018       479
0.009354       419
0.098205       390
0.102376       340
0.145478       237
0.601356       229
0.705470       198
1.031539       158
0.009327       135
0.098548       107
Name: fx_rate, dtype: int64

In [70]:
df.goal.value_counts()

5000.0     14847
10000.0    13102
1000.0      9816
2000.0      8505
3000.0      8353
           ...  
28100.0        1
3978.0         1
5205.0         1
29011.0        1
10130.0        1
Name: goal, Length: 5405, dtype: int64

In [71]:
df.id_x.value_counts()

1.829455e+09    3
5.207550e+08    2
1.540653e+09    2
1.041477e+09    2
1.156990e+09    2
               ..
1.418861e+09    1
7.656159e+08    1
1.773568e+08    1
7.363103e+08    1
1.610614e+09    1
Name: id_x, Length: 183173, dtype: int64

In [72]:
df.name_x.value_counts()

Debut Album                                              8
A Midsummer Night's Dream                                7
Home                                                     6
Romeo & Juliet                                           6
Reflections                                              6
                                                        ..
Superman and Batman The Worlds Finest Parody Fan Film    1
Make Him Cry                                             1
"Reflections" by R. Chadwick Drewes                      1
CATTLE - AN AWESOME SCI-FI SHORT (Canceled)              1
If everyone gave a dollar                                1
Name: name_x, Length: 182587, dtype: int64

In [73]:
df.pledged.value_counts()

0.00         15846
1.00          6527
2.00          1674
10.00         1652
5.00          1144
             ...  
2081.88          1
60144.00         1
70715.20         1
155138.71        1
262143.00        1
Name: pledged, Length: 46998, dtype: int64

In [74]:
df.slug.value_counts()

infinite-academy-a-super-new-way-of-learning               3
author-of-book-hansel-and-gretta                           2
evacuation-pandemic                                        2
kawaii-anime-inspired-hard-enamel-pins                     2
isospine-trigger-point-therapy-to-stretch-out-your-back    2
                                                          ..
how-to-make-money-from-art-camden-fringe-festival          1
retronator-pixel-art-academy                               1
krista-branch-is-recording-her-1st-full-length-alb         1
blog-about-what-international-students-think-about         1
the-lost-princess-studio                                   1
Name: slug, Length: 183173, dtype: int64

In [75]:
df.spotlight.value_counts()

True     121680
False     86555
Name: spotlight, dtype: int64

In [76]:
df.staff_pick.value_counts()

False    180698
True      27537
Name: staff_pick, dtype: int64

In [77]:
df.state.value_counts()

successful    121680
failed         73128
canceled        8678
live            4749
Name: state, dtype: int64

In [78]:
df.state_changed_at.value_counts()

1.572581e+09    30
1.559362e+09    28
1.583039e+09    26
1.459483e+09    20
1.572592e+09    20
                ..
1.444937e+09     1
1.444931e+09     1
1.431899e+09     1
1.444927e+09     1
1.353211e+09     1
Name: state_changed_at, Length: 173298, dtype: int64

In [79]:
df.static_usd_rate.value_counts()
df.drop(['static_usd_rate'], axis=1, inplace=True)

In [80]:
df.usd_pledged.value_counts()

0.000000        15846
1.000000         4496
2.000000         1117
10.000000        1080
25.000000         941
                ...  
911.482860          1
2754.510000         1
802.013808          1
43533.000000        1
32.723074           1
Name: usd_pledged, Length: 83701, dtype: int64

In [81]:
df.id_y.value_counts()
df.drop(['id_y'], axis=1, inplace=True)

In [82]:
df.name_y.value_counts()

Web                4621
Product Design     4402
Tabletop Games     4040
Accessories        3683
Comic Books        3556
                   ... 
Social Practice      78
Chiptune             59
Farmers Markets      28
Toys                 16
Taxidermy            12
Name: name_y, Length: 146, dtype: int64

In [83]:
df.parentname.value_counts()

Film  Video    28223
Music          27057
Technology     21453
Publishing     20865
Art            20622
Food           16010
Games          14097
Fashion        12575
Design          8846
Comics          8818
Photography     8117
Theater         6794
Crafts          6607
Journalism      5373
Dance           2778
Name: parentname, dtype: int64

In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 208235 entries, 0 to 217005
Data columns (total 23 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             208235 non-null  float64
 1   blurb                     208235 non-null  object 
 2   converted_pledged_amount  208235 non-null  float64
 3   country_displayable_name  208235 non-null  object 
 4   created_at                208235 non-null  float64
 5   currency                  208235 non-null  object 
 6   currency_symbol           208235 non-null  object 
 7   currency_trailing_code    208235 non-null  object 
 8   deadline                  208235 non-null  float64
 9   fx_rate                   208235 non-null  float64
 10  goal                      208235 non-null  float64
 11  id_x                      208235 non-null  float64
 12  launched_at               208235 non-null  float64
 13  name_x                    208235 non-null  o

In [85]:
df.currency_symbol.value_counts()
df.drop(['currency_symbol'], axis=1, inplace=True)

In [86]:
df.columns

Index(['backers_count', 'blurb', 'converted_pledged_amount',
       'country_displayable_name', 'created_at', 'currency',
       'currency_trailing_code', 'deadline', 'fx_rate', 'goal', 'id_x',
       'launched_at', 'name_x', 'pledged', 'slug', 'spotlight', 'staff_pick',
       'state', 'state_changed_at', 'usd_pledged', 'name_y', 'parentname'],
      dtype='object')

In [87]:
df.state

0             failed
1         successful
2         successful
3             failed
4         successful
             ...    
217000        failed
217002    successful
217003    successful
217004    successful
217005        failed
Name: state, Length: 208235, dtype: object

In [88]:
df.state = pd.get_dummies(df.state, columns=['dummy'], drop_first=True)

In [89]:
df.state

0         1
1         0
2         0
3         1
4         0
         ..
217000    1
217002    0
217003    0
217004    0
217005    1
Name: state, Length: 208235, dtype: uint8

In [90]:
df.state = df.state.replace({1:0, 0:1})

In [91]:
df.state

0         0
1         1
2         1
3         0
4         1
         ..
217000    0
217002    1
217003    1
217004    1
217005    0
Name: state, Length: 208235, dtype: int64

In [92]:
df.columns

Index(['backers_count', 'blurb', 'converted_pledged_amount',
       'country_displayable_name', 'created_at', 'currency',
       'currency_trailing_code', 'deadline', 'fx_rate', 'goal', 'id_x',
       'launched_at', 'name_x', 'pledged', 'slug', 'spotlight', 'staff_pick',
       'state', 'state_changed_at', 'usd_pledged', 'name_y', 'parentname'],
      dtype='object')

In [93]:
corr_matrix = df.corr()
print(corr_matrix["state"].sort_values(ascending=False))

state                       1.000000
backers_count               0.107838
launched_at                 0.092956
deadline                    0.090893
state_changed_at            0.089478
created_at                  0.088051
usd_pledged                 0.084288
converted_pledged_amount    0.084233
pledged                     0.018619
fx_rate                     0.015172
id_x                        0.000451
goal                       -0.029612
Name: state, dtype: float64


In [94]:
df[df.columns[1:]].corr()['state'][:]

converted_pledged_amount    0.084233
created_at                  0.088051
deadline                    0.090893
fx_rate                     0.015172
goal                       -0.029612
id_x                        0.000451
launched_at                 0.092956
pledged                     0.018619
state                       1.000000
state_changed_at            0.089478
usd_pledged                 0.084288
Name: state, dtype: float64

In [95]:
df.head()

Unnamed: 0,backers_count,blurb,converted_pledged_amount,country_displayable_name,created_at,currency,currency_trailing_code,deadline,fx_rate,goal,...,name_x,pledged,slug,spotlight,staff_pick,state,state_changed_at,usd_pledged,name_y,parentname
0,1.0,we are going Production herbal teabag of plan...,19.0,Australia,1441269000.0,AUD,True,1444141000.0,0.643694,14000.0,...,Production herbal teabag of plants native to Iran,27.0,production-herbal-teabag-of-plants-native-to-iran,False,False,0,1444141000.0,18.66144,Small Batch,Food
1,637.0,Two agents battle each other in another dimens...,16233.0,the United States,1576048000.0,USD,True,1583987000.0,1.0,6000.0,...,Slip Strike,16233.0,slip-strike-0,True,False,1,1583987000.0,16233.0,Tabletop Games,Games
2,50.0,A collection of Hard Enamel pins inspired by T...,983.0,Canada,1560822000.0,CAD,True,1564166000.0,0.709285,450.0,...,Tattoo Shop Flash,1294.29,tattoo-shop-flash,True,False,1,1564166000.0,987.4137,Accessories,Fashion
3,8.0,"Low carb, no sugar sauces and marinades using ...",361.0,the United States,1563140000.0,USD,True,1569531000.0,1.0,28000.0,...,Pennington's - Keto Sauces and Marinades,361.0,penningtons-keto-sauces-and-marinades,False,False,0,1569531000.0,361.0,Small Batch,Food
4,6452.0,The everyday bag fused with Parisian chic and ...,1385803.0,the United States,1561365000.0,USD,True,1568408000.0,1.0,15000.0,...,bobobark - Designed for Women. Made for Life.,1385803.0,bobobark-designed-for-women-made-for-life,True,False,1,1568408000.0,1385803.0,Product Design,Design


In [96]:
df.shape

(208235, 22)

## Logistic Regression

In [97]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 208235 entries, 0 to 217005
Data columns (total 22 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   backers_count             208235 non-null  float64
 1   blurb                     208235 non-null  object 
 2   converted_pledged_amount  208235 non-null  float64
 3   country_displayable_name  208235 non-null  object 
 4   created_at                208235 non-null  float64
 5   currency                  208235 non-null  object 
 6   currency_trailing_code    208235 non-null  object 
 7   deadline                  208235 non-null  float64
 8   fx_rate                   208235 non-null  float64
 9   goal                      208235 non-null  float64
 10  id_x                      208235 non-null  float64
 11  launched_at               208235 non-null  float64
 12  name_x                    208235 non-null  object 
 13  pledged                   208235 non-null  f

In [98]:
df.country_displayable_name.value_counts()

the United States     143456
the United Kingdom     23963
Canada                  9841
Australia               5013
Germany                 3785
France                  3015
Mexico                  2932
Italy                   2636
Spain                   2354
the Netherlands         1842
Sweden                  1529
Hong Kong               1485
Denmark                  949
New Zealand              921
Singapore                836
Switzerland              729
Ireland                  677
Belgium                  617
Japan                    554
Austria                  532
Norway                   497
Luxembourg                72
Name: country_displayable_name, dtype: int64

In [99]:
df.state = df.state.replace({1:0, 0:1})

In [100]:
from sklearn.preprocessing import LabelEncoder
gle = LabelEncoder()
genre_labels = gle.fit_transform(df['parentname'])
genre_mappings = {index: label for index, label in enumerate(gle.classes_)}

In [101]:
genre_mappings

{0: 'Art',
 1: 'Comics',
 2: 'Crafts',
 3: 'Dance',
 4: 'Design',
 5: 'Fashion',
 6: 'Film  Video',
 7: 'Food',
 8: 'Games',
 9: 'Journalism',
 10: 'Music',
 11: 'Photography',
 12: 'Publishing',
 13: 'Technology',
 14: 'Theater'}

In [102]:
df.parentname = genre_labels

In [103]:
gle = LabelEncoder()
type_labels = gle.fit_transform(df['name_y'])
type_mappings = {index: label for index, label in enumerate(gle.classes_)}

In [104]:
df.name_y = type_labels

In [105]:
X = df.drop(['state',
             'blurb',
             'country_displayable_name',
             'currency',
             'currency_trailing_code',
             'name_x',
             'slug',
             'spotlight',
             'staff_pick'
            ], axis=1)
y = df.state

In [119]:
features = ['backers_count', 
            'converted_pledged_amount', 
            'created_at', 
            'deadline', 
            'fx_rate', 
            'goal', 
            'id_x', 
            'launched_at', 
            'pledged', 
            'state_changed_at',
            'usd_pledged',
            'parentname',
            'name_y'
           ]

In [120]:
X = df[features]
y = df.state

In [121]:
from sklearn.feature_selection import RFE

In [122]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import seaborn as sns

plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Import LogisticRegression and LinearRegression from sklearn.linear_model
from sklearn.linear_model import LogisticRegression, LinearRegression

In [123]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split (X, y, random_state = 42)

In [124]:
# Scale our data.
# Relabeling scaled data as "Z" is common.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
Z_train = sc.fit_transform(X_train)
Z_test = sc.transform(X_test)

In [125]:
from sklearn.metrics import confusion_matrix
logreg = LogisticRegression(C=1e9, solver='lbfgs')
logreg.fit(Z_train, y_train)

# Predict the labels of the test set: y_pred
y_pred = logreg.predict(Z_test)

# Compute and print the confusion matrix and classification report
print(confusion_matrix(y_test, y_pred))

[[29714  3960]
 [ 2397 15988]]


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [131]:
df.state.value_counts()

0    135107
1     73128
Name: state, dtype: int64

In [128]:
print(logreg.score(Z_train, y_train))
print(logreg.score(Z_test, y_test))

0.8752177031041901
0.8778885495303406


In [129]:
logreg.coef_

array([[-5.05052588e+01, -1.30061806e+00, -6.82103366e-01,
        -9.00615146e+01,  1.93034496e-02,  1.13618028e+00,
        -8.08978579e-03, -3.23738813e+01,  1.32262340e+00,
         1.22890009e+02, -1.07419233e+00, -3.86592953e-03,
        -1.12429611e-02]])

In [130]:
# Assign the coefficients to a list coef
coef = logreg.coef_
for p,c in zip(features,list(coef[0])):
    print(p + '\t' + str(c))

backers_count	-50.50525880971376
converted_pledged_amount	-1.3006180553080446
created_at	-0.6821033657711582
deadline	-90.06151459324798
fx_rate	0.019303449567795262
goal	1.1361802788704174
id_x	-0.008089785794514302
launched_at	-32.37388128548797
pledged	1.3226233970463004
state_changed_at	122.89000856446515
usd_pledged	-1.0741923313397805
parentname	-0.003865929532217644
name_y	-0.011242961065191022


In [None]:
# exponentiate

In [116]:
df.columns

Index(['backers_count', 'blurb', 'converted_pledged_amount',
       'country_displayable_name', 'created_at', 'currency',
       'currency_trailing_code', 'deadline', 'fx_rate', 'goal', 'id_x',
       'launched_at', 'name_x', 'pledged', 'slug', 'spotlight', 'staff_pick',
       'state', 'state_changed_at', 'usd_pledged', 'name_y', 'parentname'],
      dtype='object')

In [132]:
df.parentname.value_counts()

6     28223
10    27057
13    21453
12    20865
0     20622
7     16010
8     14097
5     12575
4      8846
1      8818
11     8117
14     6794
2      6607
9      5373
3      2778
Name: parentname, dtype: int64