# Importance of Category in Determining Kickstarter Success

## Introduction

Kickstarter exists to help bring creative projects to life.
Before you get excited, Kickstarter projects work on a all or nothing basis. That means that the project owner sets a target amount to raise in order to make their project possible, and if they fail to raise 100% of that amount, the project does not get any funding. 

Now the question is, what kinds of projects get funded and what kinds don't?

### Choosing your project
If you plan to obtain funding from external sources, you need to ensure that your project is desirable. To help you to determine the direction for your project, I have gathered some data on kickstarter projects that have succeeded or failed to get funded, as well as their categories.

The data is obtained from https://www.kaggle.com/kemical/kickstarter-projects#ks-projects-201801.csv.

In [17]:
import pandas as pd

In [18]:
data = pd.read_csv("ks-projects-201801.csv")
data

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,1000002330,The Songs of Adelaide & Abullah,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,1000003930,Greeting From Earth: ZGAC Arts Capsule For ET,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.00
2,1000004038,Where is Hank?,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.00
3,1000007540,ToshiCapital Rekordz Needs Help to Complete Album,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.00
4,1000011046,Community Film Project: The Art of Neighborhoo...,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,14,US,1283.0,1283.0,19500.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
378656,999976400,ChknTruk Nationwide Charity Drive 2014 (Canceled),Documentary,Film & Video,USD,2014-10-17,50000.0,2014-09-17 02:35:30,25.0,canceled,1,US,25.0,25.0,50000.00
378657,999977640,The Tribe,Narrative Film,Film & Video,USD,2011-07-19,1500.0,2011-06-22 03:35:14,155.0,failed,5,US,155.0,155.0,1500.00
378658,999986353,Walls of Remedy- New lesbian Romantic Comedy f...,Narrative Film,Film & Video,USD,2010-08-16,15000.0,2010-07-01 19:40:30,20.0,failed,1,US,20.0,20.0,15000.00
378659,999987933,BioDefense Education Kit,Technology,Technology,USD,2016-02-13,15000.0,2016-01-13 18:13:53,200.0,failed,6,US,200.0,200.0,15000.00


To make the data more readable, I have dropped all unnecessary columns.

In [19]:
filtered_data = data[['category','state']]
filtered_data

Unnamed: 0,category,state
0,Poetry,failed
1,Narrative Film,failed
2,Narrative Film,failed
3,Music,failed
4,Film & Video,canceled
...,...,...
378656,Documentary,canceled
378657,Narrative Film,failed
378658,Narrative Film,failed
378659,Technology,failed


##### There are 159 different categories. 

In [90]:
categories = filtered_data.category.unique()
categories

array(['Poetry', 'Narrative Film', 'Music', 'Film & Video', 'Restaurants',
       'Food', 'Drinks', 'Product Design', 'Documentary', 'Nonfiction',
       'Indie Rock', 'Crafts', 'Games', 'Tabletop Games', 'Design',
       'Comic Books', 'Art Books', 'Fashion', 'Childrenswear', 'Theater',
       'Comics', 'DIY', 'Webseries', 'Animation', 'Food Trucks',
       'Public Art', 'Illustration', 'Photography', 'Pop', 'People',
       'Art', 'Family', 'Fiction', 'Accessories', 'Rock', 'Hardware',
       'Software', 'Weaving', 'Gadgets', 'Web', 'Jazz', 'Ready-to-wear',
       'Festivals', 'Video Games', 'Anthologies', 'Publishing', 'Shorts',
       'Electronic Music', 'Radio & Podcasts', 'Apps', 'Cookbooks',
       'Apparel', 'Metal', 'Comedy', 'Hip-Hop', 'Periodicals', 'Dance',
       'Technology', 'Painting', 'World Music', 'Photobooks', 'Drama',
       'Architecture', 'Young Adult', 'Latin', 'Mobile Games', 'Flight',
       'Fine Art', 'Action', 'Playing Cards', 'Makerspaces', 'Punk',
       

##### Let's look at the category "Residencies" in more detail.

In [91]:
category_data = filtered_data.category.str.contains('Residencies')
total = len(filtered_data[category_data])
category_success = filtered_data[category_data].state.str.contains('successful')
success = len(filtered_data[category_data][category_success])
print('Total kickstarters with Residence as its category: {}'.format(total))
print('Total successful kickstarters with Residence as its category: {}'.format(success))

Total kickstarters with Residence as its category: 69
Total successful kickstarters with Residence as its category: 50


### To identify the categories with the highest and lowest probabilities of success, we do the following calculation.

In [93]:
#Most successful
ms_list = []
ms_rate = 0
#Least successful
ls_list = []
ls_rate = 100

df = pd.DataFrame(columns = ['Category','Successes','Attempts', 'Success rate'])

for category in categories:
    filter_category = filtered_data.category.str.contains(f"{category}")
    count = len(filtered_data[filter_category])
    filter_s_in_category = filtered_data[filter_category].state.str.contains('successful')
    s = len(filtered_data[filter_category][filter_s_in_category])
    rate = s/count
    
    df.loc[len(df)] = [category,s,count,rate]

    if rate > ms_rate:
        ms_rate = rate
        ms_list = [[category,s,count,rate]]
        
    elif rate < ls_rate:
        ls_rate = rate
        ls_list = [[category,s,count,rate]]
        
    elif rate == ms_rate:
        ms_list.append([category,s,count,rate])
    
    elif rate == ls_rate:
        ls_list.append([category,s,count,rate])

m_count = len(ms_list)
if m_count == 1:
    temp = ms_list[0]
    print("The most successful category is {0} with {1} successes out of {2} attempts. The success rate is {3}.".format(*temp))
else:
    print('The most successful categories are ')
    for ms in ms_list:
        print("{0} with {1} successes out of {2} attempts. The success rate is {3}.\n".format(ms[0], ms[1], ms[2], ms[3]))

l_count = len(ls_list)
if l_count == 1:
    temp2 = ls_list[0]
    print("The least successful category is {0} with {1} successes out of {2} attempts. The success rate is {3}.".format(*temp2))
else:
    print('The least successful categories are ')
    for ls in ls_list:
        print(ls)
        print("{0} with {1} successes out of {2} attempts. The success rate is {3}.\n".format(ls[0], ls[1], ls[2], ls[3]))

                                                 

The most successful category is Chiptune with 27 successes out of 35 attempts. The success rate is 0.7714285714285715.
The least successful category is Apps with 378 successes out of 6345 attempts. The success rate is 0.059574468085106386.


## Here is a table of the categories organized by their success rate from highest to lowest.

In [95]:
df = df.sort_values(by='Success rate', ascending=False,ignore_index=True)
df

Unnamed: 0,Category,Successes,Attempts,Success rate
0,Chiptune,27,35,0.771429
1,Residencies,50,69,0.724638
2,Anthologies,521,784,0.664541
3,Dance,1542,2322,0.664083
4,Indie Rock,3618,5657,0.639562
...,...,...,...,...
154,Candles,55,429,0.128205
155,Food Trucks,217,1752,0.123858
156,Software,371,3048,0.121719
157,Mobile Games,153,1789,0.085523


### Now you have a better idea of the what categories to invest your time and money into :)