Kickstarter has established itself as the leading platform for funding creative ventures. Aspiring entrepreneurs in the arts can initiate fundraising campaigns on Kickstarter to support their projects. Some projects have been hugely successful, whereas many others have fallen well short of their fundraising objectives. The attached data file the document contains sample data on over 4000 Kickstarter fundraising campaigns. Each row contains a summary of each campaign, including the goal and amount pledged, the state of the project (e.g., successful, failed), the category of the project (i.e., type of art), and whether the project was featured via a staff pick or spotlight (i.e., on the Kickstarter home page). Complete the following exercises using the data.

In [107]:
import numpy as np

## Exercise #1

Load the amount pledged (in U.S. dollars) from the data file into an array of floating point values. Then, produce the following descriptive statistics:

- Total number of projects
- Minimum, mean, standard deviation, and maximum amount pledged
- Proportion (or percentage) of projects that earned total pledges of at least $1,000

In [96]:
usd_pledged = np.loadtxt('kickstarter.csv', delimiter=',', usecols=11, skiprows=1)

In [99]:
print('total_projects :',usd_pledged.shape[0])

total_projects : 4184


In [106]:
print('min :',usd_pledged.min())
print('mean :',usd_pledged.mean())
print('std :',usd_pledged.std())
print('max :',usd_pledged.max())

min : 0.0
mean : 1242.1242686279124
std : 5177.92798555567
max : 111111.77


In [105]:
projects_percent = ((pledged >= 1000).sum() / total_projects) * 100
print('Percentage of projects that earned total pledges of at least $1,000 is',projects_percent)

Percentage of projects that earned total pledges of at least $1,000 is 19.622370936902485


## Exercise #2 

Load the project categories from the data file into an array of strings. Count the frequency of each category, and then calculate the proportion of observations that fall into each category. Return both results in the form of a dictionary. Which project category is the most popular (in terms of the number of projects)? Least popular?

In [54]:
categories = np.loadtxt('kickstarter.csv', delimiter=',', usecols=12, skiprows=1, dtype=str)
categories

array(['Conceptual Art', 'Conceptual Art', 'Conceptual Art', ...,
       'Painting', 'Painting', 'Painting'], dtype='<U14')

In [112]:
frequency = {}
for category in categories :
    frequency[category] = frequency.get(category,0) + 1
    
proportion = {}
for k,v in frequency.items() :
    proportion[k] = {'count' : v, 'proportion' : v/total_projects}
proportion

{'Conceptual Art': {'count': 879, 'proportion': 0.21008604206500955},
 'Digital Art': {'count': 1054, 'proportion': 0.2519120458891013},
 'Illustration': {'count': 461, 'proportion': 0.11018164435946463},
 'Painting': {'count': 1586, 'proportion': 0.37906309751434036},
 'Ceramics': {'count': 204, 'proportion': 0.04875717017208413}}

In [110]:
print("Most Popular Category is 'Painting'")
print("Least Popular Category is 'Ceramics'")

Most Popular Category is 'Painting'
Least Popular Category is 'Ceramics'


## Exercise #3

Import the project states from the data file into an array of strings. For each project category, calculate the proportion (or percentage) of projects that were successful.  Which project category is the most successful (on average)? Least successful (on average)?

In [113]:
states = np.loadtxt('kickstarter.csv', delimiter=',', usecols=4, skiprows=1, dtype=str)

for category in np.unique(categories) :
    print(category,':',(np.logical_and(categories == category, states == 'successful').sum() / frequency[category]) * 100)

Ceramics : 41.17647058823529
Conceptual Art : 36.6325369738339
Digital Art : 27.13472485768501
Illustration : 0.0
Painting : 0.0


In [14]:
print("Most Successful Category is 'Ceramics'")
print("Least Successful Category are 'Illustration' & 'Painting'")

Most Successful Category is 'Ceramics'
Least Successful Category are 'Illustration' & 'Painting'


## Exercise #4

Load the staff pick and spotlight columns from the data file into (separate) arrays of strings. Calculate the total number of projects in each featured category, and then calculate the associated success rates in each category (proportion of successful projects). Which feature (staff pick or spotlight) is associated with a higher proportion of successful projects?

In [95]:
staff_pick = np.loadtxt('kickstarter.csv', delimiter=',', usecols=8, skiprows=1, dtype=str)
spotlight = np.loadtxt('kickstarter.csv', delimiter=',', usecols=13, skiprows=1, dtype=str)

In [114]:
total_staff_pick = (staff_pick == 'TRUE').sum()
total_spotlight = (spotlight == 'TRUE').sum()
success_staff_pick = (np.logical_and(staff_pick == 'TRUE', states == 'successful').sum() / total_staff_pick) * 100
success_spotlight = (np.logical_and(spotlight == 'TRUE', states == 'successful').sum() / total_spotlight) * 100
print('staff_pick ----\tNumber of projects :',total_staff_pick, '\tSuccess Rate :', success_staff_pick)
print('spotlight ----\tNumber of projects :',total_spotlight, '\tSuccess Rate :', success_spotlight)

staff_pick ----	Number of projects : 224 	Success Rate : 72.76785714285714
spotlight ----	Number of projects : 692 	Success Rate : 100.0


In [115]:
print("'spotlight' is associated with a higher proportion of successful projects")

'spotlight' is associated with a higher proportion of successful projects
