# Numpy Basics

## Exercise 1

### Load the amount pledged (in U.S. dollars) from the data file into an array of floating point values. Then, produce the following descriptive statistics:
### Total number of projects, Amount pledged: minimum, mean, standard deviation, and maximum, Proportion (or percentage) of projects that earned total pledges of at least 1,000

In [1]:
# Import numpy
import numpy as np
# Print the total number of projects
p = np.loadtxt("kickstarter.csv", dtype=str, delimiter=',', skiprows=1, usecols=1)
print(len(p))

4184


In [2]:
# Load the amount pledged (in U.S. dollars) from the data file into an array of floating point values
pledged_amount = np.loadtxt("kickstarter.csv", dtype=float, delimiter=',', skiprows=1, usecols=11)
print('The minimum of the amount pledged:', pledged_amount.min())
print('The mean of the amount pledged:', pledged_amount.mean())
print('The standard deviation of the amount pledged:',pledged_amount.std())
print('The maximum of the amount pledged:',pledged_amount.max())

The minimum of the amount pledged: 0.0
The mean of the amount pledged: 1242.1242686279124
The standard deviation of the amount pledged: 5177.92798555567
The maximum of the amount pledged: 111111.77


In [3]:
# Percentage of projects that earned total pledges of at least $1,000
atleast1000 = len(pledged_amount[pledged_amount>= 1000])/len(pledged_amount)
print(atleast1000*100, '%')

19.622370936902485 %


## Exercise 2

### Load the project categories from the data file into an array of strings. Count the frequency of each category, and then calculate the proportion of observations that fall into each category. Return both results (category and proportion of observations) in the form of a dictionary. Hint: Look at the help for np.unique.
### Which project category is the most popular (in terms of the number of projects)? Least popular? Write your answer in a markdown cell.

In [4]:
# Load the project categories from the data file into an array of strings.
pledged_category = np.loadtxt("kickstarter.csv", dtype=str, delimiter=',', skiprows=1, usecols=12)
# If true, return the number of times each unique category item appears
unique_category = np.unique(pledged_category, return_counts = True)
print(unique_category)

(array(['Ceramics', 'Conceptual Art', 'Digital Art', 'Illustration',
       'Painting'], dtype='<U14'), array([ 204,  879, 1054,  461, 1586], dtype=int64))


In [5]:
# Count the frequency of each category and return the result in the form of a dictionary
# Calculate the proportion of observations that fall into each category and return the result in the form of a dictionary
number_all_category = dict(zip(unique_category[0], unique_category[1]))
number_proportion_category = dict(zip(unique_category[0], unique_category[1]/len(pledged_category)))
print(number_all_category)
print(number_proportion_category)

{'Ceramics': 204, 'Conceptual Art': 879, 'Digital Art': 1054, 'Illustration': 461, 'Painting': 1586}
{'Ceramics': 0.04875717017208413, 'Conceptual Art': 0.21008604206500955, 'Digital Art': 0.2519120458891013, 'Illustration': 0.11018164435946463, 'Painting': 0.37906309751434036}


In [6]:
# Use loop to present the answers in a clear way
for a, b in number_all_category.items():
    print(a, b)
for a, b in number_proportion_category.items():
    print(a, '{}%'.format(round(b*100, 1)))

Ceramics 204
Conceptual Art 879
Digital Art 1054
Illustration 461
Painting 1586
Ceramics 4.9%
Conceptual Art 21.0%
Digital Art 25.2%
Illustration 11.0%
Painting 37.9%


## Exercise 3

### Import the project states from the data file into an array of strings. For each project category, calculate the proportion (or percentage) of projects that were successful. Which project category is the most successful (on average)? Least successful (on average)?

In [8]:
# Import the project states from the data file into an array of strings
states = np.loadtxt("kickstarter.csv", dtype = str, delimiter=',',skiprows = 1, usecols = 4)
ceramics_s = sum(states[pledged_category == 'Ceramics'] == 'successful')/len(states[pledged_category == 'Ceramics'])
print("For the Ceramics, the proportion of projects that are successful:", ceramics_s*100,"%")

For the Ceramics, the proportion of projects that are successful: 41.17647058823529 %


In [9]:
concep_art_s = sum(states[pledged_category == 'Conceptual Art'] == 'successful')/len(states[pledged_category == 'Conceptual Art'])
print("For the Conceptual Art, the proportion of projects that are successful:", concep_art_s*100,"%")

For the Conceptual Art, the proportion of projects that are successful: 36.6325369738339 %


In [10]:
dig_art_s = sum(states[pledged_category == 'Digital Art'] == 'successful')/len(states[pledged_category == 'Digital Art'])
print("For the Digital Art, the proportion of projects that are successful:", dig_art_s*100,"%")

For the Digital Art, the proportion of projects that are successful: 27.13472485768501 %


In [11]:
illustration_s = sum(states[pledged_category == 'Illustration'] == 'successful')/len(states[pledged_category == 'Illustration'])
print("For the Illustration, the proportion of projects that are successful:", illustration_s*100,"%")

For the Illustration, the proportion of projects that are successful: 0.0 %


In [12]:
Painting_s = sum(states[pledged_category == 'Painting'] == 'successful')/len(states[pledged_category == 'Painting'])
print("For the Painting, the proportion of projects that are successful:", Painting_s*100,"%")

For the Painting, the proportion of projects that are successful: 0.0 %


### In this sense, the most successful project category is Ceramics and the least successful project categories are Illustration and Painting.

## Exercise 4

### Load the staff pick and spotlight columns from the data file into (separate) arrays of strings. Calculate the total number of projects in each featured category, and then calculate the associated success rates in each category (proportion of successful projects). Which feature (staff pick or spotlight) is associated with a higher proportion of successful projects?

In [13]:
# Load the staff pick and spotlight columns from the data file into separate arrays of strings.
states = np.loadtxt('kickstarter.csv', dtype = str, delimiter=',',skiprows=1,usecols=4)
print(states)

['successful' 'successful' 'successful' ... 'failed' 'failed' 'failed']


In [15]:
# Load the staff pick and spotlight columns from the data file into (separate) arrays of strings. 
staff_pick = np.loadtxt('kickstarter.csv', dtype= str, delimiter=',',skiprows=1,usecols=8)
spotlight = np.loadtxt('kickstarter.csv', dtype= str, delimiter=',',skiprows=1,usecols=13)

# Calculate the total number of projects in each featured category, and then calculate the associated success rates in each category
staff_pick_true = sum(states[staff_pick == 'TRUE'] == 'successful')/len(states[staff_pick == 'TRUE'])
staff_pick_false = sum(states[staff_pick == 'FALSE'] == 'successful')/len(states[staff_pick == 'FALSE'])
spotlight_true = sum(states[spotlight == 'TRUE'] == 'successful')/len(states[spotlight == 'TRUE'])
spotlight_false = sum(states[spotlight == 'FALSE'] == 'successful')/len(states[spotlight == 'FALSE'])

# Print each feature of the proportion of successful projects
print('staff_pick: True, Project success rate: {}%'.format(round(staff_pick_true*100,1)))
print('staff_pick: False, Project success rate: {}%'.format(round(staff_pick_false*100,1)))
print('spotlight: True, Project success rate: {}%'.format(round(spotlight_true*100,1)))
print('spotlight: False, Project success rate: {}%'.format(round(spotlight_false*100,1)))

staff_pick: True, Project success rate: 72.8%
staff_pick: False, Project success rate: 13.4%
spotlight: True, Project success rate: 100.0%
spotlight: False, Project success rate: 0.0%
