# Kickstarter Projects
## Notebook 01 : Data Loading

Group's members:
- Crnigoj Gabriele 134176
- Ferraro Tommaso 132998
- Stinat Kevin 134905

![kickstarter-logo.jpg](attachment:kickstarter-logo.jpg)

## Tags: 

> business, finance, crowdfunding, accounting and auditing.

## URL:

> Website: [www.kickstarter.com](https://www.kickstarter.com/?lang=it)\
  Kaggle: [www.kaggle.com/kickstarter-projects](https://www.kaggle.com/kemical/kickstarter-projects)

## Context:  
> Kickstarter is an American public benefit corporation based in Brooklyn, New York, that maintains a global crowdfunding platform focused on creativity. The company's stated mission is to "help bring creative projects to life". As of December 2019, Kickstarter has received more than 4.6 billion dollar in pledges from 17.2 million backers to fund 445,000 projects, such as films, music, stage shows, comics, journalism, video games, technology, publishing, and food-related projects.
People who back Kickstarter projects are offered tangible rewards or experiences in exchange for their pledges. This model traces its roots to subscription model of arts patronage, where artists would go directly to their audiences to fund their work.

## Business Model:
> Kickstarter is one of a number of crowdfunding platforms for gathering money from the public, which circumvents traditional avenues of investment.Project creators choose a deadline and a minimum funding goal. If the goal is not met by the deadline, no funds are collected (a kind of assurance contract). The kickstarter platform is open to backers from anywhere in the world and to creators from many countries, including the US, UK, Canada, Australia, New Zealand, The Netherlands, Denmark, Ireland, Norway, Sweden, Spain, France, Germany, Austria, Italy, Belgium, Luxembourg, Switzerland and Mexico. Kickstarter applies a 5\% fee on the total amount of the funds raised. Their payments processor applies an additional 3–5\% fee. Unlike many forums for fundraising or investment, Kickstarter claims no ownership over the projects and the work they produce. The web pages of projects launched on the site are permanently archived and accessible to the public. 
After funding is completed, projects and uploaded media cannot be edited or removed from the site. There is no guarantee that people who post projects on Kickstarter will deliver on their projects, use the money to implement their projects, or that the completed projects will meet backers' expectations. Kickstarter advises backers to use their own judgment on supporting a project. They also warn project leaders that they could be liable for legal damages from backers for failure to deliver on promises. Projects might also fail even after a successful fundraising campaign when creators underestimate the total costs required or technical difficulties to be overcome. Asked what made Kickstarter different from other crowdfunding platforms, co-founder Perry Chen said: "I wonder if people really know what the definition of crowdfunding is. Or, if there’s even an agreed upon definition of what it is. We haven’t actively supported the use of the term because it can provoke more confusion. In our case, we focus on a middle ground between patronage and commerce. People are offering cool stuff and experiences in exchange for the support of their ideas. People are creating these mini-economies around their project ideas. So, you aren’t coming to the site to get something for nothing; you are trying to create value for the people who support you. We focus on creative projects—music, film, technology, art, design, food and publishing—and within the category of crowdfunding of the arts, we are probably ten times the size of all of the others combined."

## Categories:
> Creators categorize their projects into one of 13 categories and 36 subcategories.They are: Art, Comics, Dance, Design, Fashion, Film and Video, Food, Games, Music, Photography, Publishing, Technology and Theater. Of these categories, Film & Video and Music are the largest categories and have raised the most money. These categories, along with Games, account for over half the money raised. Video games and tabletop games alone account for more than 2 dollars out of every 10 dollars spent on Kickstarter.

## Projects: 
> On June 21, 2012, Kickstarter began publishing statistics on its projects. As of December 4, 2019, there were 469,286 launched projects (3,524 in progress), with a success rate of 37.45\% (success rate being how many were successfully funded by reaching their set goal). The total amount pledged was 4,690,286,673 dolars.
On February 9, 2012, Kickstarter hit a number of milestones. A dock made for the iPhone designed by Casey Hopkins became the first Kickstarter project to exceed one million dollars in pledges. A few hours later, a new adventure game project started by computer game developers, Double Fine Productions, reached the same figure, having been launched less than 24 hours earlier, and finished with over 3 million dollars pledged. This was also the first time Kickstarter raised over a million dollars in pledges in a single day.


## Our Goal: 
> From 2012 to 2013, Wharton professor Ethan Mollick and Jeanne Pi conducted research into what contributes to a project's success or failure on Kickstarter. Some key findings from the analysis were that increasing goal size is negatively associated with success, projects that are featured on the Kickstarter homepage have an 89\% chance of being successful, compared to 30\% without, and that for an average 10,000 dollars for project, a 30-day project has a 35\% chance of success, while a 60-day project has a 29\% chance of success, all other things being constant.
The ten largest Kickstarter projects by funds raised are listed below. Among successful projects, most raise between 1,000 dollar and 9,999 dollars . These dollar amounts drop to less than half in the Design, Games, and Technology categories. However, the median amount raised for the latter two categories remains in the four-figure range. There is substantial variation in the success rate of projects falling under different categories. Over two thirds of completed dance projects have been successful. In contrast, fewer than 30\% of completed fashion projects have reached their goal. Most failing projects fail to achieve 20\% of their goals and this trend applies across all categories. Indeed, over 80\% of projects that pass the 20\% mark reach their goal. 

**What is the current situation in Kikcstarter? What are the most successful Projects? In which Categories? There are lots of questions that we can answer. Let's Analyze the DataFrame!**

## Data Import  

First we load the DataFrame from the `.csv` file and we can inspect the data, in particular we have to check the nature (categorical, numerical, etc.) of each column and the range of values (using the `describe()` method) or the unique values (using the `unique()` or `nunique()` method on each categorical data column). Those analysis will be done in the `Notebook 02` and `03` because the first place we need to managed the columns `deadline` and `launched`.

In [1]:
import numpy as np
import pandas as pd
import joblib
import pickle

In [2]:
df_ks = pd.read_csv('2018.csv')
df_ks.head()

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,1000002330,The Songs of Adelaide & Abullah,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,1000003930,Greeting From Earth: ZGAC Arts Capsule For ET,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.0
2,1000004038,Where is Hank?,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.0
3,1000007540,ToshiCapital Rekordz Needs Help to Complete Album,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.0
4,1000011046,Community Film Project: The Art of Neighborhoo...,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,14,US,1283.0,1283.0,19500.0


In [3]:
df_ks.tail()

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
378656,999976400,ChknTruk Nationwide Charity Drive 2014 (Canceled),Documentary,Film & Video,USD,2014-10-17,50000.0,2014-09-17 02:35:30,25.0,canceled,1,US,25.0,25.0,50000.0
378657,999977640,The Tribe,Narrative Film,Film & Video,USD,2011-07-19,1500.0,2011-06-22 03:35:14,155.0,failed,5,US,155.0,155.0,1500.0
378658,999986353,Walls of Remedy- New lesbian Romantic Comedy f...,Narrative Film,Film & Video,USD,2010-08-16,15000.0,2010-07-01 19:40:30,20.0,failed,1,US,20.0,20.0,15000.0
378659,999987933,BioDefense Education Kit,Technology,Technology,USD,2016-02-13,15000.0,2016-01-13 18:13:53,200.0,failed,6,US,200.0,200.0,15000.0
378660,999988282,Nou Renmen Ayiti! We Love Haiti!,Performance Art,Art,USD,2011-08-16,2000.0,2011-07-19 09:07:47,524.0,failed,17,US,524.0,524.0,2000.0


In [4]:
df_ks.shape

(378661, 15)

## Managing and Data Cleaning

We noticed that the columns `deadline` and `launched` contain aggregated data and for better reading we proceed to split them in order to manage in a better way the data. We  use 'tqdm' function to analyze the progress status of the processing of the function. One of the most important problem of those columns were that their format was `object` and not a `datetime`.

We have also calculated the period in which the crowfounding was opened, calculating it as the difference between the launch and the deadline dates. We have combined the dataframes created above with the main dataframe in order to have the complete data and be ready for an effective and efficient analysis

In [5]:
import datetime
from tqdm import tqdm

In [6]:
df_temp_deadline = pd.DataFrame(columns = ['Deadline_Year', 'MDeadline_Month', 'Deadline_Day'])
df_temp_launched = pd.DataFrame(columns = ['Launch_Year', 'Launch_Month', 'Launch_Day'])
df_temp_d = pd.DataFrame(columns = ['crowdfounding_period'])

for k in tqdm(range(len(df_ks))):
    #stripping the date
    date1 = datetime.datetime.strptime(str(df_ks['deadline'][k]), "%Y-%m-%d")
    df_temp_deadline.loc[k] = [date1.year, date1.month, date1.day]
    date2 = datetime.datetime.strptime(str(df_ks['launched'][k]), "%Y-%m-%d %H:%M:%S")
    df_temp_launched.loc[k] = [date2.year, date2.month, date2.day]
    
    #calculating the period in which the crowdfounding was open
    days_open = int((date1 - date2).days)
    df_temp_d.loc[k] = [days_open]

100%|██████████| 378661/378661 [6:56:27<00:00,  8.13it/s]  


In [7]:
df_temp_deadline.head()

Unnamed: 0,Deadline_Year,MDeadline_Month,Deadline_Day
0,2015,10,9
1,2017,11,1
2,2013,2,26
3,2012,4,16
4,2015,8,29


In [8]:
df_temp_launched.head()

Unnamed: 0,Launch_Year,Launch_Month,Launch_Day
0,2015,8,11
1,2017,9,2
2,2013,1,12
3,2012,3,17
4,2015,7,4


In [9]:
df_temp_d.head()

Unnamed: 0,crowdfounding_period
0,58
1,59
2,44
3,29
4,55


In [10]:
df_kickstarter = pd.concat([df_ks, df_temp_launched], axis = 1)
df_kickstarter = pd.concat([df_kickstarter, df_temp_deadline], axis = 1)
df_kickstarter = pd.concat([df_kickstarter, df_temp_d], axis = 1)
df_kickstarter.head()

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,...,usd pledged,usd_pledged_real,usd_goal_real,Launch_Year,Launch_Month,Launch_Day,Deadline_Year,MDeadline_Month,Deadline_Day,crowdfounding_period
0,1000002330,The Songs of Adelaide & Abullah,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,...,0.0,0.0,1533.95,2015,8,11,2015,10,9,58
1,1000003930,Greeting From Earth: ZGAC Arts Capsule For ET,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,...,100.0,2421.0,30000.0,2017,9,2,2017,11,1,59
2,1000004038,Where is Hank?,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,...,220.0,220.0,45000.0,2013,1,12,2013,2,26,44
3,1000007540,ToshiCapital Rekordz Needs Help to Complete Album,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,...,1.0,1.0,5000.0,2012,3,17,2012,4,16,29
4,1000011046,Community Film Project: The Art of Neighborhoo...,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,...,1283.0,1283.0,19500.0,2015,7,4,2015,8,29,55


## Data Dump 

In [11]:
from tempfile import mkdtemp
import os

os.mkdir('Kickstarter_Dataframe')
savedir = 'Kickstarter_Dataframe'

In [12]:
filename = os.path.join(savedir, 'Kikstarter_Backup_File')

with open(filename,'wb') as r:
    joblib.dump(df_kickstarter , r)

## Conclusion of Notebook 01

>One of the main problems of our dataframe is its size and we have decided to use the function dumps in order to have the possibility to manage it better.
We proceeded by creating the folder in which we want to save the compressed database file; we are forced to use this type of approach because the computation times of our database could not make us work efficiently.
In order to save the file we have to use Joblib library to save it into a directory called `'Kickstarter_Dataframe'` and we create a file called `Kikstarter_Backup_File` to save our dataframe.
We use "wb" because it is a binary file. If we open the file, we can't read it bacause it has a different way of reading than a normal file in fact for this reason, we have to load it with joblib.
Then we can open a new Notebook and test if the dump of the binary file went fine, and we can keep working on our project.
We adopted this way of operating for all the next Notebooks becausee it's more efficient and save us lots of time and we can parallelize our work.