# Team 3 - Kickstarter

![](https://a.kickstarter.com/assets/site/social/og-kickstarter-social-d58bfe030adf82001e25d3f7015eedb8ab84bc4bf9eeeeede5f8d8b0d02d641a.png)

_For more information about the dataset, read [here](https://www.kaggle.com/kemical/kickstarter-projects)._

## Your tasks
- Name your team!
- Read the source and do some quick research to understand more about the dataset and its topic
- Clean the data
- Perform Exploratory Data Analysis on the dataset
- Analyze the data more deeply and extract insights
- Visualize your analysis on Google Data Studio
- Present your works in front of the class and guests next Monday

## Submission Guide
- Create a Github repository for your project
- Upload the dataset (.csv file) and the Jupyter Notebook to your Github repository. In the Jupyter Notebook, **include the link to your Google Data Studio report**.
- Submit your works through this [Google Form](https://forms.gle/oxtXpGfS8JapVj3V8).

## Tips for Data Cleaning, Manipulation & Visualization
- Here are some of our tips for Data Cleaning, Manipulation & Visualization. [Click here](https://hackmd.io/cBNV7E6TT2WMliQC-GTw1A)

_____________________________

## Some Hints for This Dataset:
- The format of `launched` column is not consistent with `deadline` column
- Can you calculate the duration of the projects?
- Some projects in `launched` and `deadline` columns have year = 1970, which is way before Kickstarter was born!
- And more...


In [0]:
# Start your codes here!
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
from google.colab import drive 
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


In [5]:
df = pd.read_csv('/content/gdrive/My Drive/03-kickstarter/kickstarter.csv')
df.sample(5)

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
335220,777703681,eSports Cafe and lounge (Canceled),Restaurants,Food,USD,2015-05-02,30000.0,2015-03-03 00:05:06,0.0,canceled,0,US,0.0,0.0,30000.0
287917,534903130,The Best Of All Possible Worlds - James Lovelo...,Documentary,Film & Video,GBP,2013-05-02,20000.0,2013-04-22 19:23:11,136.0,canceled,5,GB,207.14,211.54,31109.04
15311,1077327402,One Wedding for Each Country. Contrasting cult...,People,Photography,EUR,2016-04-14,3900.0,2016-02-29 16:57:46,145.0,failed,4,ES,158.52,163.62,4400.76
78403,1398806367,The Gender Series,Webseries,Film & Video,USD,2016-07-01,1250.0,2016-06-01 14:18:36,321.0,failed,11,US,321.0,321.0,1250.0
88550,1450157092,"BOOST - Ultimate Solution for Your MacBook 12""",Product Design,Design,USD,2017-03-06,15000.0,2017-01-10 16:09:25,25867.0,successful,188,US,9224.0,25867.0,15000.0


In [6]:
# Print out brief info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 378661 entries, 0 to 378660
Data columns (total 15 columns):
ID                  378661 non-null int64
name                378657 non-null object
category            378661 non-null object
main_category       378661 non-null object
currency            378661 non-null object
deadline            378661 non-null object
goal                378661 non-null float64
launched            378661 non-null object
pledged             378661 non-null float64
state               378661 non-null object
backers             378661 non-null int64
country             378661 non-null object
usd pledged         374864 non-null float64
usd_pledged_real    378661 non-null float64
usd_goal_real       378661 non-null float64
dtypes: float64(5), int64(2), object(8)
memory usage: 43.3+ MB


In [0]:
df["launched"] = df.apply(lambda row: row["launched"].split(' ')[0], axis = 1)

In [9]:
df.sample(5)

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
287912,534886879,Fundraiser For New Gaming Channel/Vlog Channel,Film & Video,Film & Video,AUD,2015-05-03,4000.0,2015-03-04,0.0,undefined,0,"N,0""",,0.0,3165.06
263675,410878379,Oozeq makes Hollow Polymer Clay easy - bake in...,Crafts,Crafts,USD,2014-08-17,25000.0,2014-08-07,786.0,failed,15,US,786.0,786.0,25000.0
21179,1107367686,High Flight Apparel,Apparel,Fashion,USD,2015-03-14,25000.0,2015-02-12,5.0,failed,1,US,5.0,5.0,25000.0
251344,348425271,WICHITA - teaser / trailer,Shorts,Film & Video,USD,2011-12-13,6000.0,2011-11-02,8000.0,successful,101,US,8000.0,8000.0,6000.0
119055,1604793462,Striker Ti Fire Survival Paracord Carabiner / ...,Product Design,Design,USD,2015-09-13,1111.0,2015-08-11,2561.0,successful,98,US,2561.0,2561.0,1111.0


In [0]:
df["launchedYear"] = df.apply(lambda row: int(row["launched"].split('-')[0]), axis = 1)
df["deadlineYear"] = df.apply(lambda row: int(row["deadline"].split('-')[0]), axis = 1)

In [11]:
df.sample(5)

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real,launchedYear,deadlineYear
217252,2106803541,Apocalypse Over Pancakes,Shorts,Film & Video,USD,2012-04-19,17000.0,2012-03-20,21686.66,successful,191,US,21686.66,21686.66,17000.0,2012,2012
337479,789059918,A new album from The Great Unknowns,Music,Music,USD,2011-04-21,4000.0,2011-03-11,8612.0,successful,149,US,8612.0,8612.0,4000.0,2011,2011
340034,801959608,"The Censored Game - Big, Hard and Funny-Looking",Tabletop Games,Games,USD,2016-05-19,18500.0,2016-04-19,14026.0,failed,118,US,14026.0,14026.0,18500.0,2016,2016
110101,1559453145,Millennium Sapphire USA Tour,Sculpture,Art,USD,2016-06-20,500000.0,2016-05-21,11.0,failed,2,US,11.0,11.0,500000.0,2016,2016
282764,508746115,The Mob CD,Rock,Music,USD,2015-12-10,6500.0,2015-11-10,0.0,failed,0,US,0.0,0.0,6500.0,2015,2015


In [13]:
df[df["launchedYear"]==1970]

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real,launchedYear,deadlineYear
2842,1014746686,Salt of the Earth: A Dead Sea Movie (Canceled),Film & Video,Film & Video,USD,2010-09-15,5000.0,1970-01-01,0.0,canceled,0,US,0.0,0.0,5000.0,1970,2010
48147,1245461087,1st Super-Size Painting - Social Network Owned...,Art,Art,USD,2010-08-14,15000.0,1970-01-01,0.0,canceled,0,US,0.0,0.0,15000.0,1970,2010
75397,1384087152,"""ICHOR"" (Canceled)",Film & Video,Film & Video,USD,2010-05-21,700.0,1970-01-01,0.0,canceled,0,US,0.0,0.0,700.0,1970,2010
94579,1480763647,"Support Solo Theater! Help ""Ungrateful Daughte...",Theater,Theater,USD,2010-06-01,4000.0,1970-01-01,0.0,canceled,0,US,0.0,0.0,4000.0,1970,2010
247913,330942060,"Help RIZ Make A Charity Album: 8 Songs, 8 Caus...",Music,Music,USD,2010-05-04,10000.0,1970-01-01,0.0,canceled,0,US,0.0,0.0,10000.0,1970,2010
273779,462917959,Identity Communications Infographic (Canceled),Design,Design,USD,2010-04-10,500.0,1970-01-01,0.0,canceled,0,US,0.0,0.0,500.0,1970,2010
319002,69489148,Student Auditions Music 2015,Publishing,Publishing,CHF,2015-10-31,1900.0,1970-01-01,0.0,suspended,0,CH,0.0,0.0,1905.97,1970,2015


In [0]:
df = df[df["launchedYear"]!=1970]

In [19]:
# from datetime import date
# date_string = df["deadline"][0]
# temp = date(*map(int, df["deadline"][0].split('-'))) - date(*map(int, df["launched"][0].split('-')))


KeyError: ignored