## Kickstarter Project (Machine Learning Project) - October 2023

<a id='about'/>

### About this file

Kickstarter is a popular crowdfunding platform that has helped thousands of entrepreneurs and creators bring their innovative ideas to life. However, not all Kickstarter projects are successful, and understanding the factors that contribute to success or failure can be valuable for both creators and investors alike.

In this dataset, we have collected information on a large number of Kickstarter projects and whether they ultimately succeeded or failed to meet their funding goals. This dataset includes a wide range of project types, including technology startups, creative arts endeavors, and social impact initiatives, among others.

By analyzing this dataset, researchers and analysts can gain insights into the characteristics of successful and unsuccessful Kickstarter projects, such as funding targets, project categories, and funding sources. This information can be used to inform investment decisions and guide future crowdfunding campaigns.

Overall, this dataset provides a comprehensive look at the Kickstarter ecosystem and can serve as a valuable resource for anyone interested in understanding the dynamics of crowdfunding and the factors that contribute to project success or failure.

### TOC (table of content)
0. [About this file](#about)
1. [Load packages](#loading_packages)
2. [Load data](#data_loading)
3. [EDA technical](#EDA_technical)
4. [EDA information](#EDA_info)
5. [Feature engineering](#feature_engineering)
6. [Dummy Classifier](#dummy)
7. [Pipeline + ColumnTransformer](#pipeline)
8. [Evaluation](#evaluation)
9. [Additional Links](#links)
---
10. [Cross-validation](#crossval)
11. [GridSearchCV](#gridsearch)
12. [Set Kaggle solution](#kaggle)

In [1]:
import pandas as pd

In [2]:
kickstarter = pd.read_csv('data/kickstarter_projects.csv')

In [3]:
kickstarter.head()

Unnamed: 0,ID,Name,Category,Subcategory,Country,Launched,Deadline,Goal,Pledged,Backers,State
0,1860890148,Grace Jones Does Not Give A F$#% T-Shirt (limi...,Fashion,Fashion,United States,2009-04-21 21:02:48,2009-05-31,1000,625,30,Failed
1,709707365,CRYSTAL ANTLERS UNTITLED MOVIE,Film & Video,Shorts,United States,2009-04-23 00:07:53,2009-07-20,80000,22,3,Failed
2,1703704063,drawing for dollars,Art,Illustration,United States,2009-04-24 21:52:03,2009-05-03,20,35,3,Successful
3,727286,Offline Wikipedia iPhone app,Technology,Software,United States,2009-04-25 17:36:21,2009-07-14,99,145,25,Successful
4,1622952265,Pantshirts,Fashion,Fashion,United States,2009-04-27 14:10:39,2009-05-26,1900,387,10,Failed


In [4]:
kickstarter.shape

(374853, 11)

In [5]:
kickstarter.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374853 entries, 0 to 374852
Data columns (total 11 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   ID           374853 non-null  int64 
 1   Name         374853 non-null  object
 2   Category     374853 non-null  object
 3   Subcategory  374853 non-null  object
 4   Country      374853 non-null  object
 5   Launched     374853 non-null  object
 6   Deadline     374853 non-null  object
 7   Goal         374853 non-null  int64 
 8   Pledged      374853 non-null  int64 
 9   Backers      374853 non-null  int64 
 10  State        374853 non-null  object
dtypes: int64(4), object(7)
memory usage: 31.5+ MB


In [6]:
kickstarter.describe(include='all')

Unnamed: 0,ID,Name,Category,Subcategory,Country,Launched,Deadline,Goal,Pledged,Backers,State
count,374853.0,374853,374853,374853,374853,374853,374853,374853.0,374853.0,374853.0,374853
unique,,372061,15,159,22,374297,3164,,,,5
top,,New EP/Music Development,Film & Video,Product Design,United States,2014-06-06 16:16:32,2014-08-08,,,,Failed
freq,,13,62694,22310,292618,2,702,,,,197611
mean,1074656000.0,,,,,,,45863.78,9121.073,106.690359,
std,619137700.0,,,,,,,1158778.0,91320.54,911.71852,
min,5971.0,,,,,,,0.0,0.0,0.0,
25%,538072800.0,,,,,,,2000.0,31.0,2.0,
50%,1075300000.0,,,,,,,5500.0,625.0,12.0,
75%,1610149000.0,,,,,,,16000.0,4051.0,57.0,


In [7]:
kickstarter['State'].unique()

array(['Failed', 'Successful', 'Canceled', 'Suspended', 'Live'],
      dtype=object)

In [8]:
kickstarter.columns

Index(['ID', 'Name', 'Category', 'Subcategory', 'Country', 'Launched',
       'Deadline', 'Goal', 'Pledged', 'Backers', 'State'],
      dtype='object')

#### Columns:

- 'ID':             ID
- 'Name':           Name
- 'Category':       Category
- 'Subcategory':    Subcategory
- 'Country':        Country of product origin
- 'Launched':       Date the project was launched
- 'Deadline':       Deadline for crowdfunding
- 'Goal':           Amount of money the creater needs to complete the project (USD)
- 'Pledged':        Amount of money pledged to by the crowd (USD)
- 'Backers':        Number of backers
- 'State':          Current condition the project is in (as of 2018-01-02) ('Failed', 'Successful', 'Canceled', 'Suspended', 'Live')