In [3]:
"""
Kickstarter is a crowdfunding platform with a community of more than 10 million people comprising of creative, tech enthusiasts 
who help in bringing new projects to life.

Until now, more than $3 billion dollars have been contributed by the members in fueling creative projects. The projects can be 
literally anything – a device, a game, an app, a film etc.

Kickstarter works on all or nothing basis: a campaign is launched with a certain amount they want to raise, if it doesn’t meet 
its goal, the project owner gets nothing. For example: if a projects’s goal is $5000. Even if it gets funded till $4999, the 
project won’t be a success.

If you have a project that you would like to post on Kickstarter now, can you predict whether it will be successfully funded or
not? Looking into the dataset, what useful information can you extract from it, which variables are informative for your 
prediction and can you interpret the model?

The goal of this project is to build a classifier to predict whether a project will be successfully funded or not. You can 
use the algorithm of your choice.

Notes on the dataset:

The target, state corresponds to a binary outcome: 0 for failed, 1 for successful.
The variables 'deadline'', 'created_at', 'launched_at' are stored in Unix time format.

Get the data
The data provided consists of X_train.csv and y_train.csv that contain the data you have available for training.

In a notebook, simply run:

import pandas as pd

X_train = pd.read_csv("data/X_train.csv")
y_train = pd.read_csv("data/y_train.csv")
to load the dataset.

We also provide a X_test.csv that you can use to make sure your model generates predictions properly.

Start working
You will need to implement the function build_model in model.py. This should return a model as a scikit-learn Pipeline 
object where the first stage is a transformer called preprocessor and the second stage a predictive model called model. 
You can then use the run.py to train your model and save its state to a file.

python run.py train
Will train your model on the training set and save it as a pickle file.

python run.py test
Can be used locally to generate some dummy predictions on X_test and make sure your model works.
"""

'\nKickstarter is a crowdfunding platform with a community of more than 10 million people comprising of creative, tech enthusiasts \nwho help in bringing new projects to life.\n\nUntil now, more than $3 billion dollars have been contributed by the members in fueling creative projects. The projects can be \nliterally anything – a device, a game, an app, a film etc.\n\nKickstarter works on all or nothing basis: a campaign is launched with a certain amount they want to raise, if it doesn’t meet \nits goal, the project owner gets nothing. For example: if a projects’s goal is $5000. Even if it gets funded till $4999, the \nproject won’t be a success.\n\nIf you have a project that you would like to post on Kickstarter now, can you predict whether it will be successfully funded or\nnot? Looking into the dataset, what useful information can you extract from it, which variables are informative for your \nprediction and can you interpret the model?\n\nThe goal of this project is to build a class

In [4]:
# Baseline Model
# Here is an example of a submission, building a simple logistic regression with only two features: goal_usd (adjusted goal)
# and usa (whether the campaign happened in the US)

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline


In [5]:
class Preprocessor(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        is_usa = X["country"] == "US"
        goal_usd = X["goal"] * X["static_usd_rate"]

        return pd.DataFrame({"is_usa": is_usa, "goal_usd": goal_usd})


In [6]:
def build_model():

    preprocessor = Preprocessor()

    model = DecisionTreeClassifier(max_depth=3)
    return Pipeline([("preprocessor", preprocessor), ("model", model)])