## Kickstarter Success Prediction

Given *data about various Kickstarter campaigns*, let's try to classify whether a given campaign will be **successful** or not.

We will use a TensorFlow ANN to make our predictions.

Data source: https://www.kaggle.com/datasets/kemical/kickstarter-projects

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.utils import class_weight

import tensorflow as tf

2025-06-10 19:19:13.215745: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
data = pd.read_csv('ks-projects-201801.csv')
data

Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,1000002330,The Songs of Adelaide & Abullah,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,1000003930,Greeting From Earth: ZGAC Arts Capsule For ET,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.00
2,1000004038,Where is Hank?,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.00
3,1000007540,ToshiCapital Rekordz Needs Help to Complete Album,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.00
4,1000011046,Community Film Project: The Art of Neighborhoo...,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,14,US,1283.0,1283.0,19500.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
378656,999976400,ChknTruk Nationwide Charity Drive 2014 (Canceled),Documentary,Film & Video,USD,2014-10-17,50000.0,2014-09-17 02:35:30,25.0,canceled,1,US,25.0,25.0,50000.00
378657,999977640,The Tribe,Narrative Film,Film & Video,USD,2011-07-19,1500.0,2011-06-22 03:35:14,155.0,failed,5,US,155.0,155.0,1500.00
378658,999986353,Walls of Remedy- New lesbian Romantic Comedy f...,Narrative Film,Film & Video,USD,2010-08-16,15000.0,2010-07-01 19:40:30,20.0,failed,1,US,20.0,20.0,15000.00
378659,999987933,BioDefense Education Kit,Technology,Technology,USD,2016-02-13,15000.0,2016-01-13 18:13:53,200.0,failed,6,US,200.0,200.0,15000.00


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 378661 entries, 0 to 378660
Data columns (total 15 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   ID                378661 non-null  int64  
 1   name              378657 non-null  object 
 2   category          378661 non-null  object 
 3   main_category     378661 non-null  object 
 4   currency          378661 non-null  object 
 5   deadline          378661 non-null  object 
 6   goal              378661 non-null  float64
 7   launched          378661 non-null  object 
 8   pledged           378661 non-null  float64
 9   state             378661 non-null  object 
 10  backers           378661 non-null  int64  
 11  country           378661 non-null  object 
 12  usd pledged       374864 non-null  float64
 13  usd_pledged_real  378661 non-null  float64
 14  usd_goal_real     378661 non-null  float64
dtypes: float64(5), int64(2), object(8)
memory usage: 43.3+ MB


### Cleaning and Preprocessing

In [4]:
unneeded_columns = ['ID', 'name']

data = data.drop(unneeded_columns, axis=1)

In [5]:
data

Unnamed: 0,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.00
2,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.00
3,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.00
4,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,14,US,1283.0,1283.0,19500.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
378656,Documentary,Film & Video,USD,2014-10-17,50000.0,2014-09-17 02:35:30,25.0,canceled,1,US,25.0,25.0,50000.00
378657,Narrative Film,Film & Video,USD,2011-07-19,1500.0,2011-06-22 03:35:14,155.0,failed,5,US,155.0,155.0,1500.00
378658,Narrative Film,Film & Video,USD,2010-08-16,15000.0,2010-07-01 19:40:30,20.0,failed,1,US,20.0,20.0,15000.00
378659,Technology,Technology,USD,2016-02-13,15000.0,2016-01-13 18:13:53,200.0,failed,6,US,200.0,200.0,15000.00


In [6]:
data.isna().sum()

category               0
main_category          0
currency               0
deadline               0
goal                   0
launched               0
pledged                0
state                  0
backers                0
country                0
usd pledged         3797
usd_pledged_real       0
usd_goal_real          0
dtype: int64

In [8]:
data['usd pledged'] = data['usd pledged'].fillna(data['usd pledged'].mean())

In [9]:
data.isna().sum().sum()

0

In [10]:
data

Unnamed: 0,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.00
2,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.00
3,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.00
4,Film & Video,Film & Video,USD,2015-08-29,19500.0,2015-07-04 08:35:03,1283.0,canceled,14,US,1283.0,1283.0,19500.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
378656,Documentary,Film & Video,USD,2014-10-17,50000.0,2014-09-17 02:35:30,25.0,canceled,1,US,25.0,25.0,50000.00
378657,Narrative Film,Film & Video,USD,2011-07-19,1500.0,2011-06-22 03:35:14,155.0,failed,5,US,155.0,155.0,1500.00
378658,Narrative Film,Film & Video,USD,2010-08-16,15000.0,2010-07-01 19:40:30,20.0,failed,1,US,20.0,20.0,15000.00
378659,Technology,Technology,USD,2016-02-13,15000.0,2016-01-13 18:13:53,200.0,failed,6,US,200.0,200.0,15000.00


In [11]:
data['state'].unique()

array(['failed', 'canceled', 'successful', 'live', 'undefined',
       'suspended'], dtype=object)

In [14]:
data = data.drop(data.query("state != 'failed' and state != 'successful'").index, axis=0).reset_index(drop=True)
data

Unnamed: 0,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real
0,Poetry,Publishing,GBP,2015-10-09,1000.0,2015-08-11 12:12:28,0.0,failed,0,GB,0.0,0.0,1533.95
1,Narrative Film,Film & Video,USD,2017-11-01,30000.0,2017-09-02 04:43:57,2421.0,failed,15,US,100.0,2421.0,30000.00
2,Narrative Film,Film & Video,USD,2013-02-26,45000.0,2013-01-12 00:20:50,220.0,failed,3,US,220.0,220.0,45000.00
3,Music,Music,USD,2012-04-16,5000.0,2012-03-17 03:24:11,1.0,failed,1,US,1.0,1.0,5000.00
4,Restaurants,Food,USD,2016-04-01,50000.0,2016-02-26 13:38:27,52375.0,successful,224,US,52375.0,52375.0,50000.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
331670,Small Batch,Food,USD,2017-04-19,6500.0,2017-03-20 22:08:22,154.0,failed,4,US,0.0,154.0,6500.00
331671,Narrative Film,Film & Video,USD,2011-07-19,1500.0,2011-06-22 03:35:14,155.0,failed,5,US,155.0,155.0,1500.00
331672,Narrative Film,Film & Video,USD,2010-08-16,15000.0,2010-07-01 19:40:30,20.0,failed,1,US,20.0,20.0,15000.00
331673,Technology,Technology,USD,2016-02-13,15000.0,2016-01-13 18:13:53,200.0,failed,6,US,200.0,200.0,15000.00


In [15]:
data['state'].unique()

array(['failed', 'successful'], dtype=object)

#### Feature Engineering and Encoding

In [16]:
data['deadline_year'] = data['deadline'].apply(lambda x: float(x[0:4]))
data['deadline_month'] = data['deadline'].apply(lambda x: float(x[5:7]))

data['launched_year'] = data['launched'].apply(lambda x: float(x[0:4]))
data['launched_month'] = data['launched'].apply(lambda x: float(x[5:7]))

data = data.drop(['deadline', 'launched'], axis=1)

In [17]:
data

Unnamed: 0,category,main_category,currency,goal,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real,deadline_year,deadline_month,launched_year,launched_month
0,Poetry,Publishing,GBP,1000.0,0.0,failed,0,GB,0.0,0.0,1533.95,2015.0,10.0,2015.0,8.0
1,Narrative Film,Film & Video,USD,30000.0,2421.0,failed,15,US,100.0,2421.0,30000.00,2017.0,11.0,2017.0,9.0
2,Narrative Film,Film & Video,USD,45000.0,220.0,failed,3,US,220.0,220.0,45000.00,2013.0,2.0,2013.0,1.0
3,Music,Music,USD,5000.0,1.0,failed,1,US,1.0,1.0,5000.00,2012.0,4.0,2012.0,3.0
4,Restaurants,Food,USD,50000.0,52375.0,successful,224,US,52375.0,52375.0,50000.00,2016.0,4.0,2016.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
331670,Small Batch,Food,USD,6500.0,154.0,failed,4,US,0.0,154.0,6500.00,2017.0,4.0,2017.0,3.0
331671,Narrative Film,Film & Video,USD,1500.0,155.0,failed,5,US,155.0,155.0,1500.00,2011.0,7.0,2011.0,6.0
331672,Narrative Film,Film & Video,USD,15000.0,20.0,failed,1,US,20.0,20.0,15000.00,2010.0,8.0,2010.0,7.0
331673,Technology,Technology,USD,15000.0,200.0,failed,6,US,200.0,200.0,15000.00,2016.0,2.0,2016.0,1.0


In [18]:
data['state'] = data['state'].apply(lambda x: 1 if x == 'successful' else 0)

In [19]:
data

Unnamed: 0,category,main_category,currency,goal,pledged,state,backers,country,usd pledged,usd_pledged_real,usd_goal_real,deadline_year,deadline_month,launched_year,launched_month
0,Poetry,Publishing,GBP,1000.0,0.0,0,0,GB,0.0,0.0,1533.95,2015.0,10.0,2015.0,8.0
1,Narrative Film,Film & Video,USD,30000.0,2421.0,0,15,US,100.0,2421.0,30000.00,2017.0,11.0,2017.0,9.0
2,Narrative Film,Film & Video,USD,45000.0,220.0,0,3,US,220.0,220.0,45000.00,2013.0,2.0,2013.0,1.0
3,Music,Music,USD,5000.0,1.0,0,1,US,1.0,1.0,5000.00,2012.0,4.0,2012.0,3.0
4,Restaurants,Food,USD,50000.0,52375.0,1,224,US,52375.0,52375.0,50000.00,2016.0,4.0,2016.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
331670,Small Batch,Food,USD,6500.0,154.0,0,4,US,0.0,154.0,6500.00,2017.0,4.0,2017.0,3.0
331671,Narrative Film,Film & Video,USD,1500.0,155.0,0,5,US,155.0,155.0,1500.00,2011.0,7.0,2011.0,6.0
331672,Narrative Film,Film & Video,USD,15000.0,20.0,0,1,US,20.0,20.0,15000.00,2010.0,8.0,2010.0,7.0
331673,Technology,Technology,USD,15000.0,200.0,0,6,US,200.0,200.0,15000.00,2016.0,2.0,2016.0,1.0


In [20]:
{column: list(data[column].unique()) for column in data.columns if data.dtypes[column] == 'object'}

{'category': ['Poetry',
  'Narrative Film',
  'Music',
  'Restaurants',
  'Food',
  'Drinks',
  'Nonfiction',
  'Indie Rock',
  'Crafts',
  'Games',
  'Tabletop Games',
  'Design',
  'Comic Books',
  'Art Books',
  'Fashion',
  'Childrenswear',
  'Theater',
  'Comics',
  'DIY',
  'Webseries',
  'Animation',
  'Food Trucks',
  'Product Design',
  'Public Art',
  'Documentary',
  'Illustration',
  'Photography',
  'Pop',
  'People',
  'Art',
  'Family',
  'Fiction',
  'Film & Video',
  'Accessories',
  'Rock',
  'Hardware',
  'Software',
  'Weaving',
  'Web',
  'Jazz',
  'Ready-to-wear',
  'Festivals',
  'Video Games',
  'Anthologies',
  'Publishing',
  'Shorts',
  'Gadgets',
  'Electronic Music',
  'Radio & Podcasts',
  'Cookbooks',
  'Apparel',
  'Metal',
  'Comedy',
  'Hip-Hop',
  'Periodicals',
  'Dance',
  'Technology',
  'Painting',
  'World Music',
  'Photobooks',
  'Drama',
  'Architecture',
  'Young Adult',
  'Latin',
  'Mobile Games',
  'Flight',
  'Fine Art',
  'Action',
  'Pl

In [21]:
def onehot_encode(df, columns, prefixes):
    df = df.copy()
    for column, prefix in zip(columns, prefixes):
        dummies = pd.get_dummies(df[column], prefix=prefix)
        df = pd.concat([df, dummies], axis=1)
        df = df.drop(column, axis=1)
    return df

In [22]:
data = onehot_encode(
    data,
    ['category', 'main_category', 'currency', 'country'],
    ['cat', 'main_cat', 'curr', 'country']
)

In [23]:
data

Unnamed: 0,goal,pledged,state,backers,usd pledged,usd_pledged_real,usd_goal_real,deadline_year,deadline_month,launched_year,...,country_JP,country_LU,country_MX,"country_N,0""",country_NL,country_NO,country_NZ,country_SE,country_SG,country_US
0,1000.0,0.0,0,0,0.0,0.0,1533.95,2015.0,10.0,2015.0,...,False,False,False,False,False,False,False,False,False,False
1,30000.0,2421.0,0,15,100.0,2421.0,30000.00,2017.0,11.0,2017.0,...,False,False,False,False,False,False,False,False,False,True
2,45000.0,220.0,0,3,220.0,220.0,45000.00,2013.0,2.0,2013.0,...,False,False,False,False,False,False,False,False,False,True
3,5000.0,1.0,0,1,1.0,1.0,5000.00,2012.0,4.0,2012.0,...,False,False,False,False,False,False,False,False,False,True
4,50000.0,52375.0,1,224,52375.0,52375.0,50000.00,2016.0,4.0,2016.0,...,False,False,False,False,False,False,False,False,False,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
331670,6500.0,154.0,0,4,0.0,154.0,6500.00,2017.0,4.0,2017.0,...,False,False,False,False,False,False,False,False,False,True
331671,1500.0,155.0,0,5,155.0,155.0,1500.00,2011.0,7.0,2011.0,...,False,False,False,False,False,False,False,False,False,True
331672,15000.0,20.0,0,1,20.0,20.0,15000.00,2010.0,8.0,2010.0,...,False,False,False,False,False,False,False,False,False,True
331673,15000.0,200.0,0,6,200.0,200.0,15000.00,2016.0,2.0,2016.0,...,False,False,False,False,False,False,False,False,False,True


#### Splitting and Scaling

In [24]:
y = data.loc[:, 'state']
X = data.drop('state', axis=1)

In [25]:
scaler = StandardScaler()

X = scaler.fit_transform(X)

In [26]:
X

array([[-3.86895001e-02, -1.04181899e-01, -1.20549482e-01, ...,
        -6.76049422e-02, -3.70227773e-02, -1.92794913e+00],
       [-1.27483511e-02, -8.03511832e-02, -1.05012296e-01, ...,
        -6.76049422e-02, -3.70227773e-02,  5.18685885e-01],
       [ 6.69484645e-04, -1.02016365e-01, -1.17442045e-01, ...,
        -6.76049422e-02, -3.70227773e-02,  5.18685885e-01],
       ...,
       [-2.61661868e-02, -1.03985032e-01, -1.19513669e-01, ...,
        -6.76049422e-02, -3.70227773e-02,  5.18685885e-01],
       [-2.61661868e-02, -1.02213231e-01, -1.14334608e-01, ...,
        -6.76049422e-02, -3.70227773e-02,  5.18685885e-01],
       [-3.77949777e-02, -9.90239908e-02, -1.02940671e-01, ...,
        -6.76049422e-02, -3.70227773e-02,  5.18685885e-01]])

In [27]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=34)

### Modeling and Training

In [28]:
X.shape

(331675, 221)

In [29]:
y.mean()

0.4038772895153388

In [34]:
class_weight.compute_class_weight?

[0;31mSignature:[0m [0mclass_weight[0m[0;34m.[0m[0mcompute_class_weight[0m[0;34m([0m[0mclass_weight[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0mclasses[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Estimate class weights for unbalanced datasets.

Parameters
----------
class_weight : dict, "balanced" or None
    If "balanced", class weights will be given by
    `n_samples / (n_classes * np.bincount(y))`.
    If a dictionary is given, keys are classes and values are corresponding class
    weights.
    If `None` is given, the class weights will be uniform.

classes : ndarray
    Array of the classes occurring in the data, as given by
    `np.unique(y_org)` with `y_org` the original class labels.

y : array-like of shape (n_samples,)
    Array of original class labels per sample.

Returns
-------
class_weight_vect : ndarray of shape (n_classes,)
    Array with `class_weight_vect[i]` the weight for i-th class.

References
----------
The "

In [35]:
class_weights = class_weight.compute_class_weight(
    class_weight = 'balanced',
    classes = y_train.unique(),
    y = y_train
)

class_weights = dict(enumerate(class_weights))
class_weights

{0: 0.8394874242489985, 1: 1.236404302907658}

In [37]:
inputs = tf.keras.Input(shape=(221, ))
x = tf.keras.layers.Dense(64, activation='relu')(inputs)
x = tf.keras.layers.Dense(64, activation='relu')(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.Model(inputs, outputs)

model.compile(
    optimizer='adam',
    loss = 'binary_crossentropy',
    metrics = [
        'accuracy',
        tf.keras.metrics.AUC(name='auc')
    ]
)

batch_size = 64
epochs = 100

history = model.fit(
    X_train,
    y_train,
    validation_split = 0.2,
    class_weight = class_weights,
    batch_size = batch_size,
    epochs = epochs,
    callbacks = [
        tf.keras.callbacks.EarlyStopping(
            monitor = 'val_loss',
            patience = 3,
            restore_best_weights = True
        )
    ]
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100


### Results

In [38]:
model.evaluate(X_test, y_test)



[0.16150054335594177, 0.9346954226493835, 0.9839630722999573]