# Project Title: Kickstarter Projects

- Author: Osmayda Nino

## Project Overview

Kickstarter is a popular crowdfunding platform that has helped thousands of entrepreneurs and creators bring their innovative ideas to life. However, not all Kickstarter projects are successful, and understanding the factors that contribute to success or failure can be valuable for both creators and investors alike.

Data was collected on a large number of Kickstarter projects and whether they ultimately succeeded or failed to meet their funding goals. This dataset includes a wide range of project types, including technology startups, creative arts endeavors, and social impact initiatives, among others.

By analyzing this dataset, researchers and analysts can gain insights into the characteristics of successful and unsuccessful Kickstarter projects, such as funding targets, project categories, and funding sources. This information can be used to inform investment decisions and guide future crowdfunding campaigns.

Overall, this dataset provides a comprehensive look at the Kickstarter ecosystem and can serve as a valuable resource for anyone interested in understanding the dynamics of crowdfunding and the factors that contribute to project success or failure.

# Import Libraries 

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
price_fmt = StrMethodFormatter("${x:,.0f}")

import seaborn as sns
sns.set_style('white')

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn import set_config

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

from sklearn.dummy import DummyRegressor
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import plot_tree

set_config(display='diagram')

from IPython.core.display import clear_output

## Filter ALL warnings
import warnings
warnings.filterwarnings('ignore')

  from IPython.core.display import clear_output


# Load Data

In [2]:
df = pd.read_csv("Data/kickstarter_projects.csv") 

# Summary of the DataFrame's Columns and preview the first row of data 
df.info()
df.head(3)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374853 entries, 0 to 374852
Data columns (total 11 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   ID           374853 non-null  int64 
 1   Name         374853 non-null  object
 2   Category     374853 non-null  object
 3   Subcategory  374853 non-null  object
 4   Country      374853 non-null  object
 5   Launched     374853 non-null  object
 6   Deadline     374853 non-null  object
 7   Goal         374853 non-null  int64 
 8   Pledged      374853 non-null  int64 
 9   Backers      374853 non-null  int64 
 10  State        374853 non-null  object
dtypes: int64(4), object(7)
memory usage: 31.5+ MB


Unnamed: 0,ID,Name,Category,Subcategory,Country,Launched,Deadline,Goal,Pledged,Backers,State
0,1860890148,Grace Jones Does Not Give A F$#% T-Shirt (limi...,Fashion,Fashion,United States,2009-04-21 21:02:48,2009-05-31,1000,625,30,Failed
1,709707365,CRYSTAL ANTLERS UNTITLED MOVIE,Film & Video,Shorts,United States,2009-04-23 00:07:53,2009-07-20,80000,22,3,Failed
2,1703704063,drawing for dollars,Art,Illustration,United States,2009-04-24 21:52:03,2009-05-03,20,35,3,Successful


# Clean Data

In [4]:
# The number  of rows and columns of the DataFrame
df.shape
print(f'There are {df.shape[0]} rows and {df.shape[1]} columns.')
print(f'The rows represent {df.shape[0]} observations and the columns represent {df.shape[1]-1} features and 1 target variable.')

There are 374853 rows and 11 columns.
The rows represent 374853 observations and the columns represent 10 features and 1 target variable.


- The dataset has a combination of categorical(object) and numberic (int) datatypes. 

## Duplicates

In [5]:
# Check data for duplicates
df.duplicated().sum()

0

- No duplicates found to be dropped.

## Missing Values

In [6]:
# Identifing missing values
df.isna().sum()

ID             0
Name           0
Category       0
Subcategory    0
Country        0
Launched       0
Deadline       0
Goal           0
Pledged        0
Backers        0
State          0
dtype: int64

- There are no missing values. 

## Inspect Columns with Object Datatypes

- Check for common syntax errors which may include extra white spaces at the beginning or end of strings or column names.

- Check for typos or inconsistencies in strings that need to be fixed.

In [7]:
# Inspect column names for Errors
df.columns

Index(['ID', 'Name', 'Category', 'Subcategory', 'Country', 'Launched',
       'Deadline', 'Goal', 'Pledged', 'Backers', 'State'],
      dtype='object')

In [8]:
# Checking for common syntax errors, typos, inconsistencies in strings that need to be fixed
# Create a series of the datatypes
data_types = df.dtypes
# Create a filter to select only the object datatypes
object_data_types = data_types[(data_types == 'object')]
# Display the series of object datatypes
object_data_types

Name           object
Category       object
Subcategory    object
Country        object
Launched       object
Deadline       object
State          object
dtype: object

In [9]:
# Loop through the index of object_data_types
for column in object_data_types.index:
  print(column)
  print(df[column].unique())
  print('\n')

Name
['Grace Jones Does Not Give A F$#% T-Shirt (limited Edition) '
 'CRYSTAL ANTLERS UNTITLED MOVIE' 'drawing for dollars' ...
 "Help save La Gattara, Arizona's first Cat Cafe!" 'Digital Dagger Coin'
 'Spirits of the Forest']


Category
['Fashion' 'Film & Video' 'Art' 'Technology' 'Journalism' 'Publishing'
 'Theater' 'Music' 'Photography' 'Games' 'Design' 'Food' 'Crafts' 'Comics'
 'Dance']


Subcategory
['Fashion' 'Shorts' 'Illustration' 'Software' 'Journalism' 'Fiction'
 'Theater' 'Rock' 'Photography' 'Puzzles' 'Graphic Design' 'Film & Video'
 'Publishing' 'Documentary' 'Sculpture' 'Electronic Music' 'Nonfiction'
 'Food' 'Painting' 'Indie Rock' 'Video Games' 'Public Art'
 'Product Design' 'Art' "Children's Books" 'Crafts' 'Jazz' 'Music'
 'Comics' 'Narrative Film' 'Tabletop Games' 'Digital Art' 'Animation'
 'Conceptual Art' 'Pop' 'Hip-Hop' 'Country & Folk' 'Periodicals'
 'Webseries' 'Performance Art' 'Technology' 'Art Books' 'World Music'
 'Knitting' 'Classical Music' 'Poetry' 'Graphi