In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

We're ready to build our first neural network. We will have multiple features we feed into our model, each of which will go through a set of perceptron models to arrive at a response which will be trained to our output.

Like many models we've covered, this can be used as both a regression or classification model.

First, we need to load our dataset. For this example we'll use The Museum of Modern Art in New York's [public dataset](https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv) on their collection.

In [3]:
artworks = pd.read_csv('Artworks.csv')

In [4]:
artworks.columns

Index(['Title', 'Artist', 'ConstituentID', 'ArtistBio', 'Nationality',
       'BeginDate', 'EndDate', 'Gender', 'Date', 'Medium', 'Dimensions',
       'CreditLine', 'AccessionNumber', 'Classification', 'Department',
       'DateAcquired', 'Cataloged', 'ObjectID', 'URL', 'ThumbnailURL',
       'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)', 'Height (cm)',
       'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)',
       'Duration (sec.)'],
      dtype='object')

We'll also do a bit of data processing and cleaning, selecting columns of interest and converting URL's to booleans indicating whether they are present.

In [5]:
# Select Columns.
artworks = artworks[['Artist', 'Nationality', 'Gender', 'Date', 'Department',
                    'DateAcquired', 'URL', 'ThumbnailURL', 'Height (cm)', 'Width (cm)']]

# Convert URL's to booleans.
artworks['URL'] = artworks['URL'].notnull()
artworks['ThumbnailURL'] = artworks['ThumbnailURL'].notnull()

# Drop films and some other tricky rows.
artworks = artworks[artworks['Department']!='Film']
artworks = artworks[artworks['Department']!='Media and Performance Art']
artworks = artworks[artworks['Department']!='Fluxus Collection']

# Drop missing data.
artworks = artworks.dropna()

In [7]:
print(artworks.shape)
artworks.head()

(106031, 10)


Unnamed: 0,Artist,Nationality,Gender,Date,Department,DateAcquired,URL,ThumbnailURL,Height (cm),Width (cm)
0,Otto Wagner,(Austrian),(Male),1896,Architecture & Design,1996-04-09,True,True,48.6,168.9
1,Christian de Portzamparc,(French),(Male),1987,Architecture & Design,1995-01-17,True,True,40.6401,29.8451
2,Emil Hoppe,(Austrian),(Male),1903,Architecture & Design,1997-01-15,True,True,34.3,31.8
3,Bernard Tschumi,(),(Male),1980,Architecture & Design,1995-01-17,True,True,50.8,50.8
4,Emil Hoppe,(Austrian),(Male),1903,Architecture & Design,1997-01-15,True,True,38.4,19.1


## Building a Model

Now, let's see if we can use multi-layer perceptron modeling (or "MLP") to see if we can classify the department a piece should go into using everything but the department name.

Before we import MLP from SKLearn and establish the model we first have to ensure correct typing for our data and do some other cleaning.

In [8]:
# Get data types.
artworks.dtypes

Artist           object
Nationality      object
Gender           object
Date             object
Department       object
DateAcquired     object
URL                bool
ThumbnailURL       bool
Height (cm)     float64
Width (cm)      float64
dtype: object

The `DateAcquired` column is an object. Let's transform that to a datetime object and add a feature for just the year the artwork was acquired.

In [9]:
artworks['DateAcquired'] = pd.to_datetime(artworks.DateAcquired)
artworks['YearAcquired'] = artworks.DateAcquired.dt.year
artworks['YearAcquired'].dtype

dtype('int64')

Great. Let's do some more miscellaneous cleaning.

In [10]:
# Remove multiple nationalities, genders, and artists.
artworks.loc[artworks['Gender'].str.contains('\) \('), 'Gender'] = '\(multiple_persons\)'
artworks.loc[artworks['Nationality'].str.contains('\) \('), 'Nationality'] = '\(multiple_nationalities\)'
artworks.loc[artworks['Artist'].str.contains(','), 'Artist'] = 'Multiple_Artists'

# Convert dates to start date, cutting down number of distinct examples.
artworks['Date'] = pd.Series(artworks.Date.str.extract(
    '([0-9]{4})', expand=False))[:-1]

# Final column drops and NA drop.
X = artworks.drop(['Department', 'DateAcquired', 'Artist', 'Nationality', 'Date'], 1)

# Create dummies separately.
artists = pd.get_dummies(artworks.Artist)
nationalities = pd.get_dummies(artworks.Nationality)
dates = pd.get_dummies(artworks.Date)

# Concat with other variables, but artists slows this wayyyyy down so we'll keep it out for now
X = pd.get_dummies(X, sparse=True)
X = pd.concat([X, nationalities, dates], axis=1)

Y = artworks.Department

In [105]:
artists.shape

(106031, 9119)

In [11]:
# Alright! We've done our prep, let's build the model.
# Neural networks are hugely computationally intensive.
# This may take several minutes to run.

# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, Y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1000,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [12]:
mlp.score(X, Y)

0.6985315615244598

In [13]:
Y.value_counts()/len(Y)

Prints & Illustrated Books    0.520763
Photography                   0.227735
Architecture & Design         0.114202
Drawings                      0.103592
Painting & Sculpture          0.033707
Name: Department, dtype: float64

In [12]:
from sklearn.model_selection import cross_val_score
cross_val_score(mlp, X, Y, cv=5)

array([ 0.57519536,  0.52577072,  0.36922856,  0.48580744,  0.54039799])

Now we got a lot of information from all of this. Firstly we can see that the model seems to overfit, though there is still so remaining performance when validated with cross validation. This is a feature of neural networks that aren't given enough data for the number of features present. _Neural networks, in general, like_ a lot _of data_. You may also have noticed something also about neural networks: _they can take a_ long _time to run_. Try increasing the layer size by adding a zero. Feel free to interrupt the kernel if you don't have time...

Also note that we created bools for artist's name but left them out. Both of the above points are the reason for that. It would take much longer to run and it would be much more prone to overfitting.

## Model parameters

Now, before we move on and let you loose with some tasks to work on the model, let's go over the parameters.

We included one parameter: hidden layer size. Remember in the previous lesson, when we talked about layers in a neural network. This tells us how many and how big to make our layers. Pass in a tuple that specifies each layer's size. Our network is 1000 neurons wide and one layer. (100, 4, ) would create a network with two layers, one 100 wide and the other 4.

How many layers to include is determined by two things: computational resources and cross validation searching for convergence. It's generally less than the number of input variables you have.

You can also set an alpha. Neural networks like this use a regularization parameter that penalizes large coefficients just like we discussed in the advanced regression section. Alpha scales that penalty.

Lastly, we'll discuss the activation function. The activation function determines whether the output from an individual perceptron is binary or continuous. By default this is a 'relu', or 'rectified linear unit function' function. In the exercise we went through earlier we used this binary function, but we discussed the _sigmoid_ as a reasonable alternative. The _sigmoid_ (called 'logistic' by SKLearn because it's a 'logistic sigmoid function') allows for continuous variables between 0 and 1, which allows for a more nuanced model. It does come at the cost of increased computational complexity.

If you want to learn more about these, study [activation functions](https://en.wikipedia.org/wiki/Activation_function) and [multilayer perceptrons](https://en.wikipedia.org/wiki/Multilayer_perceptron). The [Deep Learning](http://www.deeplearningbook.org/) book referenced earlier goes into great detail on the linear algebra involved.

You could also just test the models with cross validation. Unless neural networks are your specialty cross validation should be sufficient.

For the other parameters and their defaults, check out the [MLPClassifier documentaiton](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier).

## Drill: Playing with layers

Now it's your turn. Using the space below, experiment with different hidden layer structures. You can try this on a subset of the data to improve runtime. See how things vary. See what seems to matter the most. Feel free to manipulate other parameters as well. It may also be beneficial to do some real feature selection work...

In [16]:
list(X.columns)

['URL',
 'ThumbnailURL',
 'Height (cm)',
 'Width (cm)',
 'YearAcquired',
 'Gender_()',
 'Gender_(Female)',
 'Gender_(Male)',
 'Gender_(male)',
 'Gender_\\(multiple_persons\\)',
 '()',
 '(Albanian)',
 '(Algerian)',
 '(American)',
 '(Argentine)',
 '(Australian)',
 '(Austrian)',
 '(Azerbaijani)',
 '(Bahamian)',
 '(Belgian)',
 '(Bolivian)',
 '(Bosnian)',
 '(Brazilian)',
 '(British)',
 '(Bulgarian)',
 '(Cambodian)',
 '(Cameroonian)',
 '(Canadian Inuit)',
 '(Canadian)',
 '(Chilean)',
 '(Chinese)',
 '(Colombian)',
 '(Congolese)',
 '(Coptic)',
 '(Costa Rican)',
 '(Croatian)',
 '(Cuban)',
 '(Czech)',
 '(Czechoslovakian)',
 '(Danish)',
 '(Dutch)',
 '(Ecuadorian)',
 '(Egyptian)',
 '(Estonian)',
 '(Ethiopian)',
 '(Finnish)',
 '(French)',
 '(Georgian)',
 '(German)',
 '(Ghanaian)',
 '(Greek)',
 '(Guatemalan)',
 '(Guyanese)',
 '(Haitian)',
 '(Hungarian)',
 '(Icelandic)',
 '(Indian)',
 '(Iranian)',
 '(Irish)',
 '(Israeli)',
 '(Italian)',
 '(Ivorian)',
 '(Japanese)',
 '(Kenyan)',
 '(Korean)',
 '(Kuwait

In [26]:
pd.concat([X,Y],1)

Unnamed: 0,URL,ThumbnailURL,Height (cm),Width (cm),YearAcquired,Gender_(),Gender_(Female),Gender_(Male),Gender_(male),Gender_\(multiple_persons\),...,2010,2011,2012,2013,2014,2015,2016,2017,2018,Department
0,True,True,48.600000,168.900000,1996,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
1,True,True,40.640100,29.845100,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
2,True,True,34.300000,31.800000,1997,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
3,True,True,50.800000,50.800000,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
4,True,True,38.400000,19.100000,1997,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
5,True,True,35.600000,45.700000,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
6,True,True,35.600000,45.700000,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
7,True,True,35.600000,45.700000,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
8,True,True,35.600000,45.700000,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design
9,True,True,35.600000,45.700000,1995,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,Architecture & Design


In [61]:
# convert y to numeric to check correlations
Y_numeric = pd.Series(pd.factorize(Y)[0]).reset_index()
Y_numeric.columns = ['index', 'Deparment']

In [63]:
# assign correlation matrix to variable
correlation_matrix = pd.concat([X,Y_numeric['Deparment']],1).corr()

In [64]:
# check correlations of top 55 
correlation_matrix.loc[:, 'Deparment'].sort_values(ascending=False)[:55]


Deparment          1.000000
(American)         0.139065
1875               0.064946
1884               0.053149
1859               0.047780
1857               0.042239
1871               0.041292
(Mexican)          0.035021
1858               0.034106
1856               0.032871
2004               0.031078
1903               0.030375
1861               0.027845
1905               0.027121
1971               0.027029
1860               0.025239
1867               0.025084
()                 0.024356
1865               0.023945
(Norwegian)        0.023830
1853               0.023649
1990               0.023209
1968               0.022800
(Cuban)            0.022610
1932               0.021829
1891               0.020500
1855               0.020115
1959               0.020023
1904               0.019437
1972               0.018611
Width (cm)         0.018500
1940               0.017335
1872               0.017216
1850               0.017078
1982               0.014666
1896               0

In [85]:
# assign feature with correlation > .01 to list
pos_corr = list(correlation_matrix.loc[:, 'Deparment'].sort_values(ascending=False)[1:103].index)

In [86]:
# assign feature with neg correlation > -.01 to list
neg_corr = list(correlation_matrix.loc[:, 'Deparment'].sort_values()[:108].index)

In [73]:
%timeit
# Your code here. Experiment with hidden layers to build your own model.

# Alright! We've done our prep, let's build the model.
# Neural networks are hugely computationally intensive.
# This may take several minutes to run.

# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 1000 perceptron layer.
mlp_test = MLPClassifier(hidden_layer_sizes=(1500,))
mlp_test.fit(X[pos_corr+neg_corr], Y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1500,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [74]:
mlp_test.score(X[pos_corr+neg_corr], Y)

0.6188378870330375

In [202]:
cross_val_score(mlp_test, X[pos_corr+neg_corr], Y)

array([0.60815934, 0.55068896, 0.49230377])

In [82]:


# Establish and fit the model, with a single, 1000 perceptron layer.
mlp_test = MLPClassifier(activation='logistic', hidden_layer_sizes=(1500,))
%timeit mlp_test.fit(X[:][:55000], Y[:55000])

7min 25s ± 1min 45s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [84]:
mlp_test.score(X[:][:55000], Y[:55000])

0.7538909090909091

In [88]:
# Establish and fit the model, with a single, 1000 perceptron layer.
mlp_test2 = MLPClassifier(activation='logistic', hidden_layer_sizes=(1500,))

In [92]:
%%time
mlp_test2.fit(X, Y)

CPU times: user 21min 55s, sys: 2min 13s, total: 24min 8s
Wall time: 17min 5s


MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1500,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [93]:
# Logistic Sigmoid with 106 features 
mlp_test2.score(X, Y)

0.714055323443144

In [97]:
%%time
mlp_test3 = MLPClassifier(activation='logistic', hidden_layer_sizes=(500, 500))
mlp_test3.fit(X[:][:55000], Y[:55000])

CPU times: user 5min 35s, sys: 34.1 s, total: 6min 9s
Wall time: 4min 4s


In [99]:
mlp_test3.score(X[:][:55000], Y[:55000])

0.7576

In [101]:
%%time
mlp_test4 = MLPClassifier(activation='logistic', hidden_layer_sizes=(500, 500, 500))
mlp_test4.fit(X[:][:55000], Y[:55000])

CPU times: user 8min 33s, sys: 54.5 s, total: 9min 27s
Wall time: 5min 59s


In [102]:
mlp_test4.score(X[:][:55000], Y[:55000])

0.7519272727272728

## lets reduce artists dummies vars with pca and add it to X

In [107]:
from sklearn.decomposition import PCA

In [108]:
pca = PCA(n_components = 50)

pca_artists = pca.fit_transform(artists)

artist_headings = ["PCA_ARTIST "+ str(x) for x in range(1,51)]


artists_df = pd.DataFrame(pca_artists, columns = artist_headings, index=X.index)

In [109]:
X_artists = pd.concat([X, artists_df], 1)

In [110]:
mlp_test5 = MLPClassifier(activation='logistic', hidden_layer_sizes=(500, 500, 500))
mlp_test5.fit(X_artists[:][:55000], Y[:55000])

MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [111]:
mlp_test5.score(X_artists[:][:55000], Y[:55000])

0.8020727272727273

In [203]:
mlp_test5a = MLPClassifier(activation='logistic', solver='sgd', learning_rate='adaptive', hidden_layer_sizes=(500, 500, 500))
mlp_test5a.fit(X_artists[:][:55000], Y[:55000])

MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500, 500), learning_rate='adaptive',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='sgd', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [205]:
mlp_test5a.score(X_artists[:][:55000], Y[:55000])

0.5514

In [112]:
mlp_test6 = MLPClassifier(activation='logistic', hidden_layer_sizes=(500, 500))
mlp_test6.fit(X_artists[:][:55000], Y[:55000])

MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [113]:
mlp_test6.score(X_artists[:][:55000], Y[:55000])

0.7768727272727273

In [114]:
mlp_test7 = MLPClassifier(activation='logistic', hidden_layer_sizes=(1500, 1000, 500))
mlp_test7.fit(X_artists[:][:55000], Y[:55000])

MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1500, 1000, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [117]:
mlp_test7.score(X_artists[:][:55000], Y[:55000])

0.7817636363636363

In [119]:
from sklearn.model_selection import cross_val_score

In [120]:
%%time
# let's cross validate best performing model
cross_val_score(mlp_test5, X_artists[:][:55000], Y[:55000], cv=5)

CPU times: user 22min 5s, sys: 1min 32s, total: 23min 38s
Wall time: 13min 58s


array([0.72125784, 0.75174984, 0.55141376, 0.74624966, 0.32696854])

In [122]:
# model is over fitting, adjust regularization parameter, alpha from .0001 to .01
mlp_test8 = MLPClassifier(activation='logistic', alpha=0.01,  hidden_layer_sizes=(500, 500, 500))
mlp_test8.fit(X_artists[:][:55000], Y[:55000])

MLPClassifier(activation='logistic', alpha=0.01, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [123]:
mlp_test8.score(X_artists[:][:55000], Y[:55000])

0.777090909090909

In [125]:


cross_val_score(mlp_test8, X_artists[:][:55000], Y[:55000], cv=5)

array([0.70917023, 0.73438778, 0.74234021, 0.73415765, 0.38816148])

In [None]:
# model is over fitting, adjust regularization parameter, alpha from .0001 to .001
mlp_test8 = MLPClassifier(activation='logistic', alpha=0.001,  hidden_layer_sizes=(500, 500, 500))
mlp_test8.fit(X_artists[:][:55000], Y[:55000])

In [126]:
mlp_test9 = MLPClassifier(activation='logistic', alpha=0.0005,  hidden_layer_sizes=(500, 500, 500))
mlp_test9.fit(X_artists, Y)

MLPClassifier(activation='logistic', alpha=0.05, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [127]:
mlp_test9.score(X_artists, Y)

0.7095566390961134

In [128]:
cross_val_score(mlp_test9, X_artists, Y, cv=5)

array([0.63183704, 0.72665975, 0.52235216, 0.60020749, 0.54879027])

In [130]:
from sklearn.feature_selection import chi2

In [169]:
chi2_stats = chi2(X,Y)[0]
chi2_p_values = chi2(X,Y)[1]

In [171]:
chi2_res_df = pd.DataFrame()
chi2_res_df['chi2_statistic'] = chi2_stats
chi2_res_df['p_vals'] = chi2_p_values
chi2_res_df.set_index(X.columns, inplace=True)
chi2_res_df

Unnamed: 0,chi2_statistic,p_vals
URL,5.342793e+03,0.000000e+00
ThumbnailURL,2.944184e+03,0.000000e+00
Height (cm),9.275725e+05,0.000000e+00
Width (cm),1.082370e+06,0.000000e+00
YearAcquired,1.204649e+03,1.564214e-259
Gender_(),4.502170e+03,0.000000e+00
Gender_(Female),7.686675e+02,4.696998e-165
Gender_(Male),4.329812e+02,2.073784e-92
Gender_(male),6.314013e+01,6.340366e-13
Gender_\(multiple_persons\),3.711220e+03,0.000000e+00


In [183]:
chi2_res_df.sort_values('chi2_statistic', ascending=False)[:100]

Unnamed: 0,chi2_statistic,p_vals
Width (cm),1.082370e+06,0.000000e+00
Height (cm),9.275725e+05,0.000000e+00
(French),5.504988e+03,0.000000e+00
URL,5.342793e+03,0.000000e+00
Gender_(),4.502170e+03,0.000000e+00
Gender_\(multiple_persons\),3.711220e+03,0.000000e+00
\(multiple_nationalities\),3.711220e+03,0.000000e+00
(),3.559297e+03,0.000000e+00
2003,3.368621e+03,0.000000e+00
ThumbnailURL,2.944184e+03,0.000000e+00


In [184]:
chi2_filter_top_100 = list(chi2_res_df.sort_values('chi2_statistic', ascending=False)[:100].index)

In [190]:
reduced_X = X[chi2_filter_top_100]

In [192]:
# atttempt 10 mlp with 100 features selected with chi-squared test plus pca reduced artists alpha .0001
mlp_test10 = MLPClassifier(activation='logistic', alpha=.001, hidden_layer_sizes=(500, 500))
mlp_test10.fit(reduced_X[:][:55000], Y[:55000])

MLPClassifier(activation='logistic', alpha=0.001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [193]:
mlp_test10.score(reduced_X[:][:55000], Y[:55000])

0.7313272727272727

In [194]:
cross_val_score(mlp_test10, reduced_X[:][:55000], Y[:55000], cv=5)

array([0.65936563, 0.69139169, 0.14055823, 0.57977998, 0.22785961])

In [195]:
from sklearn.model_selection import train_test_split

In [196]:
train_test_split?

In [197]:
X_train, X_test, Y_train, Y_test = train_test_split(reduced_X[:][:55000], Y[:55000], test_size=.2, random_state=0)

In [198]:
mlp_test11 = MLPClassifier(activation='logistic', alpha=.001, hidden_layer_sizes=(500, 500))
mlp_test11.fit(X_train, Y_train)

MLPClassifier(activation='logistic', alpha=0.001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 500), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [199]:
mlp_test11.score(X_test, Y_test)

0.7134545454545455

In [200]:
mlp_test11.score(X_train, Y_train)

0.713590909090909

## Excercise Takeaways
Model performance benefit from more features (9000 artist dummy variable reduced via pca)
L2 regularization parameter tuning can help address over fitting at the cost of previdective accuracy
This particular model seemed to work best with a sigmoid activation function
Best model used three hidden layers 500 nodes wide, grid search may be good for determining optimal hadden layer size layer size