# One-Dimensional Convolutional Neutral Network

This is to demonstrate a 1D CNN using Keras. The utility class accepts pandas dataframe as input for training and testing. 

## Load Packages

In [1]:
import pandas as pd
from one_dim_cnn import OneDimCnn

Using TensorFlow backend.


## Load Data
Note: The datasets must be contained in csv file or dataframe with column names, 'input' and 'label'. 

In [2]:
path = 'data/training_set.csv'
df = pd.read_csv(path, encoding='latin-1')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 427 entries, 0 to 426
Data columns (total 2 columns):
input    427 non-null object
label    427 non-null object
dtypes: object(2)
memory usage: 6.8+ KB


## Balanced Data
Check the shape of data.

In [4]:
df[df.loc[:, 'label'] == 'celebrity'].shape, df[df.loc[:, 'label'] == 'non-celeb'].shape

((283, 2), (144, 2))

Create sub-dataframe with balanced data sets.

In [5]:
sub_df = df[df.loc[:, 'label'] == 'celebrity'].head(144)
sub_df = sub_df.append(df[df.loc[:, 'label'] == 'non-celeb'])
sub_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 288 entries, 3 to 426
Data columns (total 2 columns):
input    288 non-null object
label    288 non-null object
dtypes: object(2)
memory usage: 6.8+ KB


In [6]:
sub_df[sub_df.loc[:, 'label'] == 'celebrity'].shape, sub_df[sub_df.loc[:, 'label'] == 'non-celeb'].shape

((144, 2), (144, 2))

## One-Dimensional CNN
Initialize Model

In [6]:
cnn = OneDimCnn()

{'epochs': 10, 'hidden_dims': 250, 'kernel_size': 3, 'filters': 250, 'embedding_dims': 50, 'batch_size': 32, 'maxlen': 500, 'max_features': 100000.0, 'self': <one_dim_cnn.OneDimCnn object at 0x000002791A477B00>}


In [7]:
tokenizer, model = cnn.train(sub_df, save=True) # Set save=True if you want to save the model, Default: False

Loading data...
Train shape:  (217, 500)
Test shape:  (72, 500)
Building model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 50)           5000000   
_________________________________________________________________
dropout_1 (Dropout)          (None, 500, 50)           0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 498, 250)          37750     
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 250)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 250)               62750     
_________________________________________________________________
dropout_2 (Dropout)          (None, 250)               0         
____________________________________________________________

You can find from the above example that the validated accuracy in ~89%.