## PREPARE THE DATA

In [1]:
from __future__ import print_function
import pandas as pd
import numpy as np
# 아래의 코드를 돌릴때 나는 경고문구를 무시
import warnings
warnings.filterwarnings('ignore')

DF = pd.read_csv('data/adult_data.csv')

# Let's create a feature that will be our target for logistic regression
DF['income_label'] = (DF["income_bracket"].apply(lambda x: ">50K" in x)).astype(int)

DF.head()

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,income_bracket,income_label
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K,0
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K,0
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K,0
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K,0
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K,0


### 1-Set the experiment

We need to define the columns in the dataset that will be passed to the *"wide-"* and the *"deep-side"* of the model. For more details of what I mean by "wide" and "deep" I recommend either to read [this tutorial](https://www.tensorflow.org/tutorials/wide_and_deep), the [original paper](https://arxiv.org/pdf/1606.07792.pdf) or the demo2 in this repo. 

In the example below, the wide and crossed column will be passed to the wide side of the model while the embedding columns and continuous columns will go through the deep side. 

We also need to state our target and the method that will be used to fit/predict that target (regression, logistic or multiclass).

In [2]:
# wide_cols와 crossed_cols는 wide로
wide_cols = ['age','hours_per_week','education', 'relationship','workclass',
             'occupation','native_country','gender']
crossed_cols = (['education', 'occupation'], ['native_country', 'occupation'])

# embeddings_cols와 continuous_cols는 deep으로
embeddings_cols = [('education',10), ('relationship',8), ('workclass',10), # column이름, 원하는 embedding size로 이루어짐
                    ('occupation',10),('native_country',12)]
continuous_cols = ["age","hours_per_week"]
target = 'income_label'
method = 'logistic'

You will see that `embeddings_cols` is a list of tuples with two elements. These are the column name and the "dimension of the corresponding embeddings" (i.e. the number of embeddings per feature), so that when passed through the Deep-side education will be represented by 10 embeddings, relatioship by 8, etc.

If you want to use the same number of embeddings for *all* the embedding columns you can simply include the column names and define the number of embeddings when calling to the `prepare_data` function I mention before. This function has a parameter called `def_dim` (default dimension) that will be applied to all embedding columns if no embedding dimension. The first few lines on `prepare_data` look like this

In [3]:
# embeddings_cols에 embeddings size가 없다면 default(def_dim)로 값 지정
if type(embeddings_cols[0]) is tuple:
    emb_dim = dict(embeddings_cols)
    embeddings_cols = [emb[0] for emb in embeddings_cols] # 컬럼들만 따로 지정
else:
    emb_dim = {e:def_dim for e in embeddings_cols}

# deep에 들어갈 컬럼만 따로 지정
deep_cols = embeddings_cols+continuous_cols

### 2-Cross-product for binary features

At explained in the original paper: *"For binary features, a cross-product transformation (e.g.,
`AND(gender=female, language=en))` is 1 if and only if the constituent features (`gender=female and language=en`)
are all 1, and 0 otherwise"*. Here, this is implemented by combining the features into a new feature and one-hot encoded it afterwards.

In [4]:
Y = np.array(DF[target])
# We copy the original dataset so we do not mutate it -> 데이터 바뀌는거 방지를 위해 복사해놓음
df_tmp = DF.copy()[list(set(wide_cols + deep_cols))]

# Build the crossed columns
crossed_columns = []
for cols in crossed_cols:
    colname = '_'.join(cols) # 컬럼명을 '_'로 합쳐줌
    df_tmp[colname] = df_tmp[cols].apply(lambda x: '-'.join(x), axis=1) # 데이터 프레임 안에 있는 값들을 '-'로 합쳐줌
    crossed_columns.append(colname)

# Extract the categorical column names that can be one hot encoded later -> 원핫인코딩이 필요한 컬럼을 추출
categorical_columns = list(df_tmp.select_dtypes(include=['object']).columns)

Let's have a look to one of the "crossed features" 

In [5]:
df_tmp['education_occupation'].head()

0       Bachelors-Adm-clerical
1    Bachelors-Exec-managerial
2    HS-grad-Handlers-cleaners
3       11th-Handlers-cleaners
4     Bachelors-Prof-specialty
Name: education_occupation, dtype: object

나중에 원핫 인코딩 할 때, "crossed features"에 있는 값이 둘다 1일때만 1로 원핫인코딩됨

When we one-hot encode this feature later, it will be only 1 *if and only* if the two constituent features are 1. In other words, the level `Bachelors-Adm-clerical` of the `education_occupation` feature will be 1 *if and only if* for that particular observation `education=Bachelors` AND `occupation=Adm-clerical`.

### 3-Label-encoding and splitting the dataframe into wide and deep.

We first encode the dataframe and keep a dictionary of the encodings for those columns that will be represented as embeddings (for the remaining ones is unneccesary).

In [6]:
def label_encode(df, cols=None):
    """
    Helper function to label-encode some features of a given dataset.

    Parameters:
    --------
    df  (pd.Dataframe)
    cols (list): optional - columns to be label-encoded

    Returns:
    ________
    val_to_idx (dict) : Dictionary of dictionaries with useful information about
    the encoding mapping
    df (pd.Dataframe): mutated df with Label-encoded features.
    """

    if cols == None:
        cols = list(df.select_dtypes(include=['object']).columns) # 카테고리컬 변수만 추출

    val_types = dict()
    for c in cols:
        val_types[c] = df[c].unique() # val_types에 카테고리컬 변수의 갯수를 저장

    val_to_idx = dict()
    for k, v in zip(val_types.keys(),val_types.values()):
        val_to_idx[k] = {o: i for i, o in enumerate(val_types[k])}

    for k, v in zip(val_to_idx.keys(),val_to_idx.values()):
        df[k] = df[k].apply(lambda x: v[x])

    return val_to_idx, df

# deep cols 임베딩 해줌
# Encode the dataframe and get the encoding Dictionary only for the
# deep_cols (for the wide_cols is uneccessary)
encoding_dict,df_tmp = label_encode(df_tmp)
encoding_dict = {k:encoding_dict[k] for k in encoding_dict if k in deep_cols}
embeddings_input = []
for k,v in zip(encoding_dict.keys(),encoding_dict.values()):
    embeddings_input.append((k, len(v), emb_dim[k]))

Then we split the data frame into the wide and deep data frames and keep the index of the deep column. This information will be used later since we will slice the tensors based on index.

df_deep과 de_wide로 나뉨

In [7]:
## de_deep ##
# select the deep_cols and get the column index that will be use later
# to slice the tensors
df_deep = df_tmp[deep_cols]
deep_column_idx = {k:v for v,k in enumerate(df_deep.columns)} # 나중에 텐서를 자를 때, 사용하기 때문에 저장해줌

# The continous columns will be concatenated with the embeddings, so you
# might want to normalize them first
# -> 연속형 변수는 임베딩 벡터들과 concat됨. 먼저 정규화 하는 작업을 거쳐도됨
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
for cc in continuous_cols:
    df_deep[cc]  = scaler.fit_transform(df_deep[cc].values.reshape(-1,1)) # 열을 1로 맞춰줌

## df_wide ##
# wide에 들어갈 데이터 생성
df_wide = df_tmp[wide_cols+crossed_columns]
del(df_tmp)

# categorical 변수는 원핫 인코딩 해줌
dummy_cols = [c for c in wide_cols+crossed_columns if c in categorical_columns]
df_wide = pd.get_dummies(df_wide, columns=dummy_cols)

### 4-Train/Test split and build the output dictionary

I think the code here is self explanatory...

In [8]:
from sklearn.model_selection import train_test_split
from collections import namedtuple

# 검증을 위해 데이터를 나눔
seed = 1981
X_train_deep, X_test_deep = train_test_split(df_deep.values, test_size=0.3, random_state=seed)
X_train_wide, X_test_wide = train_test_split(df_wide.values, test_size=0.3, random_state=seed)
y_train, y_test = train_test_split(Y, test_size=0.3, random_state=1981)

# Building the output dictionary
wd_dataset = dict()
train_dataset = namedtuple('train_dataset', 'wide, deep, labels')
test_dataset  = namedtuple('test_dataset' , 'wide, deep, labels')
wd_dataset['train_dataset'] = train_dataset(X_train_wide, X_train_deep, y_train)
wd_dataset['test_dataset']  = test_dataset(X_test_wide, X_test_deep, y_test)
wd_dataset['embeddings_input']  = embeddings_input
wd_dataset['deep_column_idx'] = deep_column_idx
wd_dataset['encoding_dict'] = encoding_dict

`wd_dataset` is a dictionary with all the neccesary information. Let's have a look to for example, the `train_dataset`

In [9]:
wd_dataset['train_dataset']

train_dataset(wide=array([[46, 50,  0, ...,  0,  0,  0],
       [32, 45,  1, ...,  0,  0,  0],
       [30, 30,  0, ...,  0,  0,  0],
       ...,
       [40, 40,  0, ...,  0,  0,  0],
       [45, 37,  1, ...,  0,  0,  0],
       [40, 45,  1, ...,  0,  0,  0]], dtype=int64), deep=array([[ 3.        ,  1.        ,  6.        , ...,  0.        ,
         0.53655844,  0.77292975],
       [ 0.        ,  0.        ,  2.        , ...,  0.        ,
        -0.48456647,  0.36942139],
       [ 1.        ,  4.        ,  2.        , ...,  0.        ,
        -0.63044146, -0.84110367],
       ...,
       [ 1.        ,  0.        ,  2.        , ...,  0.        ,
         0.09893348, -0.03408696],
       [ 0.        ,  1.        ,  2.        , ...,  0.        ,
         0.46362095, -0.27619198],
       [ 0.        ,  1.        ,  2.        , ...,  0.        ,
         0.09893348,  0.36942139]]), labels=array([1, 0, 0, ..., 0, 0, 0]))

### 2-The model

The model is a combination of a linear classifier/regressor for sparse features (Wide) plus a neural network classifier/regressor that receives the embeddings. The figure below, taken from the [tutorial](https://www.tensorflow.org/tutorials/wide_and_deep), is a good illustration on how the algorithm works.

In [10]:
# from IPython.display import Image
# PATH = "/Users/javier/Desktop/wide_deep.png"
# Image(filename = PATH, width=1000, height=500)

A priori, "all" we have to do is:

1. Prepare the wide part

2. Prepare the deep part

3. combine them

So...let's go!

### 2_1. The Wide Part

The wide part consist simply in the sparse features connected directly to the output neuron (or neurons if the problem is a multiclass classification). In the example here we will perform a logistic regression, so we need to connect the input features to an output neuron and use a *Sigmoid* activation function. 

In our case, this could be done like this:

In [11]:
import torch.nn as nn
import torch.nn.functional as F

wide_dim = wd_dataset['train_dataset'].wide.shape[1]
n_class  = 1
wide_part = nn.Linear(wide_dim, n_class)

print(wide_part)

Linear(in_features=798, out_features=1, bias=True)


Of course, we want our code to look pretty and be functional. When using pytorch, models are normally defined as classes (although if the are simple enough one could use the `Sequential` API) that inherit the methods from the `nn` module. Let's define the wide part properly and see how to use it. 

In [12]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset
from torch.autograd import Variable
from torch.utils.data import DataLoader

class Wide(nn.Module):
    """
    Wide-side consists in simply in "pluging" the features into the output neuron(s)

    Parameters:
    ----------
    wide_dim: int. Number of features per observation
    method  : str. Regression, logistic or multiclass
    n_class : int. number of classes. Defaults to 1 if logistic or regression
    """
    def __init__(self, wide_dim, n_class):

        super(Wide, self).__init__()
        self.wide_dim = wide_dim
        self.n_class = n_class

        self.linear = nn.Linear(self.wide_dim, self.n_class)

    def forward(self,X):

        out = F.sigmoid(self.linear(X))

        return out

In [13]:
wide_dim = wd_dataset['train_dataset'].wide.shape[1]
n_class  = 1
wide_model = Wide(wide_dim, n_class)

In [14]:
print(wide_model)

Wide(
  (linear): Linear(in_features=798, out_features=1, bias=True)
)


Ok, so far so good. We have simply created a model that consists of observations of 798 sparse features "plugged" into an output neuron that is activated with a *Sigmoid* function. Now, all we need to do is prepare the tensor to be passed through the model. Remember that the `prepare-data` function returns a dictionary with the wide and deep part arrays. Therefore, we can use the `train_dataset` to build the input tensor. 

Here the first column will be the target. 

In [15]:
wd_dataset['train_dataset'].labels.reshape(-1, 1).shape

(34189, 1)

In [16]:
# np.hstack: 두 배열을 왼쪽에서 오른쪽으로 붙이기 
train_dataset = np.hstack([wd_dataset['train_dataset'].labels.reshape(-1, 1), wd_dataset['train_dataset'].wide])
train_dataset

array([[ 1, 46, 50, ...,  0,  0,  0],
       [ 0, 32, 45, ...,  0,  0,  0],
       [ 0, 30, 30, ...,  0,  0,  0],
       ...,
       [ 0, 40, 40, ...,  0,  0,  0],
       [ 0, 45, 37, ...,  0,  0,  0],
       [ 0, 40, 45, ...,  0,  0,  0]], dtype=int64)

Now we need to set up the training: optimizer, loss function, etc.

Pytorch provide a very handy functionality called `DataLoader`, at `torch.utils.data`. We will use if here to create the batches. Finally, the model needs to receive `Variables`. `Variables` support almost all operations you can perform on tensors. In addition, they define the computational graph, which will allow us later to automatically compute gradients. For more information read [here](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-variables-and-autograd).

In [45]:
optimizer = torch.optim.Adam(wide_model.parameters())
batch_size = 64
n_epochs = 10
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
# from http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
for epoch in range(n_epochs):
    total=0
    correct=0
    for i, batch in enumerate(train_loader):

        X_w = Variable(batch[:, 1:]).float()
        y = Variable(batch[:, 0]).float()

        optimizer.zero_grad()
        y_pred = wide_model(X_w).squeeze(1)
        loss = F.binary_cross_entropy(y_pred, y)
        loss.backward()
        optimizer.step()

        total+= y.size(0)
        y_pred_cat = (y_pred > 0.5).float()
        correct+= float((y_pred_cat == y).sum().data)

    print ('Epoch {} of {}, Loss: {}, accuracy: {}'.format(epoch+1, n_epochs, (loss.data), (correct/total)))

Epoch 1 of 10, Loss: 0.1180478036403656, accuracy: 0.8382520693790401
Epoch 2 of 10, Loss: 0.3979654014110565, accuracy: 0.8382813185527509
Epoch 3 of 10, Loss: 0.3123619854450226, accuracy: 0.8376085875574015
Epoch 4 of 10, Loss: 0.35507360100746155, accuracy: 0.8375208400362689
Epoch 5 of 10, Loss: 0.36205658316612244, accuracy: 0.8384568135950159
Epoch 6 of 10, Loss: 0.19640900194644928, accuracy: 0.8371990991254497
Epoch 7 of 10, Loss: 0.12821811437606812, accuracy: 0.8374038433414256
Epoch 8 of 10, Loss: 0.30573543906211853, accuracy: 0.8377840825996665
Epoch 9 of 10, Loss: 0.176675945520401, accuracy: 0.8377840825996665
Epoch 10 of 10, Loss: 0.37185028195381165, accuracy: 0.8375208400362689


### 2_2. Deep part

Things get a bit more "colorful" here. 

Still, the deep part implemented here will be comprised by two layers of 100 and 50 neurons, so strictly speaking and under today's standards, is not very "deep". 

As mentioned earlier, the deep part receives embeddings and can also receive numerical features if one likes. The set up of the deep part is "stored" in our favourite dictionar `wd_dataset`. There we have two entries: 

In [46]:
print(wd_dataset['embeddings_input'])
print(wd_dataset['deep_column_idx'])

[('education', 16, 10), ('workclass', 9, 10), ('native_country', 42, 12), ('occupation', 15, 10), ('relationship', 6, 8)]
{'education': 0, 'relationship': 1, 'workclass': 2, 'occupation': 3, 'native_country': 4, 'age': 5, 'hours_per_week': 6}


These should be read as follows: the feature `workclass` has 9 unique values and it will be represented using 10 embeddings. In addition, in the input tensor to the deep part, `workclass` is at column 2. With this information, plus continuous columns list at the beginning of this notebook, we can build the deep part of the model.

In pytorch, embedding layers are defined as:

In [47]:
col_name, unique_vals, n_emb = wd_dataset['embeddings_input'][0]
emb_layer = nn.Embedding(unique_vals, n_emb) # 임베딩에는 파이토치에 있는 nn.embedding을 사용함
print(emb_layer)

Embedding(16, 10)


Let's go:

In [52]:
class Deep(nn.Module):
    """
    Deep-side, which consists in a series of embeddings and numerical 
    features passed through a series of dense layers.

    Params:
    --------
    embeddings_input (tuple): 3-elements tuple with the embeddings "set-up" -
    (col_name, unique_values, embeddings dim)
    continuous_cols (list) : list with the name of the continuum columns
    deep_column_idx (dict) : dictionary where the keys are column names and the values
    their corresponding index in the deep-side input tensor
    hidden_layers (list) : list with the number of units per hidden layer
    n_class (int) : number of classes. Defaults to 1 if logistic or regression
    """
    def __init__(self,embeddings_input,continuous_cols,deep_column_idx,hidden_layers,n_class):

        super(Deep, self).__init__()
        self.deep_column_idx = deep_column_idx
        self.embeddings_input = embeddings_input
        self.continuous_cols = continuous_cols
        self.hidden_layers = hidden_layers
        self.n_class = n_class

        # build the embeddings that will be passed through the deep side    
        for col,val,dim in self.embeddings_input:
            setattr(self, 'emb_layer_'+col, nn.Embedding(val, dim))

        # the input dimension to the 1st hidden layer will be the sum of the
        # embeddings dimensions plus the number of continuous features
        input_emb_dim = np.sum([emb[2] for emb in self.embeddings_input])
        self.linear_1 = nn.Linear(input_emb_dim+len(continuous_cols), self.hidden_layers[0])
        
        for i,h in enumerate(self.hidden_layers[1:],1): # enumerate 1부터 시작됨
            setattr(self, 'linear_'+str(i+1), nn.Linear( self.hidden_layers[i-1], self.hidden_layers[i] ))

        self.output = nn.Linear(self.hidden_layers[-1], n_class)

    def forward(self, X):

        emb = [getattr(self, 'emb_layer_'+col)(X[:,self.deep_column_idx[col]].long())
               for col,_,_ in self.embeddings_input]

        cont_idx = [self.deep_column_idx[col] for col in self.continuous_cols]
        cont = [X[:, cont_idx].float()]

        deep_inp = torch.cat(emb+cont, 1)

        x_deep = F.relu(self.linear_1(deep_inp))
        for i in range(1,len(self.hidden_layers)):
            x_deep = F.relu( getattr(self, 'linear_'+str(i+1))(x_deep) )

        out = F.sigmoid(self.output(x_deep))

        return out

Let's built the model and have a look:

In [53]:
deep_column_idx = wd_dataset['deep_column_idx']
embeddings_input= wd_dataset['embeddings_input']
hidden_layers = [100,50]
deep_model = Deep(embeddings_input, continuous_cols, deep_column_idx, hidden_layers, n_class)

In [54]:
print(deep_model)

Deep(
  (emb_layer_education): Embedding(16, 10)
  (emb_layer_workclass): Embedding(9, 10)
  (emb_layer_native_country): Embedding(42, 12)
  (emb_layer_occupation): Embedding(15, 10)
  (emb_layer_relationship): Embedding(6, 8)
  (linear_1): Linear(in_features=52, out_features=100, bias=True)
  (linear_2): Linear(in_features=100, out_features=50, bias=True)
  (output): Linear(in_features=50, out_features=1, bias=True)
)


As we can see, the deep part is comprised by: 

1. 5 embedding layers of dimensions 10, 10, 10, 8 and 10 respectively
2. The 5 embedding layers will be contactenated with the two continuous feaures (age and hours per week) so that the first hidden layer will receive tensors of dimensions (?, 10+10+10+8+10+2=50)
3. Two hidden layers of 100 and 50 neurons
4. The output neuron with a Sigmoid activation function

All left to do now is to build the input dataset and set up the training as we did before for the wide part. 

In [55]:
train_dataset = np.hstack([wd_dataset['train_dataset'].labels.reshape(-1, 1), wd_dataset['train_dataset'].deep])
train_dataset

array([[ 1.        ,  3.        ,  1.        , ...,  0.        ,
         0.53655844,  0.77292975],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
        -0.48456647,  0.36942139],
       [ 0.        ,  1.        ,  4.        , ...,  0.        ,
        -0.63044146, -0.84110367],
       ...,
       [ 0.        ,  1.        ,  0.        , ...,  0.        ,
         0.09893348, -0.03408696],
       [ 0.        ,  0.        ,  1.        , ...,  0.        ,
         0.46362095, -0.27619198],
       [ 0.        ,  0.        ,  1.        , ...,  0.        ,
         0.09893348,  0.36942139]])

In [57]:
optimizer = torch.optim.Adam(deep_model.parameters())
batch_size = 64
n_epochs = 10
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

for epoch in range(n_epochs):
    total=0
    correct=0
    for i, batch in enumerate(train_loader):

        X_d = Variable(batch[:, 1:])
        y = Variable(batch[:, 0]).float()

        optimizer.zero_grad()
        y_pred = deep_model(X_d).squeeze(1)
        loss = F.binary_cross_entropy(y_pred, y)
        loss.backward()
        optimizer.step()

        total+= y.size(0)
        y_pred_cat = (y_pred > 0.5).float()
        correct+= float((y_pred_cat == y).sum().data)

    print ('Epoch {} of {}, Loss: {}, accuracy: {}'.format(epoch+1, n_epochs, (loss.data), (correct/total)))

Epoch 1 of 10, Loss: 0.704898476600647, accuracy: 0.8220480271432332
Epoch 2 of 10, Loss: 0.3213774561882019, accuracy: 0.836965105735763
Epoch 3 of 10, Loss: 0.3190212547779083, accuracy: 0.8388955512006786
Epoch 4 of 10, Loss: 0.2503102123737335, accuracy: 0.8404750065810641
Epoch 5 of 10, Loss: 0.30370792746543884, accuracy: 0.8421422094825821
Epoch 6 of 10, Loss: 0.4253805875778198, accuracy: 0.8444236450320278
Epoch 7 of 10, Loss: 0.26143401861190796, accuracy: 0.8456521103278832
Epoch 8 of 10, Loss: 0.29889628291130066, accuracy: 0.8458276053701483
Epoch 9 of 10, Loss: 0.1939985156059265, accuracy: 0.8457691070227266
Epoch 10 of 10, Loss: 0.2566056251525879, accuracy: 0.8474070607505338


### Wide and Deep

Time to combine the two parts. The code below is mostly identical to that of the `Deep` class. The two main differences are: 

 1. The output neuron receives now the wide and the deep side, so the input dimensions need to be adapted in both the definition of the model
 
         self.output = nn.Linear(self.hidden_layers[-1]+self.wide_dim, n_class)
    
    and the forward pass  

        wide_deep_input = torch.cat([x_deep, X_w], 1)


Other than that, there are no major changes. 

In [58]:
class WideDeep(nn.Module):

    def __init__(self, wide_dim, embeddings_input, continuous_cols, deep_column_idx, hidden_layers, n_class):

        super(WideDeep, self).__init__()
        self.wide_dim = wide_dim
        self.deep_column_idx = deep_column_idx
        self.embeddings_input = embeddings_input
        self.continuous_cols = continuous_cols
        self.hidden_layers = hidden_layers
        self.n_class = n_class

        for col,val,dim in self.embeddings_input:
            setattr(self, 'emb_layer_'+col, nn.Embedding(val, dim))

        input_emb_dim = np.sum([emb[2] for emb in self.embeddings_input])
        self.linear_1 = nn.Linear(input_emb_dim+len(continuous_cols), self.hidden_layers[0])
        for i,h in enumerate(self.hidden_layers[1:],1):
            setattr(self, 'linear_'+str(i+1), nn.Linear( self.hidden_layers[i-1], self.hidden_layers[i] ))

        self.output = nn.Linear(self.hidden_layers[-1]+self.wide_dim, n_class)

    def forward(self, X_w, X_d):

        emb = [getattr(self, 'emb_layer_'+col)(X_d[:,self.deep_column_idx[col]].long())
               for col,_,_ in self.embeddings_input]

        cont_idx = [self.deep_column_idx[col] for col in self.continuous_cols]
        cont = [X_d[:, cont_idx].float()]

        deep_inp = torch.cat(emb+cont, 1)

        x_deep = F.relu(self.linear_1(deep_inp))
        for i in range(1,len(self.hidden_layers)):
            x_deep = F.relu( getattr(self, 'linear_'+str(i+1))(x_deep) )

        wide_deep_input = torch.cat([x_deep, X_w.float()], 1)

        out = F.sigmoid(self.output(wide_deep_input))

        return out

Let's build the model and have a look to what is "inside"

In [59]:
wide_deep_model = WideDeep(wide_dim, embeddings_input, continuous_cols, deep_column_idx, hidden_layers, n_class)

In [60]:
wide_deep_model

WideDeep(
  (emb_layer_education): Embedding(16, 10)
  (emb_layer_workclass): Embedding(9, 10)
  (emb_layer_native_country): Embedding(42, 12)
  (emb_layer_occupation): Embedding(15, 10)
  (emb_layer_relationship): Embedding(6, 8)
  (linear_1): Linear(in_features=52, out_features=100, bias=True)
  (linear_2): Linear(in_features=100, out_features=50, bias=True)
  (output): Linear(in_features=848, out_features=1, bias=True)
)

As we see, everything is identical, apart from the output layer, which now receives all the features from the wide side plus the 50 dense features from the deep side. 

Finally, the `dataset` parameter in the `DataLoader` method needs to have  `__getitem__` and `__len__` methods itself. Here we would like to build a loader that returns, per batch, the wide and deep tensors. Therefore, I coded a simple class that will facilitate the loading and make the code more readable. 

In [61]:
class WideDeepLoader(Dataset):
    """Helper to facilitate loading the data to the pytorch models.

    Parameters:
    --------
    data: namedtuple with 3 elements - (wide_input_data, deep_inp_data, target)
    """
    def __init__(self, data):

        self.X_wide = data.wide
        self.X_deep = data.deep
        self.Y = data.labels

    def __getitem__(self, idx):

        xw = self.X_wide[idx]
        xd = self.X_deep[idx]
        y  = self.Y[idx]

        return xw, xd, y

    def __len__(self):
        return len(self.Y)


train_dataset = wd_dataset['train_dataset']
widedeep_dataset = WideDeepLoader(train_dataset)
train_loader = torch.utils.data.DataLoader(dataset=widedeep_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

At this stage, we are good to go:

In [64]:
optimizer = torch.optim.Adam(wide_deep_model.parameters())

batch_size = 64
n_epochs = 10
for epoch in range(n_epochs):
    total=0
    correct=0
    for i, (X_wide, X_deep, target) in enumerate(train_loader):
        X_d = Variable(X_deep)
        X_w = Variable(X_wide)
        y = Variable(target).float()

        optimizer.zero_grad()
        y_pred = wide_deep_model(X_w, X_d).squeeze(1)
        loss = F.binary_cross_entropy(y_pred, y)
        loss.backward()
        optimizer.step()

        total+= y.size(0)
        y_pred_cat = (y_pred > 0.5).float()
        correct+= float((y_pred_cat == y).sum().data)

    print ('Epoch {} of {}, Loss: {}, accuracy: {}'.format(epoch+1, n_epochs, (loss.data), (correct/total)))


Epoch 1 of 10, Loss: 0.14472396671772003, accuracy: 0.8244172102138114
Epoch 2 of 10, Loss: 0.2337171882390976, accuracy: 0.8368773582146304
Epoch 3 of 10, Loss: 0.22294625639915466, accuracy: 0.8399485214542689
Epoch 4 of 10, Loss: 0.19736920297145844, accuracy: 0.8410014917078592
Epoch 5 of 10, Loss: 0.4768935739994049, accuracy: 0.842727192956799
Epoch 6 of 10, Loss: 0.3591581881046295, accuracy: 0.843282927257305
Epoch 7 of 10, Loss: 0.10276183485984802, accuracy: 0.8431659305624616
Epoch 8 of 10, Loss: 0.6814659833908081, accuracy: 0.8442773991634737
Epoch 9 of 10, Loss: 0.12703849375247955, accuracy: 0.8445698909005821
Epoch 10 of 10, Loss: 0.2659841775894165, accuracy: 0.8465588347129194


And that's it.

The code in this demo, with some minor rings and bells, is wrapped-up into a the class `WideDeep` at `wide_deep.torch_model` so is easy to use. 

If you want to see how to use it simply go to demo3. 

출처: https://github.com/zenwan/Wide-and-Deep-PyTorch