Dhananjay Tiwari,

PhD Student, Mechanical Science and Engineering,

University of Illinois Urbana Champaign,

Coordinated Science Laboratory, 348

### Goal: Learn a neural network for the following function 

$\Lambda:\mathcal{S}\times \mathcal{A} \rightarrow \mathbb R$

where $\mathcal{S}$ is a state-space, $\mathcal{A}$ is an action-space

$\mathcal{S} = \{s_1, s_2, s_3, \dots, s_N\}$ and 

$\mathcal{A} = \{a_1, a_2, a_3, \dots, a_M\}$

Suppose each state and action has an associated parameter given by $\{\zeta_s, s\in\mathcal S\}$ and $\{\eta_a, a \in \mathcal A\}$ respectively.

Let's aim to learn the dummy function $\Lambda(s,a) = \sqrt{\zeta_s^2+\eta_a^2}$


### Import Dependencies
In this code, we use keras module adapted within tensorflow to build and train neural networks

In [191]:
import numpy as np
from tensorflow import keras
from itertools import chain

### Define the function that needs to be approximated as a neural network

In [192]:
# initialize the parameters, state-space, action-space and the Lambda function
M = 300; N = 100
state_space=[]; action_space=[]
for i in range(M):
    state_space.append((i,))
for j in range(N):
    action_space.append((j,))

# initialize state-action parameters
state_params = np.arange(len(state_space))
action_params = np.arange(len(action_space))

# display variables
print('M = ', M)
print('N = ', N)
print('state_space = ', state_space)
print('action_space = ', action_space)
print('state_params = ', state_params)
print('action_params = ', action_params)

# define the Lambda function
def Lambda(states,actions):
    return np.sqrt(state_params[states]**2+np.squeeze(action_params[actions])**2)

# test Lambda function for all the states
LambdaValue = Lambda(state_space,action_space)
print('Lambda_shape = ', np.shape(LambdaValue))
s = (1,)
a = (7,)
print('Lambda(s,a) at s=',s,'a=',a, '=', LambdaValue[(*s,*a)])

# test Lambda function for randomly selected states
ID_sub_state_space0 = np.random.choice(a=np.arange(M), size=10, replace=False)
ID_sub_action_space0 = np.random.choice(a=np.arange(N), size=5, replace=False)
sub_state_space0 = [state_space[i] for i in ID_sub_state_space0]
sub_action_space0 = [action_space[i] for i in ID_sub_action_space0]
print('sub state space = ', sub_state_space0)
print('sub action space = ', sub_action_space0)
LambdaValue0 = Lambda(sub_state_space0, sub_action_space0)
print('Lambda_shape = ', np.shape(LambdaValue0))


M =  300
N =  100
state_space =  [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,), (11,), (12,), (13,), (14,), (15,), (16,), (17,), (18,), (19,), (20,), (21,), (22,), (23,), (24,), (25,), (26,), (27,), (28,), (29,), (30,), (31,), (32,), (33,), (34,), (35,), (36,), (37,), (38,), (39,), (40,), (41,), (42,), (43,), (44,), (45,), (46,), (47,), (48,), (49,), (50,), (51,), (52,), (53,), (54,), (55,), (56,), (57,), (58,), (59,), (60,), (61,), (62,), (63,), (64,), (65,), (66,), (67,), (68,), (69,), (70,), (71,), (72,), (73,), (74,), (75,), (76,), (77,), (78,), (79,), (80,), (81,), (82,), (83,), (84,), (85,), (86,), (87,), (88,), (89,), (90,), (91,), (92,), (93,), (94,), (95,), (96,), (97,), (98,), (99,), (100,), (101,), (102,), (103,), (104,), (105,), (106,), (107,), (108,), (109,), (110,), (111,), (112,), (113,), (114,), (115,), (116,), (117,), (118,), (119,), (120,), (121,), (122,), (123,), (124,), (125,), (126,), (127,), (128,), (129,), (130,), (131,), (132,), (133,), (134

### Create input-output dataset using the function defined above

In [193]:

ID_sub_state_space1 = np.random.choice(a=np.arange(M), size=100, replace=False)
ID_sub_action_space1 = np.random.choice(a=np.arange(N), size=60, replace=False)
sub_state_space1 = [state_space[i] for i in ID_sub_state_space1]
sub_action_space1 = [action_space[i] for i in ID_sub_action_space1]
S,A = np.meshgrid(list(chain(*sub_state_space1)),list(chain(*sub_action_space1)))

inputs = np.vstack([S.ravel(), A.ravel()]).T
outputs = np.array([Lambda(input[0],input[1]) for input in inputs])
print('inputs = ', inputs, '\nshape = ', np.shape(inputs), 'type = ', type(inputs))
print('output = ', outputs, '\nshape = ', np.shape(outputs), 'type = ', type(outputs))

# training dataset
inputs_train = inputs[0:-1000,:]
outputs_train = outputs[0:-1000]

# create test dataset
inputs_test = inputs[-1000:,:]
outputs_test = outputs[-1000:]

# reserve some samples from training dataset for validation
inputs_val = inputs_train[-1000:,:]
outputs_val = outputs_train[-1000:]
inputs_train = inputs_train[0:-1000,:]
outputs_train = outputs_train[0:-1000]

# print('inputs_train = ', inputs_train, '\nshape = ', np.shape(inputs_train))
# print('output_train = ', outputs_train, '\nshape = ', np.shape(outputs_train))



inputs =  [[ 29  63]
 [283  63]
 [ 49  63]
 ...
 [298  85]
 [ 13  85]
 [297  85]] 
shape =  (6000, 2) type =  <class 'numpy.ndarray'>
output =  [ 69.35416354 289.92757716  79.81227976 ... 309.88546271  85.98837131
 308.92393886] 
shape =  (6000,) type =  <class 'numpy.ndarray'>


### Initialize a Neural network below

Input consists of 2 nodes corresponding to a state-action pair $(s,a)$

Output consists of 1 node corresponding to the value $\Lambda(s,a)$

Loss function is Mean-Square Error $L(w,b) = \frac{1}{P}\sqrt{\sum_{p=1}^P (\hat{\Lambda}(s_k,a_k,w,b)-\Lambda(s_k,a_k))^2}$

where $\hat{\Lambda}$ is the Neural Net approximation of $\Lambda$, $(s_k,a_k)$ are state-action samples for training

Using Keras Sequential API to build and train the Neural Network

In [203]:
# initialize the input (s,a) layer
keras.Input(shape=(2,))
# initialize the Sequential model
model = keras.Sequential()
# add the first layer with input shape equal to the input layer
model.add(keras.layers.Dense(5,input_shape=(2,), activation='relu'))
# add the second layer
model.add(keras.layers.Dense(5, activation='relu'))
# add the third (output) layer
model.add(keras.layers.Dense(1)) 

model.summary()


Using Functional API to build and train the Neural Network

In [198]:
# initialize the input (s,a) layer
inputs = keras.Input(shape=(2,), name='state_action_pair')
# add the first layer and connect it with the input layer
x = keras.layers.Dense(10, activation='relu', name='dense_1')(inputs)
# add the second layer and connect it with the first layer
x1 = keras.layers.Dense(10, activation='relu', name='dense_2')(x)
# add the output layer and connect it with the second layer
outputs = keras.layers.Dense(1, name='predictions')(x1)
# initialize the DNN model with the above layer architecture
model1 = keras.Model(inputs=inputs, outputs=outputs)

model1.summary()

### Train the network

In [204]:
mdl = model
# compile the model
mdl.compile(
    optimizer=keras.optimizers.RMSprop(),
    loss = keras.losses.MeanSquaredError(reduction="sum_over_batch_size", name="mean_squared_error"),
    metrics = [keras.metrics.MeanSquaredError(name="mean_squared_error")]
)

# train the model
history = mdl.fit(
    inputs_train,
    outputs_train,
    batch_size=30,
    epochs=20,
    validation_data = (inputs_val, outputs_val))

history.history


Epoch 1/20
[1m134/134[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 15535.9863 - mean_squared_error: 15535.9863 - val_loss: 4374.4819 - val_mean_squared_error: 4374.4819
Epoch 2/20
[1m134/134[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 2293.2505 - mean_squared_error: 2293.2505 - val_loss: 155.7341 - val_mean_squared_error: 155.7341
Epoch 3/20
[1m134/134[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 924us/step - loss: 132.1232 - mean_squared_error: 132.1232 - val_loss: 37.3226 - val_mean_squared_error: 37.3226
Epoch 4/20
[1m134/134[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 838us/step - loss: 55.4271 - mean_squared_error: 55.4271 - val_loss: 14.8447 - val_mean_squared_error: 14.8447
Epoch 5/20
[1m134/134[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 969us/step - loss: 14.6378 - mean_squared_error: 14.6378 - val_loss: 2.0508 - val_mean_squared_error: 2.0508
Epoch 6/20
[1m134/134[0m [32m━━━━━━━━━━━━━━━━

{'loss': [11629.4921875,
  1215.6107177734375,
  105.11119842529297,
  44.67398452758789,
  8.955161094665527,
  2.632638931274414,
  2.5200552940368652,
  2.5151498317718506,
  2.408536434173584,
  2.360459566116333,
  2.283921480178833,
  2.2598557472229004,
  2.195401191711426,
  2.2107813358306885,
  2.1416144371032715,
  2.1325459480285645,
  2.103084087371826,
  2.0963804721832275,
  2.0605502128601074,
  2.0730507373809814],
 'mean_squared_error': [11629.4921875,
  1215.6107177734375,
  105.11119842529297,
  44.67398452758789,
  8.955161094665527,
  2.632638931274414,
  2.5200552940368652,
  2.5151498317718506,
  2.408536434173584,
  2.360459566116333,
  2.283921480178833,
  2.2598557472229004,
  2.195401191711426,
  2.2107813358306885,
  2.1416144371032715,
  2.1325459480285645,
  2.103084087371826,
  2.0963804721832275,
  2.0605502128601074,
  2.0730507373809814],
 'val_loss': [4374.48193359375,
  155.7340850830078,
  37.322574615478516,
  14.844743728637695,
  2.0508484840393

### Evaluate

In [205]:
results = mdl.evaluate(inputs_test, outputs_test, batch_size=128)
print("test loss, test acc:", results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print("Generate predictions for 3 samples")
predictions = mdl.predict(inputs_test[:3])
print("predictions shape:", predictions.shape)
print("Inputs_test = ", inputs_test[:3])
print("predictions = ", predictions)

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 658us/step - loss: 1.9376 - mean_squared_error: 1.9376
test loss, test acc: [2.1319096088409424, 2.1319096088409424]
Generate predictions for 3 samples
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
predictions shape: (3, 1)
Inputs_test =  [[ 29  20]
 [283  20]
 [ 49  20]]
predictions =  [[ 36.32327 ]
 [284.37262 ]
 [ 53.280544]]
