-
Notifications
You must be signed in to change notification settings - Fork 345
Open
Description
The code for the model is as below
model = torch.nn.Sequential(
torch.nn.Linear(l1, l2),
torch.nn.LeakyReLU(),
torch.nn.Linear(l2, l3),
torch.nn.Softmax(dim=0) #C
)
But the softmax operation with dim=0 is only OK when the input is a 1 dimensional array. However, when you give a batch input, then the probability will be computed along the row direction of the batch matrix.
You can check it by printing pred_batch of Listing 4.8.
pred_batch = model(state_batch) #N
print(pred_batch)
One way to fix this is by modifying it to:
torch.nn.Softmax(dim=1) #C
and do unsqueeze(0) and squeeze(0) for the computation of just one state vector:
state1 = env.reset()
pred = model(torch.from_numpy(state1).float().unsqueeze(0)) #G
action = np.random.choice(np.array([0,1]), p=pred.data.numpy().squeeze(0)) #H
state2, reward, done, info = env.step(action) #I
I like this book much since it gives some intuition for RL rather than trying to provide the theory^^
grisuji and PhGK
Metadata
Metadata
Assignees
Labels
No labels