Skip to content

Chap.4. softmax(dim=1) #32

@yongduek

Description

@yongduek

The code for the model is as below

model = torch.nn.Sequential(
    torch.nn.Linear(l1, l2),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(l2, l3),
    torch.nn.Softmax(dim=0) #C
)

But the softmax operation with dim=0 is only OK when the input is a 1 dimensional array. However, when you give a batch input, then the probability will be computed along the row direction of the batch matrix.

You can check it by printing pred_batch of Listing 4.8.

    pred_batch = model(state_batch) #N
    print(pred_batch)

One way to fix this is by modifying it to:

    torch.nn.Softmax(dim=1) #C

and do unsqueeze(0) and squeeze(0) for the computation of just one state vector:

state1 = env.reset()
pred = model(torch.from_numpy(state1).float().unsqueeze(0)) #G
action = np.random.choice(np.array([0,1]), p=pred.data.numpy().squeeze(0)) #H
state2, reward, done, info = env.step(action) #I

I like this book much since it gives some intuition for RL rather than trying to provide the theory^^

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions