Chap.4. softmax(dim=1)

The code for the model is as below
```
model = torch.nn.Sequential(
    torch.nn.Linear(l1, l2),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(l2, l3),
    torch.nn.Softmax(dim=0) #C
)
```

But the `softmax` operation with `dim=0` is only OK when the input is a 1 dimensional array. However, when you give a batch input, then the probability will be computed along the row direction of the batch matrix.

You can check it by printing `pred_batch` of Listing 4.8.
```
    pred_batch = model(state_batch) #N
    print(pred_batch)
```

One way to fix this is by modifying it to:
```
    torch.nn.Softmax(dim=1) #C
```
and do `unsqueeze(0)` and `squeeze(0)` for the computation of just one state vector:
```
state1 = env.reset()
pred = model(torch.from_numpy(state1).float().unsqueeze(0)) #G
action = np.random.choice(np.array([0,1]), p=pred.data.numpy().squeeze(0)) #H
state2, reward, done, info = env.step(action) #I
```

I like this book much since it gives some intuition for RL rather than trying to provide the theory^^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chap.4. softmax(dim=1) #32

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chap.4. softmax(dim=1) #32

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions