### GYM CartPole
- Rewards
    - +1 for every incremental timestep
    - Env terminate if pole falls over or cart moves more than 2.4 units away from center
- Performance (value)
    - Higher value if scenarios run for longer duration, accumulating larger return
- Input
    - (position, velocity, etc.)

### Useful Syntax
- namedtuple: access elements in tuple with "keys"; for example, transition.state=0
- deque: efficient append and pop, fixed-size buffer (useful for replay memory in RL, where older experiences are dorpped as new ones are added, keeping a manageable memory size)
- random.sample(array, n): randomly sample an array of length n from array


### Ideas
#### Replay Memory
During each step of an episode, an agent generates a tuple of experience, often represented as $(\text{state}, \text{action}, \text{next\_state}, \text{reward})$. Replay memory stores these experiences in a buffer, allowing the agent to access and learn from past actions, even if they no longer represent the current state.

In [1]:
import gym
import math
import random
import matplotlib.pyplot as plt
from collections import namedtuple, deque
from itertools import count

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

In [7]:
env = gym.make('CartPole-v1')

# if GPU, use it
device = torch.device(
    "cuda" if torch.cuda.is_available() else
    "mps" if torch.backends.mps.is_available() else
    "cpu"
)

In [None]:
Transition = namedtuple('Transition', ('state', 'action', 'next_state', 'reward'))

class ReplayMemory:
    def __init__(self, capacity):
        self.memory = deque([], maxlen=capacity)
    
    def push(self, *args):
        # save memory of a transition
        self.memory.append(Transition(*args))

    def sample(self, batch_size):
        # random sample a batch of transitions
        return random.sample(self.memory, batch_size)
    
    def __len__(self):
        return len(self.memory)

[1, 2]