# Cart Pole with Deep Q-Network (DQN)
---
In this notebook, you will implement a DQN agent with OpenAI Gym's [Cart Pole](https://www.gymlibrary.dev/environments/classic_control/cart_pole/) - environment. The file structure is as follows:

- `model.py` - contains model written in PyTorch
- `dqn_agent.py` - contains a Deep Q-Learning agent
- `play.py` - allows to play the game manually
- `render.py` - uses the saved agent and renders the environment
- `TODO.ipynb` - this notebook

Your task is to write a code in `model.py`, `agent.py` and this notebook. The places were you should create your own implementation are described in this notebook and marked with `# ENTER YOUR CODE HERE` comments.

This is a relatively simple version of the algorithm, it differs from classical Q-Learning in four main ways:
- we are using a neural network to estimate the state-action value
- the neural network estimates all state-action values for a given state
- we are using Replay Buffer to store the `s, a, r, s', done` tuples, and learning from the data sampled from the Replay Buffer, not from immediate experience.
- we are using the decaying exploration rate $\epsilon$. In the beginning of learning the agent is exploring a lot, and the exploration rate is reduced in time.



In [None]:
# Import packages
import gym
import torch
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
%matplotlib inline

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print ('Device:', device)

### Task 1 - Implement the neural network
In `model.py` implement the neural network (`QNetwork` class):
1) In `__init__()` Create a neural network
2) In `forward()` implement the forward pass with ReLU

### Task 2 - Implement the Agent:
The agent is defined in `dqn_agent.py` in the `Agent` class.
1) In `__init__()` create a neural network, optimizer and Replay Buffer
2) In `step()` add the experience to the memory buffer
3) In `learn()` calculate `Q_targets` and `Q_expected`

Please refer to the instructions in `Deep_Q_Network.ipynb` if you would like to write your own DQN agent.  Otherwise, run the code cell below to load the solution files.

### Task 3. Train the Agent
Complete the `train_agent()` function.
1) Select and perform an action
2) Use agent.step() to update the agent's knowledge
3) Decay the explorarion rate epsilon


In [None]:
env = gym.make('CartPole-v1')
print(f'State shape: {env.observation_space.shape}')
print(f'Number of actions: {env.action_space.n}')

In [None]:
from dqn_agent import Agent

agent = Agent(
    state_size=env.observation_space.shape[0], 
    action_size=env.action_space.n)


In [None]:

def train_agent(n_episodes=3000, 
        max_t=1000, 
        eps_start=1.0, 
        eps_end=0.01, 
        eps_decay=0.995,
        finish_threshold = 200):
    """Deep Q-Learning.
    Args:
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
        eps_start (float): starting value of epsilon, for 
            epsilon-greedy action selection
        eps_end (float): minimum value of epsilon
        eps_decay (float): decay factor (per episode) 
            for decreasing epsilon
        finish_threshold (float): finish when the average score
            is greater than this value
    Returns:
        scores (list): list of scores from each episode
    """
    scores = []                        
    scores_window = deque(maxlen=100)  
    eps = eps_start                    
    for episode in range(1, n_episodes+1):
        state, info = env.reset()
        score = 0
        for t in range(max_t):
            
            # TODO: Select and perform an action
            action = None # ENTER YOUR CODE HERE
            next_state, reward, done, truncated, info = None # ENTER YOUR CODE HERE
            
            # TODO: Use agent.step() to update the agent's knowledge
            # ENTER YOUR CODE HERE
            
            state = next_state
            score += reward
            if done:
                break 
        scores_window.append(score)       
        scores.append(score)
        
        # TODO: Decay epsilon by multiplying it by eps_decay
        # Make sure it does not go below eps_end              
        eps = None # ENTER YOUR CODE HERE
        
        mean_score = np.mean(scores_window)
        print(f'\rEpisode {episode}\tAverage Score: {mean_score:.2f}', end="")
        if episode % 100 == 0:
            print(f'\rEpisode {episode}\tAverage Score: {mean_score:.2f}')
            agent.save('checkpoint.pth')
        if mean_score >= finish_threshold:
            print(f'\nDone in {episode:d} episodes!\tAverage Score: {mean_score:.2f}')
            agent.save('checkpoint.pth')
            break
    return scores

### Task 4. Train the agent
- Run the `train_agent()` function to train the agent
- From command line use `python render.py` to see how the agent performs


In [None]:
scores = train_agent()

# plot the scores
fig = plt.figure()
plt.plot(np.arange(len(scores)), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

### Task 5: Improve the agent
Tamper with the parameters to improve the agent's performance.
