#### Team members:
* Ekansh Sharma

In [6]:
### General imports
import numpy as np
import matplotlib.pyplot as plt
import gymnasium as gym
import torch
import torch.nn as nn
import torch.distributions as distributions
import torch.optim as optim

# Visuomotor Policies

In this assignment, you will develop a simple visuomotor policy for solving the simple cart-pole problem. For this, you will use the [gymnasium](https://gymnasium.farama.org/) (which defines the cart-pole environment) as well as PyTorch for implementing and training neural network models.

## Policy Network Implementation [30 points]

In the cell below, implement the `LearningAgent` class that defines policy and/or value networks (depending on the reinforcement learning algorithm that you want to implement) and allows you to sample actions from the learned policy as well as perform network updates based on experiences.

Your network should be defined so that the $s \in S$ is an image of the cart-pole system and the action space is discrete - move left or move right.

Note: If it helps, you are free to incorporate existing implementations of reinforcement learning algorithms, for instance as provided in [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/), in your solution.

In [None]:
### Define an agent that implements a deep reinforcement learning algorithm of your choice to solve an MDP.
### The class should:
### * define your policy / value networks
### * enable sample actions from the learned policy, and
### * enable network updates based on experiences
### You will need to update the function signatures so that you can pass appropriate parameters.

class LearningAgent(nn.Module):
    def __init__(self):
        # YOUR CODE HERE
        raise NotImplementedError()

    def sample_action(self):
        """Samples an action from the policy.
        """
        # YOUR CODE HERE
        raise NotImplementedError()

    def update(self):
        """Updates the network parameters.
        """
        # YOUR CODE HERE
        raise NotImplementedError()

## Agent Training [40 points]

Now that your network is defined, implement the reinforcement learning loop for your agent in the cell below. This means that you need to collect experiences of the form $(s_t, a_t, s_{t+1}, r)$ so that you can update your policy network appropriately. How exactly you do the update will depend on the RL algorithm you use.

Plot the evolution of the return over the learning process to show that your agent is actually learning. Note, however, that, as reinforcement learning algorithms have randomness associated with them, the results will differ every time you execute the algorithm; thus, you should plot an average of the return (over multiple runs) instead of the return of a single run --- like on the plots shown [here](https://how-do-you-learn.readthedocs.io/en/latest/rl/reinforce.html).

In [5]:
# creating the cart pole environment in a way that allows us to render the observation as an image
env = gym.make("CartPole-v1", render_mode='rgb_array')

### You can obtain an image of the current state of the system as follows:
###     current_img_state = env.render()

# YOUR CODE HERE
raise NotImplementedError()

DependencyNotInstalled: pygame is not installed, run `pip install gymnasium[classic-control]`

Discuss the observations from your evaluation here.

YOUR ANSWER HERE

## Combining Visual and Explicit State Information [30 points]

Modify your implementation of the policy network so that it takes both the image and the explicit state information of the system as separate inputs (thus turning the policy into a multimodal policy). Then, update the learning loop accordingly and verify that learning is indeed taking place.

In [None]:
class UpdatedLearningAgent(nn.Module):
    def __init__(self):
        # YOUR CODE HERE
        raise NotImplementedError()

    def sample_action(self):
        """Samples an action from the policy.
        """
        # YOUR CODE HERE
        raise NotImplementedError()

    def update(self):
        """Updates the network parameters.
        """
        # YOUR CODE HERE
        raise NotImplementedError()

In [None]:
# creating the cart pole environment in a way that allows us to render the observation as an image
env = gym.make("CartPole-v1", render_mode='rgb_array')

### You can obtain an image of the current state of the system as follows:
###     current_img_state = env.render()

# YOUR CODE HERE
raise NotImplementedError()

Has the second modality changed the behaviour of the agent? Discuss the observations from your evaluation here.

YOUR ANSWER HERE