# Mapping qubits
In this notebook we will cover the QGym `InitialMapping` environment.

This environment is aimed at solving the problem of mapping virtual to physical qubits that have a certain topology.

In [None]:
%matplotlib inline
import numpy as np
import networkx as nx
from networkx.generators import gnp_random_graph
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
from stable_baselines3.common.utils import set_random_seed
from stable_baselines3.common.vec_env import SubprocVecEnv
from IPython.display import clear_output

from qgym.envs.initial_mapping import *

In [None]:
def render_rgb(env):
    """Convenience method that we will use later on to display our results."""
    clear_output(wait=True)
    plt.figure(figsize=(40, 20))
    plt.title(f"Step {env._state.steps_done}", fontsize=40)
    plt.imshow(env.render())
    plt.axis("off")
    plt.show()

### Connection and interaction graph

The initial mapping problem is focussed around two graphs:

- connection graph: hardware layout describing the connections between physical qubits
- interaction graph: software layout describing which virtual qubits interact in the particular quantum program

The goal of the initial mapping problem is to find an optimal one-to-one between the virtual qubits of the interaction graph and the physical qubits of the connection graph.

For now, we will consider an optimal mapping to be any mapping where the number of edges of the mapped interaction graph that do not coincide with edges of the connection graph is minimal.

#### Toy hardware

To explain this concept in more detail we start by defining a toy connection graph and by taking a look at some potential interaction graphs

In [None]:
connection_graph = nx.Graph()
connection_graph.add_edges_from([(0, 1), (0, 2), (0, 3)])
nx.draw(connection_graph, with_labels=True)

Now let's take a look at some random interaction graphs, and think about how these can be best mapped on the connection graph.

### To Do
Implement the `generate_random_interaction_graph` in the code block below.

_We can simply generate random graphs using [`gnp_random_graph`](https://networkx.org/documentation/stable/reference/generated/networkx.generators.random_graphs.gnp_random_graph.html)._

In [None]:
def generate_random_interaction_graph(connection_graph):
    p = np.random.rand()  # edge probability
    n = connection_graph.number_of_nodes()
    return gnp_random_graph(n, p)


interaction_graph = generate_random_interaction_graph(connection_graph)
nx.draw(interaction_graph, with_labels=True)

### `InitialMapping` environment

The most simple `InitialMapping` environment can be initialized by providing just a connection graph.

In [None]:
env = InitialMapping(connection_graph=connection_graph)

#### State space
The state space is described by a `State` with the following structure:

- `steps_done`: Number of steps done since the last reset.
- `num_nodes`: Number of *physical* qubits.
- `graphs`: Dictionary containing the the interaction graph, connection graph and a interaction graph generator.
- `mapping`: Array of which the index represents a physical qubit, and the value a virtual qubit. A value of ``num_nodes + 1`` represents the case when nothing is mapped to the physical qubit yet. (Used for observations)
- `mapping_dict`: Dictionary that maps logical qubits (keys) to physical qubit (values).
- `mapped_qubits`: Dictionary with a two Sets containing all mapped physical and logical qubits.

### To Do
Take a look at the state space in the code block below.

In [None]:
env.reset()
print(env._state)

#### Action space
A valid action is a tuple of integers $(i,j)$, such that $0 \leq i,j < n$ where $n$ is the number of physical qubits. The action $(i,j)$ maps virtual qubit $j$ to phyiscal qubit $i$ when this action is legal. An action is legal when:
1. virtual qubit $i$ has not been mapped to another physical qubit; and
2. no other virual qubit has been mapped to physical qubit $j$.

### To Do
Take a look at the action space in the code block below.

In [None]:
print(env.action_space)

#### Observation space

Each element in the observation space is a dictionary with 2 entries:
- `mapping`: the current state of the mapping
- `interaction_matrix`: the flattened adjacency matrix of the interaction graph

### To Do
Take a look at the observation space and see what the reset() method returns in the code block below.

In [None]:
print(env.observation_space)
print()
obs, extra_info = env.reset()
print(obs)

#### Rewarders

We have pre-defined 3 different rewarders, all of which return a penalty when an illegal action is attempted, for a valid action their behaviour is as follows:

- `BasicRewarder`: Reward is computed over all edges that have been mapped so far.
- `SingleStepRewader`: Reward is computed over all new edges that have been mapped due to this action.
- `EpisodeRewarder`: Only the final step results in a reward.

All these rewarders can be tweaked by altering either of their parameters:

- `illegal_action_penalty`: penalty given for attempting an illegal action (should be non-positive)
- `reward_per_edge`: reward giving for correctly mapped edges (should be non-negative)
- `penalty_per_edge`: penalty given for incorrectly mapped edges (should be non-positive)

<br/>
<br/>
<br/>
<br/>

### Human Intelligence
Let's attempt to determine the optimal mapping from our randomly generated interaction graph to our toy connection graph. Since this environment is still quite straightforward, we should be able to solve this case optimally by hand.

Don't forget to take a look at the obtained rewards and observations after each step.

_Note: It might be that multiple mappings our optimal._

### To Do
Pick your favorite rewarder in the first code block below.
Next, try to solve the problem by giving the actions which you think are correct. 

In [None]:
env = InitialMapping(connection_graph=connection_graph, render_mode="rgb_array")
env.rewarder = EpisodeRewarder(illegal_action_penalty=0)
obs, extra_info = env.reset(options={"interaction_graph": interaction_graph})
print(f"observation: {obs}")

In [None]:
obs, rewards, done, truncated, info = env.step((0, 0))
print(f"observation: {obs}\n\nreward: {rewards}")
render_rgb(env)

In [None]:
obs, rewards, done, truncated, info = env.step((1, 1))
print(f"observation: {obs}\n\nreward: {rewards}")
render_rgb(env)

In [None]:
obs, rewards, done, truncated, info = env.step((2, 2))
print(f"observation: {obs}\n\nreward: {rewards}")
render_rgb(env)

In [None]:
obs, rewards, done, truncated, info = env.step((3, 3))
print(f"observation: {obs}\n\nreward: {rewards}")
render_rgb(env)

<br/>
<br/>
<br/>
<br/>

### Reinforcement learning

Can we achieve the same using reinforcement learning?

Does changing the rewarder and/or its parameter give better results?

### To Do
Train a model on this environment in the first code block below.
Next, run the second block to see how well the agent performs.

In [None]:
env = InitialMapping(connection_graph=connection_graph)
env.rewarder = EpisodeRewarder(illegal_action_penalty=0)
check_env(env, warn=True)

model = PPO("MultiInputPolicy", env, verbose=1)

model.learn(int(1e5))
model.save("initial_mapping_1")

In [None]:
env = InitialMapping(connection_graph=connection_graph, render_mode="rgb_array")
model = PPO.load("initial_mapping_1")

obs, extra_info = env.reset(options={"interaction_graph": connection_graph})
for _ in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, truncated, info = env.step(action)
    render_rgb(env)
    if done:
        break

Let's try another interaction graph.

### To Do
Design a nice interaction graph in the first code block.
Next, run the second block to see if the agent can map it correctly.

In [None]:
interaction_graph = connection_graph.copy()
interaction_graph.remove_edge(0, 2)
nx.draw(interaction_graph)

In [None]:
model = PPO.load("initial_mapping_1")
obs, extra_info = env.reset(options={"interaction_graph": interaction_graph})
for _ in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, trucated, info = env.step(action)
    render_rgb(env)
    if done:
        break

Just to be sure, one more...

In [None]:
interaction_graph = connection_graph.copy()
interaction_graph.add_edge(3, 2)
nx.draw(interaction_graph)

In [None]:
obs, extra_info = env.reset(options={"interaction_graph": interaction_graph})
for _ in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, truncated, info = env.step(action)
    render_rgb(env)
    if done:
        break

<br/>
<br/>
<br/>
<br/>

### More realistic hardware

Having seen that we are able to train an agent on a toy environment, let's take a look at a more realistic hardware topology.

In [None]:
connection_graph = nx.Graph()
connection_graph.add_edges_from([(0, 1), (1, 2), (2, 0), (2, 3), (3, 4), (4, 2)])
nx.draw(connection_graph)

### To Do
Use the first code block to create a new environment with the new connection graph, set a rewarder and train an agent.
Use the second code block to design a interaction graph of your choice.
Finally, use the third code block to see how well the agent performs on your interaction graph.

In [None]:
env = InitialMapping(connection_graph=connection_graph)
env.rewarder = EpisodeRewarder(illegal_action_penalty=-10)
check_env(env, warn=True)

model = PPO("MultiInputPolicy", env, verbose=1)

model.learn(int(1e5))
model.save("initial_mapping_2")

In [None]:
interaction_graph = generate_random_interaction_graph(connection_graph)
nx.draw(interaction_graph)

In [None]:
env = InitialMapping(connection_graph=connection_graph, render_mode="rgb_array")
model = PPO.load("initial_mapping_2")

obs, extra_info = env.reset(options={"interaction_graph": interaction_graph})
for _ in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, truncated, info = env.step(action)
    render_rgb(env)
    if done:
        break

<br/>
<br/>
<br/>
<br/>

### Connection fidelity

Up to now, we have consider interaction graphs without fidelity. However, we can also train agents to learn how to deal with fidelity.

Most digital quantum computers do not have the same fidelity on every edge. Hence, we might want to take this into account for the computation of our reward. Meaning, that the agent would attempt to find a solution which not only requires a small amount of swap gates but also takes edge fidelities into account.

### To Do
Define a weighted connection graph in the code block below.

In [None]:
connection_graph = nx.Graph()
connection_graph.add_edge(0, 1, weight=1)
connection_graph.add_edge(1, 2, weight=1)
connection_graph.add_edge(2, 0, weight=1)
connection_graph.add_edge(2, 3, weight=0.5)
connection_graph.add_edge(3, 4, weight=0.5)
connection_graph.add_edge(4, 2, weight=0.5)

# display graph with edge weights
pos = nx.spring_layout(connection_graph, seed=0)
edge_labels = nx.get_edge_attributes(connection_graph, "weight")
nx.draw(connection_graph, pos, with_labels=True)
nx.draw_networkx_edge_labels(connection_graph, pos, edge_labels);

Time for training...

In [None]:
env = InitialMapping(connection_graph=connection_graph)
env.rewarder = EpisodeRewarder(illegal_action_penalty=-10)
check_env(env, warn=True)

model = PPO("MultiInputPolicy", env, verbose=1)

model.learn(int(1e5))
model.save("initial_mapping_3")

How does fidelity influence your training?

In [None]:
interaction_graph = generate_random_interaction_graph(connection_graph)
nx.draw(interaction_graph)

In [None]:
env = InitialMapping(connection_graph=connection_graph, render_mode="rgb_array")
model = PPO.load("initial_mapping_3")

obs, extra_info = env.reset(options={"interaction_graph": interaction_graph})
for _ in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, truncated, info = env.step(action)
    render_rgb(env)
    if done:
        break