# Mapping qubits
In this notebook we will cover the QGym `InitialMapping` environment.

This environment is aimed at solving the problem of mapping virtual to physical qubits that have a certain topology.

In [None]:
%matplotlib inline
import numpy as np
import networkx as nx
from networkx.generators import gnp_random_graph
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
from IPython.display import clear_output

from qgym.envs.initial_mapping import InitialMapping
from qgym.envs.initial_mapping.initial_mapping_rewarders import *

In [None]:
def render_rgb(step, rgb_array):
    """
    Convenience method that we will use later on to display our results.
    """
    clear_output(wait=True)
    plt.figure(figsize=(40, 20))
    plt.title(f"Step {step}", fontsize=40)
    plt.imshow(rgb_array)
    plt.axis("off")
    plt.show()

### Connection and interaction graph

The initial mapping problem is focussed around two graphs:

- connection graph: hardware layout describing the connections between physical qubits
- interaction graph: software layout describing which virtual qubits interact in the particular quantum program

The goal of the initial mapping problem is to find an optimal one-to-one between the virtual qubits of the interaction graph and the physical qubits of the connection graph.

For now, we will consider an optimal mapping to be any mapping where the number of edges of the mapped interaction graph that do not coincide with edges of the connection graph is minimal.

<br/>
<br/>
<br/>
<br/>


### Toy hardware

To explain this concept in more detail we start by defining a toy connection graph and by taking a look at some potential interaction graphs

In [None]:
connection_graph = nx.Graph()
connection_graph.add_edge(0, 1)
connection_graph.add_edge(0, 2)
connection_graph.add_edge(0, 3)
nx.draw(connection_graph, with_labels=True)

Now let's take a look at some random interaction graphs, and think about how these can be best mapped on the connection graph.

_We can simply generate random graphs using [`gnp_random_graph`](https://networkx.org/documentation/stable/reference/generated/networkx.generators.random_graphs.gnp_random_graph.html)._

In [None]:
def generate_random_interaction_graph(connection_graph):
    p = np.random.rand()  # edge probability
    n = connection_graph.number_of_nodes()
    return gnp_random_graph(n, p)
    
interaction_graph = generate_random_interaction_graph(connection_graph)
nx.draw(interaction_graph, with_labels=True)

### Human Intelligence

todo: describe environment specifics

Since this environment is still quite straightforward, we should be able to solve this case optimally by hand.

In [None]:
env = InitialMapping(0.5, connection_graph=connection_graph)
env.rewarder = EpisodeRewarder(illegal_action_penalty=0)
obs = env.reset(interaction_graph=interaction_graph)
print(obs)

In [None]:
obs, rewards, done, info = env.step((0,0))
render_rgb(1, env.render(mode="rgb_array"))
print(obs)

In [None]:
obs, rewards, done, info = env.step((1,1))
render_rgb(2, env.render(mode="rgb_array"))
print(obs)

In [None]:
obs, rewards, done, info = env.step((2,2))
render_rgb(3, env.render(mode="rgb_array"))
print(obs)

In [None]:
obs, rewards, done, info = env.step((3,3))
render_rgb(4, env.render(mode="rgb_array"))
print(obs)

### Reinforcement learning

Let's check if our reinforcement learning agent is capable of solving this problem.

In [None]:
env = InitialMapping(0.5, connection_graph=connection_graph)
env.rewarder = EpisodeRewarder(illegal_action_penalty=0)
check_env(env, warn=True)

model = PPO("MultiInputPolicy", env, verbose=1)

model.learn(int(1e5))

In [None]:
obs = env.reset(interaction_graph=connection_graph)
for i in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, info = env.step(action)
    render_rgb(i, env.render(mode="rgb_array"))
    if done:
        break

Let's try another interaction graph.

In [None]:
interaction_graph = connection_graph.copy()
interaction_graph.remove_edge(1, 3)
nx.draw(interaction_graph)

In [None]:
obs = env.reset(interaction_graph=interaction_graph)
for i in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, info = env.step(action)
    render_rgb(i, env.render(mode="rgb_array"))
    if done:
        break

Just to be sure, one more...

In [None]:
interaction_graph = connection_graph.copy()
interaction_graph.add_edge(3, 2)
nx.draw(interaction_graph)

In [None]:
obs = env.reset(interaction_graph=interaction_graph)
for i in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, info = env.step(action)
    render_rgb(i, env.render(mode="rgb_array"))
    if done:
        break

<br/>
<br/>
<br/>
<br/>

### More realistic hardware

Having seen that we are able to train an agent on a toy environment, let's take a look at a more realistic hardware topology.

In [None]:
connection_graph = nx.Graph()
connection_graph.add_edge(0, 1)
connection_graph.add_edge(1, 2)
connection_graph.add_edge(2, 0)
connection_graph.add_edge(2, 3)
connection_graph.add_edge(3, 4)
connection_graph.add_edge(4, 2)
connection_graph.number_of_nodes()
nx.draw(connection_graph)

In [None]:
env = InitialMapping(0.5, connection_graph=connection_graph)
env.rewarder = EpisodeRewarder(illegal_action_penalty=-10)
check_env(env, warn=True)

model = PPO("MultiInputPolicy", env, verbose=1)

model.learn(int(1e6))
model.save("saved_model")

In [None]:
interaction_graph = fast_gnp_random_graph(5, 0.5)
nx.draw(interaction_graph)

In [None]:
env = InitialMapping(0.5, connection_graph=connection_graph)
model = PPO.load("saved_model")
obs = env.reset(interaction_graph=interaction_graph)
for i in range(1000):
    action, states = model.predict(obs, deterministic=False)
    obs, rewards, done, info = env.step(action)
    render_rgb(i, env.render(mode="rgb_array"))
    if done:
        break

In [None]:
# try the same with the other two rewarders that we have defined

In [None]:
# QAP rewarder