Using reinforcement learning to optimize decision-making strategies for quantum circuit design

In [None]:
import gym
from gym import spaces
import hashlib
import numpy as np
from qiskit import QuantumCircuit, transpile
from qiskit_aer import Aer
from qiskit.circuit.library import HGate, CXGate, SGate, TGate, XGate, YGate, ZGate, CRZGate, TdgGate, UnitaryGate
from qiskit.quantum_info import Operator
import matplotlib.pyplot as plt
import csv

# **Define Matrices and Operators**

This section defines key quantum operators, unitary transformations, and quantum circuits for Bell states, GHZ states, and textbook examples.

---

In [None]:
# Define basic Quantum gates

H = np.array([[1, 1], [1, -1]]) / np.sqrt(2)
X = np.array([[0, 1], [1, 0]])
Z = np.array([[1, 0], [0, -1]])

# Define matrices and operators
swap_matrix = np.array([
    [1, 0, 0, 0],
    [0, 0, 1, 0],
    [0, 1, 0, 0],
    [0, 0, 0, 1]
])

CNOT = np.array([
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 0, 1],
    [0, 0, 1, 0]
])

# Bell state unitary
bell_state_unitary = Operator(CNOT) @ Operator(np.kron(H, np.eye(2)))
phi_minus = Operator(np.kron(np.eye(2), Z)) @ Operator(CNOT) @ Operator(np.kron(H, np.eye(2)))
psi_plus = Operator(CNOT) @ Operator(np.kron(X, np.eye(2))) @ Operator(np.kron(H, np.eye(2)))
psi_minus = Operator(np.kron(np.eye(2), Z)) @ Operator(CNOT) @ Operator(np.kron(X, np.eye(2))) @ Operator(np.kron(H, np.eye(2)))

# CZ matrix
cz_matrix = np.array([
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, -1]
])

# GHZ Circuit (3 qubits)
ghz_circuit = QuantumCircuit(3)
ghz_circuit.h(0)
ghz_circuit.cx(0, 1)
ghz_circuit.cx(1, 2)
ghz_circuit = Operator(ghz_circuit)

# Textbook circuits
# page 200
text_circuit1 = QuantumCircuit(3)
text_circuit1.cx(0,1)
text_circuit1.cx(1,2)
text_circuit1.h(0)
text_circuit1.h(1)
text_circuit1.h(2)
text_circuit1 = Operator(text_circuit1)


You can make your own circuit here and modify and test the effect of Q learning reinforcement learning algorithm in designing the circuit in the subsequent code.

# **How to Create Your Own Quantum Circuit**

Quantum computing circuits are designed using `Qiskit`, a quantum computing framework in Python. Below is a step-by-step guide on how to create your own quantum circuit.

---

## **1. Import Necessary Libraries**
Before creating a quantum circuit, you need to import the required libraries.

```python
from qiskit import QuantumCircuit, Aer, transpile, assemble, execute
from qiskit.quantum_info import Operator
import numpy as np
```

---

## **2. Create a Quantum Circuit**
You can create a quantum circuit using `QuantumCircuit`. The number of qubits is specified as an argument.

```python
qc = QuantumCircuit(2)  # Create a quantum circuit with 2 qubits
```

---

## **3. Apply Quantum Gates**
Quantum gates manipulate qubits in different ways. Some commonly used quantum gates include:

- **Hadamard Gate (H)**: Creates a superposition state.
- **CNOT Gate (CX)**: Entangles two qubits.
- **Pauli Gates (X, Y, Z)**: Represent basic quantum operations.

Example of applying gates:

```python
qc.h(0)        # Apply Hadamard gate to qubit 0
qc.cx(0, 1)    # Apply CNOT gate with qubit 0 as control and qubit 1 as target
```

---

## **4. Convert the Circuit to a Matrix**
To obtain the matrix representation of a circuit, use the `Operator` class:

```python
unitary_operator = Operator(qc)
print(unitary_operator.data)  # Print the corresponding unitary matrix
```

---

## **5. Check If a Matrix is Unitary**
You make create a np.array to have your own Quantum Circuit.
But have to make sure that it is a Unitary.


### **Function to Check If a Matrix is Unitary**
```python
def is_unitary(matrix):
    """Check if a given np.array is a unitary matrix."""
    identity = np.eye(matrix.shape[0])  # Create an identity matrix of the same size
    conjugate_transpose = np.conjugate(matrix).T  # Compute conjugate transpose
    return np.allclose(identity, conjugate_transpose @ matrix) and np.allclose(identity, matrix @ conjugate_transpose)

# Example matrices
matrix1 = np.array([
    [1, 0],
    [0, -1]
])  # Unitary matrix (Z gate)

matrix2 = np.array([
    [1, 1],
    [1, 1]
])  # Not a unitary matrix

print(is_unitary(matrix1))  # True
print(is_unitary(matrix2))  # False
```

---

## **6. Test If Quantum Circuit Matrices Are Unitary**
You can test whether predefined matrices (e.g., `swap_matrix`, `CNOT`, `cz_matrix`) are unitary:

```python
print(is_unitary(swap_matrix))  # Should return True
print(is_unitary(CNOT))  # Should return True
print(is_unitary(cz_matrix))  # Should return True
```


Hash function for Q-learning

# **Matrix Hashing and Unique ID Assignment**

This script provides a mechanism to assign unique IDs to matrices by hashing them into a dictionary. The purpose of this approach is to efficiently track and identify matrices without redundant storage.

## **How It Works**
1. **Matrix Hashing:**  
   - The function `matrix_to_hash(matrix)` converts a given matrix into a hashable tuple format.  
   - This ensures that matrices can be used as dictionary keys.

2. **Unique ID Assignment:**  
   - The function `get_matrix_id(matrix)` checks if a given matrix has been previously encountered.  
   - If the matrix is new, it is assigned a unique ID and stored in `matrix_dict`.  
   - If the matrix already exists in the dictionary, its previously assigned ID is returned.

In [None]:
# Dictionary to store unique matrix hashes and their corresponding IDs
matrix_dict = {}
counter = 0 

def matrix_to_hash(matrix):
    """
    Convert a matrix to a hashable tuple format.
    """
    matrix_array = np.asarray(matrix) 
    return tuple(tuple(row) for row in matrix_array)

def get_matrix_id(matrix):
    """
    Assign a unique ID to a matrix if it has not been encountered before.
    """
    global counter
    matrix_hash = matrix_to_hash(matrix)
    
    if matrix_hash not in matrix_dict:
        matrix_dict[matrix_hash] = counter
        counter += 1  
    
    return matrix_dict[matrix_hash]

# **1:Quantum Environment for Reinforcement Learning**

This class is an implementation of a quantum environment using `OpenAI Gym` and `Qiskit`. The environment is designed for reinforcement learning (RL) tasks, where the goal is to apply quantum operations to match a target unitary transformation.

---

## **Overview**
The environment simulates a **two-qubit quantum circuit**, where an agent applies quantum gates to reach a target unitary matrix. The circuit's state evolves as actions (quantum gates) are applied, and a reward function evaluates how close the resulting unitary matrix is to the target.

---

In [None]:
class QuantumEnv(gym.Env):
    def __init__(self):
        super(QuantumEnv, self).__init__()
        
        # Set the number of qubits
        self.num_qubits = 2
        # Initialize the quantum circuit
        self.circuit = QuantumCircuit(self.num_qubits)
        # Set the target unitary matrix (can be changed to bell_state, cz, swap, iswap)
        self.target_unitary = iswap_matrix  
        
        # Define the action space (6 possible actions)
        self.action_space = spaces.Discrete(6)
        # Define the observation space (100 possible state hashes)
        self.observation_space = spaces.Discrete(100)
        
        # Mapping of state indices
        self.state_to_index = {}
        self.index_to_state = []

    def _hash_circuit(self, circuit: QuantumCircuit) -> int:
        """
        Compute a hash value for the given quantum circuit.
        """
        matrix = Operator(circuit)  # Get the unitary matrix of the circuit
        return get_matrix_id(matrix) % 100  # Compute hash value within 100

    def get_state_index(self, state: QuantumCircuit) -> int:
        """
        Get the index of a state; if it is a new state, add it to the index mapping.
        """
        state_hash = self._hash_circuit(state)
        if state_hash not in self.state_to_index:
            index = len(self.state_to_index)
            self.state_to_index[state_hash] = index
            self.index_to_state.append(state)
        return self.state_to_index[state_hash]

    def get_state_from_index(self, index: int) -> QuantumCircuit:
        """
        Retrieve the quantum circuit state based on the index.
        """
        if 0 <= index < len(self.index_to_state):
            return self.index_to_state[index]
        return None

    def reset(self):
        """
        Reset the environment and return the initial state index.
        """
        self.circuit = QuantumCircuit(self.num_qubits)  # Reinitialize the circuit
        return self.get_state_index(self.circuit)

    def step(self, action, qubits):
        """
        Execute an action, update the environment state, and compute the reward.
        """
        self.circuit.append(action, qubits)  # Append the action to the circuit
        state_index = self.get_state_index(self.circuit)  # Get the new state index
        reward, done = self._reward(self.target_unitary)  # Compute the reward
        return state_index, reward, done

    def render(self):
        """
        Render the quantum circuit.
        """
        print(self.circuit.draw())

    def _reward(self, target_unitary):
        """
        Compute the fidelity between the circuit and the target unitary matrix and return the reward.
        """
        simulator = Aer.get_backend('unitary_simulator')  # Get the unitary simulator
        result = simulator.run(transpile(self.circuit, simulator)).result()
        unitary = result.get_unitary(self.circuit)  # Get the unitary matrix of the current circuit
        
        # Compute the fidelity of the quantum state
        unitary_array = np.asarray(unitary)
        target_unitary_array = np.asarray(target_unitary)
        fidelity = np.abs(np.trace(unitary_array.conj().T @ target_unitary_array)) / (2 ** self.num_qubits)
        
        reward = 0
        done = False
        if fidelity > 0.99:
            done = True  # Task completed
            reward += 100  # Assign high reward
            self.render()  # Display the final circuit
        return reward, done

    def close(self):
        """
        Close the environment.
        """
        pass

    def render(self):
        """
        Display the quantum circuit.
        """
        print(self.circuit.draw())

# **2:Q-learning Agent for Quantum Reinforcement Learning**

This class is a **Q-learning agent** designed for reinforcement learning in a quantum environment. The agent learns how to construct quantum circuits by selecting quantum gates to maximize a reward function.

---

In [None]:
# Define the Q-learning agent
class QLearningAgent:
    def __init__(self, state_size, action_size, alpha, gamma, epsilon, decay_rate, epsilon_min):
        """
        Initialize the Q-learning agent with given parameters.
        """
        self.state_size = state_size
        self.action_size = action_size
        self.alpha = alpha  # Learning rate
        self.gamma = gamma  # Discount factor
        self.epsilon = epsilon  # Exploration rate
        self.decay_rate = decay_rate  # Decay rate for epsilon
        self.epsilon_min = epsilon_min  # Minimum value of epsilon
        self.q_table = np.zeros((state_size, action_size))  # Initialize Q-table with zeros
    
    def choose_action(self, state_index):
        """
        Select an action using epsilon-greedy strategy.
        """
        if np.random.rand() < self.epsilon:
            action = np.random.randint(self.action_size)  # Random action (exploration)
        else:
            action = np.argmax(self.q_table[state_index])  # Best action (exploitation)
        
        possible_actions = [
            [HGate(), [0]],
            [HGate(), [1]],
            [CXGate(), [0, 1]],
            [CXGate(), [1, 0]],
            [TGate(), [0]],
            [TGate(), [1]],
        ]
        
        return possible_actions[action], action

    def choose_actionNoE(self, state_index):
        """
        Select the best action based on the current Q-table without exploration.
        """
        action = np.argmax(self.q_table[state_index])
        
        possible_actions = [
            [HGate(), [0]],
            [HGate(), [1]],
            [CXGate(), [0, 1]],
            [CXGate(), [1, 0]],
            [TGate(), [0]],
            [TGate(), [1]],
        ]
        
        return possible_actions[action], action
    
    def update_q_table(self, state_index, action, reward, next_state_index):
        """
        Update the Q-table using the Q-learning formula.
        """
        self.q_table[state_index, action] += self.alpha * (
            reward + self.gamma * np.max(self.q_table[next_state_index]) - self.q_table[state_index, action]
        )
    
    def decay_exploration(self):
        """
        Reduce epsilon value over time to shift from exploration to exploitation.
        """
        self.epsilon = max(self.epsilon_min, self.epsilon * self.decay_rate)


# **3:Train the Agent**

The `train_agent` function is responsible for training a reinforcement learning (RL) agent to optimize decision-making in a given environment.

---

In [None]:
# Train the agent
def train_agent(agent, environment, episodes, max_steps_per_episode):
    for episode in range(episodes):
        # Reset the environment at the beginning of each episode
        state_index = environment.reset()
        episode_reward = 0  # Initialize the total reward for this episode
        
        for step in range(max_steps_per_episode):
            # Choose an action using the agent's policy
            action, action_index = agent.choose_action(state_index)
            
            # Execute the chosen action and observe the outcome
            next_state_index, reward, done = environment.step(action[0], action[1])
            episode_reward += reward  # Accumulate the reward
            
            # Update the Q-table based on the agent's learning algorithm
            agent.update_q_table(state_index, action_index, reward, next_state_index)
            
            # Update the current state
            state_index = next_state_index
            
            # Check if the episode has reached a terminal state
            if done:
                print("Generated circuit:")
                environment.render()  # Render the environment to visualize the result
                print(f"Episode {episode + 1}: Total Reward = {episode_reward}")
                break
            
            # Apply a penalty if the circuit exceeds the maximum number of allowed gates
            if environment.circuit.size() > 4:
                episode_reward -= 100  # Negative reward for exceeding the maximum gate limit
                break
        
        # Print results every 100 episodes
        if (episode + 1) % 100 == 0:
            print(f"Episode {episode + 1}: Total Reward = {episode_reward}")
        
        # Decay the exploration rate to encourage exploitation over time
        agent.decay_exploration()

# **4:Test the Trained Agent Without Exploration**

The `test_agent` function evaluates a trained reinforcement learning (RL) agent by running it in the environment **without exploration** (i.e., the agent strictly follows the learned policy).

---

In [None]:
# Test the trained agent without exploration
def test_agent(agent, environment, episodes, max_steps_per_episode):
    for episode in range(episodes):
        # Reset the environment
        environment.reset()
        state_index = environment.reset()

        for step in range(max_steps_per_episode):
            # Choose an action based purely on learned policy (no exploration)
            action, action_index = agent.choose_actionNoE(state_index)
            
            # Execute the chosen action and observe the outcome
            next_state_index, reward, done = environment.step(action[0], action[1])
            
            # Update the current state
            state_index = next_state_index
            
            # Check if the episode has reached a terminal state
            if done:
                global holder
                holder += 1  # Increment success counter
                break
        
        # Render the environment to visualize the test result
        environment.render()



# **5:Main Function**

The `__main__` block initializes and trains a reinforcement learning agent multiple times, followed by testing its performance. It also calculates and prints the average success rate.

---


In [None]:
global holder
holder = 0  # Initialize success counter

# Main function
if __name__ == "__main__":
    # Run multiple training and testing iterations
    for i in range(20):
        # Initialize the environment and agent for each iteration
        environment = QuantumEnv()
        agent = QLearningAgent(state_size=100, action_size=6, alpha=0.1, gamma=0.95, epsilon=1, decay_rate=0.99, epsilon_min=0.05)
        
        # Train the agent
        train_agent(agent, environment, episodes=100, max_steps_per_episode=5)
        
        # Test the trained agent
        print("Test Result")
        test_agent(agent, environment, episodes=1, max_steps_per_episode=5)
    
    # Print the average success rate over 20 iterations
    print(holder / 20)