# Cryogenic Classical Control Unit Analysis and Optimization for Quantum Computation

#### Figure 1

#### **ABSTRACT**

#### 1. FRAMEWORK OVERVIEW

Of the various hardware implementations proposed to facilitate quantum computation, superconducting qubits have been gaining traction and popularity, as these systems benefit from solid state material implementations, and can leverage pre-existing research and tools surrounding planar electronic fabrication techniques.

To perform quantum computation with superconducting quantum circuitry, the entire system must be cooled to far below typical critical temperature for superconducting metals, so as to reduce external environment interference with the quantum device as much as possible. Dilution refrigerators are the current technology of choice required to perform this cooling, which presents a novel set of design constraints concerning electronic control system integration in such a system.

Typical dilution refrigerators cool materials in stages, proceeding from room temperature, to several intermediate stages, ultimately resulting in a small zone that arrives at the desired temperature. The temperature regions typically accessed along the cooling path are described in and ultimately utilize regions cooled to approximately 100K, 4K, along with a final region at the target 20mK temperature. This is described schematically here:

In such a system, wiring control signals to the supercooled quantum processor presents a series of problems. Namely, wiring the quantum processor to room temperature control electronics subjects the system to large thermal loads, difficult isolation from interference, large latency penalties, and lack of scalable compactness. To address these issues, classical control units can be located at one (or more) temperature level(s) of dilution refrigerator-based superconducting quantum computers. In moving to a system architecture involving internal classical control hardware, a series of tradeoffs is presented involving thermal computation costs and hardware constraints governing memory construction.

- Dilution Refrigerator Constraints,
- Hardware Characteristics, and

• Benchmark Code Generation and Manipulation.

# **Dilution Refrigerator Constraints**

The architecture of a dilution refrigerator being considered is a multi-stage cooling apparatus, using liquid nitrogen and liquid helium. Of interest right now are two characteristics: cooling capacity of temperature stages and latency of microwave links crossing these thermal boundaries. Using information from the specifications of a typical refrigerator (specifically the Oxford-Instruments TritonXL), these temperature levels and cooling capacities are shown in Table 2.

| Temperature Stage (Kelvin) | Cooling Capacity (Watts) |
|----------------------------|--------------------------|
| 20 mK                      | $25 \ \mu W$             |
| 100 mk                     | 1  mW                    |
| 4K                         | 1.35-2W                  |

Table 1: Temperature Levels and Cooling Capacities

These figures are approximate, and relate specifically to one realization of a dilution refrigerator, so are definitely subject to change. They differ somewhat from the estimates brought up in discussions, where we saw that the 20 mK level cooling capacity is approximately two orders of magnitude less than the cooling power available at the 4K temperature level. Further clarification will be sought to establish correct figures.

Additionally, there is a range of latencies introduced in passing microwave links through temperature interfaces. These seem to vary between 1-10ns approximately, growing longer as the gap widens between the target temperatures of the stages being crossed.

## **Hardware Characteristics**

There are three primary types of hardware under consideration:

- RSFQ: Rapid Single Flux Quantum
- RQL: Reciprocal Quantum Logic
- CryoCMOS: Cryogenic CMOS

Each of these is characterized by a different energy usage/logic gate relationship. these values are typically characterized similarly to Table 2.

| Hardware Type | Energy Required Per Gate (Joules) |
|---------------|-----------------------------------|
| RSFQ          | $10^{-19}$                        |
| RQL           | $10^{-19}$                        |
| CryoCMOS      | $10^{-15}$                        |

Table 2: Hardware Types and Energy Consumed Per Gate

These can be used directly to convert between computation and energy, and will be inserted into the model.

It is also important to consider power consumption of memory systems under these types of conditions. Here, we will analyze power usage of cryogenic CMOS memory devices, and consider implementations of cryogenic persistent memory systems. Persistent memory could potentially be useful in applications that rely upon a precomputed library of rotation decompositions, as memory may only need to be read only for long periods of computation. The qualification is that different applications often require different levels of precision in rotation decompositions, so these databases would potentially need to be changed between apps.

# **Program Code Generation and Manipulation**

The last set of variables being considered are those relating to the generation of instruction code for these quantum benchmarks. These techniques fall into several categories, specifically the relationship between the compilation flattening threshold and code caching behavior, and methods of compressing and decompressing modules.

#### 2. ROADMAP

The analysis will progress in stages. First an abstract model of a classical control unit as a cache is built, along with various methods for communicating program data to and from the cache. In this stage, program modules are extracted from full logical schedules which are compiled with the ScaffCC framework [?], and various cache compression techniques are explored for reducing the minimum required memory size of the control unit. Initial analysis will compare the effect of variable module compression techniques against the required computation introduced by performing the compression and decompression actions, which ultimately affects overall quantum program runtime.

The next stage is the construction of an energy budget cost model, which incorporates cooling capacity constraints of the different temperature interfaces of a dilution refrigerator with the effective energy usage per operation of different hardware materials. The cache model of classical control is expanded by introducing these new costs as constraints on computation, which affects the ability to perform module decompression and maintain the target temperature for a specified region. Techniques for optimizing module compression and decompression will be explored. Additionally, parameters describing latency caused by microwave links travers-

ing thermal boundaries will be introduced as communication costs, which will be incorporated into the cost model.

The following stage will begin to analyze the tradeoffs of different flattening thresholds for code modularization during compilation of quantum programs. Conceptually, flattening quantum benchmarks with a high gate count threshold will allow the compiler to be more efficient and reduce overall runtime, as more context is available during these optimizations. However this increases the size of each module, which increases the complexity and overhead of the caching mechanism. The interaction between these behaviors will be analyzed, and new techniques of both code modularization and module compression and decompression will be explored to optimize quantum program execution.

The ultimate goal of this analysis is to extract and quantify specific tradeoffs with system design choices involving control unit hardware, temperature level placement of control units, and memory sizes of control units. The aim is also to develop novel techniques for optimizing the transmission of large quantum programs through a control unit system, including optimized module compression and decompression methods for both control unit memory size reduction and communication bandwidth requirement reduction.

# 3. STAGE ONE: CACHE IMPLEMENTATION AND INITIAL RESULTS

### 4. REFERENCES