# Stage 3

There are 3 main sections to this notebook. The first section deals with creating a meta-dataframe which is useful for general visualistaion of the data. The second section deals with expanding and labeling everyrun.  The third section? deals with the extraction of a relative noise profile from the raw counts. The final section deals with creating a dataframe where the noise profile is extracted from every run and is labeled. 

1. [The Meta-Dataframe](#1-the-meta-dataframe)
2. [The Relative Noise Profile](#2-the-relative-noise-profile)
3. [A Fully Labeled Dataframe](#a-fully-labeled-dataframe)

### Imports

In [2]:
import sys
sys.path.append('../')

from investigation_functions import  meta_dataframe_functions as mdf
from investigation_functions import  data_process_funcs as dpf
import config
import backend_vars

import pandas as pd

The relative directory of the different experimnet type folders must be set correctly:

In [3]:
dir = "../../"

## 1. The meta-dataframe

The combination blank_meta_df() and load_meta_df() produce a pandas dataframe from the csv files that contain the raw counts. The dataframe contains the following labels:
- nr_qubits : the number of qubits of the circuit (4, 8, 16)
- backend : the backend that the circuit was run on (brisbane, fez, marrakesh, torino)
- sim	: whether the backend was simulated (True) or not (False)
- circuit_type	: the type of circuit that was run (1, 2, 3)
- file_path : the filepath of the csv file containing the raw counts

The type of experiment, 'Hardware','Simulation', and 'Refreshed_Simulation' must be specified to load the results from the folder. This 'meta-dataframe' is useful for sorting the data before loading it all. Loading the data can be a lenghty process.

In [None]:
df_Refr_Sim = mdf.blank_meta_df()
df_Sim =mdf.blank_meta_df()
df_Hardware = mdf.blank_meta_df()

mdf.load_meta_df(df_Refr_Sim,'Refreshed_Simulation',dir)
mdf.load_meta_df(df_Sim,'Simulation',dir)
mdf.load_meta_df(df_Hardware,'Hardware',dir)

df_Refr_Sim.tail()

Various columns can be added to this 'meta dataframe'. Such as 'experiment' type, a dataframe containing the raw counts, and a measure of the sparsity of the counts in that dataframe. This can be useful when trying to understand the data further. See [Visualising the Sparsity](../sparse_stuff.ipynb).

In [11]:
df_Refr_Sim =mdf.add_experiment_type_column(df_Refr_Sim)
df_Refr_Sim.head()

Unnamed: 0,nr_qubits,backend,sim,circuit_type,file_path,experiment_type
0,4,torino,True,1,../../Refreshed_Simulated_results/4q/4q_fake_t...,Refreshed Sim
1,4,torino,True,2,../../Refreshed_Simulated_results/4q/4q_fake_t...,Refreshed Sim
2,4,torino,True,3,../../Refreshed_Simulated_results/4q/4q_fake_t...,Refreshed Sim
3,4,brisbane,True,1,../../Refreshed_Simulated_results/4q/4q_fake_b...,Refreshed Sim
4,4,brisbane,True,2,../../Refreshed_Simulated_results/4q/4q_fake_b...,Refreshed Sim


## 2. The Relative Noise Profile

First, a single csv file is processed. The total erroneous counts are calculated and added to the Dataframe of raw couts in the column 'totalError'.

In [13]:
unprocessed_df = dpf.create_unprocessed_df(df_Refr_Sim.loc[0,'file_path'])
unprocessed_df.head()

Unnamed: 0,totalError,0000,0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011,1100,1101,1110,1111
0,49,49,4,19,0.0,14,0.0,0.0,0.0,12,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,50,50,5,18,0.0,14,0.0,0.0,0.0,13,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,41,41,5,14,0.0,15,0.0,0.0,0.0,7,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,47,47,5,19,0.0,13,0.0,0.0,0.0,10,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,40,40,3,15,0.0,14,0.0,1.0,0.0,7,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The erroneus counts are then divided by the totalError to get the percentage distribution of the error over the incorrect outcomes.

In [14]:
processed_df = dpf.create_processed_df(df_Refr_Sim.loc[0,'file_path'])
processed_df.head()

Unnamed: 0,totalError,0000,0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011,1100,1101,1110,1111
0,49,1.0,0.081633,0.387755,0.0,0.285714,0.0,0.0,0.0,0.244898,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,50,1.0,0.1,0.36,0.0,0.28,0.0,0.0,0.0,0.26,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,41,1.0,0.121951,0.341463,0.0,0.365854,0.0,0.0,0.0,0.170732,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,47,1.0,0.106383,0.404255,0.0,0.276596,0.0,0.0,0.0,0.212766,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,40,1.0,0.075,0.375,0.0,0.35,0.0,0.025,0.0,0.175,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## A Fully-Labeled Dataframe

First, the meta-dataframe is created according to the experiment type requested, then the rows that correpsond the the requested number of qubits are expanded, processed, and returned.

In [4]:
nr_qubits = 4
df_arr_R = dpf.get_expanded_df('Refreshed_Simulation',nr_qubits,dir)

In [5]:
df_arr_R.head()

Unnamed: 0,circuit_type,backend,nr_qubits,experiment_type,totalError,0000,0001,0010,0011,0100,...,0110,0111,1000,1001,1010,1011,1100,1101,1110,1111
0,1,torino,4,Refreshed_Simulation,49,1.0,0.081633,0.387755,0.0,0.285714,...,0.0,0.0,0.244898,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,torino,4,Refreshed_Simulation,50,1.0,0.1,0.36,0.0,0.28,...,0.0,0.0,0.26,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1,torino,4,Refreshed_Simulation,41,1.0,0.121951,0.341463,0.0,0.365854,...,0.0,0.0,0.170732,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1,torino,4,Refreshed_Simulation,47,1.0,0.106383,0.404255,0.0,0.276596,...,0.0,0.0,0.212766,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1,torino,4,Refreshed_Simulation,40,1.0,0.075,0.375,0.0,0.35,...,0.025,0.0,0.175,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Next stage -> [Stage 4 ](Stage4_Preprocessing_Data.ipynb)