# Tutorial Document
This document details the use of the QCOM package. The goal is a python package that allows users to interact with Aquila and DMRG data seemlessly and prevent repetitive function writing. 

### Getting Data from DMRG files
If you have data in text files with the form

state : count

state : count

.
.
.

Then the function below allows you to easily obtain and return the data as a dictionary and additionally the total count of the values (equals 1 if the values are probabilities) using the file path as input.

This function also has a built in progress manager so you can estimate how long the data takes to load in. This is helpful for larger datasets. To track progress use the "show_progress" flag and set it equal to True. This flag is always the last parameter passed into the function call. It is set to False by default. 

Note: 

You must set the flag in the function call itself like shown below if the function has multiple arguments before show_progress. You can try for yourself but if you just do show_progress = True on a separate line and pass that into the function it will not show progress. This is because parse_file also takes in an additional two arguments, one of which (sample_size) can be represented as a boolean. Thus if you just pass file_path and show_progress with show_progress defined on a previous line, it will read show_progress as the sample size input and it will not show progress. 

### Weird Import
I'm importing QCOM this way because it's likely when you first run QCOM you will not have it in your python path. Instead you can import it as a py file directly from the parent directory like I have below. If you have QCOM downloaded to your path then you can immedietly import QCOM without the extra line.

In [4]:
# import the qcom module

import qcom as qc

In [5]:
file_path = "../example_data/1_billion_3.0_4_rungs.txt"

# Process without sampling
processed_data, total_count = qc.parse_file(file_path, show_progress = True)

Starting: Parsing file...
Task: Parsing file | Progress: 100.00% | Elapsed: 0.00s | Remaining: 0.00s      
Completed: Parsing file. Elapsed time: 0.00 seconds.


### Most probable states
This is great but it's hard to confirm for very large datasets. To confirm the result we might just be interested in the 10 most probable states. The function below allows us to print out the n most probable states

In [4]:
n = 10
qc.print_most_probable_data(processed_data, n)

Most probable 10 bit strings:
 1.  Bit string: 01000010, Probability: 389245564.00000000
 2.  Bit string: 10000001, Probability: 389223579.00000000
 3.  Bit string: 01000001, Probability: 48778624.00000000
 4.  Bit string: 10000010, Probability: 48777628.00000000
 5.  Bit string: 01000000, Probability: 20562997.00000000
 6.  Bit string: 00000001, Probability: 20561460.00000000
 7.  Bit string: 00000010, Probability: 20550353.00000000
 8.  Bit string: 10000000, Probability: 20549831.00000000
 9.  Bit string: 10000100, Probability: 4983517.00000000
10.  Bit string: 00100001, Probability: 4981922.00000000


### Sampling 
If I want to take a random sample of the larger data into a smaller amount I can do so using the sample_data function

In [3]:
sample_size = 1000
sampled_data = qc.sample_data(processed_data, total_count, sample_size, show_progress= True)
qc.print_most_probable_data(sampled_data, 10)

Starting: Sampling data...
Task: Sampling data | Progress: 100.00% | Elapsed: 0.01s | Remaining: 0.00s     
Completed: Sampling data. Elapsed time: 0.01 seconds.
Most probable 10 bit strings:
 1.  Bit string: 01000010, Probability: 0.40700000
 2.  Bit string: 10000001, Probability: 0.36700000
 3.  Bit string: 10000010, Probability: 0.04800000
 4.  Bit string: 01000001, Probability: 0.04600000
 5.  Bit string: 10000000, Probability: 0.02800000
 6.  Bit string: 01000000, Probability: 0.02200000
 7.  Bit string: 00000001, Probability: 0.01900000
 8.  Bit string: 00000010, Probability: 0.01700000
 9.  Bit string: 00100001, Probability: 0.00700000
10.  Bit string: 10000011, Probability: 0.00600000


### Sample as you parse
It's actually more efficient to sample as we parse the set. So if you know before hand you only care about 100 randomly sample states it makes no sense to parse the whole thing and then sample it down to 100. Instead as you are parsing you will only grab 100 states. We can do this using the parse_file function by adding the sample_size parameter

In [7]:
file_path = "1_billion_3.0_4_rungs.txt"

# Process with sampling

sample_size = 1000
processed_data, total_count = qc.parse_file(file_path, sample_size)

### Error handling
The Aquila device has a readout error rate of 0.08 for the excited state and 0.01 for the ground state. We can simulate this error on our data using the following function

NOTE: The current function assumes the default values of 0.08 and 0.01. These are taken as parameters so if future error rates change then we can accurately model those as well. Thus, you do not technically need to pass ground_rate and excited_rate into the function. Although it is good practice.

In [8]:
ground_rate = 0.01
excited_rate = 0.08

error_data = qc.introduce_error_data(sampled_data, total_count, ground_rate, excited_rate)
qc.print_most_probable_data(error_data, 10)

Introducing errors to the data...
Most probable 10 bit strings:
 1.  Bit string: 10000000, Probability: 0.09090909
 2.  Bit string: 01000010, Probability: 0.04545455
 3.  Bit string: 00000010, Probability: 0.04545455
 4.  Bit string: 10000010, Probability: 0.04545455
 5.  Bit string: 00010010, Probability: 0.04545455
 6.  Bit string: 01001001, Probability: 0.04545455
 7.  Bit string: 01000001, Probability: 0.04545455
 8.  Bit string: 10000100, Probability: 0.04545455
 9.  Bit string: 01000000, Probability: 0.04545455
10.  Bit string: 10000011, Probability: 0.04545455


### Combining data
Say you have two data sets and you want to combine them. We can do this using the following funciton, however there are some rules. You can either combine two datasets of probabilities or two datasets of counts. You cannot combine a dataset of probabilities and a dataset of counts as this would make normalizing impossible. Additionally, if you combine two probabilities, the function will automatically normalize. If you combine two datasets of counts, the function will NOT normalize. This is so if users want to combine say 100 sets of count data, they can do so without a problem. They simply need to normalize afterwards. 

In [9]:
combined_data = qc.combine_datasets(sampled_data, error_data)
print(combined_data)

{'10000010': 0.04772727272727273, '10001000': 0.02322727272727273, '00000100': 0.02372727272727273, '01000010': 0.23322727272727273, '10000110': 0.02372727272727273, '10000001': 0.1935, '11000010': 0.02322727272727273, '01000011': 0.02372727272727273, '11000001': 0.025727272727272727, '01000000': 0.03172727272727273, '00010010': 0.02422727272727273, '00000001': 0.007, '10001001': 0.02372727272727273, '01001000': 0.026227272727272728, '00000010': 0.03172727272727273, '00100011': 0.022727272727272728, '01001001': 0.022727272727272728, '10000000': 0.050954545454545454, '01000001': 0.043727272727272726, '10000100': 0.02422727272727273, '00000000': 0.02372727272727273, '10000011': 0.025227272727272727, '00100001': 0.001, '00000011': 0.02322727272727273}


### Saving to txt File
If we have a set of data we would like to save to a file we can do so using the following function

In [10]:
file_path = "error_data.txt"
qc.save_data(processed_data, file_path)