# Tutorial 3: Contributing to SQuADDS

In this tutorial, we will go over the basics of contributing data to the SQuADDS project. We will cover the following topics:

0. [Contribution Information Setup](#setup)
1. [Understanding the terminology and database structure](#structure)
2. [Contributing to an existing dataset configuration](#existing)
    - [via HuggingFace](#hf_existing)
    - [via SQuADDS API](#api_existing)
3. [Creating new dataset configuration](#creation)
4. [Building on top of others works](#missing)
---

In [1]:
%load_ext autoreload
%autoreload 2

## <a name="setup">Contribution Information Setup</a>

In order to contribute to SQuADDS, you will need to provide some information about yourself. This information will be used to track your contributions and to give you credit for your work. You can provide this information by updating the following variables in the `.env` file in the root directory of the repository:

```
GROUP_NAME = ""
PI_NAME = ""
INSTITUTION = ""
USER_NAME = ""
CONTRIB_MISC = ""
```

where `GROUP_NAME` is the name of your research group, `PI_NAME` is the name of your PI, `INSTITUTION` is the name of your institution, `USER_NAME` is your name, and `CONTRIB_MISC` is any other information you would like to provide about your contributions (e.g. bibTex citation, paper link, etc).

Alternatively, you can provide this information by executing the following cell.


In [22]:
from squadds.database.utils import *

In [25]:
create_contributor_info()

Contributor information updated in .env file (/Users/shanto/LFL/SQuADDS/SQuADDS/.env).


## <a name="structure">Understanding the terminology and database structure</a>

### <a name="hugging_face">HuggingFace</a>

[HuggingFace](https://huggingface.co/) stands at the forefront of the AI revolution, offering a dynamic collaboration platform for the machine learning community. Renowned for hosting an array of open-source machine learning libraries and tools, Hugging Face Hub serves as a central repository where individuals can share, explore, and innovate with ML technologies. The platform is dedicated to fostering an environment of learning, collaboration, and ethical AI, bringing together a rapidly expanding community of ML engineers, scientists, and enthusiasts.

In our pursuit to enhance the versatility and utility of SQuADDS for quantum hardware developers and machine learning researchers, we have chosen to host our database on the HuggingFace platform. This strategic decision leverages HuggingFace's capability to support and facilitate research with machine learning models, aligning with methodologies outlined in various references. By making the SQuADDS database readily accessible on this platform, we aim to contribute to the development of cutting-edge Electronic Design Automation (EDA) tools. Our goal is to replicate the transformative impact witnessed in the semiconductor industry, now in the realm of superconducting quantum hardware.

Key to our choice of HuggingFace is its [datasets](https://huggingface.co/datasets) library, which provides a unified interface for accessing a wide range of datasets. This feature is integral to SQuADDS, offering a streamlined and cohesive interface to our database. The decentralized nature of HuggingFace datasets significantly enhances community-driven development and access, a functionality that can be challenging to implement with traditional data storage platforms. This aspect of HuggingFace aligns perfectly with our vision for SQuADDS, enabling us to foster a collaborative and open environment for innovation in quantum technology.

## <a name="datasets_config">Datasets & Configurations</a>

As seen in [Tutorial 1](https://lfl-lab.github.io/SQuADDS/source/tutorials/Tutorial-1_Getting_Started_with_SQuADDS.html#Accessing-the-SQuADDS-Database-using-the-HuggingFace-API) we have organized the SQuADDS database into datasets and configurations. Let's quickly review about these two concepts and how they are used in SQuADDS.

Each configuration in the dataset is uniquely identified by their `config`. For the SQuADDS Database, the `config` string is created in the following format:

```python
config = f"{component}_{component_name}_{data_type}"
```

where `component` is the name of the component, `component_name` is the name of the component (in Qiskit Metal), and `data_type` is the type of simulation data that has been contributed. 

This structured approach ensures that users can query specific parts of the dataset relevant to their work, such as a particular type of qubit design or simulation results. This API abstraction allows for more complex queries and operations on the data, facilitating a more efficient workflow for researchers and developers.

Lets check what the `config` string looks like for our database:

In [29]:
from datasets import get_dataset_config_names

configs = get_dataset_config_names("SQuADDS/SQuADDS_DB")
print(configs)

['qubit-TransmonCross-cap_matrix', 'cavity_claw-RouteMeander-eigenmode', 'coupler-NCap-cap_matrix']


You can now access the database using the `config` string. For example, if you want to access the `qubit-TransmonCross-cap_matrix` configuration, you can do so by executing the following cell:

In [31]:
from datasets import load_dataset

qubit_data = load_dataset("SQuADDS/SQuADDS_DB", configs[0])
print(qubit_data)

DatasetDict({
    train: Dataset({
        features: ['notes', 'sim_results', 'contributor', 'design', 'sim_options'],
        num_rows: 1934
    })
})


Please review [Section "Using the SQuADDS API to access and anlyze the database" in Tutorial 1](https://lfl-lab.github.io/SQuADDS/source/tutorials/Tutorial-1_Getting_Started_with_SQuADDS.html#Accessing-the-SQuADDS-Database-using-the-HuggingFace-API) where we introduce and explain how to use the SQuADDS API to access and analyze the database.

#### <a name="dataset_schema">Database Schema</a>

Each contributed entry to SQuADDS must **AT LEAST** have the following fields. One can add as many more supplementary fields as one wants.

```json
{
    "design":{
        "design_tool": design_tool_name,
        "design_options": design_options,
    },
    "sim_options":{
        "setup": sim_setup_options,
        "simulator": simulator_name,
    },
    "sim_results":{
        "result1": sim_result1,
        "unit1": unit1,
        "result2": sim_result2,
        "unit2": unit2,
    },
    "contributor":{
        "group": group_name,
        "PI": pi_name,
        "institution": institution,
        "uploader": user_name,
        "misc": contrib_misc,
        "date_created": "YYYY-MM-DD-HHMMSS",
    },
}
```

If all the `sim_results` has the same units you can just use a `"units":units` field instead of repeating the unit for each result. 

**Note:** The `"contributor"` field is automatically added by the SQuADDS API when you upload your dataset. You do not need to add this field yourself.

Lets look at the schema for the `qubit-TransmonCross-cap_matrix` configuration that used `qiskit-metal` as the design tool and `Ansys HFSS` as the simulation engine.

In [32]:
from squadds import SQuADDS_DB

db = SQuADDS_DB()

Downloading readme:   0%|          | 0.00/2.25k [00:00<?, ?B/s]

In [33]:
db.get_dataset_info(component="qubit", component_name="TransmonCross", data_type="cap_matrix")

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset Features:
{'contributor': {'PI': Value(dtype='string', id=None),
                 'date_created': Value(dtype='string', id=None),
                 'group': Value(dtype='string', id=None),
                 'institution': Value(dtype='string', id=None),
                 'uploader': Value(dtype='string', id=None)},
 'design': {'design_options': {...},
            'design_tool': Value(dtype='string', id=None)},
 'notes': {},
 'sim_options': {'renderer_options': {...},
                 'setup': {...},
                 'simulator': Value(dtype='string', id=None)},
 'sim_results': {'claw_to_claw': Value(dtype='float64', id=None),
                 'claw_to_ground': Value(dtype='float64', id=None),
                 'cross_to_claw': Value(dtype='float64', id=None),
                 'cross_to_cross': Value(dtype='float64', id=None),
                 'cross_to_ground': Value(dtype='float64', id=None),
                 'ground_to_ground': Value(dtype='float64', id=None),
                 'u

---

## <a name="existing">Contributing to an existing configuration</a>

Let's revisit [Tutorial 2](https://lfl-lab.github.io/SQuADDS/source/tutorials/Tutorial-2_Simulate_interpolated_designs.html#Simulate-the-Target-Design) where we simulated a novel `TransmonCross` qubit design. We will now learn how to contribute this design to the SQuADDS database.

### <a name="hf_existing">via HuggingFace</a>

The high level steps for contributing to an existing configuration via HuggingFace are as follows:


1. **Clone/Fork the Repository**: If you have not already forked or cloned the [repository](https://huggingface.co/datasets/SQuADDS/SQuADDS_DB), do so.

2. **Create or Checkout a Branch**: If adding new data, it might be best to do it on a new branch:

   ```sh
   git checkout -b branch_name
   ```

3. **Modify the Configuration**: Add or modify the dataset configuration as necessary (following guidelines). 

4. **Commit and Push Your Changes**: Commit the new data and push it to your fork:

   ```sh
   git add .
   git commit -m "GOOD COMMIT MESSAGE"
   git push origin branch_name
   ```

5. **Pull Request**: Create a pull request against the original `SQuADDS_DB` repository.


### <a name="api_existing">via SQuADDS API</a> 

We have also provided a simple API for contributing to the SQuADDS database. The high level steps for contributing to an existing configuration via the SQuADDS API are as follows:

1. **Select the dataset configuration**: Select the dataset configuration you would like to contribute to. 

2. **Validate your data**: Validate your data against the dataset configuration.

3. **Submit your data**: Submit your data to the SQuADDS database.

## <a name="creation">Contributing a New Configuration</a>

We may find that we possess a dataset that is not currently included in SQuADDS. In this case, we can add a new configuration to SQuADDS.

But before we do that, we need to make sure that the dataset is in a format that is compatible with the SQuADDS project and is also validated against measurement results. We will go over this process in this section.

### Process Overview

The high level steps for contributing a new configuration are as follows:

1. **Create a new configuration**: Create a new configuration for your dataset.
2. **Create the metadata for the configuration**: Create the metadata for your configuration that contains information on the measured device design and paramaters.
3. **Create the database schema**: Create the database schema for your configuration.
4. **Validate your data**: Validate your data against the database schema.
5. **Submit your data**: Submit your data to the SQuADDS database.

## Building on top of others works <a name="missing"></a>

We might have some data that can be appended to an existing configuration in `SQuADDS_DB`. In this case, we can add to the existing configuration without pushing new entries to the original dataset. Here are the steps to do so:

1. **Select the dataset configuration**: Select the dataset configuration you would like to contribute to.
2. **Gather your data**: Gather your data that you would like to contribute to the dataset configuration.
3. **Validate your data**: Validate your data against the dataset configuration.
4. **Submit your data**: Submit your data to the SQuADDS database.

## License

<div style='width: 100%; background-color:#3cb1c2;color:#324344;padding-left: 10px; padding-bottom: 10px; padding-right: 10px; padding-top: 5px'>
    <h3>This code is a part of SQuADDS</h3>
    <p>Developed by Sadman Ahmed Shanto</p>
    <p>&copy; Copyright Sadman Ahmed Shanto & Eli Levenson-Falk 2023.</p>
    <p>This code is licensed under the MIT License. You may<br> obtain a copy of this license in the LICENSE.txt file in the root directory<br> of this source tree.</p>
    <p>Any modifications or derivative works of this code must retain this<br>copyright notice, and modified files need to carry a notice indicating<br>that they have been altered from the originals.</p>
</div>