In [86]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [14]:
!pip install -e ../.

Obtaining file:///Users/shanto/LFL/SQuADDS/SQuADDS
  Preparing metadata (setup.py) ... [?25ldone
[?25hInstalling collected packages: SQuADDS
  Attempting uninstall: SQuADDS
    Found existing installation: SQuADDS 0.1
    Uninstalling SQuADDS-0.1:
      Successfully uninstalled SQuADDS-0.1
  Running setup.py develop for SQuADDS
Successfully installed SQuADDS-0.1


# Tutorial 2: Contributing to SQuADDS


**Table of Contents:**

0. [Contribution Information Setup](#setup)
1. [Understanding the terminology and database structure](#structure)
2. [Contributing to an existing database node](#contribution)
3. [Creating new database node](#creation)
4. [Building on top of others works](#missing)

---

## Contribution Information Setup <a name="setup"></a>

In order to contribute to SQuADDS, you will need to provide some information about yourself. This information will be used to track your contributions and to give you credit for your work. You can provide this information by updating the following variables in the `.env` file in the root directory of the repository:

```
GROUP_NAME = ""
PI_NAME = ""
INSTITUTION = ""
USER_NAME = ""
```

Or you can provide this information by executing the following cell.


In [17]:
from squadds.database import *

In [None]:
create_contributor_info()

## Understanding the terminology and database structure <a name="structure"></a>

- HuggingFace
- Datasets
- Configurations
- Structure of SQuADDS_DB
- Adding to SQuADDS_DB

---

## Data Processing:

We want the data to be in a `json` format with **AT LEAST** to have the following fields. You can add as many more supplementary fields as you want.

```json
{
    "design":{
        "design_options": design_options,
        "design_tool": design_tool_name, 
    },
    "sim_options":{
        "setup": sim_setup_options,
        "simulator": simulator_name,
    },
    "sim_results":{
        "result1": sim_result1,
        "unit1": unit1,
        "result2": sim_result2,
        "unit2": unit2,
    },
    "contributor":{
        "group": group_name,
        "PI": pi_name,
        "institution": institution,
        "uploader": user_name,
        "date_created": "YYYY-MM-DD-HHMMSS",
    },
}
```

If all the `sim_results` has the same units you can just use a `"units":units` field instead of repeating the unit for each result.

---


## Adding to an Existing Configuration <a name="contribution"></a>

1. **Clone/Fork the Repository**: If you have not already forked or cloned the repository, do so.

2. **Create or Checkout a Branch**: If adding new data, it might be best to do it on a new branch:

   ```sh
   git checkout -b add_to_configuration
   ```

3. **Modify the Configuration**: Add or modify the data files as necessary for the configuration. Make sure to follow any guidelines provided by the dataset maintainers for the specific structure and format required.

4. **Commit and Push Your Changes**: Commit the new data and push it to your fork:

   ```sh
   git add .
   git commit -m "Add new data to configuration Y"
   git push origin add_to_configuration
   ```

5. **Pull Request**: Create a pull request against the original repository.

## Contributing a New Configuration <a name="creation"></a>

1. **Fork the Dataset Repository**: On the Hugging Face Hub, fork the dataset repository you want to contribute to.

2. **Clone Your Fork Locally**: Clone the forked repository to your local machine using the following command:

   ```sh
   git clone https://huggingface.co/datasets/YOUR_USERNAME/DATASET_NAME
   ```

3. **Create a New Branch**: It's a good practice to create a new branch for your configuration contribution:

   ```sh
   git checkout -b new_configuration
   ```

4. **Add Your Configuration**: Depending on the dataset's structure, this might involve adding new files or modifying existing ones. If the dataset uses the `datasets` library's builder configurations, you will need to modify the Python script that defines the configurations.

5. **Commit Your Changes**: Commit the changes with a clear commit message:

   ```sh
   git add .
   git commit -m "Add new configuration for circuit element X"
   ```

6. **Push to Your Fork**: Push your new branch to your fork on the Hugging Face Hub:

   ```sh
   git push origin new_configuration
   ```

7. **Create a Pull Request**: Go to the Hugging Face Hub, navigate to your fork, and create a pull request for your new branch. The pull request will be reviewed by the dataset maintainers.

Example Qiskit Metal Rendering Code to get `design`

Example Simulation Code to get `sim_results`

Use of SQuADDS API to upload dataset - adds `contributor` information automatically.

Generate the `load_dataset` config file

Upload to `SQuADDS_DB`

In [79]:
def combine_json_files(source_directory, output_file):
    all_data = []

    # List all JSON files in the directory
    file_paths = glob.glob(os.path.join(source_directory, "*.json"))

    for file_path in file_paths:
        with open(file_path, 'r', encoding='utf-8') as file:
            data = json.load(file)
            all_data.append(data)

    # Write combined data to a single file
    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(all_data, outfile, indent=4)



In [80]:
source_directory = "../data/qubit/TransmonCross/cap_matrix"  # Replace with your source directory
output_file = "../data/combined_qubit_data.json"  # Replace with your desired output file path
combine_json_files(source_directory, output_file)

In [81]:
source_directory = "../data/coupler/NCap/cap_matrix"  # Replace with your source directory
output_file = "../data/combined_coupler_data.json"  # Replace with your desired output file path
combine_json_files(source_directory, output_file)

In [82]:
source_directory = "../data/cavity_claw/RouteMeander/eigenmode/"  # Replace with your source directory
output_file = "../data/combined_cavity-claw_data.json"  # Replace with your desired output file path
combine_json_files(source_directory, output_file)

In [83]:
# rename the json files in data to "LFL_USC_{hash}.json" where the hash is based on the file contents

for file in glob.glob("../data/*.json"):
    with open(file, 'r', encoding='utf-8') as f:
        data = json.load(f)
        hash = hashlib.md5(json.dumps(data, sort_keys=True).encode('utf-8')).hexdigest()
        new_file = file.replace(".json", f"LFL_USC_{hash}.json")
        os.rename(file, new_file)

In [99]:
!datasets-cli test ../data/SQuADDS_DB.py --save_info --all_configs

Loading Dataset Infos from /Users/shanto/.cache/huggingface/modules/datasets_modules/datasets/SQuADDS_DB/6c042e99be0aa47463aa5d9ae2ceeb87b02331a7e2aa159865c27f7b19e9316e
Testing builder 'qubit-TransmonCross-cap_matrix' (1/3)
Generating dataset s_qu_adds_db (/Users/shanto/.cache/huggingface/datasets/s_qu_adds_db/qubit-TransmonCross-cap_matrix/1.0.0/6c042e99be0aa47463aa5d9ae2ceeb87b02331a7e2aa159865c27f7b19e9316e)
Downloading and preparing dataset s_qu_adds_db/qubit-TransmonCross-cap_matrix to /Users/shanto/.cache/huggingface/datasets/s_qu_adds_db/qubit-TransmonCross-cap_matrix/1.0.0/6c042e99be0aa47463aa5d9ae2ceeb87b02331a7e2aa159865c27f7b19e9316e...
Downloading data files: 100%|██████████████████| 3/3 [00:00<00:00, 12458.33it/s]
Downloading took 0.0 min
Checksum Computation took 0.0 min
Extracting data files: 100%|█████████████████████| 3/3 [00:00<00:00, 853.43it/s]
Generating train split
Traceback (most recent call last):
  File "/Users/shanto/miniconda3/envs/qiskit_metal/bin/datasets-

In [96]:
import json
import random
import os

def create_train_val_test_splits(source_files, output_directory, train_ratio=0.7, val_ratio=0.15):
    """
    Splits the data from source JSON files into train, validation, and test sets.
    """
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)

    all_data = []

    # Load data from all source files
    for file_path in source_files:
        with open(file_path, 'r', encoding='utf-8') as file:
            data = json.load(file)
            all_data.extend(data)

    # Shuffle the data
    random.shuffle(all_data)

    # Split the data
    total_data = len(all_data)
    train_end = int(total_data * train_ratio)
    val_end = train_end + int(total_data * val_ratio)

    train_data = all_data[:train_end]
    val_data = all_data[train_end:val_end]
    test_data = all_data[val_end:]

    # Save the splits
    with open(os.path.join(output_directory, 'train.json'), 'w', encoding='utf-8') as f:
        json.dump(train_data, f, indent=4)

    with open(os.path.join(output_directory, 'validation.json'), 'w', encoding='utf-8') as f:
        json.dump(val_data, f, indent=4)

    with open(os.path.join(output_directory, 'test.json'), 'w', encoding='utf-8') as f:
        json.dump(test_data, f, indent=4)



In [97]:
source_files = [
    "/Users/shanto/LFL/SQuADDS/SQuADDS/data/cavity_claw/LFL_USC_cavity_claw_3e95ba4a2e4da2141f9edaa9f9fa1653.json",
    "/Users/shanto/LFL/SQuADDS/SQuADDS/data/coupler/LFL_USC_coupler_e7855e5c7467f76edb09779d8f3a1a0c.json",
    "/Users/shanto/LFL/SQuADDS/SQuADDS/data/qubit/LFL_USC_qubit_e68f323df894ba4b2891bd64742a2c35.json",
]
output_directory = '../data/'

create_train_val_test_splits(source_files, output_directory)