# Table of Contents:
- [Overview on Content Addressable aRchives (CAR / .car)](#overview-on-content-addressable-archives-car--car)
- [Create CAR Files](#create-car-files)
- [Creating Interactive CLI Widgets](#creating-interactive-cli-widgets)


## **Overview on Content Addressable aRchives (CAR / .car)**

**_References_**:
- [CAR v1 Spec](https://ipld.io/specs/transport/car/carv1/)
- [CAR v2 Spec](https://ipld.io/specs/transport/car/carv2/)

The CAR format (Content Addressable aRchives) can be used to store content addressable objects in the form of IPLD block data as a sequence of bytes; typically in a file with a .car filename extension.
> NOTE: The name Certified ARchive has also previously been used to refer to the CAR format.
    
The CAR format is intended as a serialized representation of any IPLD DAG (graph) as the concatenation of its blocks, plus a header that describes the graphs in the file (via root CIDs). The requirement for the blocks in a CAR to form coherent DAGs is not strict, so the CAR format may also be used to store arbitrary IPLD blocks.

In addition to the binary block data, storage overhead for the CAR format consists of:
- A header block encoded as [DAG-CBOR](https://github.com/ipld/specs/blob/a3c982518232b79123af2a2cf5e8642162c62524/block-layer/codecs/dag-cbor.md) containing the format version and an array of root CIDs
- A CID for each block preceding its binary data
- A compressed integer prefixing each block (including the header block) indicating the total length of that block, including the length of the encoded CID

This diagram shows how IPLD blocks, their root CID, and a header combine to form a CAR.


<center><img src="https://ipld.io/specs/transport/car/content-addressable-archives.png" alt="CARv2 Format" width="50%"></center>

#### **Format Description**
The CAR format comprises a sequence of length-prefixed IPLD block data, where the first block in the CAR is the Header encoded as CBOR, and the remaining blocks form the Data component of the CAR and are each additionally prefixed with their CIDs. The length prefix of each block in a CAR is encoded as a "varint"—an unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128) integer. This integer specifies the number of remaining bytes for that block entry—excluding the bytes used to encode the integer, but including the CID for non-header blocks.

<div style="text-align: center">
<pre>|--------- Header --------| |---------------------------------- Data -----------------------------------|</pre>

<pre>[ varint | DAG-CBOR block ] [ varint | CID | block ] [ varint | CID | block ] [ varint | CID | block ] …</pre>
</div>

### **Updates from CARv1 to CARv2**
CARv2 is a minimal upgrade to the CARv1 format with the primary aim of adding an optional index within the format for fast random-access to blocks.

CARv2 makes use of CARv1 by wrapping a properly formed CARv1 with a prefix containing a pragma and header, and a suffix containing the optional index data. Once the offset and length of the CARv1 bytes are determined using CARv2 parsing rules. Though not necessarily ideal, an existing CARv1 decoder could be used to read the roots and CID:Bytes pairs. Likewise, a CARv1 encoder could be be used to encode this data for wrapping by a CARv2 encoder as the payload is the same format.

#### **Format Description**

1. An 11-byte pragma that identify the data as a CARv2 format.
2. A header describing some characteristics of the CARv2 as well as the locations of the data payload and index payload within the CARv2.
3. A standard CARv1 data payload, including standard CARv1 header and roots and sequence of CID:Bytes pairs.
4. An optional index payload, which may be one of a number of supported index formats, allowing for fast lookups of blocks within the data payload.

The CARv2 format can be illustrated as follows:

<center><img src="https://ipld.io/specs/transport/car/carv2/carv2-sections.png" alt="CARv2 Format"></center>


<div style="text-align: center"><p>
<pre>| 11-byte fixed pragma | 40-byte header | optional padding | CARv1 data payload | optional padding | optional index payload |</pre>
</div><p>

### Why the need to create CARs?
Storing content on the Filecoin network is not like typical storage systems that consumers use (such as like Dropox, AWS, OneDrive) which store objects. Content in Filecoin are flat files, known as a [Filecoin Piece](https://spec.filecoin.io/#section-systems.filecoin_files.piece). The "Piece" in Filecoin Piece represents a whole or part of a file that's distilled into an IPLD directed acyclic graph (DAG) in the form of a hash that's called a CID or Payload CID. To make the "Piece into a Filecoin Piece, the IPLD DAG is serialized into a “Content-Addressable aRchive” (.car), which is in raw bytes format.


# Going from Files to CARs

[Singularity](https://data-programs.gitbook.io/singularity/overview/readme) is a tool to simplify the process of preparing data for content distribution. We'll be using the [Ez-Prep](https://data-programs.gitbook.io/singularity/cli-reference/ez-prep) command to package content from a selected directory on our local machine. Below is an overview of the command and the options we can pass in.


```bash
NAME:
   singularity ez-prep - Prepare a dataset from a local path

USAGE:
   singularity ez-prep [command options] <path>

CATEGORY:
   Utility

DESCRIPTION:
   This commands can be used to prepare a dataset from a local path with minimum configurable parameters.
   For more advanced usage, please use the subcommands under `storage` and `data-prep`.
   You can also use this command for benchmarking with in-memory database and inline preparation, i.e.
     mkdir dataset
     truncate -s 1024G dataset/1T.bin
     singularity ez-prep --output-dir '' --database-file '' -j $(($(nproc) / 4 + 1)) ./dataset

OPTIONS:
   --max-size value, -M value       Maximum size of the CAR files to be created (default: "31.5GiB")
   --output-dir value, -o value     Output directory for CAR files. To use inline preparation, use an empty string (default: "./cars")
   --concurrency value, -j value    Concurrency for packing (default: 1)
   --database-file value, -f value  The database file to store the metadata. To use in memory database, use an empty string. (default: ./ezprep-<name>.db)
   --help, -h                       show help
```

   To get started, **Select a Directory** below and optionally modify any of the options below.  When ready, click the button **Create CAR Files** button which will package all the content from the selected directory and save to an output directory found in [car_files](./../../data/car_files/).

In [None]:
from typing import Optional
import ipywidgets as widgets
from IPython.display import clear_output, display
from pathlib import Path
import shutil
import subprocess

# Initialize global variables
DEFAULT_ROOT_PATH = Path("../../data").resolve()  # replace with your root path
ROOT_PATH = DEFAULT_ROOT_PATH
CAR_PAYLOAD = {}
SINGULARITY_RESULT = []


# Initialize widget properties

# Widget to enter output directory name
output_directory_name_input = widgets.Text(
    value="output",
    description="Output Directory Name: ",
    style={"description_width": "initial"},
)

# Widget to enter chunk size
chunk_size_input = widgets.Text(
    value="32",
    description="Enter Chunk Size: ",
    style={"description_width": "initial"},
)

# Widget to enter chunk prefix
chunk_suffix_dropdown = widgets.Dropdown(
    options=["mb", "gb"],
    value="mb",
    description="Chunk Prefix: ",
    style={"description_width": "initial"},
)

# Create the button that runs singularity command
run_singularity_button = widgets.Button(
    description="Create CAR Files",
    style=widgets.ButtonStyle(button_color="cyan", font_weight="bold"),
)


# Function to get directories
def get_directories(path):
    directories = [d.name for d in Path(path).iterdir() if d.is_dir()]
    if path.parent != path:  # don't add ".." if we're at the root directory
        directories.insert(0, "..")
        directories.insert(0, "Select a Directory...")
    return directories


# Function to create dropdown
def create_dropdown(directories):
    dropdown = widgets.Dropdown(options=directories)
    dropdown.observe(on_dropdown_value_change, names="value")
    return dropdown


# Set up event handler
def on_dropdown_value_change(change):
    global ROOT_PATH
    if change["new"] is None:  # ignore changes where the new value is None
        return
    if change["new"] == "..":
        if (
            ROOT_PATH != DEFAULT_ROOT_PATH
        ):  # only navigate up if we're not at the root directory
            ROOT_PATH = ROOT_PATH.parent
    else:
        ROOT_PATH = Path(ROOT_PATH, change["new"])
    new_dropdown = create_dropdown(get_directories(ROOT_PATH))
    dropdown_selection.append(new_dropdown)
    current_dir_label.value = f"Current directory: {ROOT_PATH}"
    display_widgets(current_dir_label, dropdown_selection)


def print_command_help(tool: str, command: str):
    command = f"{tool} {command} --help"
    result = subprocess.run(command, shell=True, capture_output=True)
    print(result.stdout.decode())


def display_widgets(current_dir_label, dropdown_selection):
    clear_output()
    display(widgets.HBox([dropdown_selection[-1], current_dir_label]))
    display(widgets.HBox([output_directory_name_input]))
    display(widgets.HBox([chunk_size_input, chunk_suffix_dropdown]))
    # Link the button click event to your function
    run_singularity_button.on_click(on_button_clicked)
    display(run_singularity_button)


def output_directory_path(dir_name: Optional[str] = None, root: Optional[bool] = False):
    if root is True and dir_name is None:
        return Path(DEFAULT_ROOT_PATH, "car_files").resolve()
    else:
        return Path(DEFAULT_ROOT_PATH, "car_files", dir_name).resolve()


def output_directory_path_export(dir_name: str):
    return Path(DEFAULT_ROOT_PATH, "output", "car_files", dir_name).resolve()


def check_output_directory(output_path: str):
    if not output_path.exists():
        output_path.mkdir()
    else:
        # delete folder and all of its contents
        shutil.rmtree(output_path)
        output_path.mkdir()


# Define the function to be called when the button is clicked
def on_button_clicked(b):
    # Capture widget values
    output_dir_name = output_directory_name_input.value
    chunk_size = chunk_size_input.value
    chunk_suffix = chunk_suffix_dropdown.value

    # Construct the output directory path
    output_dir = output_directory_path(output_dir_name)

    # Check and prepare the output directory
    check_output_directory(output_dir)

    # Construct the max size string
    max_size = chunk_size + chunk_suffix
    print(
        f"Max Chunk Size: {max_size} | Source Path: {ROOT_PATH} | Output Path: {output_dir}"
    )
    command = f"singularity ez-prep --output-dir {output_dir} --max-size {max_size} {ROOT_PATH}"
    singularity_result = subprocess.run(command, shell=True, capture_output=True)
    print(singularity_result.stdout.decode())
    global SINGULARITY_RESULT
    SINGULARITY_RESULT = singularity_result_to_dict(singularity_result.stdout.decode())


def singularity_result_to_dict(output):
    lines = output.strip().split("\n")
    headers = lines[0].split()
    data = []

    for line in lines[1:]:
        values = line.split()
        data.append(dict(zip(headers, values)))
    return data


# Create initial dropdown
root_directories = get_directories(ROOT_PATH)
dropdown_selection = [create_dropdown(root_directories)]

# Create label to display current directory
current_dir_label = widgets.Label()
# Display the widgets side by side
display_widgets(current_dir_label, dropdown_selection)


HBox(children=(Dropdown(options=('Select a Directory...', '..', 'comp', 'data', 'guide'), value='Select a Dire…

HBox(children=(Text(value='output3', description='Output Directory Name: ', style=TextStyle(description_width=…

HBox(children=(Text(value='32', description='Enter Chunk Size: ', style=TextStyle(description_width='initial')…

Button(description='Create CAR Files', style=ButtonStyle(button_color='cyan', font_weight='bold'))

Max Chunk Size: 32gb | Source Path: C:\Github\Client_Projects\UMD\easier\data\gedi\l4b | Output Path: C:\Github\Client_Projects\UMD\easier\data\car_files\output3
PieceCID                                                          PieceSize    RootCID                                                      FileSize    StoragePath                                                           
baga6ea4seaqgpi5vhimnxuepyo4ptttcilysyp2mavkgjknumry46zf66bozqaa  34359738368  bafkreicjxhag23doy3yssap7fox4moyemkwdysxpoeen4msnb4pu3msktq  2502949831  baga6ea4seaqgpi5vhimnxuepyo4ptttcilysyp2mavkgjknumry46zf66bozqaa.car  
baga6ea4seaqahj7s2z7vehonvy2euguk4e254yfywskzxmbp4vbz4vxt4czqcha  34359738368  bafybeif57b5qfwy7ucx4pmycol3bkpu3g3zg723rtn5iju5apkjivo7hvq  1617        baga6ea4seaqahj7s2z7vehonvy2euguk4e254yfywskzxmbp4vbz4vxt4czqcha.car  



# Creating Interactive CLI Widgets

When in the exploratory stage of reviewing data and it's content, I personally do not like the "process" to slow me down. I want to be able to jump back and forth between different tool features or tool sets to get a "lay of the land". In the following section, we'll be exploring how to make a CLI tool interactive with the use of [ipywidgets](https://ipywidgets.readthedocs.io/en/latest/index.html)!

### Working with the go-car CLI Tool.

Let's start by gathering all the commands and the properties for each command.  We'll do this by running the car tool using the `--help` option flag.  This will get us list of all the commands and then perform the same action for each command.  Since we will be using the [subprocess](https://docs.python.org/3/library/subprocess.html) to run the CLI commands, we can capture all the details from the [result.stdout] (object) into a dictionary.



In [9]:
import subprocess
import re


def get_cli_help(cli_tool, command_help=False):
    result = subprocess.run([cli_tool, "--help"], capture_output=True, text=True)
    lines = result.stdout.split("\n")

    commands = {}

    for line in lines:
        command_match = re.match(r"^\s+([\w-]+),?.*\s+(.*)$", line)

        if command_match:
            command = command_match.group(1)
            description = " ".join(line.split()[2:])
            commands[command] = description

    # remove the first key/value and the 'help' keys from the dictionary
    commands.pop("car")
    commands.pop("help")
    commands.pop("--help")

    if command_help:
        for command, description in commands.items():
            commands[command] = {"Description": description}
            commands[command].update(get_command_help("car", command))

    return commands


def get_command_help(cli_tool, command):
    result = subprocess.run(
        [cli_tool, command, "--help"], capture_output=True, text=True
    )
    output = result.stdout

    command_info = {}

    # Extract NAME
    name_match = re.search(r"NAME:\s+.* - (.*)", output)
    if name_match:
        command_info["Description"] = name_match.group(1)

    # Extract USAGE
    usage_match = re.search(r"USAGE:\s+(.*)", output)
    if usage_match:
        command_info["USAGE"] = usage_match.group(1)

    # return command_info
    commands_match = re.search(r"COMMANDS:\s+(.*)", output)
    if commands_match:
        command_info["COMMANDS"] = commands_match.group(1)

    # Extract OPTIONS
    options_match = re.findall(r"--([\w-]+).*?(?=\n)", output)
    descriptions_match = re.findall(r"\n\s{2,}(.*?)(?=\n\s{2,}--|$)", output)
    if options_match and descriptions_match:
        command_info["OPTIONS"] = {}
        for option, desc in zip(options_match, descriptions_match):
            if option == "help":
                continue
            is_required = " value" in desc
            command_info["OPTIONS"][option] = {
                "description": clean_option_description(desc.strip()),
                "is_required": is_required,
            }

    return command_info


def clean_option_description(description):
    split_value = description.split("  ")
    if len(split_value) > 1:
        description = split_value[-1].strip()
    return description


def format_as_df(dictionary_payload):
    import pandas as pd

    pd.set_option("display.max_colwidth", None)
    pd.set_option("display.max_columns", None)

    # I need to convert the dictionary to a dataframe.  The format of the dataframe should be represented as each corresponds to a command and the columns represent the inner dictionary for each command.
    df = pd.DataFrame.from_dict(dictionary_payload, orient="index")
    df = df.reset_index()

    # I need to convert the dictionaries in the OPTIONS column to formatted list of strings where each string is a key/value pair and then merge it back to the original dataframe.
    df["OPTIONS"] = df["OPTIONS"].apply(
        lambda x: " | ".join([f"--{k}: {v['description']} " for k, v in x.items()])
    )

    # replace Nan with empty string in the commands column
    df["COMMANDS"] = df["COMMANDS"].fillna("")
    return df


cli_commands = get_cli_help("car", command_help=True)
format_as_df(cli_commands)


Unnamed: 0,index,Description,USAGE,OPTIONS,COMMANDS
0,compile,compile a car file from a debug patch,car compile [command options] [arguments...],--output: The file to write to,
1,create,Create a car file,car create [command options] [arguments...],--file: The car file to write to | --no-wrap: Do not wrap the files in a directory (default: false) | --version: Write output as a v1 or v2 format car (default: 2),
2,debug,debug a car file,car debug [command options] [arguments...],--output: The file to write to,
3,detach-index,Detach an index to a detached file,car detach-index command [command options] [arguments...],,list List a detached index
4,extract,Extract the contents of a car when the car encodes UnixFS data,car extract [command options] [output directory|-],"--file: The car file to extract from, or stdin if omitted | --path: The unixfs path to extract | --verbose: Include verbose information about extracted contents (default: false)",
5,filter,Filter the CIDs in a car,car filter [command options] [arguments...],--cid-file: A file to read CIDs from | --append: Append cids to an existing output file (default: false) | --inverse: Inverse the filter (this will remove cids from the car file) (default: false) | --version: Write output as a v1 or v2 format car (default: 2),
6,get-block,Get a block out of a car,car get-block [command options] [arguments...],,
7,get-dag,Get a dag out of a car,car get-dag [command options] [arguments...],--selector: A selector over the dag | --strict: Fail if the selector finds links to blocks not in the original car (default: false) | --version: Write output as a v1 or v2 format car (default: 2),
8,index,write out the car with an index,car index command [command options] [arguments...],"--codec: The type of index to write (default: ""car-multihash-index-sorted"") | --version: Write output as a v1 or v2 format car (default: 2)",create Write out a detached index
9,inspect,verifies a car and prints a basic report about its contents,car inspect [command options] [arguments...],--full: Check that the block data hash digests match the CIDs (default: false),


### Making the CLI Tool Interactive!

Now that we've capture all the necessary ingredients, let's build out the widget UI.

In [None]:
from ast import Global
import ipywidgets as widgets
from IPython.display import display


# Global variable to hold the selected command values
CLI_TOOL = "car"
COMMAND_INFO = cli_commands.copy()
SELECTED_COMMAND = None
COMMAND_PROPERTY_VALUES = {}

### .............................................................................
# Initialize widget properties
### .............................................................................

# Create a dropdown widget for the commands
command_dropdown = widgets.Dropdown(
    options=["Select a Command..."],
    description="Command:",
    style={"description_width": "initial"},
)
# Label placeholder for the command description
command_label = widgets.Label(value="")

# Create a VBox widget to hold widgets for each command option
container = widgets.VBox()

### Widgets for go-car arguments
gocar_tool_dirs = widgets.Dropdown(
    options=["Select a directory..."],
    description="Select a directory: ",
    style={"description_width": "initial"},
)
# Create the second dropdown for selecting a CAR file found in `gocar_tool_dirs` widget.
gocar_tool_argument_dropdown = widgets.Dropdown(
    options=["Select a CAR file..."],
    description="Select a CAR file:",
    style={"description_width": "initial"},
    layout=widgets.Layout(width="auto"),
)

# Create the button that runs CLI tool command
run_command_button = widgets.Button(
    description="Run Command",
    style=widgets.ButtonStyle(button_color="cyan", font_weight="bold"),
)

### .............................................................................
# Widget Methods
### .............................................................................


def display_command_widgets():
    """
    Displays the command widgets for the CAR-Packing_PrepData notebook.

    This function clears the output, creates and displays the command dropdown,
    creates and displays the CLI tool argument dropdowns, and links the button
    click event to the on_cmd_button_clicked function.

    Parameters:
    None

    Returns:
    None
    """
    clear_output()
    command_dropdown = create_new_cmd_dropdown()
    gocar_tool_dirs = create_new_cmd_argument_dropdown()
    display(command_dropdown, command_label, container)
    # Display the CLI tool argument widgets
    display(gocar_tool_dirs, gocar_tool_argument_dropdown)
    # Link the button click event to your function
    display(run_command_button)
    run_command_button.on_click(on_cmd_button_clicked)


def create_new_cmd_dropdown():
    drop_down_vals = list(COMMAND_INFO.keys())
    drop_down_vals.insert(0, "Select a Command...")

    # Create a dropdown widget for the commands
    command_dropdown.options = drop_down_vals
    command_dropdown.observe(on_command_change)
    return command_dropdown


def on_command_change(change):
    """
    Function to handle the change event of the command dropdown.

    Parameters:
    - change (dict): The change event object containing information about the change.

    Returns:
    None
    """
    if change["type"] == "change" and change["name"] == "value":
        global COMMAND_PROPERTY_VALUES
        COMMAND_PROPERTY_VALUES = {}
        # Clear previous widgets and set the command label back to empty
        container.children = []
        command_label.value = ""

        # Get the selected command
        command_dropdown.value = change["new"]
        command = change["new"]

        # Check if the selected command is in the command_info dictionary
        if command in COMMAND_INFO:
            # Create widgets for each command option
            widgets_list = []
            for option, info in COMMAND_INFO[command]["OPTIONS"].items():
                # Create a label for the option
                label = widgets.Label(value=f"  {info['description']}")

                if info["is_required"]:
                    # Create a text box for required options
                    textbox = widgets.Text(description=option)
                    # Add the text box to the widgets dictionary
                    COMMAND_PROPERTY_VALUES[option] = textbox
                else:
                    # Create a checkbox for non-required options
                    checkbox = widgets.Checkbox(description=option)
                    # Add the checkbox to the widgets dictionary
                    COMMAND_PROPERTY_VALUES[option] = checkbox

                # Create a HBox to hold the label and widget
                hbox = widgets.HBox([COMMAND_PROPERTY_VALUES[option], label])
                widgets_list.append(hbox)

            # Add the widgets to the container
            container.children = widgets_list
            command_label.value = f'{COMMAND_INFO[command]["Description"]}   |   Usage: {COMMAND_INFO[command]["USAGE"]}'


def create_new_cmd_argument_dropdown():
    """
    Creates a new dropdown widget for selecting a directory that contains CAR files.

    Returns:
        gocar_tool_dirs (Dropdown): The dropdown widget for selecting the directory.
    """
    # Widget to generate list of directories found in `data/car_files`. This widget will be used to select the directory that contains the CAR files.
    output_car_dirs = [
        d.name for d in Path(output_directory_path(root=True)).iterdir() if d.is_dir()
    ]
    output_car_dirs.insert(0, "Select a Directory...")
    gocar_tool_dirs.options = output_car_dirs

    # Attach the update function to the directories dropdown
    gocar_tool_dirs.observe(update_gocar_tool_argument_dropdown)
    return gocar_tool_dirs


def update_gocar_tool_argument_dropdown(change):
    """
    Function to handle the change event of the gocar_tool_dirs dropdown.
    Update the dropdown options for the gocar_tool_argument_dropdown based on the selected directory.

    Parameters:
    - change (dict): The change event containing the type and name of the change.

    Returns:
    None
    """
    if change["type"] == "change" and change["name"] == "value":
        # Get the selected directory

        # Get the car files in the selected directory and sort from smallest to largest
        car_files = sorted(
            [
                f
                for f in Path(
                    output_directory_path(dir_name=gocar_tool_dirs.value)
                ).iterdir()
                if f.suffix == ".car"
            ],
            key=lambda x: x.stat().st_size,
        )
        car_files.insert(0, "Select a CAR file...")
        # extract the filename from the path
        # car_files = [car_file.name for car_file in car_files]

        # Update the car files dropdown options
        gocar_tool_argument_dropdown.options = car_files


# Define the function to be called when the button is clicked
def on_cmd_button_clicked(b):
    """
    Executes a command based on the selected options and displays the output.

    Parameters:
    - b: The button object that triggered the event.

    Returns:
    None
    """
    # Clear print statements from previous run and display the command widgets with the previous inputs
    clear_output()
    display_command_widgets()

    # Get the command options into a structured dictionary format
    widget_values = get_widget_values()

    # Loop over the widget values and add them to the command.
    # If the value is True, add the option without a value
    cmd_options = list()
    for option, value in widget_values.items():
        if isinstance(value, bool):
            if value is True:
                cmd_options.append(f"--{option}")
        elif value != "":
            cmd_options.append(f"--{option} {value}")
    if len(cmd_options) > 0:
        cmd_options = " ".join(cmd_options)
    else:
        cmd_options = ""

    # Construct the command with all the components and print.
    command = f"{CLI_TOOL} {command_dropdown.value} {cmd_options} {gocar_tool_argument_dropdown.value}"
    print(command)

    # Run the command
    result = subprocess.run(command, shell=True, capture_output=True)
    if result.returncode == 0:
        print(result.stdout.decode())
    else:
        if result.stdout.decode() != "":
            print(f"{result.stdout.decode()}")
        print(f"Error: {result.stderr.decode()}")


def get_widget_values():
    """
    Get the values of each widget and return them as a dictionary.

    Returns:
        dict: A dictionary containing the values of each widget.
    """
    # Dictionary to hold the widget values
    widget_values = {}

    # Get the value of each widget
    for option, widget in COMMAND_PROPERTY_VALUES.items():
        widget_values[option] = widget.value

    return widget_values


# Display the dropdown and output widgets
display_command_widgets()


Dropdown(description='Command:', index=11, options=('Select a Command...', 'compile', 'create', 'debug', 'deta…

Label(value='List the CIDs in a car   |   Usage: car list [command options] [arguments...]')

VBox(children=(HBox(children=(Checkbox(value=True, description='verbose'), Label(value='  Include verbose info…

Dropdown(description='Select a directory: ', index=3, options=('Select a Directory...', 'output', 'output2', '…

Dropdown(description='Select a CAR file:', index=1, layout=Layout(width='auto'), options=('Select a CAR file..…

Button(description='Run Command', style=ButtonStyle(button_color='cyan', font_weight='bold'))

car list --verbose C:\Github\Client_Projects\UMD\easier\data\car_files\output3\baga6ea4seaqahj7s2z7vehonvy2euguk4e254yfywskzxmbp4vbz4vxt4czqcha.car
dag-pb: bafybeif57b5qfwy7ucx4pmycol3bkpu3g3zg723rtn5iju5apkjivo7hvq
	3 links. 2 bytes
		comp[5.7 MB] bafybeiepmyhpbpn43o7wldfkuurufd7dcidbyob3435wwqll3djfnakeei
		data[2.5 GB] bafybeifurw2z3xmkrflcqy2xfpr34ys7dneodveyeniov7voiss5azkdhu
		guide[295 B] bafybeidgkjpwkmajsdijlkearn5c4ogtxloqogags5cfvpn2riwsjxsvde
	Unixfs Directory
dag-pb: bafybeiepmyhpbpn43o7wldfkuurufd7dcidbyob3435wwqll3djfnakeei
	3 links. 2 bytes
		GEDI_L4B_ATBD_V2.0.pdf[3.0 MB] bafybeifcodio3sdrmcpykzrqjdluyiyeds3gmfsocjuqr4oghkrula4may
		GEDI_L4B_Gridded_Biomass_V2_1.pdf[2.0 MB] bafybeihohmjyjixpqwszocv6hxnaomegxm627ratchdnqmbyxaptcxtgke
		gedi_l4b_excluded_granules_v21.json[759 kB] bafkreihzklpkkhzsmgmupljxtnjbdz2gkcynan24mxinrt5vncb67oo2li
	Unixfs Directory
dag-pb: bafybeifurw2z3xmkrflcqy2xfpr34ys7dneodveyeniov7voiss5azkdhu
	10 links. 2 bytes
		GEDI04_B_MW019MW223_02_002_

# Gathering Metadata on our CAR Files

As we saw above running the `ez-prep` command from Singularity, two or more CAR files are created, depending on the chunking size that was specified. Let's structure the output into a dictionary so we can easily capture details about the CAR file and it's contents.  We'll be adding to this dictionary as we progress through this notebook, utilizing other tools to capture additional metadata. 

In [8]:
# Loop through the variable SINGULARITY_RESULT and create a dictionary that is saved to CAR_PAYLOAD where the key, StoragePath is the outer dictionary key and the inner dictionary contains all the other key/values found in SINGULARITY_RESULT. The dictionary will be saved to the variable CAR_PAYLOAD.
CAR_PAYLOAD = {}
for item in SINGULARITY_RESULT:
    # Save the StoragePath value as the key for the dictionary
    car_file_name = item["StoragePath"]
    # Save the entire dictionary as the value for the key
    CAR_PAYLOAD[car_file_name] = item
    # Add the full path to the StoragePath key so it can reference the file later
    CAR_PAYLOAD[car_file_name]["FilePath"] = Path(
        output_directory_path(output_directory_name_input.value), car_file_name
    ).resolve()

# Display the first item in the CAR_PAYLOAD dictionary
CAR_PAYLOAD[list(CAR_PAYLOAD.keys())[0]]


IndexError: list index out of range

## Next let's look at the CID metadata for the car files that were just created

We'll next run a set of commands (`car inspect` and `car list`) from the [go-car](https://github.com/ipld/go-car) CLI tool as to dive into each CAR file and retrieve details such as:
- Root CID of each single CAR file
- Name of the files found in each CAR file
- CIDs of the files in all the CAR files
- Size of each file in the CAR
- If the file is completely contained within the CAR file. If not, what other CAR files contain the file and the size of the content 

In [7]:
import json


def convert_to_dict_inspect(output):
    lines = output.strip().split("\n")
    data = {}
    current_key = None

    for line in lines:
        if ":" in line:
            key, value = line.split(":", 1)
            key = key.strip()
            value = value.strip()
            if value == "":
                data[key] = {}
                current_key = key
            elif current_key and "\t" not in key:
                data[current_key][key] = value
            else:
                data[key] = value
        elif "\t" in line and current_key:
            key, value = line.split("\t", 1)
            key = key.strip()
            value = value.strip()

            data[current_key][key] = value
    return data


def parse_output_car_list(output):
    lines = output.split("\n")

    result = {}
    current_key = None
    current_dict = None
    skip_lines = False

    for line in lines:
        if line.startswith("raw:"):
            continue
        elif line.startswith("dag-pb:"):
            current_key = line.split(": ")[1]
            current_dict = {}
            result[current_key] = current_dict
            skip_lines = False
        elif line.startswith("\tUnixfs"):
            current_dict["unixfs_type"] = line.split(" ")[1].lower()
        elif line.startswith("\t") and not line.startswith("\t\t"):
            link, size = line.strip().split(". ")
            current_dict["links"] = int(link.split(" ")[0])
            current_dict["size"] = size
            skip_lines = True
        elif line.startswith("\t\t") and not skip_lines:
            link_key = line.split(" ")[1]
            link_size = line.split(" ")[0][1:]
            current_dict[link_key] = {"size": link_size}
    return result


### Inspect the CAR files
# .............................................................................
print(
    "\n............................................................................\n\n"
)

print_command_help("car", "inspect")

for car_file, car_properties in CAR_PAYLOAD.items():
    print(f"Reviewing contents of {car_file}...")
    command = f"car inspect {car_properties['FilePath']}"
    car_inspect_result = subprocess.run(command, shell=True, capture_output=True)
    CAR_PAYLOAD[car_file].update(
        convert_to_dict_inspect(car_inspect_result.stdout.decode("utf-8"))
    )

### List the CIDs of each CAR file
# .............................................................................
print(
    "\n.............................................................................\n\n"
)

print_command_help("car", "ls")

for car_file, car_properties in CAR_PAYLOAD.items():
    print(f"Reviewing contents of {car_file} with Root CID: {car_properties['Roots']})")
    command = f"car ls -v {car_properties['FilePath']}"
    car_list_result = subprocess.run(command, shell=True, capture_output=True)
    CAR_PAYLOAD[car_file]["related"] = {}
    CAR_PAYLOAD[car_file]["related"] = parse_output_car_list(
        car_list_result.stdout.decode()
    )

### Print the CAR_PAYLOAD dictionary
# .............................................................................
print(
    "\n.............................................................................\n\n"
    + "Printing Extracted details from CAR files...\n\n"
)

# Convert WindowsPath object to string before serializing to JSON
car_payload_str = json.dumps(CAR_PAYLOAD, indent=4, default=str)
print(car_payload_str)



............................................................................


NAME:
   car inspect - verifies a car and prints a basic report about its contents

USAGE:
   car inspect [command options] [arguments...]

OPTIONS:
   --full      Check that the block data hash digests match the CIDs (default: false)
   --help, -h  show help


.............................................................................


NAME:
   car list - List the CIDs in a car

USAGE:
   car list [command options] [arguments...]

OPTIONS:
   --verbose, -v  Include verbose information about contained blocks (default: false)
   --unixfs       List unixfs filesystem from the root of the car (default: false)
   --help, -h     show help


.............................................................................

Printing Extracted details from CAR files...


{}


In [6]:
def parse_string_to_dict(input_string):
    input_string = input_string.replace("\\n", "\n").replace("\\t", "\t")
    lines = input_string.split("\n")
    result = {}
    current_dict = None

    for line in lines:
        if line.startswith("dag-pb:"):
            current_key = line.split(": ")[1]
            current_dict = {"links": {}, "unixfs_type": None}
            result[current_key] = current_dict
        elif line.startswith("\t") and not line.startswith("\t\t"):
            current_dict["unixfs_type"] = line.split(" ")[1].lower()
        elif line.startswith("\t\t"):
            link_key = line.split(" ")[1]
            link_size = line.split(" ")[0][1:]
            current_dict["links"][link_key] = {"size": link_size}

    return result


car_list_result.stdout.decode()

result = parse_string_to_dict(car_list_result.stdout.decode())
result


NameError: name 'car_list_result' is not defined

# Performing Partial Extraction From a Set of CAR Files

While we could use the `extract`, `get-block` or `get-dag` modules from the [go-car](https://github.com/ipld/go-car) tool, content retrieval by CID reference is only possible if all links are completely contained in a single CAR file.  For cases where CID links are spread out amongst a set of CAR files in a directory (*assuming that the set is complete with no missing CAR files*), we'll be using [Singularity](https://data-programs.gitbook.io/singularity/overview/readme) to perform partial content extraction, specifically the `extract-car` module.

```bash
NAME:
   singularity extract-car - Extract folders or files from a folder of CAR files to a local directory

USAGE:
   singularity extract-car [command options] [arguments...]

CATEGORY:
   Utility

OPTIONS:
   --input-dir value, -i value  Input directory containing CAR files. This directory will be scanned recursively
   --output value, -o value     Output directory or file to extract to. It will be created if it does not exist (default: ".")
   --cid value, -c value        CID of the folder or file to extract
   --help, -h                   show help
```



In [512]:
# singularity extract-car --input-dir C:/Github/Client_Projects/UMD/easier/data/car_files/sesh_with_zheng/ --cid bafybeidgkjpwkmajsdijlkearn5c4ogtxloqogags5cfvpn2riwsjxsvde --output C:/Github/Client_Projects/UMD/easier/data/car_files/test_export
cid_of_interest = "bafybeiakure7paspbvxqfj4b64svy3vtxqjc72kww6doykphaq63w23ctu"
# Construct the output directory path
input_car_path = output_directory_path(output_directory_name_input.value)
export_path = output_directory_path_export((output_directory_name_input.value))
# Check and prepare the export directory
check_output_directory(export_path)

print_command_help("singularity", "extract-car")
command = f"singularity extract-car --input-dir {input_car_path} --cid {cid_of_interest} --output {export_path}"
export_result = subprocess.run(command, shell=True, capture_output=True)
export_result


NAME:
   singularity extract-car - Extract folders or files from a folder of CAR files to a local directory

USAGE:
   singularity extract-car [command options] [arguments...]

CATEGORY:
   Utility

OPTIONS:
   --input-dir value, -i value  Input directory containing CAR files. This directory will be scanned recursively
   --output value, -o value     Output directory or file to extract to. It will be created if it does not exist (default: ".")
   --cid value, -c value        CID of the folder or file to extract
   --help, -h                   show help



CompletedProcess(args='singularity extract-car --input-dir C:\\Github\\Client_Projects\\UMD\\easier\\data\\car_files\\hope_this_works --cid bafybeihob3p5hlhzbsindkg5lluznkbfobwqlgzok3tfueeiluir7sqkjm --output C:\\Github\\Client_Projects\\UMD\\easier\\data\\output\\car_files\\hope_this_works', returncode=0, stdout=b'Create Dir C:\\Github\\Client_Projects\\UMD\\easier\\data\\output\\car_files\\hope_this_works\nWriting to C:\\Github\\Client_Projects\\UMD\\easier\\data\\output\\car_files\\hope_this_works\\Online_Version_GEDI_L4B_Gridded_Biomass_V2_1.html\nCreate Dir C:\\Github\\Client_Projects\\UMD\\easier\\data\\output\\car_files\\hope_this_works\\copy\nWriting to C:\\Github\\Client_Projects\\UMD\\easier\\data\\output\\car_files\\hope_this_works\\copy\\Online_Version_GEDI_L4B_Gridded_Biomass_V2_1.html\n', stderr=b'')

# Adding content to IPFS

In [10]:
def check_mfs_path(mfs_path: str):
    # Replace any backslashes with forward slashes
    mfs_path.replace(r"\\", "/")
    # Check if the path starts with and ends with a "/"
    if not mfs_path.startswith("/"):
        mfs_path = f"/{mfs_path}"
    if not mfs_path.endswith("/"):
        mfs_path = f"{mfs_path}/"
    # Check if the path exists and create it if it doesn't
    check_mfs_path_exists(mfs_path)
    return mfs_path


def check_mfs_path_exists(mfs_path: str):
    command = f"ipfs files ls {mfs_path}"
    result = subprocess.run(command, shell=True, capture_output=True)
    if result.returncode == 1:  # Path does not exist
        print(f"Creating the MFS directory to {mfs_path}...")
        command = f"ipfs files mkdir {mfs_path} --parents --cid-version 1"
        result = subprocess.run(command, shell=True, capture_output=True)
        print(result)


# Print the extracted hash value
print_command_help("ipfs", "add")
command = f"ipfs add {output_directory_path(output_directory_name_input.value)} --to-files {check_mfs_path('import')} --cid-version 1 --progress --recursive --inline"
result = subprocess.run(command, shell=True, capture_output=True)
decoded_stm = result.stdout.decode("utf-8")
print(decoded_stm)


USAGE
  ipfs add <path>... - Add a file or directory to IPFS.

SYNOPSIS
  ipfs add [--recursive | -r] [--dereference-args] [--stdin-name=<stdin-name>]
           [--hidden | -H] [--ignore=<ignore>]...
           [--ignore-rules-path=<ignore-rules-path>] [--quiet | -q]
           [--quieter | -Q] [--silent] [--progress | -p] [--trickle | -t]
           [--only-hash | -n] [--wrap-with-directory | -w]
           [--chunker=<chunker> | -s] [--raw-leaves] [--nocopy] [--fscache]
           [--cid-version=<cid-version>] [--hash=<hash>] [--inline]
           [--inline-limit=<inline-limit>] [--pin=false] [--to-files=<to-files>]
           [--] <path>...

ARGUMENTS

  <path>... - The path to a file to be added to IPFS.

OPTIONS

  -r, --recursive            bool   - Add directory paths recursively.
  --dereference-args         bool   - Symlinks supplied in arguments are
                                      dereferenced.
  --stdin-name               string - Assign a name if the file source is s

# Examples from widget documentation

In [None]:
a = widgets.IntSlider(description="Delayed", continuous_update=False)
b = widgets.IntText(description="Delayed", continuous_update=False)
c = widgets.IntSlider(description="Continuous", continuous_update=True)
d = widgets.IntText(description="Continuous", continuous_update=True)

widgets.link((a, "value"), (b, "value"))
widgets.link((a, "value"), (c, "value"))
widgets.link((a, "value"), (d, "value"))
widgets.HBox([a, b, c, d])


In [None]:
caption = widgets.Label(value="The values of slider1 and slider2 are synchronized")
sliders1, slider2 = (
    widgets.IntSlider(description="Slider 1"),
    widgets.IntSlider(description="Slider 2"),
)
l = widgets.link((sliders1, "value"), (slider2, "value"))
display(caption, sliders1, slider2)

caption = widgets.Label(value="Changes in source values are reflected in target1")
source, target1 = (
    widgets.IntSlider(description="Source"),
    widgets.IntSlider(description="Target 1"),
)
dl = widgets.dlink((source, "value"), (target1, "value"))
display(caption, source, target1)


In [None]:
l.unlink()
dl.unlink()


In [None]:
caption = widgets.Label(value="The values of slider1 and slider2 are synchronized")
sliders1, slider2 = (
    widgets.IntSlider(description="Slider 1"),
    widgets.IntSlider(description="Slider 2"),
)
l = widgets.link((sliders1, "value"), (slider2, "value"))
display(caption, sliders1, slider2)


In [None]:
# Utils widgets
from ipywidgets import Button, Layout, jslink, IntText, IntSlider


def create_expanded_button(description, button_style):
    return Button(
        description=description,
        button_style=button_style,
        layout=Layout(height="auto", width="auto"),
    )


top_left_button = create_expanded_button("Top left", "info")
top_right_button = create_expanded_button("Top right", "success")
bottom_left_button = create_expanded_button("Bottom left", "danger")
bottom_right_button = create_expanded_button("Bottom right", "warning")

top_left_text = IntText(
    description="Top left", layout=Layout(width="auto", height="auto")
)
top_right_text = IntText(
    description="Top right", layout=Layout(width="auto", height="auto")
)
bottom_left_slider = IntSlider(
    description="Bottom left", layout=Layout(width="auto", height="auto")
)
bottom_right_slider = IntSlider(
    description="Bottom right", layout=Layout(width="auto", height="auto")
)

app = TwoByTwoLayout(
    top_left=top_left_text,
    top_right=top_right_text,
    bottom_left=bottom_left_slider,
    bottom_right=bottom_right_slider,
)

link_left = jslink((app.top_left, "value"), (app.bottom_left, "value"))
link_right = jslink((app.top_right, "value"), (app.bottom_right, "value"))
app.bottom_right.value = 30
app.top_left.value = 25
app
