In [None]:
import hoss
import os
import tempfile

import hoss.tools.upload

## Connect to local server
This notebook demonstrates how to use the upload tool that is included in the hoss client library.

For these demo notebooks, it's assumed you're running against the system running in
dev mode and able to connect to localhost.

We start by connecting the the "local" server. If using a different server be sure to change the `.connect()` arg

In [None]:
server_local = hoss.connect('http://localhost')

In [None]:
print("Existing Namespaces:")
print(server_local.list_namespaces())

## Create a dataset
First load the default namespace and then create a dataset inside the namespace

In [None]:
ns = server_local.get_namespace('default')

In [None]:
ds = ns.create_dataset("upload-test", "A dataset for an upload tool example")
ds.display()

## Write test data to upload

The upload tool operates on a directory of files. Create a test directory of dummy data.

In [None]:
temp_dir = tempfile.TemporaryDirectory()
for cnt in range(5):
    with open(os.path.join(temp_dir.name, f"file{cnt}.dat"), 'wt') as fh:
        fh.write('dummy data' * 5000000)

## Run upload tool

You can run the upload tool as a function that even works in Jupyter.

You can also run the upload tool from the command line. When you pip install the hoss client library, the program `hoss` is installed. The format of the command line interface is:

`hoss upload <dataset name> <absolute path to the upload dir>`

You can optionally write metadata key-value pairs using the `-m` flag (i.e `-m subject_id=123`). Multiple `-m` optional args are supported.

You can optionally filter out files to upload using a regex string with the `--skip` arg.

You can specify the endpoint (defaults to localhost) using the `--endpoint` arg.

In [1]:
!hoss upload -h

Usage: hoss upload [OPTIONS] DATASET_NAME DIRECTORY

  Upload files in a directory to an existing dataset

Options:
  -n, --namespace TEXT            Namespace that contains the dataset
                                  [default: default]
  -e, --endpoint TEXT             Hoss server root endpoint  [default:
                                  http://localhost]
  -p, --prefix TEXT               Optional prefix to where the files should be
                                  uploaded. If this is not provided, the files
                                  will be uploaded to a 'directory' in the
                                  root of the dataset. The the 'directory'
                                  name will be the same as the source
                                  directory name.
  -s, --skip TEXT                 Optional regular expression used to filter
                                  out files to skip (e.g. myprefix.*\.txt)
  -j, --num_processes INTEGER     Number of processes to u

In [None]:
# Try uploading by using the function directly
# We can populate most args using the client library objects we've already created
hoss.tools.upload.upload_directory(ds.dataset_name, temp_dir.name, ns.name, server_local.base_url, num_processes=1,
                                   skip=None, max_concurrency=10, multipart_threshold=48, multipart_chunk_size=48, metadata={"my-upload-test": "foo"})

## Verify the files uploaded successfully

In [None]:
for f in (ds / "my-test").iterdir():
    print(f)

## Clean up this example
Run these cells to remove the resources created during the test

In [None]:
temp_dir.cleanup()

In [None]:
ns.delete_dataset("upload-test")