Skip to content

Commit

Permalink
added synchronized folders docs
Browse files Browse the repository at this point in the history
  • Loading branch information
krokicki committed Nov 9, 2022
1 parent 58c4d1b commit f9e5486
Show file tree
Hide file tree
Showing 6 changed files with 69 additions and 28 deletions.
19 changes: 0 additions & 19 deletions content/docs/administration/aws/Operation.md

This file was deleted.

25 changes: 25 additions & 0 deletions content/docs/administration/aws/data.md
@@ -0,0 +1,25 @@
---
title: "Data Import"
linkTitle: "Data Import"
weight: 50
description: >
How to get image data into the system
---

### Adding the MouseLight Open Data

The MouseLight project at Janelia has made available the complete imagery and neuron tracing annotations for its published data sets on the [AWS Open Data Registry](https://registry.opendata.aws/janelia-mouselight/).

You can easily make these available in your own HortaCloud instance by [creating a Synchronized Folder](/docs/user_manual/synchronized_folders/) which points to `/s3data/janelia-mouselight-data`.

### Adding a single volume

A Horta Sample is an object representing a single 3D volume that can be visualized and traced with Horta.

If you already have the HortaKTX format data in your mounted S3 buckets, select **File****New****Horta Sample**, and then set "Sample Name" to `<sampleDirectoryName>` and "Path to Render Folder" as `/s3data/<bucketName>/<sampleDirectoryName>`.

Open the Data Explorer (**Window****Core****Data Explorer**) and navigate to Home, then "3D RawTile Microscope Samples", and your sample name. Right-click the sample and choose "Open in Horta". This will open the Horta Panel and then from the Horta Panel you have options to create a workspace or to open the 2D or the 3D volume viewer.

### Converting your data

If your data has not been converted into HortaKTX format, you can use the [HortaCloud Data Importer](https://github.com/JaneliaSciComp/hortacloud-importer) to convert it. This tool supports many common image formats, and can be easily extended in Python to support your favorite format.
18 changes: 9 additions & 9 deletions content/docs/administration/aws/deployment.md
Expand Up @@ -63,7 +63,7 @@ openssl rand -hex 32

We prefer this procedure because these values will be handled during the installation using the `sed` command and it is preferable that they not contain any characters that require escaping in a sed command.

If you already have data on some S3 buckets you can add them to `HORTA_DATA_BUCKETS` as a comma separated list. For example, if you want to use Janelia's Open Data bucket but in addition you also have your data on a private bucket ('janelia-mouseligh-demo' in this example) you need to set `HORTA_DATA_BUCKETS="janelia-mouselight-imagery,janelia-mouselight-demo"`. Currently it is set to Janelia's open data MouseLight bucket only. Every bucket specified in the 'HORTA_DATA_BUCKETS' list will be available in the workstation as `/<s3BucketName>` directory.
If you already have data on some S3 buckets you can add them to `HORTA_DATA_BUCKETS` as a comma separated list. For example, if you want to use Janelia's Open Data bucket but in addition you also have your data on a private bucket ('janelia-mouselight-demo' in this example) you need to set `HORTA_DATA_BUCKETS="janelia-mouselight-imagery,janelia-mouselight-demo"`. By default, only the MouseLight Open Data bucket is mounted. Every bucket specified in the 'HORTA_DATA_BUCKETS' list will be available in Horta as `/s3data/<s3BucketName>` directory.

If you want to change the setting for `HORTA_WS_INSTANCE_TYPE`, keep in mind that you may have to change `HORTA_WS_IMAGE_NAME`
For `HORTA_WS_INSTANCE_TYPE` set to any `stream.graphics.g4dn.*` instances:
Expand Down Expand Up @@ -125,15 +125,15 @@ with some manual intervention for **AppStream builder** step (third step outline

2) **Deploy the back-end stacks** - this includes the AppStream builder. At the back end deployment the installation process will also create the admin user configured in `ADMIN_USER_EMAIL`.

3) **Connect to AppStream builder and install the workstation application** - This is a semiautomated step that involves copying and running two PowerShell scripts onto the AppStream builder instance.
3) **Connect to AppStream builder and install the Horta application** - This is a semiautomated step that involves copying and running two PowerShell scripts onto the AppStream builder instance.

4) **Deploy the administration stack.**

### Workstation app installation
### Install the Horta desktop application

For client installation start and connect to the AppStream builder instance then copy the following scripts from this repo to the AppStream instance:

* [installcmd.ps1](https://github.com/JaneliaSciComp/hortacloud/blob/main/vpc_stack/src/asbuilder/installcmd.ps1) - installs JDK and the workstation
* [installcmd.ps1](https://github.com/JaneliaSciComp/hortacloud/blob/main/vpc_stack/src/asbuilder/installcmd.ps1) - installs JDK and the Horta application
* [createappimage.ps1](https://github.com/JaneliaSciComp/hortacloud/blob/main/vpc_stack/src/asbuilder/createappimage.ps1) - creates the AppStream image

After you copied or created the scripts:
Expand All @@ -152,23 +152,23 @@ After you copied or created the scripts:
cd 'C:\Users\ImagebuilderAdmin\My Files\Temporary Files'
```

* Run the installcmd script to install the workstation. &lt;serverName&gt; is the name of the backend EC2 instance, typically it looks like `ip-<ip4 with dashes instead of dots>.ec2.internal`. Instructions for locating this are provided as output from the installer script. The workstation client certificate is signed using the ec2 internal name so do not use the actual IP for the &lt;serverName&gt; parameter, because user logins will fail with a certificate error.
* Run the installcmd script to install Horta. &lt;serverName&gt; is the name of the backend EC2 instance, typically it looks like `ip-<ip4 with dashes instead of dots>.ec2.internal`. Instructions for locating this are provided as output from the installer script. The Horta client certificate is signed using the ec2 internal name so do not use the actual IP for the &lt;serverName&gt; parameter, because user logins will fail with a certificate error.

```powershell
installcmd.ps1 <serverName>
```

This will install the JDK and the workstation. The installer will run silently and it will install the workstation under the `C:\apps` folder. If it prompts you for the install directory, select `C:\apps` as the JaneliaWorkstation location.
This will install the JDK and Horta. The installer will run silently and it will install the Horta application under the `C:\apps` folder. If it prompts you for the install directory, select `C:\apps` as the JaneliaWorkstation location.

* *Optional* - To start the workstation for testing, run:
* *Optional* - To start Horta for testing, run:

```powershell
c:\apps\runJaneliaWorkstation.ps1
```

* when prompted, login as the admin user you set in ADMIN_USER_EMAIL (leave the password empty)
* Navigate through the menus to make sure the workstation is working. *Do not create any user accounts at this time as they will get created from the Admin web application.*
* When testing is finished, close down the workstation.
* Navigate through the menus to make sure Horta is working. *Do not create any user accounts at this time as they will get created from the Admin web application.*
* When testing is finished, close down Horta.

* Finalize the creation of the AppStream image, run:

Expand Down
Binary file added content/docs/user_manual/add_sync_folder.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions content/docs/user_manual/synchronized_folders.md
@@ -0,0 +1,35 @@
---
title: "Synchronized Folders"
linkTitle: "Synchronized Folders"
weight: 55
description: >
How to use Synchronized Folders to make data available to Horta
---

## What are Synchronized Folders

If you are incrementally adding a lot of data to the system, using the **File****New****Horta Sample** menu option can become tedious. If you create a Synchronized Folder, your data will be discovered as it is placed in storage, and Horta Samples will be automatically generated. The new Horta Samples can be made available to you or a group of users of your choice.

Synchronized Folders also load traced neurons, when those tracings are made available in a specific format. The [MouseLight Open Data](https://registry.opendata.aws/janelia-mouselight/) contains traced neurons alongside the imagery. If you use a Synchronized Folder to bring that data into Horta, you'll also get Workspaces containing the published neurons that were traced on each sample.

## Creating a Synchronized Folder

You can find Synchronized Folders near the top of the Data Explorer:

![Synchronized folders](../synchronized_folders.png)

To add a new Synchronized Folder, right-click the "Synchronized Folders" node and choose "Add Synchronized Folder..."

![Adding synchronized folder](../add_sync_folder.png)

You can fill out the form as follows:

* The **path** you want to synchronize needs to be mounted and accessible through the back-end. Typically, S3 buckets are mounted as `/s3data/<bucket name>`. If you don't know the path to use, talk to your system administrator.
* You can choose any **name** (i.e. label) for your synchronized folder. This is the name that will appear in the Data Explorer. It can include spaces and special characters if you wish.
* The **depth** is the depth of the hierarchy to traverse before reaching data to discover. In the example above we specify a depth of 1, since we are searching in the images subfolder. If we wanted to search the entire bucket (i.e. at the parent folder level), we would need to specify a depth of 2 to find the same Horta Samples.
* The **owner** is the user or group that will "own" the objects in the Workstation. This primarily affects who can see the objects in the Data Explorer. For example, if you choose a group here then everyone in the group will see the samples you discover.
* Finally, you should choose the data types that should be discovered while synchronizing with the file system. Typically, you'll want to select **Horta Samples** here, but you can also discover other data types such as **Zarr Containers**, which can be viewed in 2D with the BigDataViewer modules.

Once you click "Save and Synchronized", the backend will try to explore the path you provided and create objects corresponding to any data it finds. You can watch the progress in the "Background Tasks" panel (typically found on the right side of the screen, it can also be accessed via the **Window****Core** menu). Synchronization can take a while when there are a lot of traced neurons, such as with the MouseLight Open Data. Once the process is complete, you can click the Refresh button in the Data Explorer to see what was discovered.

Currently, synchronization is done only on-demand. If you add data later, make sure to right-click your Synchronized Folder and choose "Refresh Synchronized Folder".
Binary file added content/docs/user_manual/synchronized_folders.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f9e5486

Please sign in to comment.