Skip to content

Tree Of Life 200M tools, configurations, documentaion#25

Merged
Andrey170170 merged 16 commits intomainfrom
docs
May 22, 2025
Merged

Tree Of Life 200M tools, configurations, documentaion#25
Andrey170170 merged 16 commits intomainfrom
docs

Conversation

@Andrey170170
Copy link
Copy Markdown
Collaborator

No description provided.

Andrey170170 and others added 9 commits May 18, 2025 00:57
Updated configuration files to support excluded servers and added support for additional download settings. Introduced new README guides for FathomNet, BIOSCAN, and Safe download workflows, ensuring clear postprocessing instructions. Included detailed Slurm script documentation for tool submission and management.
Introduce a new configuration file, `eol_download_config.yaml`, for managing Encyclopedia of Life (EoL) downloads within the Tree of Life 200M dataset. Also, add a detailed `EoL_download_README.md` documenting the step-by-step process for downloading and post-processing the data. Removed "eol" from `safe_download_config.yaml` to separate download workflows.
Introduce a new `tol200m_fathom_net_crop` module that processes and crops images based on bounding box metadata. It includes components for filtering valid image partitions, scheduling distributed tasks, and performing cropping operations with updated metadata. Added a comprehensive README detailing the tool's functionality, configuration, and pre/post-conditioning requirements.
…eduler

Introduce `tol200m_bioscan_data_transfer` module, including classes for data transfer, scheduling, and processing with MPI. Updated README with configuration details for the new module and added imports to the main TreeOfLife toolbox.
Revised and improved download guides for BIOSCAN, GBIF, FathomNet, and EoL, including clearer component-specific details, troubleshooting, and updated configurations. Central README updated to reflect new structure and improved guidance for TreeOfLife200M dataset setup.
Updated README files for both new tools.
…cesses

- Introduced bioscan_download_config.yaml with detailed parameters for BIOSCAN dataset.
- Updated eol_download_config.yaml, fathomNet_download_config.yaml, general_download_config.yaml, and safe_download_config.yaml to include destination folders for images and errors.
- Enhanced README files for BIOSCAN, EoL, FathomNet, GBIF fast, and GBIF slow downloads to reference new config files and provide clearer instructions.
Copy link
Copy Markdown
Member

@egrace479 egrace479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, likely relevant to the other configs

path_to_input: "path_to_input_file" # Path to the input file with the list of servers
path_to_output_folder: "path_to_output_folder" # Path to the output folder
provenance_path: "" # Path to a provenance Parquet file containing source metadata
path_to_tol_folder: "" # Path to where you save TOL data folder (a.k.a. `<output_dir>/data`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where you want to save it or where it has been saved?

@@ -0,0 +1,68 @@
account: "account_name" # Account name for the cluster
path_to_input: "path_to_input_file" # Path to the input file with the list of servers
path_to_output_folder: "path_to_output_folder" # Path to the output folder
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What goes to this output folder?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bioscan specifically, only partitioned metadata, the images are transferred directly to the ToL save location

Comment on lines +8 to +10
scripts:
# Wrapper scripts to submit jobs to the cluster
general_submitter: "path_to_general_submitter_script.sh"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be e.g., "path/to/general_submitter_script.sh"

Comment thread config/tree_of_life_200M/bioscan_download_config.yaml Outdated
Andrey170170 and others added 2 commits May 22, 2025 14:15
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Copy link
Copy Markdown
Member

@egrace479 egrace479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on Docs (mostly updates to phrasing and adding links to sources)

Comment thread docs/BIOSCAN_download_README.md Outdated
Comment thread docs/BIOSCAN_download_README.md Outdated
Comment thread docs/BIOSCAN_download_README.md Outdated
Comment thread docs/BIOSCAN_download_README.md Outdated
Comment thread docs/EoL_download_README.md Outdated
Comment thread docs/GBIF_slow_download_README.md Outdated
### 1. Setup and Download

1. **Install and Configure Downloader**
- Set up the `distributed-downloader` package following the [official instructions](https://github.com/Imageomics/distributed-downloader/blob/9ef8b0d297f7a868fac31b2b9c3d5f3aa5533472/docs/scripts_README.md)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be pip installable or it's about where the scripts are located?

Comment thread docs/README.md Outdated
Comment thread docs/README.md Outdated
Comment thread docs/README.md Outdated
Comment thread docs/README.md Outdated
Comment thread src/TreeOfLife_toolbox/tol200m_fathom_net_crop/README.md Outdated
Andrey170170 and others added 3 commits May 22, 2025 16:23
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Added installation instructions
@Andrey170170 Andrey170170 requested a review from egrace479 May 22, 2025 20:46
Copy link
Copy Markdown
Member

@egrace479 egrace479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last changes

Comment thread docs/EoL_download_README.md Outdated
Comment thread docs/FathomNet_download_README.md Outdated
Comment thread docs/GBIF_fast_download_README.md Outdated
Comment thread docs/GBIF_slow_download_README.md Outdated
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
@Andrey170170 Andrey170170 merged commit 7214fcf into main May 22, 2025
@Andrey170170 Andrey170170 deleted the docs branch May 22, 2025 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants