Docs fix by trvachov · Pull Request #826 · NVIDIA-BioNeMo/bionemo-framework

trvachov · 2025-04-11T18:05:07Z

Description

For the Geneformer documentation:

Capitalization standardization:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
- Fixed "children" (previously "chidlren")
Formatting improvements:
- Properly formatted model version bullet points with nesting
- Added proper headings for property categories
- Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections
Image captions:
- Replaced low-quality image captions with descriptive, properly formatted titles
- Made chart descriptions more professional and consistent
Grammatical improvements:
- Fixed article usage and punctuation
- Improved sentence structure and clarity
- Fixed section headings capitalization and consistency
Fixed broken notes:
- Corrected !! note to !!! note for proper rendering

For the ESM-2 pretraining documentation:

Grammar and clarity improvements:
- Fixed article usage ("a ESM-2" → "an ESM-2")
- Fixed formatting of numeric values (e.g., "1." → "1.0")
- Fixed typos ("depreciation" → "deprecation")
- Fixed "trainiing" → "training"
Consistency in terminology:
- Standardized "BioNeMo" capitalization
- Ensured consistent treatment of "ESM-2" references
Structure and formatting:
- Improved spacing and paragraph breaks
- Fixed section formatting and readability

For the training-models documentation:

Capitalization and consistency:
- Standardized capitalization of model sizes (8M, 650M, 3B)
- Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
- Changed "WandB" to "Weights and Biases" consistently
Formatting improvements:
- Changed code blocks consistently to include language tags
- Added proper spacing and improved paragraph formatting
- Fixed punctuation in lists and note sections
Grammar and clarity:
- Added missing commas after introductory phrases
- Fixed formatting of lists for better readability
- Made bulleted explanations more consistent

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

SKIP_CI - Skip all continuous integration tests
INCLUDE_NOTEBOOKS_TESTS - Execute notebook validation tests in pytest
INCLUDE_SLOW_TESTS - Execute tests labelled as slow in pytest for extensive testing

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

copy-pr-bot · 2025-04-11T18:05:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jwilber

Added some small changes and approved:

models/geneformer.md: Changed 20M of 26M cells to 20M of the 26M cells
pretrain.md: Changed Pott's model to Potts model (no apostrophe). Added comma here after the word tier here "we are working on a free tier so a credit card...". Remove training after pretraining (redundant) here: To load pretraining training and validation data with mapped UniRef90
sequences to UniRef50 clusters
initialization-guide.md: Separated sentence here: The port number for a Jupyter Lab server, default port is 8888 -> The port number for a Jupyter Lab server. The default port is 8888
training-models.md: Changed two word issues: context specific -> context-specific, usecase -> use case.

kushshah1 · 2025-04-21T20:03:45Z

@trvachov thanks for sharing this for my review. It looks like the text-related issues in the bug were fixed by the LLM, but issues with charts (outlined in point 2 of the bug report) are still remaining. What is the plan to fix these?

(Also just a note that it looks like the "!!! note" rendering issue is still there on the GitHub preview of the file - see screenshot - but don't know if this will be fine on the docs?)

trvachov · 2025-04-25T15:45:31Z

/ok to test

copy-pr-bot · 2025-04-25T15:45:34Z

/ok to test

@trvachov, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

trvachov · 2025-04-25T16:00:09Z

/ok to test bfd1cee

codecov-commenter · 2025-04-25T17:06:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.39%. Comparing base (192e537) to head (bfd1cee).

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #826      +/-   ##
==========================================
- Coverage   84.40%   84.39%   -0.02%     
==========================================
  Files         138      138              
  Lines        8685     8685              
==========================================
- Hits         7331     7330       -1     
- Misses       1354     1355       +1

see 1 file with indirect coverage changes

Signed-off-by: Timur Rvachov <trvachov@nvidia.com>

trvachov · 2025-04-25T19:07:54Z

/ok to test

copy-pr-bot · 2025-04-25T19:07:58Z

/ok to test

@trvachov, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

pstjohn · 2025-04-25T19:22:04Z

/ok to test 38cd661

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

trvachov · 2025-04-25T20:30:46Z

/ok to test 8e79b71

### Description ### For the Geneformer documentation: 1. **Capitalization standardization**: - Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM" - Corrected spelling of "Crohn's disease" (previously "Chron's disease") - Fixed "children" (previously "chidlren") 2. **Formatting improvements**: - Properly formatted model version bullet points with nesting - Added proper headings for property categories - Fixed displayed values (e.g., ".5M" → "0.5M") - Standardized formatting of data collection/labeling methods sections 3. **Image captions**: - Replaced low-quality image captions with descriptive, properly formatted titles - Made chart descriptions more professional and consistent 4. **Grammatical improvements**: - Fixed article usage and punctuation - Improved sentence structure and clarity - Fixed section headings capitalization and consistency 5. **Fixed broken notes**: - Corrected `!! note` to `!!! note` for proper rendering ### For the ESM-2 pretraining documentation: 1. **Grammar and clarity improvements**: - Fixed article usage ("a ESM-2" → "an ESM-2") - Fixed formatting of numeric values (e.g., "1." → "1.0") - Fixed typos ("depreciation" → "deprecation") - Fixed "trainiing" → "training" 2. **Consistency in terminology**: - Standardized "BioNeMo" capitalization - Ensured consistent treatment of "ESM-2" references 3. **Structure and formatting**: - Improved spacing and paragraph breaks - Fixed section formatting and readability ### For the training-models documentation: 1. **Capitalization and consistency**: - Standardized capitalization of model sizes (8M, 650M, 3B) - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML" - Changed "WandB" to "Weights and Biases" consistently 2. **Formatting improvements**: - Changed code blocks consistently to include language tags - Added proper spacing and improved paragraph formatting - Fixed punctuation in lists and note sections 3. **Grammar and clarity**: - Added missing commas after introductory phrases - Fixed formatting of lists for better readability - Made bulleted explanations more consistent ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com>

### Description ### For the Geneformer documentation: 1. **Capitalization standardization**: - Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM" - Corrected spelling of "Crohn's disease" (previously "Chron's disease") - Fixed "children" (previously "chidlren") 2. **Formatting improvements**: - Properly formatted model version bullet points with nesting - Added proper headings for property categories - Fixed displayed values (e.g., ".5M" → "0.5M") - Standardized formatting of data collection/labeling methods sections 3. **Image captions**: - Replaced low-quality image captions with descriptive, properly formatted titles - Made chart descriptions more professional and consistent 4. **Grammatical improvements**: - Fixed article usage and punctuation - Improved sentence structure and clarity - Fixed section headings capitalization and consistency 5. **Fixed broken notes**: - Corrected `!! note` to `!!! note` for proper rendering ### For the ESM-2 pretraining documentation: 1. **Grammar and clarity improvements**: - Fixed article usage ("a ESM-2" → "an ESM-2") - Fixed formatting of numeric values (e.g., "1." → "1.0") - Fixed typos ("depreciation" → "deprecation") - Fixed "trainiing" → "training" 2. **Consistency in terminology**: - Standardized "BioNeMo" capitalization - Ensured consistent treatment of "ESM-2" references 3. **Structure and formatting**: - Improved spacing and paragraph breaks - Fixed section formatting and readability ### For the training-models documentation: 1. **Capitalization and consistency**: - Standardized capitalization of model sizes (8M, 650M, 3B) - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML" - Changed "WandB" to "Weights and Biases" consistently 2. **Formatting improvements**: - Changed code blocks consistently to include language tags - Added proper spacing and improved paragraph formatting - Fixed punctuation in lists and note sections 3. **Grammar and clarity**: - Added missing commas after introductory phrases - Fixed formatting of lists for better readability - Made bulleted explanations more consistent ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>

### Description ### For the Geneformer documentation: 1. **Capitalization standardization**: - Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM" - Corrected spelling of "Crohn's disease" (previously "Chron's disease") - Fixed "children" (previously "chidlren") 2. **Formatting improvements**: - Properly formatted model version bullet points with nesting - Added proper headings for property categories - Fixed displayed values (e.g., ".5M" → "0.5M") - Standardized formatting of data collection/labeling methods sections 3. **Image captions**: - Replaced low-quality image captions with descriptive, properly formatted titles - Made chart descriptions more professional and consistent 4. **Grammatical improvements**: - Fixed article usage and punctuation - Improved sentence structure and clarity - Fixed section headings capitalization and consistency 5. **Fixed broken notes**: - Corrected `!! note` to `!!! note` for proper rendering ### For the ESM-2 pretraining documentation: 1. **Grammar and clarity improvements**: - Fixed article usage ("a ESM-2" → "an ESM-2") - Fixed formatting of numeric values (e.g., "1." → "1.0") - Fixed typos ("depreciation" → "deprecation") - Fixed "trainiing" → "training" 2. **Consistency in terminology**: - Standardized "BioNeMo" capitalization - Ensured consistent treatment of "ESM-2" references 3. **Structure and formatting**: - Improved spacing and paragraph breaks - Fixed section formatting and readability ### For the training-models documentation: 1. **Capitalization and consistency**: - Standardized capitalization of model sizes (8M, 650M, 3B) - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML" - Changed "WandB" to "Weights and Biases" consistently 2. **Formatting improvements**: - Changed code blocks consistently to include language tags - Added proper spacing and improved paragraph formatting - Fixed punctuation in lists and note sections 3. **Grammar and clarity**: - Added missing commas after introductory phrases - Fixed formatting of lists for better readability - Made bulleted explanations more consistent ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Timur Rvachov <trvachov@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com> Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Ubuntu <camirr@nvidia.com>

trvachov requested review from dorotat-nv, jstjohn, jwilber, malcolmgreaves and pstjohn as code owners April 11, 2025 18:05

jwilber approved these changes Apr 15, 2025

View reviewed changes

trvachov enabled auto-merge April 18, 2025 17:30

pstjohn approved these changes Apr 18, 2025

View reviewed changes

trvachov force-pushed the trvachov/docs-fix branch from 4a1b8eb to f86b91a Compare April 25, 2025 15:35

trvachov force-pushed the trvachov/docs-fix branch from f86b91a to bfd1cee Compare April 25, 2025 15:54

trvachov added this pull request to the merge queue Apr 25, 2025

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Apr 25, 2025

Docs grammar/style cleanup

38cd661

Signed-off-by: Timur Rvachov <trvachov@nvidia.com>

trvachov force-pushed the trvachov/docs-fix branch from bfd1cee to 38cd661 Compare April 25, 2025 17:45

trvachov enabled auto-merge April 25, 2025 17:45

trvachov added the SKIP_CI label Apr 25, 2025

trvachov added this pull request to the merge queue Apr 25, 2025

lvojtku reviewed Apr 25, 2025

View reviewed changes

trvachov removed this pull request from the merge queue due to a manual request Apr 25, 2025

trvachov and others added 2 commits April 25, 2025 16:27

Update docs/docs/datasets/uniprot.md

b69c69a

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

Update docs/docs/datasets/CELLxGENE.md

d06447a

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

trvachov and others added 5 commits April 25, 2025 16:28

Update docs/docs/datasets/CELLxGENE.md

d34756e

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

Update docs/docs/datasets/uniprot.md

9a561c2

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

Update docs/docs/models/geneformer.md

01902d6

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

Update docs/docs/user-guide/examples/bionemo-esm2/pretrain.md

ecb155c

Co-authored-by: lvojtku <lvojtku@nvidia.com> Signed-off-by: Timur Rvachov <120140748+trvachov@users.noreply.github.com>

Merge branch 'main' into trvachov/docs-fix

8e79b71

trvachov enabled auto-merge April 25, 2025 20:30

trvachov added this pull request to the merge queue Apr 25, 2025

Merged via the queue into main with commit effc955 Apr 25, 2025
10 checks passed

trvachov deleted the trvachov/docs-fix branch April 25, 2025 21:56

Conversation

trvachov commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

For the Geneformer documentation:

For the ESM-2 pretraining documentation:

For the training-models documentation:

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Usage

Pre-submit Checklist

Uh oh!

copy-pr-bot Bot commented Apr 11, 2025

Uh oh!

jwilber left a comment

Choose a reason for hiding this comment

Uh oh!

kushshah1 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trvachov commented Apr 25, 2025

Uh oh!

copy-pr-bot Bot commented Apr 25, 2025

Uh oh!

trvachov commented Apr 25, 2025

Uh oh!

codecov-commenter commented Apr 25, 2025

Codecov Report

Uh oh!

trvachov commented Apr 25, 2025

Uh oh!

copy-pr-bot Bot commented Apr 25, 2025

Uh oh!

pstjohn commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trvachov commented Apr 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

trvachov commented Apr 11, 2025 •

edited

Loading

kushshah1 commented Apr 21, 2025 •

edited

Loading