Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace HSDS with new simdata-api service #4791

Merged
merged 46 commits into from
Jan 26, 2024
Merged

Conversation

jcschaff
Copy link
Collaborator

@jcschaff jcschaff commented Jan 13, 2024

What new features does this PR implement?

The Problem

The use of the HSDS (Highly Scalable Data Service) for managing HDF5 data was found to be complicated and unreliable. In its current configuration, HPC simulation jobs push HDF5 data to Google Cloud two ways: (1) upload a reports.h5 file to the S3 storage, and (2) also use a fine-grained protocol to push each group, dataset, and attribute of an HDF5 file to the HSDS service. Under any load, the fine-grained upload (2) often fails (which fails simulation jobs) and damages the state of the HSDS cluster.

Considerable effort has gone into improving the HSDS configuration, but without success.

Approach

Since the HDF5 files are reliably uploaded, a new simple service which serves HDF5 data from S3 storage would be sufficiently performant. A new python-basesd service, simdata-api uses h5py, aiobotocore and TensorStore and is served with FastAPI to provide access to HDF5 metadata and dataset arrays. The generated typescript-nestjs API client provides a simple interface to access to this service. Upon first request, the service retrieves the reports.h5 document from S3, extracts metadata and array data and stores it in a zarr version 3 store in the same S3 bucket as the hdf5 file. The metadata is also stored as a single JSON file in S3 for fast retrieval.

What bugs does this PR fix?
frequent failures in simulation jobs, especially when performed in batches.

How have you tested this PR?
The simdata-api python code and the some typescript metadata utilities have been unit tested.
Local integration testing with live services will follow.

@jcschaff jcschaff self-assigned this Jan 13, 2024
@probot-autolabeler probot-autolabeler bot added BioSimulations-API build CI/CD Issues with continuous integration and deployment docker Pull requests that update Docker code simulation-service labels Jan 13, 2024
Copy link

gitguardian bot commented Jan 13, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id Secret Commit Filename
9207574 Generic CLI Secret 435953d apps/dispatch-service/src/app/services/sbatch/expected_sbatch.ts View secret
9207574 Generic CLI Secret 435953d apps/dispatch-service/src/app/services/sbatch/expected_sbatch.ts View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

apps/simdata-api/simdata_api/main.py Fixed Show fixed Hide fixed
apps/simdata-api/simdata_api/main.py Fixed Show fixed Hide fixed
apps/simdata-api/simdata_api/main.py Fixed Show fixed Hide fixed
apps/simdata-api/simdata_api/main.py Fixed Show fixed Hide fixed
Copy link

nx-cloud bot commented Jan 13, 2024

☁️ Nx Cloud Report

CI is running/has finished running commands for commit bbb29e0. As they complete they will appear below. Click to see the status, the terminal output, and the build insights.

📂 See all runs for this CI Pipeline Execution


✅ Successfully ran 9 targets

Sent with 💌 from NxCloud.

@probot-autolabeler probot-autolabeler bot added the testing Issues relating to tests label Jan 15, 2024
aws_access_key_id=settings.storage_access_key_id,
aws_secret_access_key=settings.storage_secret) as s3_client:
obj = await s3_client.get_object(Bucket=bucket_name, Key=s3_path)
async with aiofiles.open(file_path, mode='wb') as f:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
if kvstore_driver == 'file':
test_file_path = ROOT_DIR / test_file_path

test_array = np.random.rand(100, 200)

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud
Copy link

sonarcloud bot commented Jan 26, 2024

Quality Gate Failed Quality Gate failed

Failed conditions

1 Security Hotspot

See analysis details on SonarCloud

@jcschaff jcschaff marked this pull request as ready for review January 26, 2024 05:23
@jcschaff jcschaff merged commit e7d1272 into dev Jan 26, 2024
24 of 28 checks passed
@jcschaff jcschaff deleted the combine-hdf5-api branch January 26, 2024 05:25
@biosimulations-daemon
Copy link
Collaborator

🎉 This PR is included in version 9.55.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BioSimulations-API build CI/CD Issues with continuous integration and deployment docker Pull requests that update Docker code released simulation-service testing Issues relating to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants