-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: replace HSDS with new simdata-api service #4791
Conversation
|
GitGuardian id | Secret | Commit | Filename | |
---|---|---|---|---|
9207574 | Generic CLI Secret | 435953d | apps/dispatch-service/src/app/services/sbatch/expected_sbatch.ts | View secret |
9207574 | Generic CLI Secret | 435953d | apps/dispatch-service/src/app/services/sbatch/expected_sbatch.ts | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
☁️ Nx Cloud ReportCI is running/has finished running commands for commit bbb29e0. As they complete they will appear below. Click to see the status, the terminal output, and the build insights. 📂 See all runs for this CI Pipeline Execution ✅ Successfully ran 9 targetsSent with 💌 from NxCloud. |
aws_access_key_id=settings.storage_access_key_id, | ||
aws_secret_access_key=settings.storage_secret) as s3_client: | ||
obj = await s3_client.get_object(Bucket=bucket_name, Key=s3_path) | ||
async with aiofiles.open(file_path, mode='wb') as f: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
This path depends on a
user-provided value
if kvstore_driver == 'file': | ||
test_file_path = ROOT_DIR / test_file_path | ||
|
||
test_array = np.random.rand(100, 200) |
Check notice
Code scanning / SonarCloud
numpy.random.Generator should be preferred to numpy.random.RandomState Low
406eca8
to
c274966
Compare
🎉 This PR is included in version 9.55.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
What new features does this PR implement?
The Problem
The use of the HSDS (Highly Scalable Data Service) for managing HDF5 data was found to be complicated and unreliable. In its current configuration, HPC simulation jobs push HDF5 data to Google Cloud two ways: (1) upload a reports.h5 file to the S3 storage, and (2) also use a fine-grained protocol to push each group, dataset, and attribute of an HDF5 file to the HSDS service. Under any load, the fine-grained upload (2) often fails (which fails simulation jobs) and damages the state of the HSDS cluster.
Considerable effort has gone into improving the HSDS configuration, but without success.
Approach
Since the HDF5 files are reliably uploaded, a new simple service which serves HDF5 data from S3 storage would be sufficiently performant. A new python-basesd service,
simdata-api
usesh5py
,aiobotocore
andTensorStore
and is served withFastAPI
to provide access to HDF5 metadata and dataset arrays. The generated typescript-nestjs API client provides a simple interface to access to this service. Upon first request, the service retrieves the reports.h5 document from S3, extracts metadata and array data and stores it in azarr version 3
store in the same S3 bucket as the hdf5 file. The metadata is also stored as a single JSON file in S3 for fast retrieval.What bugs does this PR fix?
frequent failures in simulation jobs, especially when performed in batches.
How have you tested this PR?
The simdata-api python code and the some typescript metadata utilities have been unit tested.
Local integration testing with live services will follow.