Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support processing of PDS products with fz file format in Nucleus #74

Closed
2 tasks done
ramesh-maddegoda opened this issue Oct 25, 2023 · 9 comments
Closed
2 tasks done

Comments

@ramesh-maddegoda
Copy link
Contributor

ramesh-maddegoda commented Oct 25, 2023

Some of the PDS products processed by Nucleus has compressed version of the data (fitz), and not the uncompressed (fits) in order to save space for archiving the data. Also, in the product label XML, it is referring to the .fits format while the actual file shared with the product label are in .fitz format. This created challenges to determine the reception of a complete product in staging S3 bucket, when the files are uploaded to the staging S3 bucket to be processed by Nucleus.

It is required to,

  • Modify the product completion detection algorithm implementation to consider this special case
  • Implement a Nucleus component to extract fitz file format to fits file format

References:

@ramesh-maddegoda
Copy link
Contributor Author

It was required build a Amazon Linux compatible version of fpack/funpack from the source code

Download the latest 4.3.0 version of CFITSIO
See what's new
Latest V4.3.0 source code package - Compile It Yourself
Unix .tar file cfitsio-4.3.0.tar.gz - see the README file for instructions. The .tar file places unpacks files into a directory named 'cfitsio-4.3.0'.
(The latest version is always available from this link: cfitsio_latest.tar.gz).

The zlib was missing in Amazon Linux EC2 instance that was used to build this. There for it was required to build zlib also.

configure
make -j4
make install

@jordanpadams
Copy link
Member

Status: Ramesh working on architecture and implementation in Nucleus

@tloubrieu-jpl
Copy link
Member

validate used by Ramesh does not validate the CSS products. Ramesh will upgrade.

ramesh-maddegoda added a commit that referenced this issue Nov 16, 2023
ADD PDS Basic Registry Load Use Case DAG with the ability to read CloudWatch logs in Airflow UI
UPDATE existing lambda to directly trigger Nucleus workflows, process PDS product labels as batches in Nucleus and code quality improvements

Refer to the following tasks:

- Display the logs of ECS Tasks in Airflow UI #77
- Prepare product labels as batches to be processed by Nucleus #75
- Support processing of PDS products with fitz file format in Nucleus #74
@ramesh-maddegoda
Copy link
Contributor Author

The pull request is available at #78

@ramesh-maddegoda
Copy link
Contributor Author

The lambda based implementation made for this was working, but it was expensive due to the fact it consumes a large amount time in lambda to download and extract the .fz file. It was decided to come up with a more cost effective option download and extract .fz files.

@ramesh-maddegoda ramesh-maddegoda changed the title Support processing of PDS products with fitz file format in Nucleus Support processing of PDS products with fz file format in Nucleus Nov 29, 2023
@tloubrieu-jpl
Copy link
Member

@ramesh-maddegoda works on the new more cost-effective design with datasync.

@tloubrieu-jpl
Copy link
Member

90% of the development is done.

@jordanpadams
Copy link
Member

jordanpadams commented Dec 12, 2023

Status: @ramesh-maddegoda working through some issues with duplicate data in database. Lambda is working as expected. Able to trigger Nucleus workflow. Harvest is failing because our OpenSearch connection is failing for some reason. TBD reason. PR to be submitted before vacation.

@jordanpadams
Copy link
Member

Per #80 going to close this out with initial implementation completed. If additional work is required, we will open more tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
Development

No branches or pull requests

3 participants