Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument processing command line utility #266

Merged

Conversation

bourque
Copy link
Collaborator

@bourque bourque commented Oct 17, 2023

Change Summary

Overview

This PR adds a script to be used on the command line to process a specific instrument and data level. Currently the methods within just print a message, but eventually these should contain code to perform any necessary steps to run the processing.

Relevant issue: #256

New Files

  • imap_processing/run_processing.py
    • The script that can be run on the command line by passing in parameters for instrument and data level

Updated Files

  • imap_processing/__init__.py
    • Added list of instruments and list of processing_levels, used to validate the passed arguments

@bourque
Copy link
Collaborator Author

bourque commented Oct 17, 2023

@tech3371 @greglucas @laspsandoval Could you provide an initial review on this? I just want to make sure that I am on the right track here, and if I have designed this appropriately. Thanks!

imap_processing/__init__.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I added a few ideas to consider, but overall I think this is the right direction!

imap_processing/run_processing.py Outdated Show resolved Hide resolved
imap_processing/run_processing.py Outdated Show resolved Hide resolved
imap_processing/run_processing.py Show resolved Hide resolved
imap_processing/run_processing.py Outdated Show resolved Hide resolved
Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks like you are heading in the right direction.

imap_processing/run_processing.py Outdated Show resolved Hide resolved
@bourque bourque marked this pull request as draft October 17, 2023 22:08
Copy link
Contributor

@laspsandoval laspsandoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Just update the arguments.

f"The data level to process. Acceptable values are: {processing_levels}"
)

parser = argparse.ArgumentParser(description=description)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. An example of the command that I was planning on using for the batch job will be:

['CodiceHi', {'l2': ['2023-10-31', '2023-06-12'], 'l3': ['2023-06-01']}, 2]

where 2 is the version number.

And then I was planning on passing in the following environment:

            environment={
                "OUTPUT_PATH": data_bucket.bucket_name,
                "SECRET_NAME": db_secret_name,
            },

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laspsandoval , I don't think imap_processing will want the data bucket information or the DB secret because this is external to anything in our infrastructure. I think this is supposed to be going through our APIs to get the files instead.

I think the way this is written currently you'd need to kick off the container by overriding the ENTRYPOINT to be something like: imap-processing --instrument codice-hi --level 2 --start-date XYZ --end-date XYZ with command line arguments instead of a list/dict. Is that doable on the infrastructure side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greglucas as I understand it the algorithms will still need to query a database table, for example the universal spin table or a calibration table.

I was thinking that we had to append the s3 bucket information to the lambda url in order to use the upload/download api's. But I could be mistaken. Is that not correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greglucas as I understand it the algorithms will still need to query a database table, for example the universal spin table or a calibration table.

I think we should avoid leaking anything database related between the processing and infrastructure. Can we make new endpoints for this instead?
/calibration_lookup, /spin_table_lookup, ... whatever else we need

I was thinking that we had to append the s3 bucket information to the lambda url in order to use the upload/download api's. But I could be mistaken. Is that not correct?

I think the Lambda should take care of this for us? So a user should only need to know the URL, but the details of where we put that in the backend shouldn't matter to them.

dev.imap-processing.com/upload should map to a different bucket than prod.imap-processing.com/upload, but a user shouldn't need to update both the url and bucket name. I'm hoping we can handle that for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greglucas we could do as you suggested for the ENTRYPOINT, but to avoid duplicates I think we should do

imap-processing --instrument codice-hi --level 2 --dates [list of dates]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about this at your processing trigger discussion meeting. If I am reprocessing an entire mission, do I want to put 5 x 365 dates into the command line (there may even be a limit for how long the input terminal can be), versus a start / end date? Could the processing code handle the duplicates? Yet another option is to only allow one date in. So we'd spin up 5 x 365 separate processing jobs all with one date input...

@bourque bourque marked this pull request as ready for review December 13, 2023 16:32
@bourque bourque changed the title [WIP] Instrument processing command line utility Instrument processing command line utility Dec 13, 2023
@bourque bourque merged commit b577fd6 into IMAP-Science-Operations-Center:dev Dec 13, 2023
14 checks passed
@bourque bourque deleted the command-line-utility branch December 13, 2023 17:01
laspsandoval pushed a commit to laspsandoval/imap_processing that referenced this pull request Apr 2, 2024
…mmand-line-utility

Instrument processing command line utility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants