-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument processing command line utility #266
Instrument processing command line utility #266
Conversation
@tech3371 @greglucas @laspsandoval Could you provide an initial review on this? I just want to make sure that I am on the right track here, and if I have designed this appropriately. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I added a few ideas to consider, but overall I think this is the right direction!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks like you are heading in the right direction.
…guments; updated help descriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Just update the arguments.
f"The data level to process. Acceptable values are: {processing_levels}" | ||
) | ||
|
||
parser = argparse.ArgumentParser(description=description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. An example of the command that I was planning on using for the batch job will be:
['CodiceHi', {'l2': ['2023-10-31', '2023-06-12'], 'l3': ['2023-06-01']}, 2]
where 2 is the version number.
And then I was planning on passing in the following environment:
environment={
"OUTPUT_PATH": data_bucket.bucket_name,
"SECRET_NAME": db_secret_name,
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laspsandoval , I don't think imap_processing will want the data bucket information or the DB secret because this is external to anything in our infrastructure. I think this is supposed to be going through our APIs to get the files instead.
I think the way this is written currently you'd need to kick off the container by overriding the ENTRYPOINT to be something like: imap-processing --instrument codice-hi --level 2 --start-date XYZ --end-date XYZ
with command line arguments instead of a list/dict. Is that doable on the infrastructure side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greglucas as I understand it the algorithms will still need to query a database table, for example the universal spin table or a calibration table.
I was thinking that we had to append the s3 bucket information to the lambda url in order to use the upload/download api's. But I could be mistaken. Is that not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greglucas as I understand it the algorithms will still need to query a database table, for example the universal spin table or a calibration table.
I think we should avoid leaking anything database related between the processing and infrastructure. Can we make new endpoints for this instead?
/calibration_lookup
, /spin_table_lookup
, ... whatever else we need
I was thinking that we had to append the s3 bucket information to the lambda url in order to use the upload/download api's. But I could be mistaken. Is that not correct?
I think the Lambda should take care of this for us? So a user should only need to know the URL, but the details of where we put that in the backend shouldn't matter to them.
dev.imap-processing.com/upload
should map to a different bucket than prod.imap-processing.com/upload
, but a user shouldn't need to update both the url and bucket name. I'm hoping we can handle that for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greglucas we could do as you suggested for the ENTRYPOINT, but to avoid duplicates I think we should do
imap-processing --instrument codice-hi --level 2 --dates [list of dates]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should talk about this at your processing trigger discussion meeting. If I am reprocessing an entire mission, do I want to put 5 x 365 dates into the command line (there may even be a limit for how long the input terminal can be), versus a start / end date? Could the processing code handle the duplicates? Yet another option is to only allow one date in. So we'd spin up 5 x 365 separate processing jobs all with one date input...
…mmand-line-utility Instrument processing command line utility
Change Summary
Overview
This PR adds a script to be used on the command line to process a specific instrument and data level. Currently the methods within just print a message, but eventually these should contain code to perform any necessary steps to run the processing.
Relevant issue: #256
New Files
imap_processing/run_processing.py
instrument
and datalevel
Updated Files
imap_processing/__init__.py
instruments
and list ofprocessing_levels
, used to validate the passed arguments