Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove --dry-run CLI option and references #326

Merged
merged 7 commits into from
Jun 12, 2024

Conversation

samleonard
Copy link
Contributor

@samleonard samleonard commented Apr 21, 2024

Overview

Closes #208.

What problem does this address?

The --dry-run option does not adequately simulate running the archiver, so it's better to rely on the --auto-publish option.

What did you change in this PR?

I removed all references to the --dry-run option.

Testing

How did you make sure this worked? How can a reviewer verify this?

To find all references to the flag I searched the repository for "dry".

I was not able to run any tests because the dependencies currently have a conflict. This repo stipulates "ruff>=0.3,<0.4" but the pudl repository recently updated ruff to 4.1.

Let me know if there are any testing steps I should take!

To-do list

Tasks

Edit tasklist title
Beta Give feedback Tasklist Tasks, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. add other TODO items here if necessary! questions that need to answered, decisions that need to be made, tests that need to be run, etc.
    Options
  2. Update relevant documentation - like comments, docstrings, README, release notes, etc.
    Options
  3. Review the PR yourself and call out any questions or issues you have
    Options

@jdangerx
Copy link
Member

Thanks, this does look like it should do the trick! Let's fix the ruff thing too - I think we should be totally fine to update ruff to >=0.4, <0.5 here. Then I think you should be able to install the environment and actually try things out!

You'll also need API tokens to test the Zenodo interactions. You should be able to create your own account on the Zenodo sandbox and get your API tokens that way - let me know if you run into hiccups there!

@samleonard
Copy link
Contributor Author

@jdangerx

I got my environment set up by adding running the following

conda env create -f environment.yml
conda init zsh # I needed this since I am new to conda
source ~/.zshrc # reload shell for conda init to take effect
conda activate pudl-cataloger
pip install .

Weirdly I then had to manually copy the zenodo_doi.yml file into /.../miniforge/base/envs/pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/utils.py. Not sure what I did wrong.

For my api keys I created an account on the zenodo sandbox, went to Settings -> Applications -> Personal access tokens and created a token with publish and upload access, then exported that token value as ZENODO_SANDBOX_TOKEN_UPLOAD and ZENODO_SANDBOX_TOKEN_PUBLISH in my shell environment.

Running pudl_archiver --sandbox --datasets eia176 --dry-run prints the usage message, as expected.

Running the same command without --dry-run encounters the following error:

2024-05-01 15:06:48 [    INFO] catalystcoop.pudl_archiver.utils:57 Error while executing <coroutine object ZenodoAPIClient._make_requester.<locals>.requester.<locals>.run_request at 0x17e40fe20> (try #6, retry in 64s): <class 'pudl_archiver.depositors.zenodo.depositor.ZenodoClientError'> - ZenodoClientError(status=403, message=Permission denied., errors=None)

Not sure what I did wrong to cause that either.

I'm new to basically all of these tools so apologies if the solutions are obvious!

@katie-lamb
Copy link
Member

Hey @samleonard ! Thanks for the PR. @jdangerx meet Sam; thanks for the review! Dazhong, Sam is my friend from our youth climbing team.

Seems like the Zenodo API issue is the only roadblock here. I'm not very familiar with the archiver, but here are some guesses, maybe you have thoughts Dazhong:

  • does testing this require creating and initializing a new Zenodo deposition? Does that have something to do with manually copying zenodo_doi.yml?
  • could the issue be from not importing the Zenodo Postman collection (see here)?

As a side note, we just got a grant to clarify all of the PUDL docs and this archiver repo readme is ripe for a rewrite. (cc @aesharpe )

@jdangerx
Copy link
Member

jdangerx commented Jun 7, 2024

Hi @samleonard ! Sorry for dropping the ball on this, and thanks @katie-lamb for bumping.

Weirdly I then had to manually copy the zenodo_doi.yml file into /.../miniforge/base/envs/pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/utils.py. Not sure what I did wrong.

That is weird! Do you mean that you had to put the text of the YAML into the utils.py somehow? What happens if you don't do that?


I think you're running into that 403 because the Zenodo record you're trying to update belongs to our sandbox Zenodo account, not yours. The mapping from dataset name (eia176) to Zenodo record is in that zenodo_doi.yml file - you can pull the record ID out of the YAML pretty easily:

eia176:
  production_doi: 10.5281/zenodo.7682357
  sandbox_doi: 10.5072/zenodo.3158

Corresponds to https://sandbox.zenodo.org/records/3158. If you follow that link you'll also see that the record belongs to Catalyst Cooperative. There's also some funny "redirect to latest version" stuff going on, check out their versioning docs for details.

In any case, I'm curious what happens if you run with --initialize also. That should, in theory:

  • create a new record using your API key
  • update it with files
  • update the zenodo_doi.yml to map that dataset name to your version of the record

I'm suspicious that that last step will run into issues since you had to do something funny with the zenodo_doi.yml to get things to work at all, so we'll see what happens there.

Don't worry about making a bunch of spurious records on the Zenodo sandbox, that's literally what it's for and they'll wipe it if things get too bloated.

Happy to get on a call if you want to talk through this more!

@e-belfer e-belfer self-requested a review June 12, 2024 13:44
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this issue, and for bearing with us and our out of date docs / complex testing set up for external contributors. I was able to test this locally and it worked as expected, so I'll go ahead and merge it into main!

@e-belfer e-belfer merged commit 3de6699 into catalyst-cooperative:main Jun 12, 2024
1 check passed
@samleonard samleonard deleted the remove-dry-run branch June 13, 2024 00:44
@samleonard
Copy link
Contributor Author

@e-belfer @jdangerx Sorry for the delay I've had a busy week! And thanks for testing and merging!

I just tried the setup again from scratch to double check that I get the same issue with the zenodo_doi.yml file, and I did. I've included the error traceback at the bottom of this comment. The solution I used was to cp the file to the location where the code was expecting it.

Happy to report that running with --initialize did fix the other issue!

I'm also happy to jump on a call if it would be helpful for demonstrating the problem, or improving the docs, or just for fun.

(test-pudl-cataloger) sleonard pudl-archiver % pudl_archiver --sandbox --datasets eia176
2024-06-12 18:59:41 [    INFO] catalystcoop.pudl_archiver.archivers.classes:122 Archiving eia176
Encountered exceptions, showing traceback for last one: ["('eia176', FileNotFoundError(2, 'No such file or directory'))"]
Traceback (most recent call last):
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/bin/pudl_archiver", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/cli.py", line 97, in main
    asyncio.run(archiver_entry())
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/cli.py", line 92, in archiver_entry
    await archive_datasets(datasets=datasets, run_settings=RunSettings(**vars(args)))
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/__init__.py", line 109, in archive_datasets
    raise exceptions[-1][1]
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/orchestrator.py", line 24, in orchestrate_run
    draft, original_datapackage = await get_deposition(dataset, session, run_settings)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/depositors/__init__.py", line 64, in get_deposition
    deposition = await api_client.get_deposition(dataset)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/depositors/zenodo/depositor.py", line 445, in get_deposition
    concept_doi = self.doi(dataset_id)
                  ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/depositors/zenodo/depositor.py", line 395, in doi
    dataset_settings = self.dataset_settings.get(dataset_id, DatasetSettings())
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/depositors/zenodo/depositor.py", line 386, in dataset_settings
    with Path.open(self._dataset_settings_path) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/pathlib.py", line 1013, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/Caskroom/miniforge/base/envs/test-pudl-cataloger/lib/python3.12/site-packages/pudl_archiver/package_data/zenodo_doi.yaml'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Remove --dry-run option in favor of --auto-publish
5 participants