Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out better API key management #181

Closed
dsweber2 opened this issue Sep 19, 2023 · 20 comments · Fixed by #217
Closed

Figure out better API key management #181

dsweber2 opened this issue Sep 19, 2023 · 20 comments · Fixed by #217
Assignees
Labels
P2 low priority

Comments

@dsweber2
Copy link
Contributor

See the tooling notes for Sept 19th

@dsweber2 dsweber2 added the P2 low priority label Sep 19, 2023
@dshemetov
Copy link
Contributor

dshemetov commented Sep 22, 2023

So here is my thinking for what to do after this week's discussions:

  1. move away from R options; switch set_api_key() to use env vars now and remove the options check. R options aren't great for long-term safety (since they don't get read into by something like .Renviron) and they aren't very user friendly anyway.
  2. make sure we point users to usethis::edit_r_environ() for a decent persistence solution
  3. write a vignette that educates users about API keys, environment variables, and .Renviron
  4. research other persistence libraries (keyrings, gitcreds, etc.) to see if there's something that can support the ideal workflow

Hold on, what problem are we solving again? Well, the ideal workflow for API keys would be this: set_api_key() prompts the user for an API key (sets the env var under the hood) and then prompts again with "do you want to store this API key for future sessions?" and then stores the key somewhere. All future session will need to make sure to read from somewhere on startup. The user never has to think about this again, but power users can be reminded of the location, if they want to change it later.

The difficulty with the ideal workflow is that this somewhere depends on the user's platform and a few other considerations:

  1. should it be in .Renviron in the local project folder?
  2. should it be in .Renviron in the user's home directory?
  3. should it be in XDG_CONFIG_HOME? (e.g. ~/.config/ on Linux, ~/Library/Preferences on Mac, ??? on Windows)

The drawback with (1) is that the user will need to set their key every time they start a new project and also increases the chance the user will leak their secret by committing .Renviron to the repo. The drawback with (2) and (3) is that we don't know how to find a user's home directory or their config directory in a cross-platform way (for (3), tools::R_user_dir() can find it, but it's only supported by R >= 4.0). The additional drawback with (2) is that if the user already has .Renviron in their local project (where they set variables for some other package), then the global one won't be read, their keys won't persist, and that will be confusing to the user.

@capnrefsmmat recently pointed me to fs::path_expand, which might make (2) or (3) viable.

@capnrefsmmat
Copy link
Contributor

Question: Is there a situation, maybe in a production environment, where one user would need multiple API keys?

@dshemetov
Copy link
Contributor

Oh man, I hope not. On the plus side, from seeing how AWS and Github handle their keys (i.e., also by reading from system env vars by default, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and GITHUB_PAT), doesn't seem like they're concerned about that situation?

@capnrefsmmat
Copy link
Contributor

It's pretty easy to switch out environment variables; if you have some automation system that runs your data pipeline, you can easily set custom environment variables in the shell scripts or commands that it runs to do the various tasks.

I'm asking because the answer might change how we store the keys. Environment variables are a good option exactly because they're so easy to replace or override in specific circumstances, if that will be necessary in a production environment.

@dsweber2
Copy link
Contributor Author

The only thing that might prompt different keys would be different private endpoint access, but in that case I think we'd just make a dedicated key instead.

@dshemetov
Copy link
Contributor

Right, if we ever run integration tests with this client with privileged and unprivileged keys, env vars would be easy to swap out.

@dshemetov dshemetov self-assigned this Sep 27, 2023
@dshemetov dshemetov added this to the 1.2 release milestone Sep 29, 2023
@dshemetov
Copy link
Contributor

dshemetov commented Sep 30, 2023

Alright, I went looking for other R developers with the same problem and I think I found a persistence solution. This email from 2018 by Gábor Csárdi points to rappdirs, which was made exactly for our purpose and requires only R (>= 3.2).

I'm thinking the workflow could now be something like:

  1. if interactive, epidatr::set_api_key(SECRET_KEY) will set the environment variable and then ask the user if they want to persist their key, with all the appropriate stern prodding that they probably should
  2. if the user types "yes", then we write the key to rappdirs::user_config_dir("epidatr") and we print that location information
  3. upon loading epidatr, we automatically read the key from rappdirs::user_config_dir("epidatr"), load it to the env var, and we print the path of the file we read from and the env var we read into

This covers all the desirable pieces I can think of, let me know if I'm missing some:

  • platform-independent persistence
  • no file-system management for most users
  • clarity about where the keys are persisted on the file system, so power users can modify/delete them
  • clarity about where the keys are read from in each session (the env var), so power users can modify/delete them
  • keys kept away from repo, so no chance of user committing it
  • implicit instruction about secrets best practices

@dshemetov
Copy link
Contributor

Oh and we can also think about suggesting users use rappdirs::user_cache_dir("epidatr") for the cache.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 2, 2023

@brookslogan points out that it's probably better to use rappdirs::user_config_dir("R", version = "epidatr") to organize R configs and for consistency with R tools (>= 4.0).

r$> rappdirs::user_config_dir("R", version = "epidatr")
[1] "~/.config/R/epidatr"
r$> tools::R_user_dir("epidatr", "config")
[1] "/home/forecaster/.config/R/epidatr"

@brookslogan
Copy link
Contributor

brookslogan commented Oct 3, 2023

Another note on CRAN policies' requirement/suggestion:

packages may store user-specific data, configuration and cache files in their respective user directories obtained from tools::R_user_dir()

According to this scheme, is sounds like for cache files they'd want us to use

tools::R_user_dir("epidatr", "cache")

which is /home//.cache/R/epidatr on my Fedora system.

(Anyway, whatever location is chosen it sounds better if the API key is not in the cache directory, which using config/"config" above already achieves. That's the main thing I just thought to check on. But it would be nice actually to have cache files in .cache so if I run low on space I can quickly clear out caches from multiple programs easily.)

@dshemetov
Copy link
Contributor

Agreed. So to reiterate, I think we should:

  • suggest rappdirs::user_cache_dir("R", version = "epidatr") by default for the cache folder, unless the user specifies otherwise
  • use rappdirs::user_config_dir("R", version = "epidatr") for the API key storage (with confirmation), unless the user specifies otherwise

@capnrefsmmat
Copy link
Contributor

Just peeking through CRAN to see what people do with API keys. Here are a few examples:

I can't find anyone who writes their own separate API key file.

I prefer the environment variable approach because it's very easy to temporarily replace an environment variable, e.g. for testing or because specific code needs access to different data.

Is the motivation for using rappdirs::user_config_dir() purely that we think using usethis::edit_r_environ() (even automatically, through some kind of function) will still be too complicated for some of our users, because they manually have to edit and save the file?

@dshemetov
Copy link
Contributor

dshemetov commented Oct 20, 2023

Thanks for doing that research into other packages. My main example was AWS CLI which has an ~/.aws/credentials file and I took that to be authoritative. I suppose it's better to compare with other R packages, but I think I'm just not a fan of .Renviron.

Here is another issue with it, other than the one you mention (where we have to teach the user how to format the environ key line): if someone already has an .Renviron in their project directory, it doesn't combine with env vars from the home directory .Renviron so them using usethis::edit_r_environ() will confusingly not load the new env var when they restart.

I want to clarify that my suggestion to use rappdirs::user_config_dir() is just for key persistence. When that file is read by epidatr, I agree that we should just load that value into DELPHI_EPIDATA_API_KEY env var or something like that, for the reasons you mention. And if the env var is already set on startup, then we just use that one, instead of reading the file.

@capnrefsmmat
Copy link
Contributor

I think AWS does that because it has a complex configuration with a bunch of variables in different groups, and cramming that all into environment variables would be hard.

Could we check for the presence of a project-specific .Renviron before calling usethis::edit_r_environ()? It does take a scope = "project" argument, so if we know the project already has an .Renviron, we could open that for editing. We could look in the project directory to see if there's an .Renviron, for instance.

@dshemetov
Copy link
Contributor

Makes sense about AWS.

And that's a good idea RE: dynamically adjusting Renviron scope. I just tested file.exists(usethis::proj_path(".Renviron")) and it seems to be a good way to check if a project-level Renviron exists.

So I think that removes my objections to Renviron. @brookslogan was it your ESS setup that didn't play well with usethis::edit_r_environ() or am I misremembering? Any other objections to just using Renviron for key persistence after all?

@brookslogan
Copy link
Contributor

brookslogan commented Oct 24, 2023

Issue skirted now:

  • scope = "project" will generate an error if it doesn't seem like you are in a project. If we do this dynamic thing, then maybe that's not an issue.

Issues I thought there were but are resolved or partially resolved or never existed:

  • I thought it didn't create new .Renvirons in the right place or maybe didn't read .Renvirons right if you were in a subdirectory of the project. Doesn't seem to be an issue now.
  • ESS at some point would always launch shells in the directory containing the script you were starting them from. Now it will launch it at roots of git repos; I'm not sure if it's aware of RStudio project metadata files though.

Remaining issue:

  • Uses some very-surprising project-location caching mechanism, maybe from here. So if you are in one project directory and do something that triggers that mechanism and then setwd to outside of the project / to another project and do usethis::edit_r_environ(scope = "project"), it will edit the first project's .Renviron.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 24, 2023

Couldn't a user that setwd's to another project to set keys just... set their key in the first project and be fine? If they open the latter project later and it doesn't have a key, epidatr will tell them.

@brookslogan
Copy link
Contributor

Sure, but they might think "I already set that for this project, why are you saying I didn't?" Doesn't sound the worst thing to have to set it twice and project switching isn't that common, but somehow I thought I remembered here caching frustrations being a recurring frustration in the past, so I don't know if I'm just missing whatever the workflow was that was frustrating or if it doesn't apply to API key setting.

@brookslogan
Copy link
Contributor

For "set their key in the first project and be fine", that'd be true only if they launched within that project every time, or set it for the second project as well. I guess that's sort of closer to how .Renvirons work anyway, which might be good. (Closer, but not exact... .Renvirons are read at a different time than here location is cached; if there is a setwd between it could be a mismatch.)

@dshemetov
Copy link
Contributor

On the general point of setwd-ing outside of a project and using here having unexpected behavior. That's an issue from a bad interaction upstream from us, so I wouldn't worry about it.

And as to users changing their launch directory for projects... YAGNI, let's wait for an actual user to have this complaint. Worst case scenario is they set their key twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 low priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants