Figure out better API key management #181

dsweber2 · 2023-09-19T19:38:01Z

dshemetov · 2023-09-22T01:53:12Z

So here is my thinking for what to do after this week's discussions:

move away from R options; switch set_api_key() to use env vars now and remove the options check. R options aren't great for long-term safety (since they don't get read into by something like .Renviron) and they aren't very user friendly anyway.
make sure we point users to usethis::edit_r_environ() for a decent persistence solution
write a vignette that educates users about API keys, environment variables, and .Renviron
research other persistence libraries (keyrings, gitcreds, etc.) to see if there's something that can support the ideal workflow

Hold on, what problem are we solving again? Well, the ideal workflow for API keys would be this: set_api_key() prompts the user for an API key (sets the env var under the hood) and then prompts again with "do you want to store this API key for future sessions?" and then stores the key somewhere. All future session will need to make sure to read from somewhere on startup. The user never has to think about this again, but power users can be reminded of the location, if they want to change it later.

The difficulty with the ideal workflow is that this somewhere depends on the user's platform and a few other considerations:

should it be in .Renviron in the local project folder?
should it be in .Renviron in the user's home directory?
should it be in XDG_CONFIG_HOME? (e.g. ~/.config/ on Linux, ~/Library/Preferences on Mac, ??? on Windows)

The drawback with (1) is that the user will need to set their key every time they start a new project and also increases the chance the user will leak their secret by committing .Renviron to the repo. The drawback with (2) and (3) is that we don't know how to find a user's home directory or their config directory in a cross-platform way (for (3), tools::R_user_dir() can find it, but it's only supported by R >= 4.0). The additional drawback with (2) is that if the user already has .Renviron in their local project (where they set variables for some other package), then the global one won't be read, their keys won't persist, and that will be confusing to the user.

@capnrefsmmat recently pointed me to fs::path_expand, which might make (2) or (3) viable.

capnrefsmmat · 2023-09-22T02:36:16Z

Question: Is there a situation, maybe in a production environment, where one user would need multiple API keys?

dshemetov · 2023-09-22T05:15:55Z

Oh man, I hope not. On the plus side, from seeing how AWS and Github handle their keys (i.e., also by reading from system env vars by default, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and GITHUB_PAT), doesn't seem like they're concerned about that situation?

capnrefsmmat · 2023-09-22T12:07:57Z

It's pretty easy to switch out environment variables; if you have some automation system that runs your data pipeline, you can easily set custom environment variables in the shell scripts or commands that it runs to do the various tasks.

I'm asking because the answer might change how we store the keys. Environment variables are a good option exactly because they're so easy to replace or override in specific circumstances, if that will be necessary in a production environment.

dsweber2 · 2023-09-22T15:47:54Z

The only thing that might prompt different keys would be different private endpoint access, but in that case I think we'd just make a dedicated key instead.

dshemetov · 2023-09-22T16:11:44Z

Right, if we ever run integration tests with this client with privileged and unprivileged keys, env vars would be easy to swap out.

dshemetov · 2023-09-30T22:46:04Z

Alright, I went looking for other R developers with the same problem and I think I found a persistence solution. This email from 2018 by Gábor Csárdi points to rappdirs, which was made exactly for our purpose and requires only R (>= 3.2).

I'm thinking the workflow could now be something like:

if interactive, epidatr::set_api_key(SECRET_KEY) will set the environment variable and then ask the user if they want to persist their key, with all the appropriate stern prodding that they probably should
if the user types "yes", then we write the key to rappdirs::user_config_dir("epidatr") and we print that location information
upon loading epidatr, we automatically read the key from rappdirs::user_config_dir("epidatr"), load it to the env var, and we print the path of the file we read from and the env var we read into

This covers all the desirable pieces I can think of, let me know if I'm missing some:

platform-independent persistence
no file-system management for most users
clarity about where the keys are persisted on the file system, so power users can modify/delete them
clarity about where the keys are read from in each session (the env var), so power users can modify/delete them
keys kept away from repo, so no chance of user committing it
implicit instruction about secrets best practices

dshemetov · 2023-09-30T22:50:16Z

Oh and we can also think about suggesting users use rappdirs::user_cache_dir("epidatr") for the cache.

dshemetov · 2023-10-02T18:51:55Z

@brookslogan points out that it's probably better to use rappdirs::user_config_dir("R", version = "epidatr") to organize R configs and for consistency with R tools (>= 4.0).

r$> rappdirs::user_config_dir("R", version = "epidatr")
[1] "~/.config/R/epidatr"
r$> tools::R_user_dir("epidatr", "config")
[1] "/home/forecaster/.config/R/epidatr"

brookslogan · 2023-10-03T16:33:15Z

Another note on CRAN policies' requirement/suggestion:

packages may store user-specific data, configuration and cache files in their respective user directories obtained from tools::R_user_dir()

According to this scheme, is sounds like for cache files they'd want us to use

tools::R_user_dir("epidatr", "cache")

which is /home//.cache/R/epidatr on my Fedora system.

(Anyway, whatever location is chosen it sounds better if the API key is not in the cache directory, which using config/"config" above already achieves. That's the main thing I just thought to check on. But it would be nice actually to have cache files in .cache so if I run low on space I can quickly clear out caches from multiple programs easily.)

dshemetov · 2023-10-03T17:56:20Z

Agreed. So to reiterate, I think we should:

suggest rappdirs::user_cache_dir("R", version = "epidatr") by default for the cache folder, unless the user specifies otherwise
use rappdirs::user_config_dir("R", version = "epidatr") for the API key storage (with confirmation), unless the user specifies otherwise

capnrefsmmat · 2023-10-20T14:51:32Z

Just peeking through CRAN to see what people do with API keys. Here are a few examples:

eia uses an environment variable and assumes you can edit Renviron
rscopus does the same
civis too
cfbfastR uses an environment variable and points at usethis to edit your Renviron easily
openai does the same
twfy edits your Renviron for you
so does ipums, in a more careful way.

I can't find anyone who writes their own separate API key file.

I prefer the environment variable approach because it's very easy to temporarily replace an environment variable, e.g. for testing or because specific code needs access to different data.

Is the motivation for using rappdirs::user_config_dir() purely that we think using usethis::edit_r_environ() (even automatically, through some kind of function) will still be too complicated for some of our users, because they manually have to edit and save the file?

dshemetov · 2023-10-20T17:31:37Z

Thanks for doing that research into other packages. My main example was AWS CLI which has an ~/.aws/credentials file and I took that to be authoritative. I suppose it's better to compare with other R packages, but I think I'm just not a fan of .Renviron.

Here is another issue with it, other than the one you mention (where we have to teach the user how to format the environ key line): if someone already has an .Renviron in their project directory, it doesn't combine with env vars from the home directory .Renviron so them using usethis::edit_r_environ() will confusingly not load the new env var when they restart.

I want to clarify that my suggestion to use rappdirs::user_config_dir() is just for key persistence. When that file is read by epidatr, I agree that we should just load that value into DELPHI_EPIDATA_API_KEY env var or something like that, for the reasons you mention. And if the env var is already set on startup, then we just use that one, instead of reading the file.

capnrefsmmat · 2023-10-20T19:57:51Z

I think AWS does that because it has a complex configuration with a bunch of variables in different groups, and cramming that all into environment variables would be hard.

Could we check for the presence of a project-specific .Renviron before calling usethis::edit_r_environ()? It does take a scope = "project" argument, so if we know the project already has an .Renviron, we could open that for editing. We could look in the project directory to see if there's an .Renviron, for instance.

dshemetov · 2023-10-20T20:32:15Z

Makes sense about AWS.

And that's a good idea RE: dynamically adjusting Renviron scope. I just tested file.exists(usethis::proj_path(".Renviron")) and it seems to be a good way to check if a project-level Renviron exists.

So I think that removes my objections to Renviron. @brookslogan was it your ESS setup that didn't play well with usethis::edit_r_environ() or am I misremembering? Any other objections to just using Renviron for key persistence after all?

brookslogan · 2023-10-24T18:07:59Z

Issue skirted now:

scope = "project" will generate an error if it doesn't seem like you are in a project. If we do this dynamic thing, then maybe that's not an issue.

Issues I thought there were but are resolved or partially resolved or never existed:

I thought it didn't create new .Renvirons in the right place or maybe didn't read .Renvirons right if you were in a subdirectory of the project. Doesn't seem to be an issue now.
ESS at some point would always launch shells in the directory containing the script you were starting them from. Now it will launch it at roots of git repos; I'm not sure if it's aware of RStudio project metadata files though.

Remaining issue:

Uses some very-surprising project-location caching mechanism, maybe from here. So if you are in one project directory and do something that triggers that mechanism and then setwd to outside of the project / to another project and do usethis::edit_r_environ(scope = "project"), it will edit the first project's .Renviron.

dshemetov · 2023-10-24T20:22:08Z

Couldn't a user that setwd's to another project to set keys just... set their key in the first project and be fine? If they open the latter project later and it doesn't have a key, epidatr will tell them.

brookslogan · 2023-10-27T15:13:00Z

Sure, but they might think "I already set that for this project, why are you saying I didn't?" Doesn't sound the worst thing to have to set it twice and project switching isn't that common, but somehow I thought I remembered here caching frustrations being a recurring frustration in the past, so I don't know if I'm just missing whatever the workflow was that was frustrating or if it doesn't apply to API key setting.

brookslogan · 2023-10-27T15:16:21Z

For "set their key in the first project and be fine", that'd be true only if they launched within that project every time, or set it for the second project as well. I guess that's sort of closer to how .Renvirons work anyway, which might be good. (Closer, but not exact... .Renvirons are read at a different time than here location is cached; if there is a setwd between it could be a mismatch.)

dshemetov · 2023-10-27T17:16:40Z

On the general point of setwd-ing outside of a project and using here having unexpected behavior. That's an issue from a bad interaction upstream from us, so I wouldn't worry about it.

And as to users changing their launch directory for projects... YAGNI, let's wait for an actual user to have this complaint. Worst case scenario is they set their key twice.

dsweber2 added the P2 low priority label Sep 19, 2023

dshemetov self-assigned this Sep 27, 2023

dshemetov added this to the 1.2 release milestone Sep 29, 2023

dshemetov mentioned this issue Nov 16, 2023

refactor+docs: set_api_key #217

Merged

dshemetov closed this as completed in #217 Nov 18, 2023

dshemetov mentioned this issue Dec 9, 2023

API key vignette #182

Closed

dshemetov mentioned this issue Feb 16, 2024

fix: make default cache directory R version portable #248

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out better API key management #181

Figure out better API key management #181

dsweber2 commented Sep 19, 2023

dshemetov commented Sep 22, 2023 •

edited

capnrefsmmat commented Sep 22, 2023

dshemetov commented Sep 22, 2023

capnrefsmmat commented Sep 22, 2023

dsweber2 commented Sep 22, 2023

dshemetov commented Sep 22, 2023

dshemetov commented Sep 30, 2023 •

edited

dshemetov commented Sep 30, 2023

dshemetov commented Oct 2, 2023 •

edited

brookslogan commented Oct 3, 2023 •

edited

dshemetov commented Oct 3, 2023

capnrefsmmat commented Oct 20, 2023

dshemetov commented Oct 20, 2023 •

edited

capnrefsmmat commented Oct 20, 2023

dshemetov commented Oct 20, 2023

brookslogan commented Oct 24, 2023 •

edited

dshemetov commented Oct 24, 2023 •

edited

brookslogan commented Oct 27, 2023

brookslogan commented Oct 27, 2023

dshemetov commented Oct 27, 2023

Figure out better API key management #181

Figure out better API key management #181

Comments

dsweber2 commented Sep 19, 2023

dshemetov commented Sep 22, 2023 • edited

capnrefsmmat commented Sep 22, 2023

dshemetov commented Sep 22, 2023

capnrefsmmat commented Sep 22, 2023

dsweber2 commented Sep 22, 2023

dshemetov commented Sep 22, 2023

dshemetov commented Sep 30, 2023 • edited

dshemetov commented Sep 30, 2023

dshemetov commented Oct 2, 2023 • edited

brookslogan commented Oct 3, 2023 • edited

dshemetov commented Oct 3, 2023

capnrefsmmat commented Oct 20, 2023

dshemetov commented Oct 20, 2023 • edited

capnrefsmmat commented Oct 20, 2023

dshemetov commented Oct 20, 2023

brookslogan commented Oct 24, 2023 • edited

dshemetov commented Oct 24, 2023 • edited

brookslogan commented Oct 27, 2023

brookslogan commented Oct 27, 2023

dshemetov commented Oct 27, 2023

dshemetov commented Sep 22, 2023 •

edited

dshemetov commented Sep 30, 2023 •

edited

dshemetov commented Oct 2, 2023 •

edited

brookslogan commented Oct 3, 2023 •

edited

dshemetov commented Oct 20, 2023 •

edited

brookslogan commented Oct 24, 2023 •

edited

dshemetov commented Oct 24, 2023 •

edited