Skip to content

support passing credentials from memory for google connectors#1888

Merged
rbiseck3 merged 5 commits into
mainfrom
roman/google-connectors-token-as-value
Oct 31, 2023
Merged

support passing credentials from memory for google connectors#1888
rbiseck3 merged 5 commits into
mainfrom
roman/google-connectors-token-as-value

Conversation

@rbiseck3
Copy link
Copy Markdown
Contributor

Description

Google Drive

The existing service account parameter was expanded to support either a file path or a json value to generate the credentials when instantiating the google drive client.

GCS

Google Cloud Storage already supports the value being passed in, from their docstring:

  • you may supply a token generated by the
    gcloud
    utility; this is either a python dictionary, the name of a file
    containing the JSON returned by logging in with the gcloud CLI tool,
    or a Credentials object.

I tested this locally:

from gcsfs import GCSFileSystem
import json

with open("/Users/romanisecke/.ssh/google-cloud-unstructured-ingest-test-d4fc30286d9d.json") as json_file:
    json_data = json.load(json_file)
    print(json_data)

    fs = GCSFileSystem(token=json_data)
    print(fs.ls(path="gs://utic-test-ingest-fixtures/"))

['utic-test-ingest-fixtures/ideas-page.html', 'utic-test-ingest-fixtures/nested-1', 'utic-test-ingest-fixtures/nested-2']

Comment on lines +60 to +70
if isinstance(key_path, dict):
creds = service_account.Credentials.from_service_account_info(key_path)
elif os.path.isfile(key_path):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = key_path
creds, _ = default()
else:
raise ValueError(
f"key path not recognized as a dictionary or a file path: "
f"[{type(key_path)}] {key_path}",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarify maybe just have a separate argument for the path versus the value rather than trying to infer. I think given naming folks are already confused on which we expec

@ryannikolaidis
Copy link
Copy Markdown
Contributor

ryannikolaidis commented Oct 26, 2023

yes, gcs does, but we don't expose this as an option in the ingest cli.

ohh so token could be a path OR a value?? got it. yea, I think this is all confusing. also...why is it called a token in one place versus service_key in another?

personally I think we should just be explicit about what we expect having discrete options for path vs value. regardless seems like these should be the same argument name across gcs and google drive (given that we're passing the same key to both...)

per issue description, keep the old options for back compatibility

@rbiseck3 rbiseck3 force-pushed the roman/google-connectors-token-as-value branch from 74847b3 to a5e2140 Compare October 27, 2023 18:23
Comment on lines +17 to +15
service_account_key: str
token: t.Union[dict, Path]
Copy link
Copy Markdown
Contributor

@ryannikolaidis ryannikolaidis Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in either case we should keep the old key and flag as deprecated (since we have folks actively using these connectors).
let's defer to service_account_key as the one we make common to both, since this is more specific and seems to be more widely referenced in their documents.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we use if both are provided? Raise an error?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think google drive is the one folks are really using. so I think fine to not deprecate, but let's just make service_account_key the common one.

@rbiseck3 rbiseck3 force-pushed the roman/google-connectors-token-as-value branch from a5e2140 to d8b900b Compare October 27, 2023 19:02
Comment thread unstructured/ingest/cli/cmds/gcs.py Outdated
Comment on lines +15 to +16
token: t.Optional[str] = None
service_account_key: t.Optional[t.Union[dict, Path]] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also bump the gcs docs page?

Copy link
Copy Markdown
Contributor

@ryannikolaidis ryannikolaidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested both by passing the value as the argument, and both worked.

after docs bump this is gtg!

Copy link
Copy Markdown
Contributor

@ryannikolaidis ryannikolaidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...per discussion, looks like docs use anonymous, so bumping shouldn't be required for this.

lgtm!

@rbiseck3 rbiseck3 force-pushed the roman/google-connectors-token-as-value branch 2 times, most recently from fa04a3e to 148a0aa Compare October 31, 2023 11:46
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 11:46 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 enabled auto-merge October 31, 2023 11:46
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 13:18 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 force-pushed the roman/google-connectors-token-as-value branch from d563c92 to 9c83868 Compare October 31, 2023 14:29
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 14:29 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 14:33 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 14:33 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 14:33 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 14:33 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 force-pushed the roman/google-connectors-token-as-value branch from 9c83868 to 05ed62d Compare October 31, 2023 15:15
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 15:15 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 16:29 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 16:29 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 16:29 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 temporarily deployed to ci October 31, 2023 16:29 — with GitHub Actions Inactive
@rbiseck3 rbiseck3 added this pull request to the merge queue Oct 31, 2023
Merged via the queue into main with commit 123ad20 Oct 31, 2023
@rbiseck3 rbiseck3 deleted the roman/google-connectors-token-as-value branch October 31, 2023 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: google drive and google cloud storage connectors should optionally take service key VALUE as a parameter

2 participants