UN-3414 [FIX] SharePoint site_url/drive_id persistence and safe OAuth token refresh#1919
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughConnector metadata handling was changed to preserve non-token fields while merging refreshed OAuth token fields; Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(63, 81, 181, 0.5)
participant Client
end
rect rgba(0, 150, 136, 0.5)
participant View
participant Serializer
end
rect rgba(255, 193, 7, 0.5)
participant ConnectorAuthCache
participant ConnectorAuthModel
end
rect rgba(233, 30, 99, 0.5)
participant DB
end
Client->>View: POST/PUT with oauth_key and connector_metadata
View->>ConnectorAuthCache: fetch cached oauth tokens (oauth_key)
ConnectorAuthCache-->>View: oauth_tokens or MissingParam
View->>View: merge oauth_tokens with non-secret request metadata
View->>Serializer: initialize with merged metadata
Serializer->>ConnectorAuthModel: _refresh_tokens(provider, uid, existing_metadata)
ConnectorAuthModel->>ConnectorAuthModel: call external token refresh
ConnectorAuthModel-->>Serializer: refreshed token subset (whitelisted)
Serializer->>DB: save ConnectorInstance with merged metadata
DB-->>Serializer: saved instance
Serializer-->>Client: response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| backend/connector_auth_v2/constants.py | Adds shared OAUTH_TOKEN_KEYS frozenset centralizing the safe-to-merge OAuth key whitelist; clean, no issues. |
| backend/connector_auth_v2/models.py | Sibling-loop in get_and_refresh_tokens now whitelist-merges via OAUTH_TOKEN_KEYS instead of replacing the entire metadata; logging switched to lazy %s format. |
| backend/connector_v2/fields.py | _refresh_tokens now accepts and preserves existing_metadata, applying OAUTH_TOKEN_KEYS whitelist before merging; per-connector form fields are protected on DB read. |
| backend/connector_v2/serializers.py | connector_name made non-required with schema-default backfill in validate(); save() whitelist-merges refreshed tokens preserving form fields; PATCH guard (self.partial) prevents overwriting user-renamed connectors. |
| backend/connector_v2/views.py | _get_connector_metadata merges form fields with OAuth tokens; form_metadata type-guarded against non-dict values; no more full-replacement of metadata with raw OAuth cache. |
| unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py | Drive resolution fixed: ctx.drives[drive_id] for direct ID; ctx.sites.get_by_url(site_url).drive for site URL; get_by_path removed in favour of the library's native URL-to-site method. |
| unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json | user_email description improved; drive_id given format: "password" which masks a non-secret identifier unnecessarily in the UI. |
Sequence Diagram
sequenceDiagram
participant FE as Frontend (RJSF)
participant V as views.py (_get_connector_metadata)
participant S as serializers.py (save)
participant M as models.py (get_and_refresh_tokens)
participant DB as ConnectorInstance DB
FE->>V: POST /connectors with form_metadata + oauth_key
V->>V: oauth_tokens = get_oauth_creds_from_cache(oauth_key)
V->>V: connector_metadata = {**form_metadata, **oauth_tokens}
Note over V: site_url/drive_id preserved from form
V->>S: serializer.save(connector_metadata=connector_metadata)
S->>M: connector_oauth.get_and_refresh_tokens()
M->>M: get_access_token() may refresh
alt tokens refreshed
M->>DB: sibling loop: whitelist-merge OAUTH_TOKEN_KEYS into each connector_metadata
end
M-->>S: returns (extra_data, refreshed_flag)
S->>S: token_updates = {k: extra_data[k] for k in OAUTH_TOKEN_KEYS}
S->>S: kwargs[connector_metadata] = {**existing_metadata, **token_updates}
Note over S: form fields (site_url, drive_id) preserved
S->>DB: super().save(**kwargs)
DB-->>FE: 201 Created
Prompt To Fix All With AI
This is a comment left during a code review.
Path: unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json
Line: 22-27
Comment:
**`drive_id` masked as a password field**
`format: "password"` causes RJSF to render this as an `<input type="password">` field, masking the value behind bullets. A Drive ID is not a secret credential — users need to be able to read and copy it to verify the value or share it with teammates. Masking it silently degrades usability (no show/hide toggle by default in most RJSF themes) without providing a security benefit. Consider keeping this as a plain text field, consistent with `site_url` and `client_id`.
```suggestion
"drive_id": {
"type": "string",
"title": "Drive ID",
"description": "Specific Drive/Document Library ID. Leave empty to use the default drive."
},
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (7): Last reviewed commit: "Merge branch 'main' into fix/sharepoint-..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/connector_v2/views.py`:
- Around line 101-103: When handling OAuth update/reauth flows in views.py the
code builds connector_metadata from only request data and oauth_tokens
(CIKey.CONNECTOR_METADATA -> form_metadata; connector_metadata =
{**form_metadata, **oauth_tokens}), which drops existing instance metadata like
site_url/drive_id when form_metadata is omitted; fix by seeding form_metadata
with the existing instance metadata when available (e.g. base_metadata =
(self.instance.connector_metadata or {}) then overlay with form_metadata and
finally oauth_tokens) so the merge preserves prior non-secret fields; apply the
same change to the other OAuth update block referenced around lines 164-175.
- Around line 108-135: The fallback in _fill_default_connector_name is currently
applied to all creates; restrict it to OAuth connector creation by adding an
early guard that returns unless the request is an OAuth create (e.g., check
request_data.get(CIKey.AUTH_TYPE) == AuthType.OAUTH or
request_data.get(CIKey.AUTH_TYPE) == "oauth"). Update the function to check
CIKey.AUTH_TYPE (and use AuthType.OAUTH constant) before fetching the
schema/details and only proceed to call ConnectorProcessor.get_json_schema and
set CIKey.CONNECTOR_NAME when the request is for an OAuth connector.
In
`@unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py`:
- Around line 139-141: The code sets self._drive using ctx.drives[self.drive_id]
and later accesses ctx.sites.get_by_url(self.site_url).drive but does not
execute the required query chain; update both locations to call
.get().execute_query() on the drive resource so the drive object is initialized
(e.g., replace ctx.drives[self.drive_id] and the .drive access from
ctx.sites.get_by_url(self.site_url) with the same .get().execute_query()
pattern) and assign the resulting loaded object to self._drive/drive as done
elsewhere in this module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 981ec789-a892-48cf-8fdf-ab45b5cd9d44
📒 Files selected for processing (4)
backend/connector_v2/serializers.pybackend/connector_v2/views.pyunstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.pyunstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json
DRF's request.data is an immutable QueryDict for multipart / form-encoded POSTs; directly mutating it inside _fill_default_connector_name would raise AttributeError for any non-JSON caller. Copy to a mutable dict once in create() and feed that copy to both the helper and the serializer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chandrasekharan-zipstack
left a comment
There was a problem hiding this comment.
Two pieces of feedback — see inline comments. The serializer-level fallback is a nice-to-have refactor; the incomplete-fix concern on the OAuth refresh path is the more important one.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/connector_auth_v2/models.py`:
- Around line 99-112: The merge in the refresh path is allowing non-token
enrichment keys from self.extra_data to leak into other ConnectorAuth siblings;
update the merge that sets connector_instance.connector_metadata to first filter
self.extra_data through the same whitelist used by serializers.py (the
_OAUTH_TOKEN_KEYS used to allow only token-related keys) before combining with
existing_metadata, or move _OAUTH_TOKEN_KEYS to a shared constant and reuse it
here; reference the connector_instance.connector_metadata assignment and
self.extra_data (and serializers._OAUTH_TOKEN_KEYS or
GoogleAuthHelper.enrich_connector_metadata as the provenance of enrichment) and
ensure only whitelisted token keys are propagated to sibling connector metadata.
In `@backend/connector_v2/fields.py`:
- Around line 31-45: The _refresh_tokens method currently merges
connector_auth.get_and_refresh_tokens() into existing_metadata unfiltered, which
can reintroduce enrichment keys from ConnectorAuth.extra_data; update
_refresh_tokens to whitelist only OAuth token-related keys (use the same
_OAUTH_TOKEN_KEYS constant used in serializers.py) by filtering
refreshed_metadata to contain only those keys before merging, then return
{**existing_metadata, **filtered_refreshed_metadata} so non-token enrichment
fields are not injected on read.
In `@backend/connector_v2/serializers.py`:
- Around line 109-124: connector_oauth.get_and_refresh_tokens() currently
iterates self.connectorinstance_set.all() and writes an unfiltered merge of
self.extra_data into each sibling connector, which lets non-token keys leak into
sibling connector_metadata; update that method so when it persists token data
back to sibling ConnectorInstance rows it only writes the whitelisted keys (use
the same _OAUTH_TOKEN_KEYS whitelist) and/or build a token_updates dict (like
the save() code does) and merge only those into each sibling's
CIKey.CONNECTOR_METADATA, explicitly excluding other extra_data keys before
saving.
- Around line 54-86: The validate method currently backfills connector_name from
the JSON schema but can return attrs without connector_name (causing a DB
IntegrityError later); update validate (in the serializer containing validate
and _get_schema_default_connector_name) so that after attempting to backfill via
_get_schema_default_connector_name (which calls
ConnectorProcessor.get_json_schema) it raises a serializers.ValidationError
referencing CIKey.CONNECTOR_NAME (or similar field name) when no connector_name
is present, preserving the API contract and producing a 400 instead of letting
ConnectorInstance.connector_name trigger a NOT NULL DB error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ae16b27d-bd5a-48a2-b609-34d30ea24dbe
📒 Files selected for processing (4)
backend/connector_auth_v2/models.pybackend/connector_v2/fields.pybackend/connector_v2/serializers.pybackend/connector_v2/views.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/connector_v2/views.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/connector_v2/serializers.py (1)
57-60: Run the schema default through the same connector-name validator.The backfilled value is assigned after field-level validation, so it currently bypasses
validate_connector_name(). Reuse that validator before storing the default to keep fallback behavior identical to normal submissions.♻️ Proposed consistency fix
if not default_name: raise ValidationError({CIKey.CONNECTOR_NAME: "This field is required."}) - attrs[CIKey.CONNECTOR_NAME] = default_name + attrs[CIKey.CONNECTOR_NAME] = self.validate_connector_name(default_name) logger.info(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/connector_v2/serializers.py` around lines 57 - 60, The default connector name assigned to attrs[CIKey.CONNECTOR_NAME] bypasses the field-level validator; call the existing validate_connector_name function on default_name (and use/assign its validated/normalized return or let it raise ValidationError) before setting attrs[CIKey.CONNECTOR_NAME] so fallback behavior matches normal submissions; update the block around validate_connector_name, default_name and attrs[CIKey.CONNECTOR_NAME] in serializers.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/connector_v2/serializers.py`:
- Around line 57-60: The default connector name assigned to
attrs[CIKey.CONNECTOR_NAME] bypasses the field-level validator; call the
existing validate_connector_name function on default_name (and use/assign its
validated/normalized return or let it raise ValidationError) before setting
attrs[CIKey.CONNECTOR_NAME] so fallback behavior matches normal submissions;
update the block around validate_connector_name, default_name and
attrs[CIKey.CONNECTOR_NAME] in serializers.py accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 81ef659e-1926-44b0-9c40-9d01ba5e9715
📒 Files selected for processing (4)
backend/connector_auth_v2/constants.pybackend/connector_auth_v2/models.pybackend/connector_v2/fields.pybackend/connector_v2/serializers.py
✅ Files skipped from review due to trivial changes (1)
- backend/connector_auth_v2/constants.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/connector_auth_v2/models.py
- backend/connector_v2/fields.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/connector_v2/serializers.py (1)
70-81: Consider broadening the exception catch in_get_schema_default_connector_name.
ConnectorProcessor.get_json_schemare-raises any exception fromjson.loads(...)on a malformed stored schema (seebackend/connector_processor/connector_processor.py:61-85). OnlyInvalidConnectorIDis caught here, so a corruptJSON_SCHEMArow for a validconnector_idwill surface as a 500 on every create — strictly worse than the pre-fix 400, and an operator typing a legitimate connector ID wouldn't get a useful response.Two reasonable paths:
- Log and return
Noneon any non-InvalidConnectorIDexception, letting the downstreamValidationErrorproduce a 400 (and keeping the create flow usable for other connectors if one is misconfigured).- Leave as-is if you explicitly want malformed schemas to page loudly — in that case a
logger.exceptionabove the re-raise inget_json_schemais probably sufficient and this can be skipped.♻️ Option: swallow and fall through to 400
- try: - schema_details = ConnectorProcessor.get_json_schema(connector_id=connector_id) - except InvalidConnectorID: - return None + try: + schema_details = ConnectorProcessor.get_json_schema(connector_id=connector_id) + except InvalidConnectorID: + return None + except Exception: + logger.exception( + "Failed to load JSON schema for connector_id=%s while " + "backfilling connector_name", + connector_id, + ) + return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/connector_v2/serializers.py` around lines 70 - 81, The _get_schema_default_connector_name currently only catches InvalidConnectorID from ConnectorProcessor.get_json_schema; broaden the exception handling to also catch any other exceptions raised by get_json_schema (e.g., JSONDecodeError or generic Exception), log the exception for operators, and return None so malformed stored schemas don't raise a 500 during create; keep the existing InvalidConnectorID behavior but ensure any non-InvalidConnectorID errors are handled by logging (using the module logger) and falling through returning None from _get_schema_default_connector_name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/connector_v2/serializers.py`:
- Around line 70-81: The _get_schema_default_connector_name currently only
catches InvalidConnectorID from ConnectorProcessor.get_json_schema; broaden the
exception handling to also catch any other exceptions raised by get_json_schema
(e.g., JSONDecodeError or generic Exception), log the exception for operators,
and return None so malformed stored schemas don't raise a 500 during create;
keep the existing InvalidConnectorID behavior but ensure any
non-InvalidConnectorID errors are handled by logging (using the module logger)
and falling through returning None from _get_schema_default_connector_name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 32939327-9660-48aa-baf1-3b8f12ef2b20
📒 Files selected for processing (1)
backend/connector_v2/serializers.py
muhammad-ali-e
left a comment
There was a problem hiding this comment.
@kirtimanmishrazipstack Makesure everything is encrypted in the metadata
…tract into fix/sharepoint-connector-UI
|
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|



What
backend/connector_auth_v2/constants.py— introduce sharedOAUTH_TOKEN_KEYSfrozenset (the only keys safe to merge across connectors sharing the same(provider, uid)).backend/connector_auth_v2/models.py—ConnectorAuth.get_and_refresh_tokenssibling-loop now whitelist-merges refreshed tokens over each sibling's existing metadata instead of replacing it.backend/connector_v2/fields.py—ConnectorAuthJSONField._refresh_tokensnow preserves the row's existing metadata and only overlaysOAUTH_TOKEN_KEYSon DB read.backend/connector_v2/serializers.py—connector_nameis made non-required;validate()backfills it from the connector's JSON schema default (skipped onpartialupdates);save()whitelist-merges refreshed tokens into this connector's metadata so form fields (site_url,drive_id) are preserved.backend/connector_v2/views.py—_get_connector_metadatamerges form fields with OAuth tokens ({**form_metadata, **oauth_tokens}) instead of replacing.unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py— fix drive resolution:_get_driveusesctx.drives[self.drive_id](bracket indexing onEntityCollection);_get_sharepoint_site_driveusesctx.sites.get_by_url(self.site_url).drive.unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json— markuser_emailasformat: \"password\"; refine its description.Why
site_urlanddrive_id, breaking SharePoint/OneDrive connection.(provider, uid)OAuth identity,ConnectorAuth.extra_datawas cross-contaminating per-connector metadata at every read/refresh path (save, DB read, sibling-loop). The whitelist centralizes which keys may cross connector boundaries.connector_namefrom the POST body, causingconnector_name: This field is required.400s on staging for OAuth connectors. Local dev does not reproduce. Backend fallback unblocks staging without waiting for the frontend race to be isolated.drive_idand for site-URL addressing against the Graph API.How
OAUTH_TOKEN_KEYS(access/refresh/type/expires/auth_time/refresh_after/expires_in/scope) defines the only keys that may be merged between connectors that share an OAuth identity.serializers.save(pre-save merge),fields.from_db_value._refresh_tokens(read-time merge),models.ConnectorAuth.get_and_refresh_tokens(sibling-loop persistence). Non-token keys on each row stay untouched.ConnectorInstanceSerializer.validate()readsproperties.connectorName.defaultviaConnectorProcessor.get_json_schemaand injects it only whenconnector_nameis absent and the request is not a partial update. Logs INFO on fire; raises 400 explicitly if no default exists rather than letting a null reach the DB._get_connector_metadatamerges the POSTed form body with the OAuth-cache tokens so site_url / drive_id survive the RJSF -> backend hop._get_driveusesctx.drives[self.drive_id];_get_sharepoint_site_driveusesctx.sites.get_by_url(self.site_url).drive.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
access_token/refresh_token/refresh_aftervia the whitelist; our proactive-refresh path keys offREFRESH_AFTER, which is whitelisted. Theconnector_namefallback only fires when the field is missing, so renames via PATCH (which setself.partial) are unaffected.Database Migrations
Env Config
Relevant Docs
prompting/connectors/sharepoint/sharepoint-fields-testing.mdRelated Issues or PRs
Dependencies Versions
Notes on Testing
https://tenant.sharepoint.com/sites/<name>URL; verify drive loads and files list correctly; confirmsite_urlis persisted inconnector_metadata.drive_id: create connector with a known drive ID; verify listing works; confirmdrive_idpersists.site_url=X, then create connector B (same OAuth identity) withsite_url=Y. Verify B is NOT pre-filled with A's value and A's metadata row is not overwritten after B saves.connector_namefallback: submit a connector create withconnector_nameomitted; verify 201 response and server log lineFilled missing connector_name with schema default for <id>. PATCH without the field should NOT backfill.REFRESH_AFTER), MinIO (non-OAuth), and existing SharePoint Client-Credentials flow continue to work.Screenshots
Checklist
I have read and understood the Contribution Guidelines.