Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing _rawBody handling and adding option to maintain _ts and _etag properties from origin #26820

Conversation

FabianMeiswinkel
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel commented Feb 1, 2022

Description

When reading a DataFrame from Cosmos without schema inference you get a property _rawBody which contains the json of the payload. It should be possible to write this Dataframe back to Cosmso - the _rawBody property is supposed to be used as the new payload (but system properties like _ts and _etag would be overridden in the backend). There was a bug that missed triggering the special casing of _rawBody in certain cases (when _rawBody was not nullable) and it was just written as a json property "_rawBody" : "xxx"
This PR fixes that.
The PR also adds a new capability - if the Dataframe written to Cosmos contains a column "_origin_rawBody" instead of "_rawBody" it gets in general handled like "_rawBody" - but it iwll preserve the _ts as _origin_ts and _etag as _origin_etag. This fills a common gap when trying to identify "missing" or "not yet migrated" records when copying from a source container to a target container.

So, to simplify this scenario customer would just

  • Read dataframe form source container without schema inference
  • Rename column _rawBody to _origin_rawBody
  • Write that DF to target container

Then a join could be used to identify records in source where the target container doesn't contain the same version (or a later version)

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copy link
Contributor

@RaviTella RaviTella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I was able to do a quick test to.
Source:
{
"id": "4",
"category": "Programming Languages",
"title": "Learning Concurrent Programming in Scala",
"_rid": "WrMCAOg+m1UPAAAAAAAAAA==",
"_self": "dbs/WrMCAA==/colls/WrMCAOg+m1U=/docs/WrMCAOg+m1UPAAAAAAAAAA==/",
"_etag": ""0801e424-0000-0700-0000-61eed7410000"",
"_attachments": "attachments/",
"_ts": 1643042625
}

Sink
{
"id": "4",
"category": "Programming Languages",
"title": "Learning Concurrent Programming in Scala",
"_rid": "WrMCAJKKoDEMAAAAAAAAAA==",
"_self": "dbs/WrMCAA==/colls/WrMCAJKKoDE=/docs/WrMCAJKKoDEMAAAAAAAAAA==/",
"_etag": ""a801b681-0000-0700-0000-61fad25b0000"",
"_attachments": "attachments/",
"_lsn": 146,
"_origin_etag": ""0801e424-0000-0700-0000-61eed7410000"",
"_origin_ts": 1643042625,
"_ts": 1643827803
}

Copy link
Member

@ealsur ealsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new tests!

@check-enforcer
Copy link

check-enforcer bot commented Feb 2, 2022

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run java - [service] - ci

@FabianMeiswinkel FabianMeiswinkel merged commit 44f50b8 into Azure:main Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants