Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix cognitive service errors #1176

Merged
merged 14 commits into from
Sep 10, 2021

Conversation

serena-ruan
Copy link
Contributor

No description provided.

@serena-ruan serena-ruan changed the title fix: fix Left & Right errors fix: fix cognitive service errors & support autoconversion of Seq[String] Sep 1, 2021
@serena-ruan serena-ruan changed the title fix: fix cognitive service errors & support autoconversion of Seq[String] fix: fix cognitive service errors Sep 1, 2021
@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@@ -91,7 +91,9 @@ trait HasServiceParams extends Params {
}

protected def shouldSkip(row: Row): Boolean = getRequiredParams.exists { p =>
emptyParamData(row, p)
if (emptyParamData(row, p))
throw new NullPointerException(s"required param undefined: $p")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never throw false if the error is in here. What I'm thinking is that we look at the required params in the transformSchema function to ensure all required params are set with either a value or a column. We can then call transformSchema before we transform to add validation there too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should be an IllegalArgumentException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CognitiveServiceBase, transformSchema and transform both call getInternalTransformer first, and in this function we create new SimpleHTTPTransformer and setInputParser(getInternalInputParser(schema)). And in getInternalInputParser we call inputFunc, which calls shouldSkip, and that's why I add it here. I know it will never throw false if the error happens here, so do we want to silent the error to catch it later and return null?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the logic in getInternalTransformer. I think the logic should live in the "control-plane" which executes on the head node rather than code inside a mapPartitions like shouldSkip

.map(x => x.map(y => Map("Text" -> y))).toJson.compactPrint, ContentType.APPLICATION_JSON))
val textVal = getValueOpt(r, text)
if (textVal.nonEmpty) {
val content = textVal.get.getClass.getName match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris' second issue was caused by having nulls within a batch. Perhaps we should add that case as a test and ensure this new func can handle. I think in my TA batching logic it had to get pretty hairy to handle that unfortunately. Perhaps we can use similar logic. Also we might want to consider a better solution that just batching elements into single arrays. But we can push that to a later PR as that is current behavior in TA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nulls in text it will get null as return result, since the request itself would become invalid. But if toLanguage is set to null, then it will trigger an error as above "required param undefined", what behaviors are we expecting exactly? I noticed in TA there's a reshapeToArray stuff dealing with turning string into arrays, but I didn't get Chris's problem with the output schema (there's unpackBatchUDF dealing with the return json in TA), could explain more about that issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error comes from when there are nulls of text which would ordinarily be skipped but they are in a batch so the mess up the whole batch

@@ -171,6 +211,8 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid)

def setToLanguage(v: Seq[String]): this.type = setScalarParam(toLanguage, v)

def setToLanguage(v: String): this.type = setScalarParam(toLanguage, Seq(v))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We might need to add this to python methods too

Copy link
Contributor Author

@serena-ruan serena-ruan Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will automatically work? We're calling self._java_obj = self._java_obj.setXXX(value) and the java object will find the corresponding set function that matches the value type?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Youre right!

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan serena-ruan marked this pull request as ready for review September 6, 2021 06:51
Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love how you have applied these fixes to whole lib

post.setHeader("Content-Type", "application/json; charset=UTF-8")

val json = textAndTranslations.head.getClass.getTypeName match {
case "scala.Tuple2" => textAndTranslations.map(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little fishy, is there a reason we cant use regular pattern matching here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we added def setTextAndTranslation(v: (String, String)): this.type = setScalarParam(textAndTranslation, Seq(v)) function and in this case even though in getValue function we cast it into Seq[(String, String)] the underlying type is still Tuple2, and for the Tuple case I can't cast it into Seq[Row] and use Map("Text" -> s.getString(0), "Translation" -> s.getString(1)) directly...

df: DataFrame,
expectString: String): Boolean = {
val results = translator
df: DataFrame): DataFrame = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to rename this method to better reflect what it does

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan serena-ruan merged commit d85aae8 into microsoft:master Sep 10, 2021
mhamilton723 added a commit that referenced this pull request Sep 23, 2021
* fix: fix setLinkedService issues in Synapse (#1177)

* doc: add predictive maintenence notebook

squash

* fix: fix cog service test flakes

* feat: add NERPii

* fix: fix scala style error

* fix: rename NERPii to PII

* fix: fix anomaly detector test cases

* fix: fix cognitive service errors (#1176)

fix Left & Right errors
Enhancement for text translator

* chore: Add script to clean and back up ACR

* fix: fix setLinkedService in Synapse

* initial commit

* resolving comments

* updated opinions test

Co-authored-by: wenqing xu <80103478+xuwq1993@users.noreply.github.com>
Co-authored-by: Mark <mhamilton723@gmail.com>
Co-authored-by: xuwq1993 <wenqx@microsoft.com>
Co-authored-by: serena-ruan <82044803+serena-ruan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants