-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix cognitive service errors #1176
Conversation
a94d4b8
to
946d73d
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
@@ -91,7 +91,9 @@ trait HasServiceParams extends Params { | |||
} | |||
|
|||
protected def shouldSkip(row: Row): Boolean = getRequiredParams.exists { p => | |||
emptyParamData(row, p) | |||
if (emptyParamData(row, p)) | |||
throw new NullPointerException(s"required param undefined: $p") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never throw false if the error is in here. What I'm thinking is that we look at the required params in the transformSchema function to ensure all required params are set with either a value or a column. We can then call transformSchema before we transform to add validation there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should be an IllegalArgumentException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CognitiveServiceBase, transformSchema and transform both call getInternalTransformer
first, and in this function we create new SimpleHTTPTransformer
and setInputParser(getInternalInputParser(schema))
. And in getInternalInputParser
we call inputFunc
, which calls shouldSkip
, and that's why I add it here. I know it will never throw false if the error happens here, so do we want to silent the error to catch it later and return null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have the logic in getInternalTransformer. I think the logic should live in the "control-plane" which executes on the head node rather than code inside a mapPartitions like shouldSkip
.map(x => x.map(y => Map("Text" -> y))).toJson.compactPrint, ContentType.APPLICATION_JSON)) | ||
val textVal = getValueOpt(r, text) | ||
if (textVal.nonEmpty) { | ||
val content = textVal.get.getClass.getName match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chris' second issue was caused by having nulls within a batch. Perhaps we should add that case as a test and ensure this new func can handle. I think in my TA batching logic it had to get pretty hairy to handle that unfortunately. Perhaps we can use similar logic. Also we might want to consider a better solution that just batching elements into single arrays. But we can push that to a later PR as that is current behavior in TA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nulls in text
it will get null as return result, since the request itself would become invalid. But if toLanguage
is set to null, then it will trigger an error as above "required param undefined", what behaviors are we expecting exactly? I noticed in TA there's a reshapeToArray
stuff dealing with turning string into arrays, but I didn't get Chris's problem with the output schema (there's unpackBatchUDF
dealing with the return json in TA), could explain more about that issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error comes from when there are nulls of text which would ordinarily be skipped but they are in a batch so the mess up the whole batch
@@ -171,6 +211,8 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid) | |||
|
|||
def setToLanguage(v: Seq[String]): this.type = setScalarParam(toLanguage, v) | |||
|
|||
def setToLanguage(v: String): this.type = setScalarParam(toLanguage, Seq(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We might need to add this to python methods too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this will automatically work? We're calling self._java_obj = self._java_obj.setXXX(value)
and the java object will find the corresponding set function that matches the value type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Youre right!
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
…ng as Array & supplement tests
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
…ensure the error is thrown if requried params are not set
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love how you have applied these fixes to whole lib
post.setHeader("Content-Type", "application/json; charset=UTF-8") | ||
|
||
val json = textAndTranslations.head.getClass.getTypeName match { | ||
case "scala.Tuple2" => textAndTranslations.map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little fishy, is there a reason we cant use regular pattern matching here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because we added def setTextAndTranslation(v: (String, String)): this.type = setScalarParam(textAndTranslation, Seq(v))
function and in this case even though in getValue
function we cast it into Seq[(String, String)] the underlying type is still Tuple2, and for the Tuple case I can't cast it into Seq[Row] and use Map("Text" -> s.getString(0), "Translation" -> s.getString(1))
directly...
df: DataFrame, | ||
expectString: String): Boolean = { | ||
val results = translator | ||
df: DataFrame): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to rename this method to better reflect what it does
cognitive/src/test/scala/com/microsoft/ml/spark/cognitive/split1/TranslatorSuite.scala
Show resolved
Hide resolved
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
* fix: fix setLinkedService issues in Synapse (#1177) * doc: add predictive maintenence notebook squash * fix: fix cog service test flakes * feat: add NERPii * fix: fix scala style error * fix: rename NERPii to PII * fix: fix anomaly detector test cases * fix: fix cognitive service errors (#1176) fix Left & Right errors Enhancement for text translator * chore: Add script to clean and back up ACR * fix: fix setLinkedService in Synapse * initial commit * resolving comments * updated opinions test Co-authored-by: wenqing xu <80103478+xuwq1993@users.noreply.github.com> Co-authored-by: Mark <mhamilton723@gmail.com> Co-authored-by: xuwq1993 <wenqx@microsoft.com> Co-authored-by: serena-ruan <82044803+serena-ruan@users.noreply.github.com>
No description provided.