-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add form recognizer support #1099
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1099 +/- ##
========================================
Coverage 85.71% 85.71%
========================================
Files 250 251 +1
Lines 11494 11603 +109
Branches 600 600
========================================
+ Hits 9852 9946 +94
- Misses 1642 1657 +15
Continue to review full report at Codecov.
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
import spark.implicits._ | ||
|
||
lazy val imageDf1: DataFrame = Seq( | ||
"https://mmlspark.blob.core.windows.net/datasets/FormRecognizer/layout1.jpg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps we can make a func that creates these dataframes and equivalent byte dataframes to reduce duplication. Something that takes in a list of filenames and a boolean flag for whether to return in binary type
} | ||
|
||
test("Basic Usage with pdf") { | ||
val results = pdfDf1.mlTransform(analyzeLayout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice use of mlTransform ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing, just a few nits, and one global question about how to make a custom cognitive service into an estimator. This will be our first custom cognitive service that we have tackled and I am very excited to see how this lands
|
||
def this() = this(Identifiable.randomUID("AnalyzeCustomForm")) | ||
|
||
def setLocationAndModelId(loc: String, modelId: String): this.type = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should make a separate (service) parameter called modelID so that the setLocation in the base cog services works as expected and users can use many diff custom models with a single call if desired
} | ||
} | ||
|
||
class TrainCustomModel(override val uid: String) extends CognitiveServicesBaseNoHandler(uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is a transformer, but training to me invokes the idea of a SparkML estimator. Though this might be best suited for a different class we should discuss how a natural sparkML design of a custom AI solution might look. My thought was that we could have something that is fit to a dataframe of forms and return the corresponding AnalyzeCustomModel transform with the modelID set with a literal value (so that it applies the same custom model to all columns)
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
cognitive/src/main/scala/com/microsoft/ml/spark/cognitive/FormRecognizer.scala
Outdated
Show resolved
Hide resolved
cognitive/src/main/scala/com/microsoft/ml/spark/cognitive/FormRecognizer.scala
Outdated
Show resolved
Hide resolved
cognitive/src/main/scala/com/microsoft/ml/spark/cognitive/FormRecognizer.scala
Outdated
Show resolved
Hide resolved
cognitive/src/main/scala/com/microsoft/ml/spark/cognitive/FormRecognizer.scala
Outdated
Show resolved
Hide resolved
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
242405d
to
efa066f
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
This reverts commit 66ad5e2.
27af12a
to
1fb7dd1
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
AB#1242424