Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add form recognizer support #1099

Merged
merged 24 commits into from
Jul 2, 2021
Merged

Conversation

serena-ruan
Copy link
Contributor

@serena-ruan serena-ruan commented Jun 23, 2021

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Jun 23, 2021

Codecov Report

Merging #1099 (6b34c86) into master (931cb42) will increase coverage by 0.00%.
The diff coverage is 82.56%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #1099    +/-   ##
========================================
  Coverage   85.71%   85.71%            
========================================
  Files         250      251     +1     
  Lines       11494    11603   +109     
  Branches      600      600            
========================================
+ Hits         9852     9946    +94     
- Misses       1642     1657    +15     
Impacted Files Coverage Δ
.../microsoft/ml/spark/cognitive/FormRecognizer.scala 82.56% <82.56%> (ø)
...com/microsoft/ml/spark/cognitive/RESTHelpers.scala 55.00% <0.00%> (+20.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 931cb42...6b34c86. Read the comment docs.

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan serena-ruan marked this pull request as ready for review June 23, 2021 10:36
import spark.implicits._

lazy val imageDf1: DataFrame = Seq(
"https://mmlspark.blob.core.windows.net/datasets/FormRecognizer/layout1.jpg"
Copy link
Collaborator

@mhamilton723 mhamilton723 Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps we can make a func that creates these dataframes and equivalent byte dataframes to reduce duplication. Something that takes in a list of filenames and a boolean flag for whether to return in binary type

}

test("Basic Usage with pdf") {
val results = pdfDf1.mlTransform(analyzeLayout,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice use of mlTransform ;)

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, just a few nits, and one global question about how to make a custom cognitive service into an estimator. This will be our first custom cognitive service that we have tackled and I am very excited to see how this lands


def this() = this(Identifiable.randomUID("AnalyzeCustomForm"))

def setLocationAndModelId(loc: String, modelId: String): this.type =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should make a separate (service) parameter called modelID so that the setLocation in the base cog services works as expected and users can use many diff custom models with a single call if desired

}
}

class TrainCustomModel(override val uid: String) extends CognitiveServicesBaseNoHandler(uid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is a transformer, but training to me invokes the idea of a SparkML estimator. Though this might be best suited for a different class we should discuss how a natural sparkML design of a custom AI solution might look. My thought was that we could have something that is fit to a dataframe of forms and return the corresponding AnalyzeCustomModel transform with the modelID set with a literal value (so that it applies the same custom model to all columns)

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@serena-ruan
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 85f089d into master Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants