-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add aisample title genre text classification #1617
Conversation
Hey @thinkall 👋! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely work heres a few suggestions!
- No need for this line: raw_df.createOrReplaceTempView("raw_data")
- are these lines necessary for this CSV?
multiLine=True,
quote='"',
escape='"',
-
We have a class balancer tool specifically to deal with label imbalence, this will add a weight thats proportional to the deficit https://microsoft.github.io/SynapseML/docs/next/documentation/estimators/estimators_core/#classbalancer
-
We have text featurizer that might be able to make alot of these steps a single model call
https://mmlspark.blob.core.windows.net/docs/0.10.0/pyspark/synapse.ml.featurize.text.html#module-synapse.ml.featurize.text.TextFeaturizer -
EvaluatorType
-> use snake_case here and elsewhere
…into aisample-tian
Thanks @mhamilton723 for the feedbacks. They helped a lot :-) Just TextFeaturizer is not applied. The output column only contains indexes of words, not the words. Thus we can't use the output df for plotting word cloud. Moreover, we want to apply word2vec for vectorization. It seems that TextFeaturizer doesn't support it. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1617 +/- ##
==========================================
- Coverage 83.61% 83.56% -0.06%
==========================================
Files 288 288
Lines 15334 15334
Branches 747 747
==========================================
- Hits 12822 12814 -8
- Misses 2512 2520 +8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Related Issues/PRs
None
What changes are proposed in this pull request?
Add a title genre text classification notebook for aisample under notebooks/community/aisample.
How is this patch tested?
Does this PR change any dependencies?
Does this PR add a new feature? If so, have you added samples on website?
website/docs/documentation
folder.Make sure you choose the correct class
estimators/transformers
and namespace.DocTable
points to correct API link.yarn run start
to make sure the website renders correctly.<!--pytest-codeblocks:cont-->
before each python code blocks to enable auto-tests for python samples.WebsiteSamplesTests
job pass in the pipeline.AB#1935137