Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add aisample title genre text classification #1617

Merged
merged 8 commits into from
Aug 23, 2022
Merged

docs: add aisample title genre text classification #1617

merged 8 commits into from
Aug 23, 2022

Conversation

thinkall
Copy link
Contributor

@thinkall thinkall commented Aug 18, 2022

Related Issues/PRs

None

What changes are proposed in this pull request?

Add a title genre text classification notebook for aisample under notebooks/community/aisample.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.
  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

AB#1935137

@github-actions
Copy link

Hey @thinkall 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.
We appreciate your patience and contributions 💯!

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely work heres a few suggestions!

  • No need for this line: raw_df.createOrReplaceTempView("raw_data")
  • are these lines necessary for this CSV?
multiLine=True,
   quote='"',
   escape='"',

@thinkall
Copy link
Contributor Author

Thanks @mhamilton723 for the feedbacks. They helped a lot :-)

Just TextFeaturizer is not applied. The output column only contains indexes of words, not the words. Thus we can't use the output df for plotting word cloud. Moreover, we want to apply word2vec for vectorization. It seems that TextFeaturizer doesn't support it.

@thinkall
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Aug 19, 2022

Codecov Report

Merging #1617 (5c08d20) into master (0f54bc6) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1617      +/-   ##
==========================================
- Coverage   83.61%   83.56%   -0.06%     
==========================================
  Files         288      288              
  Lines       15334    15334              
  Branches      747      747              
==========================================
- Hits        12822    12814       -8     
- Misses       2512     2520       +8     
Impacted Files Coverage Δ
.../azure/synapse/ml/param/PythonWrappableParam.scala 66.66% <0.00%> (-8.34%) ⬇️
...ft/azure/synapse/ml/param/JsonEncodableParam.scala 57.14% <0.00%> (-7.15%) ⬇️
...re/src/main/python/synapse/ml/core/schema/Utils.py 67.10% <0.00%> (-5.27%) ⬇️
.../execution/streaming/continuous/HTTPSourceV2.scala 92.08% <0.00%> (-0.72%) ⬇️
...ft/azure/synapse/ml/cognitive/ComputerVision.scala 73.10% <0.00%> (+1.26%) ⬆️
...osoft/azure/synapse/ml/core/utils/AsyncUtils.scala 80.00% <0.00%> (+5.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 enabled auto-merge (squash) August 23, 2022 04:24
@mhamilton723 mhamilton723 merged commit d98ac02 into microsoft:master Aug 23, 2022
@thinkall thinkall deleted the aisample-tian branch August 23, 2022 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants