feat: Added two classification examples using Vowpal Wabbit #733

chenhuims · 2019-11-12T22:38:20Z

One example uses Sentiment140 data for twitter sentiment classification
The other example applies VW algorithm to the adult census dataset

welcome · 2019-11-12T22:38:22Z

💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work!
Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

Make sure to check out the developer guide for guidance on testing your change.

notebooks/samples/Classification - Twitter Sentiment with Vowpal Wabbit.ipynb

drdarshan

Thank you for this contribution! Could you please remove the output cells (esp. ones with images) and resubmit the PR? I'll sign off.

drdarshan

Thanks again for your contribution.

mhamilton723 · 2019-11-14T16:48:07Z

/azp run

azure-pipelines · 2019-11-14T16:48:19Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2019-11-14T16:56:46Z

Codecov Report

Merging #733 into master will increase coverage by 11.56%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master     #733       +/-   ##
===========================================
+ Coverage   68.56%   80.13%   +11.56%     
===========================================
  Files         230      230               
  Lines        9197     9197               
  Branches      504      504               
===========================================
+ Hits         6306     7370     +1064     
+ Misses       2891     1827     -1064

Impacted Files	Coverage Δ
...osoft/ml/spark/io/http/PartitionConsolidator.scala	`95.55% <0%> (+2.22%)`	⬆️
...m/microsoft/ml/spark/io/http/HTTPTransformer.scala	`97.5% <0%> (+2.5%)`	⬆️
...om/microsoft/ml/spark/lightgbm/LightGBMUtils.scala	`94.93% <0%> (+2.53%)`	⬆️
...com/microsoft/ml/spark/core/contracts/Params.scala	`95.74% <0%> (+4.25%)`	⬆️
...a/com/microsoft/ml/spark/io/http/HTTPClients.scala	`57.14% <0%> (+5.35%)`	⬆️
...icrosoft/ml/spark/downloader/ModelDownloader.scala	`85.88% <0%> (+5.88%)`	⬆️
...scala/com/microsoft/ml/spark/io/http/Parsers.scala	`75% <0%> (+6.25%)`	⬆️
...a/com/microsoft/ml/spark/lightgbm/TrainUtils.scala	`91.03% <0%> (+8.96%)`	⬆️
...n/scala/org/apache/spark/ml/param/ArrayParam.scala	`70% <0%> (+10%)`	⬆️
...rosoft/ml/spark/core/schema/BinaryFileSchema.scala	`100% <0%> (+12.5%)`	⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dece5ae...bfc4e96. Read the comment docs.

azure-pipelines · 2019-11-14T22:17:22Z

Azure Pipelines successfully started running 1 pipeline(s).

…stream/master'

mhamilton723 · 2019-11-18T01:17:05Z

@chenhuims Looks like the data download and load didnt work. your dispay df shows 0 rows. It might have something to do with the schema you provided. Try with inferSchema set to true instead of making and passing a schema.

chenhuims · 2019-11-18T15:40:12Z

@chenhuims Looks like the data download and load didnt work. your dispay df shows 0 rows. It might have something to do with the schema you provided. Try with inferSchema set to true instead of making and passing a schema.

Thanks for checking. There was indeed some issue with data downloading. I will try to fix this issue.

chenhuims · 2019-11-20T03:02:43Z

@mhamilton723 I made a fix and the notebook runs without issue on my ADB workspace. However, the E2E test still failed. Could you grant me access to the ADB workspace of the testing pipeline so I can check the detailed logs?

chenhuims · 2019-11-21T20:10:46Z

/azp run

azure-pipelines · 2019-11-21T20:10:58Z

Azure Pipelines successfully started running 1 pipeline(s).

chenhuims · 2019-11-21T21:13:54Z

/azp run

azure-pipelines · 2019-11-21T21:14:04Z

Azure Pipelines successfully started running 1 pipeline(s).

welcome · 2019-11-21T21:59:39Z

Congrats on merging your first pull request, we appreciate your support! 🎉🎉🎉

chenhuims requested review from drdarshan and mhamilton723 as code owners November 12, 2019 22:38

drdarshan reviewed Nov 12, 2019

View reviewed changes

notebooks/samples/Classification - Twitter Sentiment with Vowpal Wabbit.ipynb Outdated Show resolved Hide resolved

drdarshan suggested changes Nov 12, 2019

View reviewed changes

drdarshan previously approved these changes Nov 14, 2019

View reviewed changes

chenhuims changed the title ~~Added two classification examples using Vowpal Wabbit~~ feat: Added two classification examples using Vowpal Wabbit Nov 14, 2019

chenhuims added 4 commits November 14, 2019 16:13

feat: added two classification examples with Vowpal Wabbit

836c237

refactor: updated notebooks by using the latest VW version

6828b16

refactor: removed output cells

66fcbbc

style: resized image

9c28c67

chenhuims dismissed drdarshan’s stale review via 9c28c67 November 14, 2019 16:19

chenhuims force-pushed the master branch from 21961e1 to 9c28c67 Compare November 14, 2019 16:19

fix: added bs4 dependency

3839919

chore: bump heap size in build via merging remote-tracking branch 'up…

097819d

…stream/master'

fix data loading issue

baa2352

microsoft deleted a comment from azure-pipelines bot Nov 20, 2019

chenhuims added 2 commits November 21, 2019 16:28

fix: added dataframe repartition

1e348b8

fix: try reducing training data

16afd38

refactor: updated text and image

bfc4e96

microsoft deleted a comment from azure-pipelines bot Nov 21, 2019

mhamilton723 approved these changes Nov 21, 2019

View reviewed changes

mhamilton723 merged commit 3da1d14 into microsoft:master Nov 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added two classification examples using Vowpal Wabbit #733

feat: Added two classification examples using Vowpal Wabbit #733

chenhuims commented Nov 12, 2019 •

edited

Loading

welcome bot commented Nov 12, 2019

drdarshan left a comment

drdarshan left a comment

mhamilton723 commented Nov 14, 2019

azure-pipelines bot commented Nov 14, 2019

codecov bot commented Nov 14, 2019 •

edited

Loading

azure-pipelines bot commented Nov 14, 2019

mhamilton723 commented Nov 18, 2019

chenhuims commented Nov 18, 2019

chenhuims commented Nov 20, 2019

chenhuims commented Nov 21, 2019

azure-pipelines bot commented Nov 21, 2019

chenhuims commented Nov 21, 2019

azure-pipelines bot commented Nov 21, 2019

welcome bot commented Nov 21, 2019

feat: Added two classification examples using Vowpal Wabbit #733

feat: Added two classification examples using Vowpal Wabbit #733

Conversation

chenhuims commented Nov 12, 2019 • edited Loading

welcome bot commented Nov 12, 2019

drdarshan left a comment

Choose a reason for hiding this comment

drdarshan left a comment

Choose a reason for hiding this comment

mhamilton723 commented Nov 14, 2019

azure-pipelines bot commented Nov 14, 2019

codecov bot commented Nov 14, 2019 • edited Loading

Codecov Report

azure-pipelines bot commented Nov 14, 2019

mhamilton723 commented Nov 18, 2019

chenhuims commented Nov 18, 2019

chenhuims commented Nov 20, 2019

chenhuims commented Nov 21, 2019

azure-pipelines bot commented Nov 21, 2019

chenhuims commented Nov 21, 2019

azure-pipelines bot commented Nov 21, 2019

welcome bot commented Nov 21, 2019

chenhuims commented Nov 12, 2019 •

edited

Loading

codecov bot commented Nov 14, 2019 •

edited

Loading