feat: Add more notebook samples for documentation #1043

serena-ruan · 2021-05-06T07:07:02Z

No description provided.

serena-ruan · 2021-05-06T07:44:58Z

/azp run

azure-pipelines · 2021-05-06T07:45:08Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2021-05-06T07:49:57Z

Codecov Report

Merging #1043 (3e483c6) into master (12cea2d) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1043      +/-   ##
==========================================
+ Coverage   84.92%   84.93%   +0.01%     
==========================================
  Files         203      203              
  Lines        9689     9689              
  Branches      558      558              
==========================================
+ Hits         8228     8229       +1     
+ Misses       1461     1460       -1

Impacted Files	Coverage Δ
...osoft/ml/spark/io/http/PartitionConsolidator.scala	`93.61% <0.00%> (-2.13%)`	⬇️
...microsoft/ml/spark/cognitive/SpeechToTextSDK.scala	`89.84% <0.00%> (-0.79%)`	⬇️
...a/com/microsoft/ml/spark/io/http/HTTPClients.scala	`83.33% <0.00%> (+6.66%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12cea2d...3e483c6. Read the comment docs.

serena-ruan · 2021-05-06T08:49:28Z

/azp run

azure-pipelines · 2021-05-06T08:49:38Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-07T04:51:19Z

/azp run

azure-pipelines · 2021-05-07T04:51:29Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-07T04:53:35Z

/azp run

azure-pipelines · 2021-05-07T04:53:48Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-07T08:14:30Z

/azp run

azure-pipelines · 2021-05-07T08:14:40Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-07T09:44:45Z

/azp run

azure-pipelines · 2021-05-07T09:44:56Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-11T02:33:42Z

/azp run

azure-pipelines · 2021-05-11T02:33:52Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-11T03:00:00Z

/azp run

azure-pipelines · 2021-05-12T07:52:48Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-12T10:31:38Z

/azp run

azure-pipelines · 2021-05-12T10:31:49Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723

Great Job! Mostly just little things left.

Two larger questions:

there are a lot of cache count and repartitions going on in VW code. Would you be able to try removing some of these to see if they are necessary? We want to avoid having many dataframes cached, but if they are needed to avoid re-fitting the model that is OK.

I will also send over Jack Gerrits example on Vowpal Wabbit Contextual Bandit code when available, (we don't have to block on this though it can be a separate PR)

mhamilton723 · 2021-05-13T19:15:08Z

notebooks/samples/Cognitive Services Overview.ipynb

+    "- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/scala/com/microsoft/ml/spark/cognitive/DetectLastAnomaly.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectLastAnomaly))\n",
+    "- Find anomalies: generates a model using an entire series and finds anomalies in the series ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/scala/com/microsoft/ml/spark/cognitive/DetectAnomalies.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectAnomalies))\n",
+    "\n",
+    "### Web Search\n",


Web Search -> Search

mhamilton723 · 2021-05-13T19:15:33Z

notebooks/samples/Cognitive Services Overview.ipynb

+    "\n",
+    "### Web Search\n",
+    "- [Bing Image search](https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/scala/com/microsoft/ml/spark/cognitive/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.BingImageSearch))\n",
+    "- [Azure Cognitive search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search)\n"


Could you add corresponding scala snd python docs links?

mhamilton723 · 2021-05-13T19:17:38Z

notebooks/samples/LightGBM Overview.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_data.show(10)"


mhamilton723 · 2021-05-13T19:17:52Z

notebooks/samples/LightGBM Overview.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_data.groupBy(\"Bankrupt?\").count().show()"


show -> display

mhamilton723 · 2021-05-13T19:19:26Z

notebooks/samples/LightGBM Overview.ipynb

+   "outputs": [],
+   "source": [
+    "from mmlspark.lightgbm import LightGBMClassificationModel\n",
+    "model.saveNativeModel(\"/lgbmcmodel\")\n",


perhaps we can call this lgbmclassifier.model and add a cried markdown description that this allows you to extract the underlying lightGBM model for fast deployment after you train on spark

mhamilton723 · 2021-05-13T19:22:50Z

notebooks/samples/LightGBM Overview.ipynb

+    "dt1 = spark.read.format('libsvm') \\\n",
+    "    .load(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/lightGBMRanker_rank_test.libsvm\") \\\n",
+    "    .withColumn('iid', monotonically_increasing_id())\n",
+    "dt2 = spark.read.format('csv').option('inferSchema', True) \\\n",


likewise here

mhamilton723 · 2021-05-13T19:23:59Z

notebooks/samples/Vowpal Wabbit Overview.ipynb

@@ -0,0 +1,659 @@
+{


Nit to keep with the style of others lets make title Vowpal Wabbit - Overview. Likewise for other NBs

mhamilton723 · 2021-05-13T19:26:27Z

notebooks/samples/Vowpal Wabbit Overview.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_data.groupBy(\"target\").count().show()"


mhamilton723 · 2021-05-13T19:27:04Z

notebooks/samples/Vowpal Wabbit Overview.ipynb

+    "data = spark.read.parquet(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/AdultCensusIncome.parquet\")\n",
+    "data = data.select([\"education\", \"marital-status\", \"hours-per-week\", \"income\"])\n",
+    "train, test = data.randomSplit([0.75, 0.25], seed=123)\n",
+    "display(train.limit(10))"


no need for limit

mhamilton723 · 2021-05-13T19:27:50Z

notebooks/samples/Vowpal Wabbit Overview.ipynb

+    "# Making predictions\n",
+    "test = test.withColumn(\"label\", when(col(\"income\").contains(\"<\"), 0.0).otherwise(1.0))\n",
+    "prediction = vw_trained.transform(test)\n",
+    "display(prediction.limit(10))"


no need for limit

serena-ruan · 2021-05-14T07:03:04Z

/azp run

azure-pipelines · 2021-05-14T07:03:16Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-14T08:10:16Z

/azp run

azure-pipelines · 2021-05-14T08:10:26Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-17T05:15:29Z

/azp run

azure-pipelines · 2021-05-17T05:15:41Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-18T03:03:52Z

/azp run

azure-pipelines · 2021-05-18T03:04:02Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan · 2021-05-19T06:29:37Z

/azp run

azure-pipelines · 2021-05-19T06:30:02Z

Azure Pipelines successfully started running 1 pipeline(s).

…park into serena/addDocumentation

serena-ruan · 2021-05-19T06:32:01Z

/azp run

azure-pipelines · 2021-05-19T06:32:13Z

Azure Pipelines successfully started running 1 pipeline(s).

serena-ruan added area/documentation and removed area/documentation labels May 6, 2021

serena-ruan added 3 commits May 7, 2021 10:52

add LightGBM classification sample

e15e710

fix error

5d670a3

add VW classification sample

aa28f11

serena-ruan force-pushed the serena/addDocumentation branch from df83c7a to aa28f11 Compare May 7, 2021 02:52

add cognitive services for big data sample

923628f

fix tiny error

44856d1

serena-ruan force-pushed the serena/addDocumentation branch from 225500e to 44856d1 Compare May 7, 2021 06:33

update LightGBM samples

2732338

update samples

281e6b2

serena-ruan added 2 commits May 10, 2021 13:25

fix link

74b4cee

update comments

26703a3

Merge branch 'master' into serena/addDocumentation

52ab69f

fix spelling error

dcb7014

serena-ruan requested a review from mhamilton723 May 13, 2021 02:54

mhamilton723 requested changes May 13, 2021

View reviewed changes

serena-ruan added 2 commits May 14, 2021 14:58

update based on comments

38bb934

remove cache

bd7fde0

fix error

8cae61a

add VW contextual bandit sample

a931672

update notebook samples

c497742

serena-ruan force-pushed the serena/addDocumentation branch from 42e8b33 to c497742 Compare May 19, 2021 06:29

Merge branch 'master' into serena/addDocumentation

9648c22

serena-ruan added 2 commits May 19, 2021 14:31

tiny modification

31a66cb

Merge branch 'serena/addDocumentation' of github.com:serena-ruan/mmls…

3e483c6

…park into serena/addDocumentation

mhamilton723 approved these changes May 19, 2021

View reviewed changes

serena-ruan merged commit 663d965 into microsoft:master May 19, 2021

jameslamb mentioned this pull request May 19, 2021

[docs] replace broken mmlspark notebook link in docs microsoft/LightGBM#4303

Merged

feat: Add more notebook samples for documentation #1043

feat: Add more notebook samples for documentation #1043

Conversation

serena-ruan commented May 6, 2021

serena-ruan commented May 6, 2021

azure-pipelines bot commented May 6, 2021

codecov bot commented May 6, 2021 • edited

Codecov Report

serena-ruan commented May 6, 2021

azure-pipelines bot commented May 6, 2021

serena-ruan commented May 7, 2021

azure-pipelines bot commented May 7, 2021

serena-ruan commented May 7, 2021

azure-pipelines bot commented May 7, 2021

serena-ruan commented May 7, 2021

azure-pipelines bot commented May 7, 2021

serena-ruan commented May 7, 2021

azure-pipelines bot commented May 7, 2021

serena-ruan commented May 11, 2021

azure-pipelines bot commented May 11, 2021

serena-ruan commented May 11, 2021

azure-pipelines bot commented May 12, 2021

serena-ruan commented May 12, 2021

azure-pipelines bot commented May 12, 2021

mhamilton723 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serena-ruan commented May 14, 2021

azure-pipelines bot commented May 14, 2021

serena-ruan commented May 14, 2021

azure-pipelines bot commented May 14, 2021

serena-ruan commented May 17, 2021

azure-pipelines bot commented May 17, 2021

serena-ruan commented May 18, 2021

azure-pipelines bot commented May 18, 2021

serena-ruan commented May 19, 2021

azure-pipelines bot commented May 19, 2021

serena-ruan commented May 19, 2021

azure-pipelines bot commented May 19, 2021

codecov bot commented May 6, 2021 •

edited

mhamilton723 left a comment •

edited