From 10e7d450fa8257efc5d614957fda514b2b91fdee Mon Sep 17 00:00:00 2001 From: myui Date: Wed, 17 May 2017 11:58:19 -0400 Subject: [PATCH] Fixed userguide for Docker/Spark entry --- docs/gitbook/docker/getting_started.md | 19 +++++++++++++++++++ docs/gitbook/spark/binaryclass/a9a_df.md | 2 +- .../spark/getting_started/installation.md | 4 ++-- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/docs/gitbook/docker/getting_started.md b/docs/gitbook/docker/getting_started.md index 09447538b..810e5d8aa 100644 --- a/docs/gitbook/docker/getting_started.md +++ b/docs/gitbook/docker/getting_started.md @@ -39,6 +39,9 @@ This page introduces how to run Hivemall on Docker. `docker build -f resources/docker/Dockerfile .` +> #### Note +> You can [skip](./getting_started.html#running-pre-built-docker-image-in-dockerhub) building images by using existing Docker images. + # 2. Run container ## Run by docker-compose @@ -52,11 +55,27 @@ This page introduces how to run Hivemall on Docker. 2. Run `docker run -it ${docker_image_id}`. Refer [Docker reference](https://docs.docker.com/engine/reference/run/) for the command detail. +## Running pre-built Docker image in Dockerhub + + 1. Check [the latest tag](https://hub.docker.com/r/hivemall/latest/tags/) first. + 2. Pull pre-build docker image from Dockerhub `docker pull hivemall/latest:20170517` + 3. `docker run -p 8088:8088 -p 50070:50070 -p 19888:19888 -it hivemall/latest:20170517` + +You can find pre-built Hivemall docker images in [this repository](https://hub.docker.com/r/hivemall/latest/). + # 3. Run Hivemall on Docker 1. Type `hive` to run (`.hiverc` automatically loads Hivemall functions) 2. Try your Hivemall queries! +## Accessing Hadoop management GUIs + +* YARN http://localhost:8088/ +* HDFS http://localhost:50070/ +* MR jobhistory server http://localhost:19888/ + +Note that you need to expose local ports e.g., by `-p 8088:8088 -p 50070:50070 -p 19888:19888` on running docker image. + ## Load data into HDFS (optional) You can find an example script to load data into HDFS in `./bin/prepare_iris.sh`. diff --git a/docs/gitbook/spark/binaryclass/a9a_df.md b/docs/gitbook/spark/binaryclass/a9a_df.md index 7c3de6731..74f2705fa 100644 --- a/docs/gitbook/spark/binaryclass/a9a_df.md +++ b/docs/gitbook/spark/binaryclass/a9a_df.md @@ -50,7 +50,7 @@ val testDf = spark.read.format("libsvm").load("a9a.t") .select($"rowid", $"label".as("target"), $"feature", $"weight".as("value")) .cache -scala> df.printSchema +scala> testDf.printSchema root |-- rowid: string (nullable = true) |-- target: float (nullable = true) diff --git a/docs/gitbook/spark/getting_started/installation.md b/docs/gitbook/spark/getting_started/installation.md index 74fc56819..2eb6cde64 100644 --- a/docs/gitbook/spark/getting_started/installation.md +++ b/docs/gitbook/spark/getting_started/installation.md @@ -43,7 +43,7 @@ $ ./bin/spark-shell --jars hivemall-spark-xxx-with-dependencies.jar Then, you load scripts for Hivemall functions. ``` -scala> :load define-all.spark -scala> :load import-packages.spark +scala> :load resources/ddl/define-all.spark +scala> :load resources/ddl/import-packages.spark ```