Skip to content

Commit

Permalink
Remove Docker instructions and update the rest
Browse files Browse the repository at this point in the history
  • Loading branch information
maziyarpanahi committed Sep 1, 2020
1 parent 6590059 commit a3f3d13
Show file tree
Hide file tree
Showing 7 changed files with 80 additions and 126 deletions.
77 changes: 0 additions & 77 deletions Dockerfile

This file was deleted.

35 changes: 14 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Showcasing notebooks and codes of how to use Spark NLP in Python and Scala.
* [Colab](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/colab) (for Google Colab)
* [Databricks Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/databricks)

## Python Setup
## Python Setup

```bash
$ java -version
# should be Java 8 (Oracle or OpenJDK)
Expand All @@ -26,32 +27,24 @@ $ conda activate sparknlp
$ pip install spark-nlp pyspark==2.4.4
```

## Docker setup

If you want to experience Spark NLP and run Jupyter examples without installing anything, you can simply use our [Docker image](https://hub.docker.com/r/johnsnowlabs/spark-nlp-workshop):

1- Get the docker image for spark-nlp-workshop:

```bash
docker pull johnsnowlabs/spark-nlp-workshop
```
## Colab setup

2- Run the image locally with port binding.
```python
import os

```bash
docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop
```
# Install java
! apt-get update -qq
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null

3- Open Jupyter notebooks inside your browser by using the token printed on the console.
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! java -version

```bash
http://localhost:8888/
# Install pyspark
! pip install -q pyspark==2.4.6
! pip install -q spark-nlp
```

* The password to Jupyter notebook is `sparknlp`
* The size of the image grows everytime you download a pretrained model or a pretrained pipeline. You can cleanup `~/cache_pretrained` if you don't need them.
* This docker image is only meant for testing/learning purposes and should not be used in production environments. Please install Spark NLP natively.

## Main repository

[https://github.com/JohnSnowLabs/spark-nlp](https://github.com/JohnSnowLabs/spark-nlp)
Expand Down
9 changes: 6 additions & 3 deletions colab_setup.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
import os

os.system('apt-get install -y openjdk-8-jdk-headless -qq > /dev/null')
# Install java
! apt-get update -qq
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null

os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! java -version

# Install pyspark
os.system("pip install --ignore-installed -q pyspark==2.4.4")
os.system("pip install --ignore-installed -q spark-nlp==2.5.1")
! pip install -q pyspark==2.4.4
! pip install -q spark-nlp
4 changes: 2 additions & 2 deletions jupyter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Finally, `eval` folder contains examples of how to evaluate the annotators. So i
If you installed pyspark through pip, you can install `spark-nlp` through pip as well.

```bash
pip install spark-nlp==2.5.0
pip install spark-nlp==2.5.5
```

PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/)
Expand All @@ -33,7 +33,7 @@ spark = SparkSession.builder \
.master("local[*]")\
.config("spark.driver.memory","6G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.0")\
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.5")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```
Expand Down
5 changes: 0 additions & 5 deletions jupyter_notebook_config.json

This file was deleted.

76 changes: 58 additions & 18 deletions tutorials/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,81 @@
# Spark-NLP Tutorials

You can either use notebooks inside `jupyter` directory when you are in Spark NLP docker image, or you can use the notebooks inside `colab` if you wish to run them on Google Colab.

## Spark NLP Instructions

1.Install docker in your systems:
### Pip

If you installed pyspark through pip, you can install `spark-nlp` through pip as well.

Go to site [https://docs.docker.com/install/](https://docs.docker.com/install/) to download based on your specific OS.
```bash
pip install spark-nlp==2.5.5
```

Note for windows user:
Use the stable channel for windows 10
PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/)

[https://docs.docker.com/docker-for-windows/install/#what-to-know-before-you-install](https://docs.docker.com/docker-for-windows/install/#what-to-know-before-you-install)
### Conda

2.Get the docker image for spark-nlp-workshop:
If you are using Anaconda/Conda for managing Python packages, you can install `spark-nlp` as follow:

```bash
docker pull johnsnowlabs/spark-nlp-workshop
conda install -c johnsnowlabs spark-nlp
```

3.Run the image locally with port binding.
Anaconda [spark-nlp package](https://anaconda.org/JohnSnowLabs/spark-nlp)

Then you'll have to create a SparkSession manually, for example:

```bash
docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop
spark = SparkSession.builder \
.appName("ner")\
.master("local[*]")\
.config("spark.driver.memory","6G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.5")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
```

4.Run the notebooks on your browser using the token printed on the console.
If using local jars, you can use `spark.jars` instead for a comma delimited jar files. For cluster setups, of course you'll have to put the jars in a reachable location for all driver and executor nodes

## Setup Jupyter Notebook

### Prerequisite: Python

While Jupyter runs code in many programming languages, Python is a requirement
(Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook itself.

## Installing Jupyter using Anaconda

We **strongly recommend** installing Python and Jupyter using the [Anaconda Distribution](https://www.anaconda.com/downloads),
which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

First, download [Anaconda](https://www.anaconda.com/downloads). We recommend downloading Anaconda’s latest Python 3 version.

Second, install the version of Anaconda which you downloaded, following the instructions on the download page.

Congratulations, you have installed Jupyter Notebook! To run the notebook, run the following command at the Terminal (Mac/Linux) or Command Prompt (Windows):

```bash
jupyter notebook
```

### Installing Jupyter with pip

As an existing or experienced Python user, you may wish to install Jupyter using Python’s package manager, pip, instead of Anaconda.

If you have Python 3 installed (which is recommended):

```bash
http://localhost:8888/
python3 -m pip install --upgrade pip
python3 -m pip install jupyter
```

> NOTE: The password to Jupyter notebook is `sparknlp`

### Increase Docker memory
Congratulations, you have installed Jupyter Notebook! To run the notebook, run
the following command at the Terminal (Mac/Linux) or Command Prompt (Windows):

The total memory of the VM in which docker runs is 2GB by default. You can increase this in macOS and Windows via gui.
```bash
jupyter notebook
```

> Preferences -> Advanced:
![Databricks](./assets/docker_memory.png)
Original reference: [https://jupyter.org/install](https://jupyter.org/install)
Binary file removed tutorials/assets/docker_memory.png
Binary file not shown.

0 comments on commit a3f3d13

Please sign in to comment.