Remove Docker instructions and update the rest

JohnSnowLabs · Sep 1, 2020 · a3f3d13 · a3f3d13
1 parent 6590059
commit a3f3d13
Show file tree

Hide file tree

Showing 7 changed files with 80 additions and 126 deletions.
diff --git a/Dockerfile b/Dockerfile
diff --git a/README.md b/README.md
@@ -17,7 +17,8 @@ Showcasing notebooks and codes of how to use Spark NLP in Python and Scala.
   * [Colab](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/colab) (for Google Colab)
 * [Databricks Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/databricks)
 
-## Python Setup 
+## Python Setup
+
 ```bash
 $ java -version
 # should be Java 8 (Oracle or OpenJDK)
@@ -26,32 +27,24 @@ $ conda activate sparknlp
 $ pip install spark-nlp pyspark==2.4.4
 ```
 
-## Docker setup
-
-If you want to experience Spark NLP and run Jupyter examples without installing anything, you can simply use our [Docker image](https://hub.docker.com/r/johnsnowlabs/spark-nlp-workshop):
-
-1- Get the docker image for spark-nlp-workshop:
-
-```bash
-docker pull johnsnowlabs/spark-nlp-workshop
-```
+## Colab setup
 
-2- Run the image locally with port binding.
+```python
+import os
 
-```bash
- docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop
-```
+# Install java
+! apt-get update -qq
+! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
 
-3- Open Jupyter notebooks inside your browser by using the token printed on the console.
+os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
+os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
+! java -version
 
-```bash
-http://localhost:8888/
+# Install pyspark
+! pip install -q pyspark==2.4.6
+! pip install -q spark-nlp
 ```
 
-* The password to Jupyter notebook is `sparknlp`
-* The size of the image grows everytime you download a pretrained model or a pretrained pipeline. You can cleanup `~/cache_pretrained` if you don't need them.
-* This docker image is only meant for testing/learning purposes and should not be used in production environments. Please install Spark NLP natively.
-
 ## Main repository
 
 [https://github.com/JohnSnowLabs/spark-nlp](https://github.com/JohnSnowLabs/spark-nlp)

diff --git a/colab_setup.py b/colab_setup.py
@@ -1,10 +1,13 @@
 import os
 
-os.system('apt-get install -y openjdk-8-jdk-headless -qq > /dev/null')
+# Install java
+! apt-get update -qq
+! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
 
 os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
 os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
+! java -version
 
 # Install pyspark
-os.system("pip install --ignore-installed -q pyspark==2.4.4")
-os.system("pip install --ignore-installed -q spark-nlp==2.5.1")
+! pip install -q pyspark==2.4.4
+! pip install -q spark-nlp
diff --git a/jupyter/README.md b/jupyter/README.md
@@ -10,7 +10,7 @@ Finally, `eval` folder contains examples of how to evaluate the annotators. So i
 If you installed pyspark through pip, you can install `spark-nlp` through pip as well.
 
 ```bash
-pip install spark-nlp==2.5.0
+pip install spark-nlp==2.5.5
 ```
 
 PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/)
@@ -33,7 +33,7 @@ spark = SparkSession.builder \
     .master("local[*]")\
     .config("spark.driver.memory","6G")\
     .config("spark.driver.maxResultSize", "2G") \
-    .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.0")\
+    .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.5")\
     .config("spark.kryoserializer.buffer.max", "500m")\
     .getOrCreate()
 ```

diff --git a/jupyter_notebook_config.json b/jupyter_notebook_config.json
diff --git a/tutorials/README.md b/tutorials/README.md
@@ -1,41 +1,81 @@
 # Spark-NLP Tutorials
 
-You can either use notebooks inside `jupyter` directory when you are in Spark NLP docker image, or you can use the notebooks inside `colab` if you wish to run them on Google Colab.
-
 ## Spark NLP Instructions
 
-1.Install docker in your systems:
+### Pip
+
+If you installed pyspark through pip, you can install `spark-nlp` through pip as well.
 
-Go to site [https://docs.docker.com/install/](https://docs.docker.com/install/) to download based on your specific OS.
+```bash
+pip install spark-nlp==2.5.5
+```
 
-Note for windows user:
-Use the stable channel for windows 10
+PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/)
 
-[https://docs.docker.com/docker-for-windows/install/#what-to-know-before-you-install](https://docs.docker.com/docker-for-windows/install/#what-to-know-before-you-install)
+### Conda
 
-2.Get the docker image for spark-nlp-workshop:
+If you are using Anaconda/Conda for managing Python packages, you can install `spark-nlp` as follow:
 
 ```bash
-docker pull johnsnowlabs/spark-nlp-workshop
+conda install -c johnsnowlabs spark-nlp
 ```
 
-3.Run the image locally with port binding.
+Anaconda [spark-nlp package](https://anaconda.org/JohnSnowLabs/spark-nlp)
+
+Then you'll have to create a SparkSession manually, for example:
 
 ```bash
- docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop
+spark = SparkSession.builder \
+    .appName("ner")\
+    .master("local[*]")\
+    .config("spark.driver.memory","6G")\
+    .config("spark.driver.maxResultSize", "2G") \
+    .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.5")\
+    .config("spark.kryoserializer.buffer.max", "500m")\
+    .getOrCreate()
 ```
 
-4.Run the notebooks on your browser using the token printed on the console.
+If using local jars, you can use `spark.jars` instead for a comma delimited jar files. For cluster setups, of course you'll have to put the jars in a reachable location for all driver and executor nodes
+
+## Setup Jupyter Notebook
+
+### Prerequisite: Python
+
+While Jupyter runs code in many programming languages, Python is a requirement
+(Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook itself.
+
+## Installing Jupyter using Anaconda
+
+We **strongly recommend** installing Python and Jupyter using the [Anaconda Distribution](https://www.anaconda.com/downloads),
+which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.
+
+First, download [Anaconda](https://www.anaconda.com/downloads). We recommend downloading Anaconda’s latest Python 3 version.
+
+Second, install the version of Anaconda which you downloaded, following the instructions on the download page.
+
+Congratulations, you have installed Jupyter Notebook! To run the notebook, run the following command at the Terminal (Mac/Linux) or Command Prompt (Windows):
+
+```bash
+jupyter notebook
+```
+
+### Installing Jupyter with pip
+
+As an existing or experienced Python user, you may wish to install Jupyter using Python’s package manager, pip, instead of Anaconda.
+
+If you have Python 3 installed (which is recommended):
 
 ```bash
-http://localhost:8888/
+python3 -m pip install --upgrade pip
+python3 -m pip install jupyter
 ```
 
-> NOTE: The password to Jupyter notebook is `sparknlp`
 
-### Increase Docker memory
+Congratulations, you have installed Jupyter Notebook! To run the notebook, run
+the following command at the Terminal (Mac/Linux) or Command Prompt (Windows):
 
-The total memory of the VM in which docker runs is 2GB by default. You can increase this in macOS and Windows via gui.
+```bash
+jupyter notebook
+```
 
-> Preferences -> Advanced:
-![Databricks](./assets/docker_memory.png)
+Original reference: [https://jupyter.org/install](https://jupyter.org/install)
diff --git a/tutorials/assets/docker_memory.png b/tutorials/assets/docker_memory.png