-
Notifications
You must be signed in to change notification settings - Fork 601
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove Docker instructions and update the rest
- Loading branch information
1 parent
6590059
commit a3f3d13
Showing
7 changed files
with
80 additions
and
126 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,13 @@ | ||
import os | ||
|
||
os.system('apt-get install -y openjdk-8-jdk-headless -qq > /dev/null') | ||
# Install java | ||
! apt-get update -qq | ||
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null | ||
|
||
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" | ||
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"] | ||
! java -version | ||
|
||
# Install pyspark | ||
os.system("pip install --ignore-installed -q pyspark==2.4.4") | ||
os.system("pip install --ignore-installed -q spark-nlp==2.5.1") | ||
! pip install -q pyspark==2.4.4 | ||
! pip install -q spark-nlp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,41 +1,81 @@ | ||
# Spark-NLP Tutorials | ||
|
||
You can either use notebooks inside `jupyter` directory when you are in Spark NLP docker image, or you can use the notebooks inside `colab` if you wish to run them on Google Colab. | ||
|
||
## Spark NLP Instructions | ||
|
||
1.Install docker in your systems: | ||
### Pip | ||
|
||
If you installed pyspark through pip, you can install `spark-nlp` through pip as well. | ||
|
||
Go to site [https://docs.docker.com/install/](https://docs.docker.com/install/) to download based on your specific OS. | ||
```bash | ||
pip install spark-nlp==2.5.5 | ||
``` | ||
|
||
Note for windows user: | ||
Use the stable channel for windows 10 | ||
PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/) | ||
|
||
[https://docs.docker.com/docker-for-windows/install/#what-to-know-before-you-install](https://docs.docker.com/docker-for-windows/install/#what-to-know-before-you-install) | ||
### Conda | ||
|
||
2.Get the docker image for spark-nlp-workshop: | ||
If you are using Anaconda/Conda for managing Python packages, you can install `spark-nlp` as follow: | ||
|
||
```bash | ||
docker pull johnsnowlabs/spark-nlp-workshop | ||
conda install -c johnsnowlabs spark-nlp | ||
``` | ||
|
||
3.Run the image locally with port binding. | ||
Anaconda [spark-nlp package](https://anaconda.org/JohnSnowLabs/spark-nlp) | ||
|
||
Then you'll have to create a SparkSession manually, for example: | ||
|
||
```bash | ||
docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop | ||
spark = SparkSession.builder \ | ||
.appName("ner")\ | ||
.master("local[*]")\ | ||
.config("spark.driver.memory","6G")\ | ||
.config("spark.driver.maxResultSize", "2G") \ | ||
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:2.5.5")\ | ||
.config("spark.kryoserializer.buffer.max", "500m")\ | ||
.getOrCreate() | ||
``` | ||
|
||
4.Run the notebooks on your browser using the token printed on the console. | ||
If using local jars, you can use `spark.jars` instead for a comma delimited jar files. For cluster setups, of course you'll have to put the jars in a reachable location for all driver and executor nodes | ||
|
||
## Setup Jupyter Notebook | ||
|
||
### Prerequisite: Python | ||
|
||
While Jupyter runs code in many programming languages, Python is a requirement | ||
(Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook itself. | ||
|
||
## Installing Jupyter using Anaconda | ||
|
||
We **strongly recommend** installing Python and Jupyter using the [Anaconda Distribution](https://www.anaconda.com/downloads), | ||
which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. | ||
|
||
First, download [Anaconda](https://www.anaconda.com/downloads). We recommend downloading Anaconda’s latest Python 3 version. | ||
|
||
Second, install the version of Anaconda which you downloaded, following the instructions on the download page. | ||
|
||
Congratulations, you have installed Jupyter Notebook! To run the notebook, run the following command at the Terminal (Mac/Linux) or Command Prompt (Windows): | ||
|
||
```bash | ||
jupyter notebook | ||
``` | ||
|
||
### Installing Jupyter with pip | ||
|
||
As an existing or experienced Python user, you may wish to install Jupyter using Python’s package manager, pip, instead of Anaconda. | ||
|
||
If you have Python 3 installed (which is recommended): | ||
|
||
```bash | ||
http://localhost:8888/ | ||
python3 -m pip install --upgrade pip | ||
python3 -m pip install jupyter | ||
``` | ||
|
||
> NOTE: The password to Jupyter notebook is `sparknlp` | ||
|
||
### Increase Docker memory | ||
Congratulations, you have installed Jupyter Notebook! To run the notebook, run | ||
the following command at the Terminal (Mac/Linux) or Command Prompt (Windows): | ||
|
||
The total memory of the VM in which docker runs is 2GB by default. You can increase this in macOS and Windows via gui. | ||
```bash | ||
jupyter notebook | ||
``` | ||
|
||
> Preferences -> Advanced: | ||
![Databricks](./assets/docker_memory.png) | ||
Original reference: [https://jupyter.org/install](https://jupyter.org/install) |
Binary file not shown.