Example code for distributing Python packages on Spark cluster
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
mecab-test.py
setup.sh
spark-defaults.conf
word_cloud.py

README.md

Distributed Python package with pip

screen shot 2017-05-30 at 14 54 50

This repo shows how to destribute packages dependented with pip install.

Basic idea is from this blog.

How to use

  • Open Workbench
  • Run setup.sh in the CDSW terminal
  • Set environment variables in the configuration UI of the project: PYSPARK_PYTHON=./MECAB/mecab_env/bin/python
  • Close Workbench and Reopen session
  • Open and run mecab-test.py on Workbench

Example 2: Wordcloud

word_cloud.py shows another example with MeCab and wordcloud.

  • Make sure to be able to run mecab-test.py
  • Execute !pip install wordcloud