Skip to content

Latest commit

 

History

History
57 lines (45 loc) · 1.54 KB

File metadata and controls

57 lines (45 loc) · 1.54 KB

Interactive example with skein

Here is an interactive example Skein and HDFS storage with a virtual environment. You can also execute it directly in a Jupyter notebook.

  1. Prepare a virtual environment with skein & numpy
$ cd examples/interactive-mode
$ python3 -m venv venv
$ . venv/bin/activate
$ pip install cluster-pack numpy skein
python
  1. Define the workload to execute remotely
def compute_intersection():
    a = np.random.random_integers(0, 100, 100)
    b = np.random.random_integers(0, 100, 100)
    print("Computed intersection of two arrays:")
    print(np.intersect1d(a, b))
  1. Upload current virtual environment to the distributed storage (HDFS in this case)
import cluster_pack
package_path, _ = cluster_pack.upload_env()
  1. Call skein config helper to get the config that easily executes this function on the cluster
from cluster_pack.skein import skein_config_builder
skein_config = skein_config_builder.build_with_func(
    func=compute_intersection,
    package_path=package_path
)
  1. Submit a simple skein application
import skein
with skein.Client() as client:
    service = skein.Service(
        resources=skein.model.Resources("1 GiB", 1),
        files=skein_config.files,
        script=skein_config.script
    )
    spec = skein.ApplicationSpec(services={"service": service})
    app_id = client.submit(spec)