Skip to content

Commit

Permalink
fix broken links (flyteorg#211)
Browse files Browse the repository at this point in the history
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
  • Loading branch information
samhita-alla committed May 19, 2021
1 parent 34cb437 commit 30b024c
Showing 1 changed file with 21 additions and 19 deletions.
40 changes: 21 additions & 19 deletions cookbook/integrations/kubernetes/k8s_spark/README.rst
@@ -1,37 +1,36 @@
Executing Spark Jobs natively on K8s Cluster
.. _plugins-spark-k8s:

Executing Spark Jobs Natively on K8s Cluster
============================================
Flyte can execute spark jobs natively on a Kubernetes Cluster. Flyte will manage the lifecycle, spin-up and tear down
of a virtual cluster. This leverages open source `Spark On K8s Operator <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator>`_
and can be enabled without signing up for any service. If you want to enable Spark for your Flyte Cluster refer to
std:ref:`plugins-spark-k8s`_. Flytekit makes it possible to write pyspark code natively as a task and the
Spark cluster will be automatically configured using the decorated SparkConf. The examples in this section provide a
hands on tutorial of writing pyspark tasks.
Flyte can execute Spark jobs natively on a Kubernetes Cluster, which manages the lifecycle, spin-up, and tear down
of a virtual cluster. It leverages the open-sourced `Spark On K8s Operator <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator>`_
and can be enabled without signing up for any service.

Flytekit makes it possible to write pyspark code natively as a task and the
Spark cluster will be automatically configured using the decorated ``SparkConf``. The examples in this section provide a
hands-on tutorial for writing pyspark tasks.

Pre-Requisites / Setup
Environment Setup
----------------------

#. Install ``flytekitplugins-spark`` using ``pip`` in your environment that contains ``flytekit >= 0.16.0``
#. Install ``flytekitplugins-spark`` using ``pip`` in your environment that contains ``flytekit >= 0.16.0``.

.. code-block:: bash
pip install flytekitplugins-spark
#. Build Spark Image correctly as explained below.
#. Build Spark image correctly as explained in :ref:`spark-docker-image`.

#. Enable Spark Plugin for Flyte following std:ref:`plugins-spark-k8s`_.
In-addition, Flyte uses the SparkOperator to run Spark Jobs as well as separate K8s Service Account/Role per namespace. All of these are created as part of the standard Flyte deploy.
#. Enable Spark plugin for Flyte refering to the :ref:`spark-examples` section. Additionally, Flyte uses the SparkOperator to run Spark Jobs and separate K8s Service Account/Role per namespace, which are created as part of the standard Flyte deployment.

#. Ensure you have enough resources on your K8s cluster. Based on the resources required for your spark job (across driver/executors), you might have to tweak resource-quotas for the namespace.
#. Ensure you have enough resources on your K8s cluster. Based on the resources required for your Spark job (across drivers/executors), you might have to tweak resource quotas for the namespace.

.. _spark-docker-image:

How to build your Dockerfile for Spark on K8s
How to Build Your Dockerfile for Spark on K8s
---------------------------------------------

Using Spark on K8s is extremely easy and provides full versioning using the custom built Spark container. The
built container can also execute regular Spark tasks.
For Spark, the image must contain spark dependencies as well as the correct entrypoint for the Spark driver/executors. This can be achieved by using the `flytekit_install_spark.sh <https://github.com/lyft/flytekit/blob/67b00ef6173c77a940dbe612baa9b76408ef1448/scripts/flytekit_install_spark3.sh>`__ script provided as referenced in the Dockerfile included here.

Using Spark on K8s is extremely easy and provides full versioning using the custom-built Spark container. The built container can also execute regular Spark tasks.
For Spark, the image must contain Spark dependencies and the correct entry point for the Spark driver/executors. This can be achieved using the `flytekit_install_spark.sh <https://github.com/lyft/flytekit/blob/67b00ef6173c77a940dbe612baa9b76408ef1448/scripts/flytekit_install_spark3.sh>`__ script provided as part of the Dockerfile included here.

.. code-block:: docker
:linenos:
Expand Down Expand Up @@ -97,4 +96,7 @@ For Spark, the image must contain spark dependencies as well as the correct entr
# Note this relies on the VENV variable we've set in this image.
ENTRYPOINT ["/usr/local/bin/flytekit_venv", "/opt/entrypoint.sh"]
.. _spark-examples:

Code Examples
-------------

0 comments on commit 30b024c

Please sign in to comment.