-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-1990] README #14
Merged
Merged
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# spark8t toolkit | ||
|
||
A set of Python scripts facilitating Spark interactions over Kunernetes, using an OCI image. | ||
|
||
## Description | ||
|
||
The main purpose of the `spark8t` toolkit is to provide a seemless, user-friendly interface | ||
to Spark functionalities over Kubernetes. As much for administator tasks (such as account registration) | ||
or data scientist functions (such as job submission or Spark interactive shell access). Various | ||
wrapper scripts allow for persistent (and user-friendly) configuration and execution of related tools. | ||
|
||
## Dependencies and Requirements | ||
|
||
- *Kubernetes* | ||
- *Apache Spark* | ||
|
||
## Installation | ||
|
||
Below we describe the essential steps on how to set up a Spark cluster together with the `spark8t` tool. | ||
|
||
(However note that most of the "hassle" desribed below can be saved, in case you choose to use the | ||
[canonical/spark-client-snap](canonical/spark-client-snap) Snap installation, that would both install | ||
dependencies, both prepare critical parts of the environment for you.) | ||
|
||
### Kubernetes | ||
|
||
In order to be able to run Spark on Kubernetes, you'll sure need to have a Kubernetes cluster installed :-) | ||
|
||
A simple installation of a lightweight Kubernetes implementation (Canonical's `microk8s`) can | ||
be found in our [Discourse Spark | ||
Tutorial](https://discourse.charmhub.io/t/spark-client-snap-tutorial-setup-environment/8951) | ||
|
||
Keep in mind to set the following environment variable: | ||
|
||
- `KUBECONFIG`: the location of the Kubernetes cluster configuration (typically: /home/$USER/.kube/config) | ||
|
||
### Spark | ||
|
||
You will need to install Spark as instructed at the official [Apache Spark pages](https://spark.apache.org/downloads.html). | ||
|
||
Related settings: | ||
|
||
- `SPARK_HOME`: location of your Spark installation | ||
|
||
### spark8t | ||
|
||
You could install the contents of this repository either by direct checkout, or using `pip` such as | ||
|
||
``` | ||
pip insatll git+https://github.com/canonical/spark-k8s-toolkit-py.git | ||
``` | ||
|
||
You'll need to add a mandatory configuration for the tool, which points to the OCI image to be used for the Spark workers. | ||
The configuration file must be called `spark-defaults.conf`, and could have a list of contents according to possible | ||
Spark-accepted command-line parameters. However the following specific one has to be defined: | ||
|
||
``` | ||
spark.kubernetes.container.image=ghcr.io/canonical/charmed-spark:<version> | ||
``` | ||
|
||
(See the [Spark ROCK releases GitHub page](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) for available versions) | ||
|
||
Then you would need to assign the correct values for the following `spark8t` environment variables: | ||
|
||
- `SPARK_CONFS`: location of the `spark8t` configuration file | ||
- `HOME`: the home of the Spark user (typically: `/home/spark`) | ||
- `SPARK_USER_DATA`: the location of Spark user data, such as interactive shell history (typically: same as `HOME`) | ||
deusebio marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Basic Usage | ||
|
||
`spark8t` is "built around" Spark itself, thus the usage is very similar to the known Spark client tools. | ||
juditnovak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The toolkit offers access to Spark functionalities via two interfaces: | ||
|
||
- interactive CLI | ||
- programmatic access via the underlying Python library | ||
|
||
We provide the following functionalities (see related documentation on Discourse): | ||
|
||
- [management of the Account Registry](https://discourse.charmhub.io/t/spark-client-snap-tutorial-manage-spark-service-accounts/8952) | ||
juditnovak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- [job submission](https://discourse.charmhub.io/t/spark-client-snap-tutorial-spark-submit/8953) | ||
- [interactive shell (Python, Scala)](https://discourse.charmhub.io/t/spark-client-snap-tutorial-interactive-mode/8954) | ||
- [programmatic access](https://discourse.charmhub.io/t/spark-client-snap-how-to-python-api/8958) | ||
|
||
## Contributing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also add sections on submitting bugs and feedback; and on reporting security issues, thanks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
Canonical welcomes contributions to the `spark8t` toolkit. Please check out our [contributor agreement](https://ubuntu.com/legal/contributors) if you're interested in contributing to the solution. | ||
|
||
## License | ||
The `spark8t` toolkit is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information. | ||
|
||
See [LICENSE](LICENSE) for more information. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a header mentioning that we are hiring and folks should apply at https://canonical.com/careers