Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-1990] README #14

Merged
merged 3 commits into from
Jun 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Contributing

## Overview

This documents explains the processes and practices recommended for contributing enhancements to this repository.

- Generally, before developing enhancements to this charm, you should consider [opening an issue](https://github.com/canonical/spark-k8s-toolkit-py/issues) explaining your problem with examples, and your desired use case.
- If you would like to chat with us about your use-cases or proposed implementation, you can reach us at [Data Platform Canonical Mattermost public channel](https://chat.charmhub.io/charmhub/channels/data-platform) or [Discourse](https://discourse.charmhub.io/).
- All enhancements require review before being merged. Code review typically examines
- code quality
- test coverage
- user experience for interacting with the other components of the Charmed Spark solution.
- Please help us out in ensuring easy to review branches by rebasing your pull request branch onto the `main` branch. This also avoids merge commits and creates a linear Git commit history.

To build and develop the package in this repository, we advise to use [Poetry](https://python-poetry.org/). For installing poetry on different platforms, please refer to [here](https://python-poetry.org/docs/#installation).

## Install from source

To install the package with poetry, checkout the repository

```bash
git clone https://github.com/canonical/spark-k8s-toolkit-py.git
cd spark-k8s-toolkit-py/
```

and run

```bash
poetry install
```

## Developing

When developing we advise you to use virtual environment to confine the installation of this package and its dependencies. Please refer to [venv](https://docs.python.org/3/library/venv.html), [pyenv](https://github.com/pyenv/pyenv) or [conda](https://docs.conda.io/en/latest/), for some tools that help you to create and manage virtual environments.
We also advise you to read how Poetry integrates with virtual environments [here](https://python-poetry.org/docs/managing-environments/).

The project uses [tox](https://tox.wiki/en/latest/) for running CI/CD pipelines and automation on different enviroments, whereas setup of python agnostic components can be done using the [Makefile](./Makefile).

You can create an environment for development with `tox`:

```shell
tox devenv -e integration
source venv/bin/activate
```

### Testing

Using tox you can also run several operations, such as

```shell
tox run -e fmt # update your code according to linting rules
tox run -e lint # code style
tox run -e unit # unit tests
tox run -e integration # integration tests
tox run -e all-tests # unit+integration tests
tox # runs 'lint' and 'unit' environments
```

## Canonical Contributor Agreement

Canonical welcomes contributions to the Charmed Kafka Operator. Please check out our [contributor agreement](https://ubuntu.com/legal/contributors) if you're interested in contributing to the solution.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ PYTHON = poetry run

folders := helpers tests
files := $(shell find . -name "*.py")
package_name="spark8t"

# Uncomment to store cache installation in the environment
# package_dir := $(shell python -c 'import site; print(site.getsitepackages()[0])')
Expand Down
92 changes: 92 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# spark8t toolkit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a header mentioning that we are hiring and folks should apply at https://canonical.com/careers


A set of Python scripts facilitating Spark interactions over Kunernetes, using an OCI image.

## Description

The main purpose of the `spark8t` toolkit is to provide a seemless, user-friendly interface
to Spark functionalities over Kubernetes. As much for administator tasks (such as account registration)
or data scientist functions (such as job submission or Spark interactive shell access). Various
wrapper scripts allow for persistent (and user-friendly) configuration and execution of related tools.

## Dependencies and Requirements

- *Kubernetes*
- *Apache Spark*

## Installation

Below we describe the essential steps on how to set up a Spark cluster together with the `spark8t` tool.

(However note that most of the "hassle" desribed below can be saved, in case you choose to use the
[canonical/spark-client-snap](canonical/spark-client-snap) Snap installation, that would both install
dependencies, both prepare critical parts of the environment for you.)

### Kubernetes

In order to be able to run Spark on Kubernetes, you'll sure need to have a Kubernetes cluster installed :-)

A simple installation of a lightweight Kubernetes implementation (Canonical's `microk8s`) can
be found in our [Discourse Spark
Tutorial](https://discourse.charmhub.io/t/spark-client-snap-tutorial-setup-environment/8951)

Keep in mind to set the following environment variable:

- `KUBECONFIG`: the location of the Kubernetes cluster configuration (typically: /home/$USER/.kube/config)

### Spark

You will need to install Spark as instructed at the official [Apache Spark pages](https://spark.apache.org/downloads.html).

Related settings:

- `SPARK_HOME`: location of your Spark installation

### spark8t

You could install the contents of this repository either by direct checkout, or using `pip` such as

```
pip insatll git+https://github.com/canonical/spark-k8s-toolkit-py.git
```

You'll need to add a mandatory configuration for the tool, which points to the OCI image to be used for the Spark workers.
The configuration file must be called `spark-defaults.conf`, and could have a list of contents according to possible
Spark-accepted command-line parameters. However the following specific one has to be defined:

```
spark.kubernetes.container.image=ghcr.io/canonical/charmed-spark:<version>
```

(See the [Spark ROCK releases GitHub page](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) for available versions)

Then you would need to assign the correct values for the following `spark8t` environment variables:

- `SPARK_CONFS`: location of the `spark8t` configuration file
- `HOME`: the home of the Spark user (typically: `/home/spark`)
- `SPARK_USER_DATA`: the location of Spark user data, such as interactive shell history (typically: same as `HOME`)
deusebio marked this conversation as resolved.
Show resolved Hide resolved

## Basic Usage

`spark8t` is "built around" Spark itself, thus the usage is very similar to the known Spark client tools.
juditnovak marked this conversation as resolved.
Show resolved Hide resolved

The toolkit offers access to Spark functionalities via two interfaces:

- interactive CLI
- programmatic access via the underlying Python library

We provide the following functionalities (see related documentation on Discourse):

- [management of the Account Registry](https://discourse.charmhub.io/t/spark-client-snap-tutorial-manage-spark-service-accounts/8952)
juditnovak marked this conversation as resolved.
Show resolved Hide resolved
- [job submission](https://discourse.charmhub.io/t/spark-client-snap-tutorial-spark-submit/8953)
- [interactive shell (Python, Scala)](https://discourse.charmhub.io/t/spark-client-snap-tutorial-interactive-mode/8954)
- [programmatic access](https://discourse.charmhub.io/t/spark-client-snap-how-to-python-api/8958)

## Contributing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add sections on submitting bugs and feedback; and on reporting security issues, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grobbie @deusebio Is there a good "template" for that? I mean a well-formed text we already have in Canonical?


Canonical welcomes contributions to the `spark8t` toolkit. Please check out our [guidelines](./CONTRIBUTING.md) if you're interested in contributing to the solution. Also, if you truly enjoy working on open-source projects like this one and you would like to be part of the OSS revolution, please don't forget to check out the [open positions](https://canonical.com/careers/all) we have at [Canonical](https://canonical.com/).

## License
The `spark8t` toolkit is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information.

See [LICENSE](LICENSE) for more information.
Loading