canonical · deusebio · Jun 19, 2023 · Jun 8, 2023 · Jun 19, 2023 · Jun 19, 2023
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,61 @@
+# Contributing
+
+## Overview
+
+This documents explains the processes and practices recommended for contributing enhancements to this repository.
+
+- Generally, before developing enhancements to this charm, you should consider [opening an issue](https://github.com/canonical/spark-k8s-toolkit-py/issues) explaining your problem with examples, and your desired use case.
+- If you would like to chat with us about your use-cases or proposed implementation, you can reach us at [Data Platform Canonical Mattermost public channel](https://chat.charmhub.io/charmhub/channels/data-platform) or [Discourse](https://discourse.charmhub.io/).
+- All enhancements require review before being merged. Code review typically examines
+  - code quality
+  - test coverage
+  - user experience for interacting with the other components of the Charmed Spark solution.
+- Please help us out in ensuring easy to review branches by rebasing your pull request branch onto the `main` branch. This also avoids merge commits and creates a linear Git commit history.
+
+To build and develop the package in this repository, we advise to use [Poetry](https://python-poetry.org/). For installing poetry on different platforms, please refer to [here](https://python-poetry.org/docs/#installation).
+
+## Install from source
+
+To install the package with poetry, checkout the repository
+
+```bash
+git clone https://github.com/canonical/spark-k8s-toolkit-py.git
+cd spark-k8s-toolkit-py/
+```
+
+and run 
+
+```bash
+poetry install
+```
+
+## Developing
+
+When developing we advise you to use virtual environment to confine the installation of this package and its dependencies. Please refer to [venv](https://docs.python.org/3/library/venv.html), [pyenv](https://github.com/pyenv/pyenv) or [conda](https://docs.conda.io/en/latest/), for some tools that help you to create and manage virtual environments. 
+We also advise you to read how Poetry integrates with virtual environments [here](https://python-poetry.org/docs/managing-environments/).   
+
+The project uses [tox](https://tox.wiki/en/latest/) for running CI/CD pipelines and automation on different enviroments, whereas setup of python agnostic components can be done using the [Makefile](./Makefile). 
+
+You can create an environment for development with `tox`:
+
+```shell
+tox devenv -e integration
+source venv/bin/activate
+```
+
+### Testing
+
+Using tox you can also run several operations, such as
+
+```shell
+tox run -e fmt           # update your code according to linting rules
+tox run -e lint          # code style
+tox run -e unit          # unit tests
+tox run -e integration   # integration tests
+tox run -e all-tests     # unit+integration tests
+tox                      # runs 'lint' and 'unit' environments
+```
+
+## Canonical Contributor Agreement
+
+Canonical welcomes contributions to the Charmed Kafka Operator. Please check out our [contributor agreement](https://ubuntu.com/legal/contributors) if you're interested in contributing to the solution.
diff --git a/Makefile b/Makefile
@@ -8,6 +8,7 @@ PYTHON = poetry run
 
 folders := helpers tests
 files := $(shell find . -name "*.py")
+package_name="spark8t"
 
 # Uncomment to store cache installation in the environment
 # package_dir := $(shell python -c 'import site; print(site.getsitepackages()[0])')

diff --git a/README.md b/README.md
@@ -0,0 +1,92 @@
+# spark8t toolkit
+
+A set of Python scripts facilitating Spark interactions over Kunernetes, using an OCI image.
+
+## Description
+
+The main purpose of the `spark8t` toolkit is to provide a seemless, user-friendly interface
+to Spark functionalities over Kubernetes. As much for administator tasks (such as account registration)
+or data scientist functions (such as job submission or Spark interactive shell access). Various
+wrapper scripts allow for persistent (and user-friendly) configuration and execution of related tools.
+
+## Dependencies and Requirements
+
+ - *Kubernetes*
+ - *Apache Spark*
+
+## Installation
+
+Below we describe the essential steps on how to set up a Spark cluster together with the `spark8t` tool.
+
+(However note that most of the "hassle" desribed below can be saved, in case you choose to use the 
+[canonical/spark-client-snap](canonical/spark-client-snap) Snap installation, that would both install
+dependencies, both prepare critical parts of the environment for you.)
+
+### Kubernetes
+
+In order to be able to run Spark on Kubernetes, you'll sure need to have a Kubernetes cluster installed :-)
+
+A simple installation of a lightweight Kubernetes implementation (Canonical's `microk8s`) can
+be found in our [Discourse Spark
+Tutorial](https://discourse.charmhub.io/t/spark-client-snap-tutorial-setup-environment/8951)
+
+Keep in mind to set the following environment variable:
+
+ - `KUBECONFIG`: the location of the Kubernetes cluster configuration (typically: /home/$USER/.kube/config)
+
+### Spark
+
+You will need to install Spark as instructed at the official [Apache Spark pages](https://spark.apache.org/downloads.html).
+
+Related settings:
+
+ - `SPARK_HOME`: location of your Spark installation
+
+### spark8t
+
+You could install the contents of this repository either by direct checkout, or using `pip` such as
+
+```
+pip insatll git+https://github.com/canonical/spark-k8s-toolkit-py.git
+```
+
+You'll need to add a mandatory configuration for the tool, which points to the OCI image to be used for the Spark workers.
+The configuration file must be called `spark-defaults.conf`, and could have a list of contents according to possible
+Spark-accepted command-line parameters. However the following specific one has to be defined:
+
+```
+spark.kubernetes.container.image=ghcr.io/canonical/charmed-spark:<version>
+```
+
+(See the [Spark ROCK releases GitHub page](https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark) for available versions)
+
+Then you would need to assign the correct values for the following `spark8t` environment variables:
+
+ - `SPARK_CONFS`: location of the `spark8t` configuration file
+ - `HOME`: the home of the Spark user (typically: `/home/spark`)
+ - `SPARK_USER_DATA`: the location of Spark user data, such as interactive shell history (typically: same as `HOME`)
+
+## Basic Usage
+
+`spark8t` is "built around" Spark itself, thus the usage is very similar to the known Spark client tools.
+
+The toolkit offers access to Spark functionalities via two interfaces:
+
+ - interactive CLI
+ - programmatic access via the underlying Python library
+
+We provide the following functionalities (see related documentation on Discourse):
+
+- [management of the Account Registry](https://discourse.charmhub.io/t/spark-client-snap-tutorial-manage-spark-service-accounts/8952)
+- [job submission](https://discourse.charmhub.io/t/spark-client-snap-tutorial-spark-submit/8953)
+- [interactive shell (Python, Scala)](https://discourse.charmhub.io/t/spark-client-snap-tutorial-interactive-mode/8954)
+- [programmatic access](https://discourse.charmhub.io/t/spark-client-snap-how-to-python-api/8958)
+
+## Contributing
+
+Canonical welcomes contributions to the `spark8t` toolkit. Please check out our [guidelines](./CONTRIBUTING.md) if you're interested in contributing to the solution. Also, if you truly enjoy working on open-source projects like this one and you would like to be part of the OSS revolution, please don't forget to check out the [open positions](https://canonical.com/careers/all) we have at [Canonical](https://canonical.com/).  
+
+## License
+The `spark8t` toolkit is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information.
+
+See [LICENSE](LICENSE) for more information.