# Example usage on HPE

### 1. Connect to the spark session

### 2. Download the source code by the procedure of [README.md](../README.md)

In [8]:
%%bash

git init
git remote add pysparkchannel https://github.com/HinnyTsang/pysparkchannel.git
git config core.sparseCheckout true
echo "pysparkchannel" > .git/info/sparse-checkout
git pull pysparkchannel main

rm -rf .git

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>


Initialized empty Git repository in /home/hinny/data-science/pysparkchannel/example/.git/


From https://github.com/HinnyTsang/pysparkchannel
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> pysparkchannel/main
	pysparkchannel/__init__.py
	pysparkchannel/core.py
	pysparkchannel/script.py
	pysparkchannel/utils.py

After fixing the above paths, you may want to run `git sparse-checkout reapply`.


### 3. Run the following cells

The example will regenerate `module_a` and `module_b` in `dest` directory.

In [3]:
# %%local
# Run this cell in local

# Import the pysparkchannel module
import os
import shutil
from pysparkchannel.core import ModuleParser

# Remove file if exist
if os.path.exists("dest"):
    shutil.rmtree("dest", ignore_errors=True)

# Create the parser
parser = ModuleParser(verbose = True)

# Parse the custom module one by one.
parser \
    .parse_module("module_a") \
    .parse_module("module_b")

# Generate script to rewrite the module on spark cluster
script = parser.generate_script("dest")


Parsing module: module_a
|--  module_a
   |__  __init__.py

Parsing module: module_b
|--  module_b
   |--  sub_module_i
      |__  sub_module_ii.py
      |__  __init__.py
   |__  module_bi.py
   |__  __init__.py


### 4. Use spark magic to copy the script to the spark session

In [4]:
# %%send_to_spark -i script -t str -n script

### 5. Run the script on the spark session to regenerate the modules

In [5]:
# %%spark

# Execute the script to rebuild the modules
exec(script)

Reconstructing folder dest/module_a
Reconstructing file dest/module_a/__init__.py
Reconstructing folder dest/module_b
Reconstructing folder dest/module_b/sub_module_i
Reconstructing file dest/module_b/sub_module_i/sub_module_ii.py
Reconstructing file dest/module_b/sub_module_i/__init__.py
Reconstructing file dest/module_b/module_bi.py
Reconstructing file dest/module_b/__init__.py
