# Data Owner 01

Outline of what DO1 will do

0. Setup local syftbox network for local experimentations (only needed for the local testing case)
1. DO logs into the datasite as an admin and creates a Syft dataset 
2. DO reviews and run jobs submitted by data scientists on DO's private data

## 0. Setup local syftbox network for local experimentations

This will set up a local syftbox directory structures to test the whole flow locally under `./local_syftbox_network`, where eventually when all 3 clients have setup their datasites, it will look like below
<img src="../images/localSyftBoxNetwork.png" width="20%" alt="local SyftBox network">

In [None]:
import os
from pathlib import Path

from syft_rds.orchestra import remove_rds_stack_dir, setup_rds_server

remove_rds_stack_dir(root_dir=Path("."), key="local_syftbox_network")

DO_EMAIL = "do1@openmined.org"
do_stack = setup_rds_server(
    email=DO_EMAIL, root_dir=Path("."), key="local_syftbox_network"
)

os.environ["SYFTBOX_CLIENT_CONFIG_PATH"] = str(do_stack.client.config_path)

## 1. DO logs into the datasite as admin and creates a corpus dataset

First, DO1 prepares a local dataset (`statpearls`) with a mock (fake / synthetic) part and a real, private part  

<img src="../images/do1PreparesDataset.png" width="33%" alt="do1 prepares a Syft dataset">

In [None]:
from huggingface_hub import snapshot_download

DATASET_DIR = Path("../datasets/").expanduser().absolute()
CORPUS_NAME = "statpearls"
DATASET_PATH = DATASET_DIR / CORPUS_NAME

if not DATASET_PATH.exists():
    snapshot_download(
        repo_id="khoaguin/medical-corpus",
        repo_type="dataset",
        local_dir=DATASET_DIR,
        allow_patterns=f"{CORPUS_NAME}/*",
    )

In [None]:
MOCK_PATH = DATASET_PATH / "mock"
PRIVATE_PATH = DATASET_PATH / "private"
README_PATH = DATASET_PATH / "README.md"

assert MOCK_PATH.exists()
assert PRIVATE_PATH.exists()
assert README_PATH.exists()

DO1 creates a Syft dataset, where the mock part is uploaded to the datasite and is public to the SyftBox network, and the private part stays local (never get shared)

<img src="../images/do1UploadsDataset.png" width="45%" alt="do1 creates a Syft dataset">

In [None]:
do1 = do_stack.init_session(host=DO_EMAIL)

print(f"DO1 is an admin to the datasite: {do1.is_admin}")

In [None]:
dataset = do1.dataset.create(
    name=CORPUS_NAME,
    path=PRIVATE_PATH,
    mock_path=PRIVATE_PATH,
    description_path=README_PATH,
)
dataset.describe()

DO1 now waits for jobs from some data scienists

<img src="../images/do1WaitsForJobs.png" width="20%" alt="do waiting for jobs">

## 2. Review and Run Jobs

After the DS submits a job, the DO sees that it has appeared on their datasite, and can review it

<img src="../images/do1ReviewsJob.png" width="40%" alt="do1 gets and reviews jobs">

In [None]:
jobs = do1.job.get_all(status="pending_code_review")
jobs

In [None]:
job = jobs[0]
job

In [None]:
# same as job.code.describe()
job.show_user_code()

By running `run_private(job)`, the DO1 runs the `syft_flwr` client code on the private dataset, retrieves the relevant documents and send them to the DS

In [None]:
res_job = do1.run_private(job)

<video width="90%" controls>
  <source src="../images/fedrag-rds.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>