# Data Owner 01

Outline of what DO1 will do

1. DO logs into the datasite as an admin
2. DO creates the dataset 
3. DO reviews and run jobs submitted by data scientists on DO's private data

## 0. Setup local syftbox network for local experimentations

This will set up a local syftbox directory structures to test the whole flow locally under `./local_syftbox_network`, where eventually when all 3 clients have setup their datasites, it will look like below

<img src="../images/localSyftBoxNetwork.png" width="20%" alt="DO waits for jobs">

In [None]:
import os
from pathlib import Path

from syft_rds.orchestra import remove_rds_stack_dir, setup_rds_server

remove_rds_stack_dir(root_dir=Path("."), key="local_syftbox_network")

DO_EMAIL = "do1@openmined.org"
do_stack = setup_rds_server(
    email=DO_EMAIL, root_dir=Path("."), key="local_syftbox_network"
)

os.environ["SYFTBOX_CLIENT_CONFIG_PATH"] = str(do_stack.client.config_path)

## 1. DO logs into the datasite as admin

<img src="../images/do1LogsIntoLocalDatasite.png" width="60%" alt="DO1 logs into local datasite">

In [None]:
do1 = do_stack.init_session(host=DO_EMAIL)

In [None]:
do1.is_admin

## 2. DO1 creates a dataset

First, DO1 prepares a diabetes dataset with mock (fake / synthetic) part and real, private part  

<img src="../images/datasetPartition0.png" width="30%" alt="partitioned dataset 0">

In [None]:
from pathlib import Path

from huggingface_hub import snapshot_download

DATASET_DIR = Path("../dataset/").expanduser().absolute()

if not DATASET_DIR.exists():
    snapshot_download(
        repo_id="khoaguin/pima-indians-diabetes-database-partitions",
        repo_type="dataset",
        local_dir=DATASET_DIR,
    )

partition_number = 0
DATASET_PATH = DATASET_DIR / f"pima-indians-diabetes-database-{partition_number}"
DATASET_PATH

DO1 creates a syft dataset, where the mock part is uploaded to the datasite and is public to the SyftBox network, and the private part stays local (never get shared)


<img src="../images/do1CreatesADataset.png" width="70%" alt="partitioned dataset">

In [None]:
dataset = do1.dataset.create(
    name="pima-indians-diabetes-database",
    path=DATASET_PATH / "private",
    mock_path=DATASET_PATH / "mock",
    description_path=DATASET_PATH / "README.md",
)
dataset.describe()

## 3. Review and Run Jobs

<img src="./images/doWaitsForJobs.png" width="40%" alt="DO waits for jobs">

By running the job privately, the DO trains the model on their local data, and then sends the trained model back to the DS

<img src="./images/doSendModels.png" width="80%" alt="DS Sends Models">

In [None]:
jobs = do1.jobs.get_all(status="pending_code_review")
jobs

In [None]:
job = jobs[-1]
job

In [None]:
# same as job.code.describe()
job.show_user_code()

In [None]:
res_job = do1.run_private(job)