# Data Owner 02

1. DO logs into the datasite as an admin
2. DO creates the dataset 
3. DO reviews and run jobs submitted by data scientists on DO's private data

## 1. DO2 logs into the datasite as admin

<img src="../images/do2LogsIntoLocalDatasite.png" width="70%" alt="DO2 logs into local datasite">

In [None]:
from pathlib import Path

from syft_rds.orchestra import setup_rds_server

DO_EMAIL = "do2@openmined.org"
do2 = setup_rds_server(email=DO_EMAIL, root_dir=Path("."), key="local_syftbox_network")
do2 = do2.init_session(host=DO_EMAIL)

In [None]:
do2.is_admin

## 2. DO2 creates dataset

DO2 also prepares its diabetes dataset with mock (fake / synthetic) part and real, private part  

<img src="../images/datasetPartition1.png" width="30%" alt="partitioned dataset 1">

In [None]:
from pathlib import Path

from huggingface_hub import snapshot_download

DATASET_DIR = Path("../dataset/").expanduser().absolute()

if not DATASET_DIR.exists():
    snapshot_download(
        repo_id="khoaguin/pima-indians-diabetes-database-partitions",
        repo_type="dataset",
        local_dir=DATASET_DIR,
    )

partition_number = 1
DATASET_PATH = DATASET_DIR / f"pima-indians-diabetes-database-{partition_number}"
DATASET_PATH

DO2 also creates a syft dataset, where the mock part is uploaded to the datasite and is public to the SyftBox network, and the private part stays local (never get shared)

<img src="../images/do2CreatesADataset.png" width="70%" alt="partitioned dataset">

In [None]:
dataset = do2.dataset.create(
    name="pima-indians-diabetes-database",
    path=DATASET_PATH / "private",
    mock_path=DATASET_PATH / "mock",
    description_path=DATASET_PATH / "README.md",
    summary="Pima Indians Diabetes Database.",
)
dataset.describe()

## Review and Run Jobs

In [None]:
jobs = do2.jobs.get_all(status="pending_code_review")
jobs

In [None]:
job = jobs[-1]
job

In [None]:
# same as job.code.describe()
job.show_user_code()

In [None]:
import os

os.environ["SYFTBOX_CLIENT_CONFIG_PATH"] = str(do2.client.config_path)

res_job = do2.run_private(job)

By running the job privately, the DO trains the model on their local data, and then sends the trained model back to the DS

<img src="./images/doSendModels.png" width="80%" alt="DS Sends Models">