# Data Owner 01

Outline of what DO1 will do

0. Run the `syftbox client` in a terminal (or the SyftBox UI app)
1. DO logs into the datasite as an admin
2. DO creates a Syft dataset 
3. DO reviews and run jobs submitted by data scientists on DO's private data

## 0. DO runs the `syftbox client` in a terminal or the SyftBox UI app 
The CLI syftbox client can be installed with a single command: `curl -fsSL https://syftbox.net/install.sh | sh`. The SyftUI app can be installed from `https://www.syftbox.net/`:

<img src="../images/syftboxnet.png" width="41%" alt="syftbox.net">

This will set up a SyftBox directory, where by default, it's under the `~/SyftBox` folder 

<img src="../images/SyftBoxNetwork.png" width="20%" alt="SyftBox network">

## 1. DO logs into the datasite as admin

<img src="../images/do1LogsInSyftBoxDatasite.png" width="67%" alt="DO1 logs into SyftBox datasite">

In [None]:
import syft_rds as sy
from loguru import logger
from syft_core import Client

do1_email = Client.load().email
logger.info(f"DO1 email: {do1_email}")
do1 = sy.init_session(host=do1_email, start_rds_server=True)

In [None]:
do1.is_admin

## 2. DO1 creates a dataset

First, DO1 prepares a diabetes dataset with mock (fake / synthetic) part and real, private part  

<img src="../images/datasetPartition0.png" width="30%" alt="partitioned dataset 0">

In [None]:
from pathlib import Path

from huggingface_hub import snapshot_download

DATASET_DIR = Path("../dataset/").expanduser().absolute()

if not DATASET_DIR.exists():
    snapshot_download(
        repo_id="khoaguin/pima-indians-diabetes-database-partitions",
        repo_type="dataset",
        local_dir=DATASET_DIR,
    )

partition_number = 0
DATASET_PATH = DATASET_DIR / f"pima-indians-diabetes-database-{partition_number}"
DATASET_PATH

DO1 creates a syft dataset, where the **mock part is uploaded to the datasite and is public** to the SyftBox network, and **the private part always stays local (never get shared)**


<img src="../images/do1CreatesSyftDataset.png" width="55%" alt="partitioned dataset">

In [None]:
try:
    dataset = do1.dataset.create(
        name="pima-indians-diabetes-database",
        path=DATASET_PATH / "private",
        mock_path=DATASET_PATH / "mock",
        description_path=DATASET_PATH / "README.md",
    )
    dataset.describe()
except ValueError as e:
    logger.error(f"Dataset already exists: {e}")
    dataset = do1.dataset.get(name="pima-indians-diabetes-database")
    dataset.describe()
except Exception as e:
    logger.error(f"An unexpected error occurred: {e}")

In [None]:
# Optional: Clean up old jobs
do1.job.delete_all()

<img src="../images/doWaitsForJobs.png" width="40%" alt="do waiting for jobs">

## 3. Review and Run Jobs

After the DS submits a job, the DO sees that it has appeared on their datasite, and can review it

<img src="../images/do1ReviewsJob.png" width="61%" alt="do waiting for jobs">

In [None]:
jobs = do1.job.get_all(status="pending_code_review")
jobs

In [None]:
job = jobs[0]
job

In [None]:
# same as job.code.describe()
job.show_user_code()

By running `run_private(job)`, the DO1 runs the `syft_flwr` client code that trains the model received from the aggregator on their private data and then sends the updated model back to the aggregator. This happens for multiple rounds

<video width="90%" controls>
  <source src="../images/fed-analytics.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>

In [None]:
res_job = do1.run_private(job)