#### Data Owner uploads the dataset on the domain

- The user log into the domian
- The user loads the dataset to be uploaded into memory
- The user converts the data to tensors
- The user converts the tensors to private tensors via the DP wizard
- The user creates metadata (This is a dict containing any additional information to be shared.)
- User uploads the dataset:
    - assets: dict of private tensors to be uploaded to the domain
    - description: description of the dataset
    - name: name of the dataset
    - metadata: a dictionary containing any additional public information to be shared

In [None]:
import syft as sy

In [None]:
# Let's login into the domain

usa_domain = sy.login(
    url="https://usa.openmined.org",
    email="wadewilson@canada.com",
    password="supersecretpassword",
)

In [15]:
# Let's load the dataset into memory

import os
import pydicom
import pandas as pd


label_mapping = {
    "Pneumothorax": 1,
    "No Pneumothorax": 0,
}

data = pd.read_csv("data/siim_small/labels.csv")


image_data = []
label_data = []

ROOT_PATH = "/home/user/Documents/myfolder/pysyft/PySyft/notebooks/course3/data/siim_small/"
for idx in range(data.shape[0]):
    img_path = data["file"][idx]
    label = data["label"][idx]
    label = label_mapping.get(label)
    img_path = os.path.join(ROOT_PATH, img_path)
    img = pydicom.dcmread(img_path)
    image_data.append(im.pixel_array).astype(np.int32)
    label_data.append(label)
    
# Let's convert the numpy array to tensors
image_tensors = sy.Tensor(image_data)
label_tensors = sy.Tensor(label_data)

In [None]:
# Let's make the data private
image_tensors = image_tensors.annotate_with_dp_metadata(lower_bound=0, upper_bound=256)
label_tensors = label_tensors.annotate_with_dp_metadata(lower_bound=0, upper_bound=1)

ALERT: You didn't pass in any entities. Launching entity wizard...

	=====================================================================
	Welcome to the Data Subject Annotation Wizard!!!
	=====================================================================

	You've arrived here because you called Tensor.annotate_with_dp_metadata() without
	passing in any entities! Since the purpose of .annotate_with_dp_metadata() is to add
	metadata for the support of automatic differential privacy budgeting,
	you need to describe which parts of your Tensor correspond to which
	real-world data subjects (entities) whose privacy you want to
	protect. This is the only way the system knows, for example, that it
	costs twice as much privacy budget when twice as much of your data
	(say, 2 rows instead of 1 row) refer to the same entity.

	Entities can be people (such as a medical patient), places (such as a
	family's address), or even organizations (such as a business, state,
	or country). If you're not sure what kind of entity to include, just
	ask yourself the question, "who am I trying to protect the privacy
	of?". If it's an organization, make one entity per organization. If
	it's people, make one entity per person. If it's a group of people
	who are somehow similar/linked to each other (such as a family), make
	each entity a different group. For more information on differential
	privacy, see OpenMined's course on the subject:
	https://courses.openmined.org/

	Since you didn't pass in entities into .annotate_with_dp_metadata() (or you did so
	incorrectly), this wizard is going to guide you through the process
	of annotating your data with entities.

	In this wizard, we're going to ask you for *unique identifiers* which
	refer to the entities in your data. While the unique identifiers need
	not be personal data (they can be random strings of letters and
	numbers if you like). It is ESSENTIAL that you use the same
	identifier when referring to the same entity in the data that you
	never accidentally refer to two entities by the same identifier.
	Additionally, if you plan to do any kind of data JOIN with another
	dataset, it is ESSENTIAL that you are using the same unique
	identifiers for entities as the data you're joining with. Since these
	unique identifiers may be personal information, PySyft might not be
	able to detect if two tensors are using different identifiers for the
	same person.

	So, in this tutorial we're going to be asking you to specify Unique
	Identifiers (UIDs) for each entity in your data. This could be an
	email, street address, or any other string that identifies someone
	uniquely in your data and in the data you intend to use with your
	data (if any).

	Do you understand, and are you ready to proceed? (yes/no)

	 yes


	Excellent! Let's begin!

	---------------------------------------------------------------------

	Question 1: Is this entire tensor referring to the same entity?

	Examples:
	 - a single medical scan of one patient
	 - a single spreadsheet of proprietary statistics about a business
	 - a tensor of facts about a country

	(if the tensor is about one entity, but it also contains multiple
	other entities within, such as a tensor about all the customers of
	one business, ask yourself, are you trying to protect the people or
	the business)

	If yes, write the UID of the entity this data is about, otherwise
	write 'no'  because this data is about more than one entity.

	 no


	---------------------------------------------------------------------

	Question 2: Does each row correspond to an entity, perhaps with
	occasional repeats (yes/no)?

	 no


	---------------------------------------------------------------------

	Question 3: Is your data one entity for every column (yes/no)?

	 no


	It sounds like your tensor is a random assortment of entities (and
	perhaps empty/non-entities). If you have empty values, just create
	random entities for them for now. If you have various entities
	scattered throughout your tensor (not organized by row), then you'll
	need to pass in a np.ndarray of strings which is identically shaped
	to your data in entities like so:


	_____________________________________________________________________

In [None]:
# Let's create the metadata required 
metadata = {
    "label_mapping": label_mapping,
}

In [32]:
# Let's load the dataset onto the domain

ca.load_dataset(
    assets={"imageData": image_tensors, "labels": label_tensors},
    name="SIIM-ACR Pneumothorax Segmentation",
    description="Pneumothorax is usually diagnosed by a radiologist on a chest x-ray, and can sometimes be very difficult to confirm. An accurate AI algorithm to detect pneumothorax would be useful in a lot of clinical scenarios.",
    metadata=metadata,
)


Loading dataset... checking asset types...                                                                                                                                    


This means you'll need to manually approve any requests which leverage this data. If this is ok with you, proceed. If you'd like to use automatic differential privacy budgeting, please pass in a DP-compatible tensor type such as by calling annotate_with_dp_metadata() on a sy.Tensor with a np.int32 or np.float32 inside.

Are you sure you want to proceed? (y/n) y

Loading dataset... uploading... SUCCESS!                        

Run <your client variable>.datasets to see your new dataset loaded into your machine!




#### Dummy Date

In [2]:
import pandas as pd
from enum import Enum
import uuid
import torch
import datetime
import json
import numpy as np


class bcolors(Enum):
    HEADER = "\033[95m"
    OKBLUE = "\033[94m"
    OKCYAN = "\033[96m"
    OKGREEN = "\033[92m"
    WARNING = "\033[93m"
    FAIL = "\033[91m"
    ENDC = "\033[0m"
    BOLD = "\033[1m"
    UNDERLINE = "\033[4m"

In [4]:
logintodomain = """
    Connecting to http://ca.openmined.org... done! 	 Logging into Canada... done!
"""

In [31]:
loading_data = """
Loading dataset... checking asset types...                                                                                                                                    

WARNING - Non-DP Asset: You just passed in a asset 'More Trade' which cannot be tracked with differential privacy because it is a <class 'pandas.core.frame.DataFrame'> object.

This means you'll need to manually approve any requests which leverage this data. If this is ok with you, proceed. If you'd like to use automatic differential privacy budgeting, please pass in a DP-compatible tensor type such as by calling .annotate_with_dp_metadata() on a sy.Tensor with a np.int32 or np.float32 inside.

Are you sure you want to proceed? (y/n) y

Loading dataset... uploading... SUCCESS!                        

Run <your client variable>.datasets to see your new dataset loaded into your machine!

"""