# Step 4: Tissue MNIST Upload a Dataset

## Step 4a: Log into our Domain

In [6]:
import syft as sy
from utils import *

In [7]:
# Let's log into the domain using the credentials
ADMIN_EMAIL = "info@openmined.org"
ADMIN_PASSWORD = "changethis"
DOMAIN1_PORT = 8081
domain_client = sy.login(
     email=ADMIN_EMAIL, password=ADMIN_PASSWORD,port = DOMAIN1_PORT
)


Anyone can login as an admin to your node right now because your password is still the default PySyft username and password!!!

Connecting to localhost... done! 	 Logging into canada... done!


## Step 4b: Creating a Dataset

In [8]:
# edit MY_DATASET_URL then run this cell
MY_DATASET_URL = "https://raw.githubusercontent.com/OpenMined/datasets/main/TissueMNIST/subsets/TissueMNIST-e6916fbe07ec4302be04779d346e8a94.pkl"
print("My Dataset URL: ", MY_DATASET_URL)
dataset = download_dataset(MY_DATASET_URL)
dataset.head()

My Dataset URL:  https://raw.githubusercontent.com/OpenMined/datasets/main/TissueMNIST/subsets/TissueMNIST-e6916fbe07ec4302be04779d346e8a94.pkl
TissueMNIST-e6916fbe07ec4302be04779d346e8a94.pkl is successfully downloaded.
Columns: Index(['patient_ids', 'images', 'labels'], dtype='object')
Total Images: 2363
Label Mapping {'Collecting Duct, Connecting Tubule': 0, 'Distal Convoluted Tubule': 1, 'Glomerular endothelial cells': 2, 'Interstitial endothelial cells': 3, 'Leukocytes': 4, 'Podocytes': 5, 'Proximal Tubule Segments': 6, 'Thick Ascending Limb': 7}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1893k  100 1893k    0     0  10.7M      0 --:--:-- --:--:-- --:--:-- 10.6M


Unnamed: 0,patient_ids,images,labels
0,55614,"[[18, 24, 65, 110, 113, 90, 74, 68, 43, 29, 14...",4
1,55614,"[[5, 6, 9, 12, 10, 9, 15, 24, 33, 28, 28, 31, ...",1
2,55614,"[[15, 11, 10, 11, 10, 6, 6, 9, 9, 11, 12, 12, ...",5
3,55614,"[[6, 5, 5, 5, 5, 5, 4, 4, 8, 7, 6, 5, 5, 6, 6,...",7
4,55614,"[[31, 14, 13, 18, 11, 7, 13, 18, 11, 11, 12, 1...",0


In [9]:
# run this cell
train, val, test = split_and_preprocess_dataset(data=dataset)

Splitting dataset into train, validation and test sets.
Preprocessing the dataset...
Preprocessing completed.


In [10]:
# run this cell
def make_private_tensors(split):
    data_subjects = DataSubjectList.from_series(split["patient_ids"])
    return (
        sy.Tensor(split["images"]).private(min_val=0, max_val=255, data_subjects=data_subjects),
        sy.Tensor(split["labels"]).private(min_val=0, max_val=7, data_subjects=data_subjects)
    )

train_image_data, train_label_data = make_private_tensors(train)
val_image_data, val_label_data = make_private_tensors(val)
test_image_data, test_label_data = make_private_tensors(test)
print("Data is now Private")

Data is now Private


### Load the Dataset

## Step 4c: Upload the Dataset

In [11]:
# run this cell
domain_client.load_dataset(
    name="TissueMNIST",
    assets={
        "train_images": train_image_data,
        "train_labels": train_label_data,
        "val_images": val_image_data,
        "val_labels": val_label_data,
        "test_images": test_image_data,
        "test_labels": test_label_data,
    },
    description="This dataset is a modified form of TissueMNIST which is made available from the Broad Bioimage Benchmark Collection."
)

Loading dataset... checking asset types...                              


Loading dataset... uploading...üöÄ                        



Dataset is uploaded successfully !!! üéâ

Run `<your client variable>.datasets` to see your new dataset loaded into your machine!


Now let's check if the dataset we successfully uploaded

In [13]:
domain_client.datasets[-1]

Dataset: TissueMNIST
Description: This dataset is a modified form of TissueMNIST which is made available from the Broad Bioimage Benchmark Collection.



Asset Key,Type,Shape
"[""train_images""]",int64,"(1635, 784)"
"[""train_labels""]",int64,"(1635,)"
"[""val_images""]",int64,"(221, 784)"
"[""val_labels""]",int64,"(221,)"
"[""test_images""]",int64,"(507, 784)"
"[""test_labels""]",int64,"(507,)"


## Step 4d: Create a Data Scientist Account

In [15]:
data_scientist_details = {
    "name": "Samantha Carter",
    "email": "sam@sg1.net",
    "password": "stargate",
    "budget": 9999,
}

In [None]:
domain_client.users.create(**data_scientist_details)

In [None]:
print("Please give these details to the data scientist üëáüèΩ")
login_details = {}
login_details["url"] = 8081
login_details["name"] = data_scientist_details["name"]
login_details["email"] = data_scientist_details["email"]
login_details["password"] = data_scientist_details["password"]
login_details["dataset_name"] = name
print()
print(login_details)