(Ishan + Phil)

#### Perform an Analysis on the Dataset from Lesson 4

We're going to access a set of medical images (the NIH chest x-ray dataset) and perform some common
data science operations.

In [None]:
import syft as sy
import numpy as np
import matplotlib.pyplot as plt
import os
import pydicom
import pandas as pd

In [None]:
# Let's login to your new domain
# This assumes we have a local domain (as course users will)
domain = sy.login(
    url="http://localhost",
    email="info@openmined.org",
    password="changethis",
    port=8081
)

### Our dataset

Let's take a second look at our image dataset. We'll:
1. Visualize an image
2. Check some label statistics (what is the ratio of positive cases in our sample?)
3. Check key image properties (what are the pixel ranges and average pixel value in our sample?)

In [None]:
## this code would generate the dataset we're using below
data = pd.read_csv("rsna chest demo set/sample_data.csv")

image_data = []
label_data = []

ROOT_PATH = "rsna chest demo set/"
for idx in range(10):
    img_path = data["patientId"][idx] + ".dcm"
    label = data["Target"][idx]
    img_path = os.path.join(ROOT_PATH, img_path)
    img = pydicom.dcmread(img_path)    
    ## downsampling image for performance reasons
    image_data.append(img.pixel_array[::8, ::8].astype(np.int32))
    label_data.append(label)
    
# Let's convert the numpy array to tensors
image_tensors = sy.Tensor(image_data)
label_tensors = sy.Tensor(label_data)

# Let's make the data private
image_tensors = image_tensors.private(min_val=0, max_val=256, entities=[str(s) for s in range(image_tensors.shape[0])])
label_tensors = label_tensors.private(min_val=0, max_val=1, entities=[str(s) for s in label_data])

metadata = {
    "label_mapping":label_mapping
}

domain.load_dataset(
    assets={"imageData": image_tensors, "labels": label_tensors},
    name="SIIM-ACR Pneumothorax Segmentation",
    description="Pneumothorax is usually diagnosed by a radiologist on a chest x-ray, and can sometimes be very difficult to confirm. An accurate AI algorithm to detect pneumothorax would be useful in a lot of clinical scenarios.",
    metadata="No metadata",
)

#### Visualizing an image

In [None]:
plt.imshow(image_data[1])

#### Label stats

In [None]:
np.mean(label_data)

#### Image properties

In [None]:
np.mean(image_data), np.min(image_data), np.max(image_data)

### List datasets on domain

In [None]:
domain.datasets[-1]

### Access the dataset

Here we select the dataset we want by index (although it's also possible to select it by Id).

In [None]:
siim_images = domain.datasets[7]["imageData"]
siim_labels = domain.datasets[7]["labels"]
siim_images, siim_labels

### Operations on private dataset

Now we'll perform the same operations on our private dataset as we performed on the unobscured dataset above. Note that now we need to add `sigma` and explicitly request access to the data.

In [None]:
image = siim_images[0]
published_result = image.publish(sigma=4)
plt.imshow(published_result.get())

In [None]:
positive_rate = siim_labels.mean()
published_result = positive_rate.publish(sigma=10)
published_result.get()

In [None]:
max_image_pixel = siim_images.mean()
published_result = max_image_pixel.publish(sigma=5)
published_result.get()