<a href="https://colab.research.google.com/github/JuanZapa7a/Medical-Image-Processing/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The `tf.keras.datasets` module provides access to a limited set of popular datasets like `MNIST, CIFAR-10, CIFAR-100,` etc. For more specialized datasets, such as medical image datasets, you need to use TensorFlow Datasets (TFDS) instead.

In [None]:
# @title
import tensorflow as tf

# List of available datasets in tf.keras.datasets
available_datasets = [
    'boston_housing',
    'cifar10',
    'cifar100',
    'fashion_mnist',
    'imdb',
    'mnist',
    'reuters'
]

# Print available datasets
print("Available datasets in tf.keras.datasets:")
for dataset in available_datasets:
    print(dataset)

Available datasets in tf.keras.datasets:
boston_housing
cifar10
cifar100
fashion_mnist
imdb
mnist
reuters


These datasets are commonly used for various machine learning tasks:

1. **boston_housing**: Boston housing price regression dataset.
2. **cifar10**: CIFAR-10 small images classification dataset.
3. **cifar100**: CIFAR-100 small images classification dataset.
4. **fashion_mnist**: Fashion-MNIST images classification dataset.
5. **imdb**: IMDB movie reviews sentiment classification dataset.
6. **mnist**: MNIST handwritten digits classification dataset.
7. **reuters**: Reuters newswire topics classification dataset.

You can load any of these datasets using the `load_data` method. For example, to load the MNIST dataset:


These datasets are commonly used for various machine learning tasks:

1. **boston_housing**: Boston housing price regression dataset.
2. **cifar10**: CIFAR-10 small images classification dataset.
3. **cifar100**: CIFAR-100 small images classification dataset.
4. **fashion_mnist**: Fashion-MNIST images classification dataset.
5. **imdb**: IMDB movie reviews sentiment classification dataset.
6. **mnist**: MNIST handwritten digits classification dataset.
7. **reuters**: Reuters newswire topics classification dataset.

You can load any of these datasets using the `load_data` method. For example, to load the MNIST dataset:

---

**Tip**: This approach provides a quick and easy way to access and utilize these popular datasets for your machine learning projects.

In [None]:
# @title

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()

# Print shape of the data
print("Training data shape:", train_images.shape, train_labels.shape)
print("Test data shape:", test_images.shape, test_labels.shape)

Training data shape: (60000, 28, 28) (60000,)
Test data shape: (10000, 28, 28) (10000,)


In [None]:
# @title
train, test = tf.keras.datasets.fashion_mnist.load_data()

Download Fashion-MNIST data into training and test sets. We use training data to teach the model. We use test data to evaluate the model. Fashion-MNIST is a dataset of Zalondo’s article images. It contains 60,000 training and 10,000 test examples. The dataset is intended to serve as a direct drop-in replacement of the original MNIST dataset for benchmarking machine learning algorithms.

Inspect :

In [None]:
# @title
type(train[0]), type(train[1])

(numpy.ndarray, numpy.ndarray)

**Training and test sets are tuples** where the first tuple element contains feature images and the second contains corresponding labels. **Both datasets (features and labels) are NumPy arrays**.

Load images and labels into variables:

In [None]:
# @title
train_img, train_lbl = train
test_img, test_lbl = test

For example, [0] images , [4000] image, [12] row, [20] column is:

In [None]:
# @title
print(train[0][4000][12][20]) # pixel value in 12,20 from image (index 0) number 4000

187


In [None]:
# @title
print(train_images[4000][12][20])

187


or:

In [None]:
# @title
print(train_img[4000][12][20])

187


and the [1] label for [4000] image is:

In [None]:
# @title
print(train[1][4000]) # label (index 1) value for label 4000

8


In [None]:
# @title
print(train_labels[4000])

8


or

In [None]:
# @title
print(train_lbl[4000])

8


By separating images and labels from the respective datasets, we can more easily process images and labels as needed.

Verify shapes:

In [None]:
# @title
# Print shape of the data
print("Training data shape:", train_images.shape, train_labels.shape)
print("Test data shape:", test_images.shape, test_labels.shape)

Training data shape: (60000, 28, 28) (60000,)
Test data shape: (10000, 28, 28) (10000,)


In [None]:
# @title
print ('train: data shape', train_img.shape, train_lbl.shape)
print ('test: data shape', test_img.shape, test_lbl.shape)

train: data shape (60000, 28, 28) (60000,)
test: data shape (10000, 28, 28) (10000,)


you can load data from TFDS and then split it into training and test sets in a way that is compatible with Keras. Here’s an example of how to do this with [Malaria TensorFlow Dataset](https://www.tensorflow.org/datasets/catalog/malaria?hl=es-419):

In [None]:
# @title
# Load the Malaria dataset
(ds_train, ds_test), ds_info = tfds.load(
    'malaria',
    split=['train[:80%]', 'train[80%:]'],  # Use 80% for training, 20% for testing
    shuffle_files=True,
    with_info=True,
    as_supervised=True,
)


# Example of accessing the dataset elements
for image, label in ds_train.take(1):
    print("Image shape:", image.shape)
    print("Label:", label)

Image shape: (103, 103, 3)
Label: tf.Tensor(0, shape=(), dtype=int64)


You can obtain information about the malaria dataset using the ds_info object returned when loading the dataset. This object contains metadata about the dataset, such as the number of classes, the number of examples in each set (train and test), the type of features, among others.

Here's an example of how to get information about the malaria dataset:

In [None]:
# @title
# Print information about the dataset
print("Dataset Information:")
print("Dataset Name:", ds_info.name)
print("Number of Classes:", ds_info.features['label'].num_classes)
print("Class Names:", ds_info.features['label'].names)
print("Number of Training Examples:", ds_info.splits['train[:80%]'].num_examples)
print("Number of Test Examples:", ds_info.splits['train[80%:]'].num_examples)
print("Feature Type:", ds_info.features['image'].np_dtype)  # Updated access to NumPy dtype

Dataset Information:
Dataset Name: malaria
Number of Classes: 2
Class Names: ['parasitized', 'uninfected']
Number of Training Examples: 22046
Number of Test Examples: 5512
Feature Type: <class 'numpy.uint8'>


In [None]:
# @title
ds_train, ds_test

(<_PrefetchDataset element_spec=(TensorSpec(shape=(None, None, 3), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>,
 <_PrefetchDataset element_spec=(TensorSpec(shape=(None, None, 3), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>)

`ds_train` is an object of type `tf.data.Dataset`, which is a data structure in TensorFlow used to represent a potentially infinite dataset. In this context, `ds_train` contains the training dataset loaded from TensorFlow Datasets (TFDS).

A `tf.data.Dataset` object consists of elements, where each element represents an example from the dataset. These elements can be tensors, tuples of tensors, or dictionaries of tensors, depending on how the dataset was configured when loaded.

In the provided code, `ds_train` is loaded using `tfds.load()` with `as_supervised=True`, indicating that each element of the dataset is a tuple `(image, label)` of tensors, where `image` is an image and `label` is the corresponding label. Therefore, `ds_train` is a `tf.data.Dataset` object composed of elements that are tuples of tensors.

In [None]:
# @title
import tensorflow_datasets as tfds

tfds.list_builders()

['abstract_reasoning',
 'accentdb',
 'aeslc',
 'aflw2k3d',
 'ag_news_subset',
 'ai2_arc',
 'ai2_arc_with_ir',
 'amazon_us_reviews',
 'anli',
 'answer_equivalence',
 'arc',
 'asqa',
 'asset',
 'assin2',
 'asu_table_top_converted_externally_to_rlds',
 'austin_buds_dataset_converted_externally_to_rlds',
 'austin_sailor_dataset_converted_externally_to_rlds',
 'austin_sirius_dataset_converted_externally_to_rlds',
 'bair_robot_pushing_small',
 'bc_z',
 'bccd',
 'beans',
 'bee_dataset',
 'beir',
 'berkeley_autolab_ur5',
 'berkeley_cable_routing',
 'berkeley_fanuc_manipulation',
 'berkeley_gnm_cory_hall',
 'berkeley_gnm_recon',
 'berkeley_gnm_sac_son',
 'berkeley_mvp_converted_externally_to_rlds',
 'berkeley_rpt_converted_externally_to_rlds',
 'big_patent',
 'bigearthnet',
 'billsum',
 'binarized_mnist',
 'binary_alpha_digits',
 'ble_wind_field',
 'blimp',
 'booksum',
 'bool_q',
 'bot_adversarial_dialogue',
 'bridge',
 'bucc',
 'c4',
 'c4_wsrs',
 'caltech101',
 'caltech_birds2010',
 'caltech_b

Training data consists of 22,046 151 × 115 x 3 feature images and 22,046 labels. Test data consists of 5512 151 × 115 x3 feature images and 5512 labels.
# Scale the tf.data.Dataset
Scale data for efficient processing and create the training (train_ds) and test (test_ds) sets:

In [None]:
# @title
train_image = train_img / 255.0
test_image = test_img / 255.0
train_ds = tf.data.Dataset.from_tensor_slices(
    (train_image, train_lbl))
test_ds = tf.data.Dataset.from_tensor_slices(
    (test_image, test_lbl))

Get slices of the NumPy arrays in the form of `tf.data.Dataset()` objects with `from_tensor_slices()`. Feature image pixel values are typically integers that range from 0 to 255. To scale, divide feature images by 255 to get pixel values that range from 0 to 1.

Scaling images is a critical preprocessing step because deep learning models train faster on smaller images. Moreover, many deep learning model architectures require that images are the same size. But raw images tend to vary in size.

Inspect training and test tensors:

In [None]:
# @title
train_ds, test_ds

(<TensorSliceDataset shapes: ((28, 28), ()), types: (tf.float64, tf.uint8)>,
 <TensorSliceDataset shapes: ((28, 28), ()), types: (tf.float64, tf.uint8)>)

Both datasets are `TensorSliceDataset` objects, which means that they are iterators. An **iterator** is an object that contains a countable number of examples that can be traversed with the `next()` method.

Display the **first label** from the training set:

In [None]:
# @title
next(train_ds.as_numpy_iterator())[1]

9

Each example in the training set contains an image matrix and its corresponding label. The `next()` method returns a tuple with the first image matrix and its label in positions 0 and 1 in the tuple respectively.

Display **ten labels** from the training set:

In [None]:
# @title
next(train_ds.batch(10).as_numpy_iterator())[1]

array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8)

The `batch()` method takes n examples from a dataset. Display all 60,000 labels from the training set:

In [None]:
# @title
labels = next(train_ds.batch(60_000).as_numpy_iterator())[1]
labels, len(labels)

(array([9, 0, 0, ..., 3, 0, 5], dtype=uint8), 60000)

Display the first image from the training set:

In [None]:
# @title
next(train_ds.as_numpy_iterator())[0]

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.    

Verify that the first image is a 28 × 28 matrix:

In [None]:
# @title
arrays = len(next(train_ds.as_numpy_iterator())[0])
pixels = len(next(train_ds.as_numpy_iterator())[0][0])
arrays, pixels

(28, 28)

or

In [None]:
# @title
next(train_ds.take(1).as_numpy_iterator())[0].shape

(28, 28)

To find dimensions of a matrix (tensor) in Python, the height (or rows) is `len(matrix)`, and the width (or columns) is `len(matrix[0])`.

## Verify Scaling
Display a pre-scaled tensor from the training set:


In [None]:
# @title
train_img[5]

array([[  0,   0,   0,   0,   1,   0,   0,   0,   0,  22,  88, 188, 172,
        132, 125, 141, 199, 143,   9,   0,   0,   0,   1,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   1,   0,   0,  20, 131, 199, 206, 196, 202, 242,
        255, 255, 250, 222, 197, 206, 188, 126,  17,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   1,   0,  35, 214, 191, 183, 178, 175, 168, 150,
        162, 159, 152, 158, 179, 183, 189, 195, 185,  82,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0, 170, 190, 172, 177, 176, 171, 169, 162,
        155, 148, 154, 169, 174, 175, 175, 177, 183, 188,  12,   0,   0,
          0,   0],
       [  0,   0,   0,   0,  25, 194, 180, 178, 174, 184, 187, 189, 187,
        184, 181, 189, 200, 197, 193, 190, 178, 175, 194,  90,   0,   0,
          0,   0],
       [  0,   0,   0,   0,  42, 218, 191, 197, 208, 204, 211, 209, 210,
        212, 211, 214, 215, 213, 214, 211, 211, 191, 200, 158,   0,   0,
          0,   0],
       [  

Display fifth image

Display the same tensor after scaling:

In [None]:
# @title
train_image[5]

array([[0.        , 0.        , 0.        , 0.        , 0.00392157,
        0.        , 0.        , 0.        , 0.        , 0.08627451,
        0.34509804, 0.7372549 , 0.6745098 , 0.51764706, 0.49019608,
        0.55294118, 0.78039216, 0.56078431, 0.03529412, 0.        ,
        0.        , 0.        , 0.00392157, 0.        , 0.        ,
        0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.00392157, 0.        ,
        0.        , 0.07843137, 0.51372549, 0.78039216, 0.80784314,
        0.76862745, 0.79215686, 0.94901961, 1.        , 1.        ,
        0.98039216, 0.87058824, 0.77254902, 0.80784314, 0.7372549 ,
        0.49411765, 0.06666667, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.00392157, 0.        ,
        0.1372549 , 0.83921569, 0.74901961, 0.71764706, 0.69803922,
        0.68627451, 0.65882353, 0.58823529, 0.63529412, 0.62352941,
        0.59607843, 0.6196

or:

In [None]:
# @title
next(train_ds.take(5).as_numpy_iterator())[0]

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.    

Voilà! The pixels are scaled between 0 and 1.

## Check Tensor Shape
Check shapes: