# **Lab 1.1 Image Data Exploration**
In this lab, you'll explore the metadata of an image and an image dataset. This includes the image's size, channels, aspect ratio, and more. You'll also learn how to handle varying aspect ratios within a dataset.



In [2]:
from PIL import Image
from PIL.ExifTags import TAGS

import cv2

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('dark_background')

import os

## **Explore image's information using Pillow**

Let's open your image using PIL (`Image.open()`). Then, try to print the datatype of the opened image. To display the image, if the opened image is stored in the `img` variable, simply type `img` on the last line of the cell.

In [12]:
# Start your code

# End your code


### Exchangeable image file format (EXIF)
EXIF is a standard that specifies formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners and other systems handling image and sound files recorded by digital cameras. In this part you'll explore `exif` of an image using `from PIL.ExifTags import TAGS` and `img._getexif()`.

<span style="color:orange">**In OpenCV**</span>, image metadata is discarded when reading images. This means it won't access information like EXIF data (exposure, camera settings, etc.) embedded within the image file.

In [13]:
# Start your code

# End your code

### Image Property

Print properties of the PIL image such as size, format, mode, bands (which means channels), number of channels, and aspect ratio.

In [14]:
# Start your code

# End your code

Convert the PIL image to a numpy array. Then, print properties of the image such as shape, number of channels, aspect ratio, and value range.

In [15]:
# Start your code

# End your code

## **Explore image's information using OpenCV**
Let's open your image using OpenCV (`cv2.imread()`). Then, try to print the datatype of the opened image. To display the image, if the opened image is stored in the `img` variable, simply type `img` on the last line of the cell. You'll see that the opened image from OpenCV is a `numpy.ndarray`, unlike PIL, which opens images as `PIL.JpegImagePlugin.JpegImageFile`.

In [16]:
# Start your code

# End your code

Try to display the image read by OpenCV using matplotlib (`plt.imshow()`). You'll notice that the colors appear to be weird. This is because OpenCV reads images in BGR format, while images are typically stored in RGB format.

In [17]:
# Start your code

# End your code

To correct this, you'll need to rearrange the channels using `cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)`. Then, display the image again. You should now see the image in the correct colors.

In [18]:
# Start your code

# End your code

Print properties of the image such as shape, number of channels, aspect ratio, and value range.

In [19]:
# Start your code

# End your code

## **Now let's expore an image dataset**
In this part, you will need to display the distribution of the image's channels, a scatter plot of height versus width, the distribution of aspect ratios, and the distribution of aspect ratios with a condition, all from the given image dataset.

In [20]:
# Start your code

# End your code

Display distribution of the image's channels using `plt.hist()`

<details>

<summary>
<font size="3" color="orange">
<b>Expected output</b>
</font>
</summary>

- The output should resemble this, but not be identical.

![image.png](attachment:image.png)

</details>

In [23]:
# Start your code

# End your code


Display scatter plot of height versus width using `plt.scatter()`

<details>

<summary>
<font size="3" color="orange">
<b>Expected output</b>
</font>
</summary>

- The output should resemble this, but not be identical.
![image.png](attachment:image.png)

</details>

In [24]:
# Start your code

# End your code


Display distribution of aspect ratios.

<details>

<summary>
<font size="3" color="orange">
<b>Expected output</b>
</font>
</summary>

- The output should resemble this, but not be identical.

![image.png](attachment:image.png)

</details>

In [25]:
# Start your code

# End your code

Display the distribution of aspect ratios under the following conditions:

- `'aspect ratio <= 1/1.8'`
- `'1/1.8 < aspect ratio < 1.8'` : This will be considered as square or closely resembling a square.
- `'aspect ratio >= 1.8'`

<details>

<summary>
<font size="3" color="orange">
<b>Expected output</b>
</font>
</summary>

- The output should resemble this, but not be identical.

![image.png](attachment:image.png)

</details>

In [26]:
# Start your code

# End your code

### How to handle varying aspect ratios within a dataset
There are various ways to handle this problem, such as simply discarding the image, among other methods. However, in this lab, we will utilize image resizing and center cropping. Most available pre-trained CNN models (for classification tasks) have specific preprocessing requirements for input images. Often, they require the input image size to be 224x224. But how does this work when we resize both the width and height directly to 224x224, and the aspect ratio of the image is not 1 or not close to 1? The aspect ratio of the objects inside the image will look strange, right? So, there's another method, which is center cropping.


In [27]:
# Start your code

# End your code

#### Case : w>h

Select an image that has an aspect ratio greater than 1.8, which means the width is greater than the height.<br/>
Read and display the image.

In [28]:
# Start your code

# End your code

##### Directly resize
Directly resize both the width and height of the image to 224. Then, display the resulting image.

In [29]:
# Start your code

# End your code

##### Center Cropping
First, resize your image <span style="color:red"> **height** </span> to 232 while maintaining the original image aspect ratio. Then, center crop the image so that the final output size of the image is 224x224. Finally, display the output image.

In [30]:
# Start your code

# End your code


Use a subplot to summarize the results. The first index is for the original image, the second is for the directly resized image, and the last is for the center cropped image

<details>

<summary>
<font size="3" color="orange">
<b>Expected output</b>
</font>
</summary>

- The output should resemble this, but not be identical.

![image.png](attachment:image.png)

</details>

In [31]:
# Start your code

# End your code

#### Case : h>w

Select an image that has an aspect ratio less than 1/1.8, which means the height is greater than the width.<br/>
Read and display the image.

In [32]:
# Start your code

# End your code

##### Directly resize
Directly resize both the width and height of the image to 224. Then, display the resulting image.

In [33]:
# Start your code

# End your code

##### Center Cropping
First, resize your image <span style="color:red"> **width** </span> to 232 while maintaining the original image aspect ratio. Then, center crop the image so that the final output size of the image is 224x224. Finally, display the output image.

In [34]:
# Start your code

# End your code

Use a subplot to summarize the results. The first index is for the original image, the second is for the directly resized image, and the last is for the center cropped image

<details>

<summary>
<font size="3" color="orange">
<b>Expected output</b>
</font>
</summary>

- The output should resemble this, but not be identical.

![image.png](attachment:image.png)

</details>

In [35]:
# Start your code

# End your code


With center cropping, you'll notice that the resulting image maintains the original object ratio, but some information at the borders of the original image is lost. For most images, the object of interest is at the center, so center cropping should generally work well with various images. However, when directly resizing, the aspect ratio of the object can become distorted, which could lead to a loss of the object's real shape information when used to train a Deep Learning model.

Question

1. What would data exploration give you about your dataset?
2. How would image aspect ratio effect on image resize?
3. What information would statistics bar graph of aspect ratio give us?
4. How would center crop help?
5. Give a suggession on how to find outlier
6. Show a comparison table of difference between openCV and PIL