In [None]:
#| default_exp live

# Loading LIVE

> We will be loading the dataset from our private server.

In [None]:
#| export
import pandas as pd
from pathlib import Path
from natsort import natsorted
from fastcore.foundation import L

In [None]:
#| hide
path_root = Path("/media/disk/databases/BBDD_video_image/Image_Quality/LIVE/")

In the root folder we can find:

- `refimgs`: Folder containing the reference images without distortions.
- `gblur`, `jpeg`, `wn`, `fastfading`, `jp2k`: Folders containing images with the corresponding distortions (given by the folder name).
- `dmos_live.mat`: Contains the distorted images and its DMOS.
- `refnames_all.mat`: Stablishes correspondence between the distorted and reference images.

## Exploring `.mat` files

We are going to begin by loading the `.mat` files to inspect them:

In [None]:
import scipy.io as sio

In [None]:
dmos_live = sio.loadmat(path_root/"dmos_live.mat", simplify_cells=True)
dmos_live.keys()

dict_keys(['__header__', '__version__', '__globals__', 'orgs', 'dmos'])

`dmos_live.mat` contains two columns: 

- `orgs`: Array of 0s and 1s, where a 0 indicates a distorted image and a 1 indicates a reference image.
- `dmos`: Corresponding DMOS value. It's 0 for `orgs=1`.

In [None]:
len(dmos_live["orgs"])

982

In [None]:
refnames_all = sio.loadmat(path_root/"refnames_all.mat", simplify_cells=True)
refnames_all.keys()

dict_keys(['__header__', '__version__', '__globals__', 'refnames_all'])

On the other hand, `refnames_all.mat` contains only one column:

- `refnames_all`: Filenames of the reference images.

In [None]:
refnames_all['refnames_all'][:5]

array(['buildings.bmp', 'studentsculpture.bmp', 'rapids.bmp',
       'dancers.bmp', 'churchandcapitol.bmp'], dtype=object)

## Combining both files

Now that we inspected both files, we have to note that the idea is that for `dmos_live["dmos"][i]`, its corresponding reference image is `refnames_all["refnames_all"][i]`. With this in mind, we can put all the information into a `.csv` file to facilitate the data loading in the future.

> By doing so, we avoid having to repeat this process.

**Important**

In `dmos_live` we have the DMOS corresponding to each distorsion, but we don't actually have the distorted images' names, so we will have to fetch them first. To do this, in the `readme.txt` file, it's said that `dmos_live.mat` has been constructed by filling, in order, with `jp2k`, `jpeg`, `wn`, `gblur` and `fastfading`, so we should get those paths in order and concatenate them to match.

In [None]:
paths_jp2k = [str(x) for x in natsorted(list((path_root/"jp2k").glob("*.bmp")))]
paths_jpeg = [str(x) for x in natsorted(list((path_root/"jpeg").glob("*.bmp")))]
paths_wn = [str(x) for x in natsorted(list((path_root/"wn").glob("*.bmp")))]
paths_gblur = [str(x) for x in natsorted(list((path_root/"gblur").glob("*.bmp")))]
paths_fastfading = [str(x) for x in natsorted(list((path_root/"fastfading").glob("*.bmp")))]
paths = L(paths_jp2k, paths_jpeg, paths_wn, paths_gblur, paths_fastfading).concat()

assert len(paths_jp2k) + len(paths_jpeg) + len(paths_wn) + len(paths_gblur) + len(paths_fastfading) == len(dmos_live["orgs"])
assert len(paths) == len(dmos_live["orgs"])

Right now, we have the full path to the images but we don't need that. It's enough to have only the `distortion/img` route:

In [None]:
paths_short = paths.map(lambda x: "/".join(x.split("/")[-2:]))
paths_short

(#982) ['jp2k/img1.bmp','jp2k/img2.bmp','jp2k/img3.bmp','jp2k/img4.bmp','jp2k/img5.bmp','jp2k/img6.bmp','jp2k/img7.bmp','jp2k/img8.bmp','jp2k/img9.bmp','jp2k/img10.bmp'...]

In [None]:
data = {
    "Reference": refnames_all["refnames_all"],
    "Distorted": paths_short,
    "DMOS": dmos_live["dmos"]
}

In [None]:
df = pd.DataFrame.from_dict(data)
df.head()

Unnamed: 0,Reference,Distorted,DMOS
0,buildings.bmp,jp2k/img1.bmp,0.0
1,studentsculpture.bmp,jp2k/img2.bmp,28.003845
2,rapids.bmp,jp2k/img3.bmp,34.010736
3,dancers.bmp,jp2k/img4.bmp,65.13141
4,churchandcapitol.bmp,jp2k/img5.bmp,68.91134


Finally, we save the generated `.csv` file:

In [None]:
#| notest
df.to_csv(path_root/"image_pairs_dmos.csv")