# Identifying features linked to attractiveness
This notebook follows on this other notebook on analysing facial ratings. The previous notebook aimed to analyse how people rate a face ; we examined 
- the distribution of averages ratings across multiple pictures
- how several variables (gender and ethnicity) correlate with the ratings
- how individual raters rate different faces
- how different people rate a same face

The above analysis was performed only on the numerical ratings given by raters to each image as well as some covariates (gender and ethnicity). For now we have not yet used the pictures themselves. In the present notebook we aim to determine which features of the images are correlated with the ratings. In a nutshell we want to find out if we can isolate:
- specific features or at least some projection from the pictures, that are characteristic of attractiveness
- specific features or at least some projection from the pictures, that are characteristic of unattractiveness

# Loading Data
First let's load the ratings and the path for each image

In [65]:

import pandas as pd
import numpy as np
import os
from pathlib import Path
from PIL import Image

'''
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import models
from sklearn.preprocessing import StandardScaler
'''

'\nimport torch\nimport torch.nn as nn\nimport torchvision.transforms as transforms\nfrom torchvision import models\nfrom sklearn.preprocessing import StandardScaler\n'

In [66]:

link_data = "https://github.com/fbplab/MEBeauty-database/raw/main/scores/generic_scores_all_2022.xlsx"
df = pd.read_excel(link_data)
df = df.iloc[:, 0:3] #remove individual ratings
df

Unnamed: 0,mean,image,path
0,1.117647,kuma-kum-GKbPbR0ZAT4-unsplash.jpg,/home/ubuntu/ME-beautydatabase/images/female/c...
1,1.000000,pexels-cottonbro-5529905.jpg,/home/ubuntu/ME-beautydatabase/images/male/asi...
2,1.000000,pexels-nishant-aneja-2561432.jpg,/home/ubuntu/ME-beautydatabase/images/male/ind...
3,1.428571,woman-1929550_1920.jpg,
4,1.500000,pexels-himesh-mehta-3059930.jpg,/home/ubuntu/ME-beautydatabase/images/female/i...
...,...,...,...
2602,9.000000,pexels-pixabay-247322.jpg,/home/ubuntu/ME-beautydatabase/images/female/c...
2603,9.375000,women-5930352_1920.jpg,/home/ubuntu/ME-beautydatabase/images/female/a...
2604,9.222222,francesca-zama-1fhl_kmbfAE-unsplash.jpg,/home/ubuntu/ME-beautydatabase/images/female/h...
2605,9.625000,sofia--LNdco1UgNY-unsplash.jpg,/home/ubuntu/ME-beautydatabase/images/female/c...


In [67]:
df.iloc[4,2]

'/home/ubuntu/ME-beautydatabase/images/female/indian/pexels-himesh-mehta-3059930.jpg'

In [68]:
# Drop  missing values if any
df.dropna(inplace=True)
df.drop("image", axis=1, inplace=True)

def standardize_path(cell):
    # We only want the architecture of the path from {gender}.
    # That way later we can provide whatever root path we want for the dataset
    path = Path(cell)
    return "/".join(path.parts[-3:])

df["path"] = df["path"].apply(standardize_path)

In [69]:
df

Unnamed: 0,mean,path
0,1.117647,female/caucasian/kuma-kum-GKbPbR0ZAT4-unsplash...
1,1.000000,male/asian/pexels-cottonbro-5529905.jpg
2,1.000000,male/indian/pexels-nishant-aneja-2561432.jpg
4,1.500000,female/indian/pexels-himesh-mehta-3059930.jpg
5,1.888889,male/asian/pexels-kaniseeyapose-2751061.jpg
...,...,...
2602,9.000000,female/caucasian/pexels-pixabay-247322.jpg
2603,9.375000,female/asian/women-5930352_1920.jpg
2604,9.222222,female/hispanic/francesca-zama-1fhl_kmbfAE-uns...
2605,9.625000,female/caucasian/sofia--LNdco1UgNY-unsplash.jpg


After removing the missing values, we have 2553 images. That's a lot of pictures ! Let's download them locally so that we can load them more easily later. You may take a cup of tea, this may take a while. Note that you may also only download the cropped dataset which is faster and takes less space on the machine.

In [70]:
import os
import requests
from tqdm import tqdm



# Function to download a file from a URL
def download_file(url, save_path):
    response = requests.get(url, stream=True)
    with open(save_path, 'wb') as file:
        for data in response.iter_content(chunk_size=1024):
            file.write(data)

# URLs of the images
base_url_original = 'https://github.com/fbplab/MEBeauty-database/raw/main/original_images/'
base_url_cropped = 'https://github.com/fbplab/MEBeauty-database/raw/main/cropped_images/images_crop_align_opencv/'

def download_dataset(df, dataset_url, output_dir, rewrite=True):
    
    image_paths = df['path'].tolist()
    with tqdm(total=len(image_paths), unit='file') as pbar:
        for  path in image_paths:
            local_path = os.path.join(output_dir, path)
            if  rewrite or not os.path.isfile(local_path):
                write_dir = os.path.join(output_dir, *Path(path).parts[:-1])
                #print(write_dir)
                os.makedirs(write_dir, exist_ok=True)
                download_file(os.path.join(dataset_url , path), local_path)
            pbar.update(1)



In [None]:

# URLs of the images
base_url_original = 'https://github.com/fbplab/MEBeauty-database/raw/main/original_images/'
base_url_cropped = 'https://github.com/fbplab/MEBeauty-database/raw/main/cropped_images/images_crop_align_opencv/'

download_dataset(df, base_url_original, 'images/original', rewrite=False)

 60%|█████████████████████████████████████████████▎                              | 1524/2553 [24:42<29:30,  1.72s/file]

In [None]:
download_dataset(df, base_url_cropped, 'images/cropped', rewrite=False)