# Image Analysis

Note: some code in this notebook is taken or derived from the manual provided for the university course "data mining" at UU.

## Motivation of the query and method

In recent years, more LGBTQ people are represented in film and television. For example, a recent report suggests that about 4.5 percent of the US population identifies as LGBTQ and 6.7 percent of recurring TV characters are LGBTQ (Dawson, 2020). This makes me wonder whether LGBTQ people are represented accordingly in (Google) images. 
As its impossible to identify LGBTQ people based on an image, I focus on same-sex couples.

With the Google Images query "couple" I want to investigate if same-sex couples are adequately represented in images.

The focus on same-sex couples is also based on the method I work with, gender classification, specifically classifying the gender of a face. Most images that are shown based on this query include faces, which makes this an appropriate method for investigation. With being able to classify the faces of an image as male or female, I can see if an image portrays a same-sex couple or a couple of the opposite gender. This of course only works if there are faces in an image, additionally, a face might not be recognized if only the profile is shown (e.g., when two people are kissing each other, or looking at each other).

Analyzing the images of this query uncovers whether the "standard"/typical image of a couple is one including a woman and a man, or if same-sex couples (so either two men or two women) are also represented when searching for images of couples.

Based on a first look at the pictures that come up with this query, I expect same-sex couples to be underrepresented in the images.

Dawson, L. (2020, December 15). Presence vs. representation: Report breaks down LGBTQ visibility on TV. NBCNews.com. Retrieved January 16, 2022, from https://www.nbcnews.com/feature/nbc-out/presence-vs-representation-report-breaks-down-lgbtq-visibility-tv-n1251153 

Ignore warnings

In [1]:
import warnings
warnings.filterwarnings('ignore')

Installing libraries

In [2]:
#!pip install simple_image_download

In [3]:
from simple_image_download import simple_image_download as simp

import tensorflow
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image

import matplotlib.pyplot as plt
from PIL import Image, ImageOps
from tqdm.notebook import tqdm

from os import listdir
from os.path import isfile, join

import cv2
import numpy as np
import pandas as pd

Downloading images

I used code from this page for downloading the images from google: https://github.com/RiddlerQ/simple_image_download/blob/master/Example/Test1.py
I downloaded 500 images because I need to filter images including 2 faces, so I need a sufficient amount.

In [4]:
response = simp.simple_image_download

response().download('couple', 500)




Get the name of every saved picture

In [5]:
mypath = '/Users/arleenlindenmeyer/Desktop/ADS/data_mining/exam2/simple_images/couple/'

In [6]:
images = [f for f in listdir(mypath) if isfile(join(mypath, f))]

Get the path of every saved picture

In [7]:
path = []

for i in images:
    path_img = '/Users/arleenlindenmeyer/Desktop/ADS/data_mining/exam2/simple_images/couple/'+i
    path.append(path_img)

Create dataframe with name and path of every saved image

In [8]:
df = pd.DataFrame()
df['image'] = images
df['path'] = path

In [9]:
df.head()

Unnamed: 0,image,path
0,couple_73.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...
1,couple_228.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...
2,couple_382.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...
3,couple_101.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...
4,couple_414.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...


Load face classifier

In [10]:
#!wget https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml

In [11]:
face_classification = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

Load gender classifier

In [12]:
#!wget https://github.com/oarriaga/face_classification/raw/master/trained_models/gender_models/gender_mini_XCEPTION.21-0.95.hdf5

gender_classifier = load_model('gender_mini_XCEPTION.21-0.95.hdf5')

2022-01-16 23:13:16.998345: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.




Function to get the faces of the images

In [13]:
def apply_offsets(face_coordinates, offsets):
    """
    Derived from https://github.com/oarriaga/face_classification/blob/
    b861d21b0e76ca5514cdeb5b56a689b7318584f4/src/utils/inference.py#L21
    """
    x, y, width, height = face_coordinates
    x_off, y_off = offsets
    return (x - x_off, x + width + x_off, y - y_off, y + height + y_off)

Function to load the images from a path

In [14]:
def load_image_from_path(image_path, target_size=None, color_mode='rgb'):
    pil_image = image.load_img(image_path, 
                               target_size=target_size,
                            color_mode=color_mode)
    return image.img_to_array(pil_image)

In a loop, I load each image in the dataframe, identify and get the faces, and classify each face. In the end, I have lists containing the number of faces in each image, if the image contains a woman/women and if the image contains a man/men.

Note: I changed settings of the face classifier because with the setting we used in the tutorial a lot of faces were missed.

In [15]:
woman_yes = []
man_yes = []

labels = ['woman', 'man'] #labels for gender classifier

n_faces = [] 

for file in tqdm(df.path): # loop over images
    
    pre_image = load_image_from_path(file, color_mode='grayscale') #loading the images
    gray_image = np.squeeze(pre_image).astype('uint8')
    
    faces = face_classification.detectMultiScale(gray_image, 1.2, 4) # detect the faces
    
    GENDER_OFFSETS = (10, 10)
    INPUT_SHAPE_GENDER = gender_classifier.input_shape[1:3]

    genders = [] #list containing "man" or "woman" for each man/woman in the picture

    for face_coordinates in faces: # using the output of the face classifier
        x1, x2, y1, y2 = apply_offsets(face_coordinates, GENDER_OFFSETS) # extends the bounding box
        face_img = gray_image[y1:y2, x1:x2] # only get the face 
        face_img = cv2.resize(face_img, (INPUT_SHAPE_GENDER)) # resize the image
        face_img = face_img.astype('float32') / 255.0 # preprocess the image
        face_img = np.expand_dims(face_img, 0) # batch of one
        probas = gender_classifier.predict(face_img) #classify the gender of the face

        genders.append(labels[np.argmax(probas[0])]) #appends "man" or "woman" depending on the gender classifier
        
        

    if 'man' in genders: #appends 1 to list "man" if there is one man or more in the picture
        man = 1
    else: 
        man = 0

    if 'woman' in genders: #appends 1 to list "woman" if there is one woman or more in the picture
        woman = 1
    else: 
        woman = 0

    n_faces.append(len(faces)) # append to list for number of faces
    woman_yes.append(woman) # append to list for if there is one woman or more present in the image
    man_yes.append(man) # append to list for if there is one man or more present in the image
    

  0%|          | 0/500 [00:00<?, ?it/s]

Adding the lists as columns to the dataframe of the images. For the columns woman and man i did not differentiate if there is one or more woman/man in the image, as I am only going to look at couples of 2 people/faces and I can determine from just the indication if men or women are present if the couple is same-sex or not.

In [16]:
df['n_faces'] = n_faces
df['woman'] = woman_yes
df['man'] = man_yes

In [17]:
df.head()

Unnamed: 0,image,path,n_faces,woman,man
0,couple_73.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,0,0,0
1,couple_228.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,0,0,0
2,couple_382.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,0,0,0
3,couple_101.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,0,0,0
4,couple_414.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,0,0,0


Not all images include faces, and the face classifier does not recognize every face in an image (or recognizes a faces when there is none, but this occurs more rarely). To fix this issue and to be able to investigate my question, I select only images in the dataframe in which two faces are recognized.

In [18]:
df_couple = df[df['n_faces'] == 2]
df_couple.head()

Unnamed: 0,image,path,n_faces,woman,man
11,couple_455.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
13,couple_394.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
15,couple_402.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
21,couple_479.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
30,couple_438.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1


In 94 of the images, the face classifier detects 2 faces.

In [19]:
len(df_couple)

94

Out of these 94 images, 58 are classified as couples of the opposite sex (61.7%), and 36 are classified as couples of the same sex (38.3%).

In [20]:
diff_sex = df_couple[(df_couple.woman == 1) & (df_couple.man == 1) ]
len(diff_sex)

58

In [21]:
diff_sex.head()

Unnamed: 0,image,path,n_faces,woman,man
11,couple_455.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
13,couple_394.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
15,couple_402.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
21,couple_479.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1
30,couple_438.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,1


In [22]:
same_sex = df_couple[(df_couple.woman == 0) | (df_couple.man == 0) ]
len(same_sex)

36

Most images of same sex couples show 2 male faces.

In [23]:
same_sex_women = df_couple[(df_couple.woman == 1) & (df_couple.man == 0) ]
len(same_sex_women)

12

In [24]:
same_sex_women.head()

Unnamed: 0,image,path,n_faces,woman,man
58,couple_45.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,0
63,couple_199.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,0
99,couple_205.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,0
266,couple_430.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,0
295,couple_353.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,1,0


In [25]:
same_sex_men = df_couple[(df_couple.woman == 0) & (df_couple.man == 1) ]
len(same_sex_men)

24

In [26]:
same_sex_men.head()

Unnamed: 0,image,path,n_faces,woman,man
68,couple_52.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,0,1
91,couple_360.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,0,1
119,couple_287.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,0,1
227,couple_297.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,0,1
268,couple_133.jpeg,/Users/arleenlindenmeyer/Desktop/ADS/data_mini...,2,0,1


## Conclusion and discussion 

The results show that about 38% of the images in which 2 faces were recognized are same-sex couples, and 2/3 of these are classified as male same-sex couples, 1/3 as female same-sex couples. In this case, same-sex couples would even be overrepresented, and my hypothesis would be incorrect.

However, looking at samples of couples of opposite sex and same sex, there were a lot of mistakes in the classifications (I looked at the images shown in the head of the dataframes for diff_sex, same_sex_women and same_sex_men). 1 of the 5 sample images for couples of the opposite sex showed 2 female faces and was therefore misclassified. All of the sample images for same-sex female couples were misclassified and were acutally showing couples of opposite sex. Only 1 out of the 5 sample images for same-sex male couples was correctly classified, the 4 others were actually showing couples of opposite sex.

Because of the flaws of both the face classifier (as many faces are missed) and gender classifier (misgendering faces), it is hard to draw any conclusion and answer the hypothesis based on these results. Further, some images only include hands or the back of people, thus cannot be analyzed with a face classifier. I must note that in the sample images also faces shown from the side and even the back of the head were identified as faces, but these were often misclassified by the gender classifier.

It would be interesting to be able to classify gender based on other parts as well (so, e.g., hands or a back shot of people) to be able to include these in the analysis. 
Additionally, I think it would be interesting to investigate skin color, to see if e.g., couples with white skin are overrepresented. It would also be interesting to analyze the age of the couples, to see if the images are mainly of one specific age group or if an age group is underrepresented.