In [1]:
import pandas as pd

from pathlib import Path

# Processing the results from the user study

In [2]:
# Set file path to .csv from Google Forms, check if file exists
csv_file = Path("./data/Koalarization_userstudy.csv")
assert csv_file.is_file()

In [3]:
# Read file to Pandas Dataframe
df = pd.read_csv(csv_file)
df.head()

Unnamed: 0,Tijdstempel,Image 1042,Image 4062,Image 91,Image 4646,Image 837,Image 4158,Image 4550,Image 4553,Image 4406,Image 110,Image 230,Image 311
0,2022/03/29 5:12:49 p.m. EET,Real,Real,Real,Real,Real,Fake,Real,Fake,Real,Real,Real,Real
1,2022/03/29 5:14:18 p.m. EET,Real,Real,Fake,Real,Real,Real,Fake,Fake,Real,Fake,Fake,Real
2,2022/03/29 5:15:05 p.m. EET,Real,Real,Real,Real,Fake,Fake,Real,Fake,Real,Fake,Real,Fake
3,2022/03/29 5:16:42 p.m. EET,Real,Real,Fake,Real,Real,Fake,Fake,Real,Fake,Real,Real,Real
4,2022/03/29 5:18:25 p.m. EET,Real,Fake,Real,Fake,Fake,Fake,Real,Fake,Real,Real,Real,Fake


In [4]:
# Drop unused column
df.drop("Tijdstempel", axis=1, inplace=True)
df.describe()

Unnamed: 0,Image 1042,Image 4062,Image 91,Image 4646,Image 837,Image 4158,Image 4550,Image 4553,Image 4406,Image 110,Image 230,Image 311
count,60,60,60,60,60,60,60,60,59,60,60,60
unique,2,2,2,2,2,2,2,2,2,2,2,2
top,Real,Real,Fake,Real,Fake,Fake,Fake,Fake,Real,Real,Real,Fake
freq,47,41,36,34,42,44,33,36,31,47,50,37


In [5]:
original = ["Image 110", "Image 230", "Image 311"]  # Original (not recoloured) images
num_responses = len(df)  # Number of responses

In [6]:
seen_as_real = (df == "Real").sum() / num_responses  # Percentage of images classified as real
seen_as_real

Image 1042    0.783333
Image 4062    0.683333
Image 91      0.400000
Image 4646    0.566667
Image 837     0.300000
Image 4158    0.266667
Image 4550    0.450000
Image 4553    0.400000
Image 4406    0.516667
Image 110     0.783333
Image 230     0.833333
Image 311     0.383333
dtype: float64

In [7]:
originals = seen_as_real[original].sort_values()  # Select the original images
originals

Image 311    0.383333
Image 110    0.783333
Image 230    0.833333
dtype: float64

Only one of the images sticks out to most users as fake, that being image 311, an image of a military(?) vessil. Furthermore, image 110 scores slightly worse to image 230. Anecdotally, this is due to the "weirdness" of its face, as it seems divided into two halves.

In [8]:
originals.describe()

count    3.000000
mean     0.666667
std      0.246644
min      0.383333
25%      0.583333
50%      0.783333
75%      0.808333
max      0.833333
dtype: float64

The average

In [9]:
fakes = seen_as_real.drop(original).sort_values(ascending=False)  # Select the recoloured images
fakes

Image 1042    0.783333
Image 4062    0.683333
Image 4646    0.566667
Image 4406    0.516667
Image 4550    0.450000
Image 91      0.400000
Image 4553    0.400000
Image 837     0.300000
Image 4158    0.266667
dtype: float64

Four out of nine images fool the average user more than half of the time. With image 1042, an image of a tiger, being the most convincing.

In [10]:
fakes.describe()

count    9.000000
mean     0.485185
std      0.170873
min      0.266667
25%      0.400000
50%      0.450000
75%      0.566667
max      0.783333
dtype: float64

On average, 48.5% of recoloured images are miss-classified as originals. With the standard deviation being as high as it is, keep in mind that the results may be significantly better or worse depending on the image. Furthermore, keep in mind that the images used here were taken from the most realistic images generated from the ImageNet dataset.