# The Role of Computer Vision Data Analyst at Varjo

## Home Assignment Report

### Aleksandr Krylov
### 26/01/2025

The following report presents the main observatons and findings about the influence of the physical attributes of the eyes and the eyewear of a subject on the tracking error of the eye tracking algorithm.

## Data Processing

Several inconsistencies and duplicates were found in values of many data attributes, including `Gender`, `Eye colour`, `Skin tone`, `Ethnic background`, `Makeup`, and `Dominant eye`. This could be explained by the faults in the data collection protocol. 

### Gender

The values, which contained inconsistent gender names, have been normalized to follow the same format.

('male', 'Male', 'male ') -> 'male'

('female', 'Female') -> 'female'


Code:

```python
data["Gender"] = data["Gender"].apply(lambda gender: gender.strip().lower())
```

### Eye colour

The color values had mixed format and many duplicates due to a disorded naming convention. A unified format has been implemented, which reduced the number of color categories from 33 down to 19. The following examples demonstrate what have been done:

('grey', 'gray', 'Grey') -> 'gray'

('blue-gray', 'gray-blue', 'blue-grey') -> blue-gray'

('gray-green', 'greengray') -> 'gray-green'

Code:

```python
def color_formatter(color_string: str) -> str:
    """
    Normalizes and unifies the color naming rules
    by sorting colors, which are part of the input color string, in an ascending order
    and using '-' and '/' as separators.
    """
    color_list = [sub.split("-") for sub in color_string.split("/")]
    color_list = sorted(list(map(sorted, color_list)))
    color_list = list(map("-".join, color_list))

    return "/".join(color_list)


data["Eye colour"] = data["Eye colour"].apply(lambda color: color.strip().lower())\
                                       .apply(lambda color: color.replace("grey", "gray"))\
                                       .apply(lambda color: color.replace("greengray", "green-gray"))\
                                       .apply(lambda color: "-".join(color.split()))\
                                       .apply(lambda color: color.replace("ish", ""))\
                                       .apply(color_formatter)
```

### Skin tone

The similar processing steps have been done to remove unnecessary duplicates and to fix spelling mistakes.

('dark', 'Dark') -> 'dark'

('medium', 'Medium') -> 'medium'

('light', 'Light', 'right') -> 'light'

Code:

```python
data["Skin tone"] = data["Skin tone"].apply(lambda skin_tone: skin_tone.strip().lower())\
                                     .apply(lambda skin_tone: skin_tone.replace("right", "light"))
```

### Ethnic background

The values contained many different inconsistencies in the naming, which are shown below.

('Europe', 'europe', 'EUrope') -> 'Europe'

('South Asia', 'South Asia ', 'south Asia', 'South- Asia ') -> 'South-Asia'

('Horn & Sub-Saharan Africa', 'Horn & Sub-Saharan Africa.', 'Horn and Sub Saharan Africa') -> 'Horn-And-Sub-Saharan-Africa'

A more rigorous cleaning and formatting steps have been done to process and unify all the names.

Code:

```python
def background_formatter(background_string: str) -> str:
    """
    Normalizes the formatting of the input background string
    by ensuring that all the name subparts are capitalized and seperated by '-'.
    """
    background_list = [sub.strip("-. ").split("-") for sub in background_string.split()]
    return "-".join([s.strip("-. ").lower().capitalize() for sub in background_list for s in sub])


data["Ethnic background"] = data["Ethnic background"].apply(lambda bkg: bkg.strip(". "))\
                                                     .apply(background_formatter)\
                                                     .apply(lambda bkg: bkg.replace("&", "And"))
```                                    

### Makeup

Similarily to `Gender` and `Skin tone`, the inconsistent value names have been normalized to the same names.

('no makeup', 'No makeup') -> 'no makeup'

('light eye makeup', 'light eye-makeup') -> light eye makeup'

Code:

```python
data["Makeup"] = data["Makeup"].apply(lambda makeup: makeup.strip().lower())\
                               .apply(lambda makeup: makeup.replace("-", " "))
````

### Dominant eye

In the same way, the inconsistent value names have been fixed.

('right', 'Right') -> 'right'

Code:

```python
data["Dominant eye"] = data["Dominant eye"].apply(lambda eye: eye if eye is np.nan else eye.strip().lower())
```

### Eye tracking error

The eye tracking error of `Left Eye` and `Right Eye` is statistically comparable and similarily distributed. The distribution is asymmetrical and skewed with a noticeable tail and a considerable number of outliers. Especially, one value from `Left Eye` appears very extream (9.5 degrees) and is clearly visible in the plot below. We have decided to remove this data point from the analysis.

<div align="center">
  <img src="./media/eye-tracking-error-boxplot.png" width="40%">
</div>

## Data Visualization and Analysis

In the analysis, we put more focus on the physical attributes of the eyewear rather than the physical attributes of individual subjects. Below we discuss the relationship between the eye tracking error and the following attributes:

* Makeup

* Eyewear type

* Eyewear frame type

* Eyewear frame thickness

* Eyewear lens type

* Eyewear lens height

* Eyewear coating type

All the above attributes represent categorical variables with multiple groups in each category. This limits the methods for quantifying the correlation between those attributes and the eye tracking error. Therefore, we opt for visual estimation and hypothesis testing.

For the visual assessment, we produce the bar plot containing average error and its 95% confidence interval for each group of the selected categorical attribute. We do not compute the confidence interval when the number of samples is less than 5. This kind of visualization helps to assess and interpret the statistical significance of the observed difference in the error between groups of the category by simply looking at the overlap between the confidence intervals.

We also use the statistical hypothesis testing to complement the visualizations. Taking into account the shape of the eye tracking error distribution, we choose to use the **Kruskal–Wallis test** [[wiki](https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_test), [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html#scipy.stats.kruskal)], which is non-parametric test and is robust to outliers. This test identifies whether 2 or more independent samples come from the same distribution. The null hypothesis states that all the groups of the selected category have the same median. The alternative hypothesis is that at least one median of some group is different from the rest. We use 5% significance level to reject the null hypothesis. Each group must have at least 5 measurements.

```python
def kruskal_wallis_test(
    groups: list[np.array],
    alpha: float = 0.05
) -> tuple[float, str]:
    """
    source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html#scipy.stats.kruskal

    Compute the Kruskal-Wallis H-test for independent groups.
    The test works on 2 or more independent groups, which may have different sizes.

    H0: the medians of all the groups are equal

    H1: at least one median of one group is different from the median of at least one other group

    Returns:
    -------
    pvalue: float
        the p-value for the test using the assumption that H has a chi square distribution

    res: str
        the test outcome saying whether to reject H0 given the significance level (alpha)
    
    """
    if len(groups) < 2: return (np.nan, f"Test cannot be computed! The number of input groups must be at least 2.")
    
    pvalue = stats.kruskal(*groups).pvalue
    res = "reject H0" if pvalue < alpha else "fail to reject H0"

    return (pvalue, res)
```

## Observation 1: the presence of makeup relates to a smaller tracking error for the left eye.

We have created a dummy binary category named `is_makeup` to analyse the relationship between the tracking error and the presence of makeup.

```python
data["is_makeup"] = (data["Makeup"] != "no makeup").map({True: "Yes", False: "No"})
````

By looking at the barplot below, we observe that the average eye tracking error is smaller for the condition when there is some makeup, although this effect is no as strong as for the right eye.

The results of the **Kruskal–Wallis test** further confirm the statistical significance of the impact of the presence of makeup on the `left eye tracking error` (pvalue = 0.022), but not on the `right eye tracking error` (pvalue = 0.447).

<div align="center">
  <img src="./media/is-makeup-barplot.png" width="100%">
</div>

We have produced an extra barplot of similar kind showing all the values of the `Makeup` attribute. The effect of the presence of `light makeup` and `eyelash extensions` on the eye tracking error cannot be concluded, which is explained by the scarcity of error measurements under these two conditions (`light makeup`: 2, `eyelash extensions`: 3). For the same reason, these two groups cannot provide reliable results for the **Kruskal–Wallis test**.

Evidently, the outcome of the previous statistical test is mainly explained by the difference in the eye tracking error between the following two conditions: `no makeup` and `light eye makeup`. This is supported by the **Kruskal–Wallis test**: left eye (pvalue = 0.01) and right eye (pvalue = 0.259).

<div align="center">
  <img src="./media/makeup-barplot.png" width="100%">
</div>


Summary:

* the presence of makeup (specifically `light eye makeup`) is associated with a smaller left eye tracking error;

* the presence of makeup does not produce significantly different eye tracking error for the right eye;

* the inconsistent eye tracking error in the presence of `light makeup` and `eyelash extensions` requires collecting more measurements to make conclusions about the effect of these two conditions.

## Observation 2: no statistically significant effect of the presence of eyewear on the eye tracking error is found.

Similarily to `Makeup`, we have assessed whether the presence of eyewear affects the eye tracking error. For this purpose, a dummy binary category called `is_eyewear` have been produced.

```python
data["is_eyewear"] = (data["Eyewear type"] != "No-eyewear").map({True: "Yes", False: "No"})
```

The barplot shows that the average eye tracking error is bigger in the presence of eyewear, although the observed difference seems insignificant by judging the amount of the overlap between the confidence intervals. The result of the **Kruskal–Wallis test** does not provide sufficient evidence of the significant difference in the eye tracking error of both eyes between the two groups: left eye (pvalue = 0.176) and right eye (pvalue = 0.141).

<div align="center">
  <img src="./media/is-eyewear-barplot.png" width="100%">
</div>

Visualizing the average eye tracking error and its 95% confidence interval for each value of the `Eyewear type` attribute reinforces the above findings. The outcome of the **Kruskal–Wallis test** on all three groups (`No-eyewear`, `Eyeglasses`, `Contacts`) is the same: left eye (pvalue = 0.32) and right eye (pvalue = 0.321).

<div align="center">
  <img src="./media/eyewear-type-barplot.png" width="100%">
</div>

Summary:

* no statistically significant effect of the presence of eyewear on the eye tracking error is found.

## Observation 3: the eyewear frame type has no substantial effect on the eye tracking error, but a few frame types require further investigation provided more measurements.

We do not observe a considerable difference in the eye tracking error between different eyewear frame types when looking at the below barplot. The less robust conclusion can be made about the `Rimless` eyewear frame type because of a small number of observations (only 3 of them is available). Additionally, the 95% confidence interval of the eye tracking error corresponding to the `Semi-rimless` eyewear frame type does not provide a reliable estimate based on 10 observations, which is explained by a big size of the interval.

The result of the **Kruskal–Wallis test** (without the `Rimless` group) supports the visualization analysis: left eye (pvalue = 0.729) and right eye (pvalue = 0.777).

<div align="center">
  <img src="./media/eyewear-frame-type-barplot.png" width="100%">
</div>

Summary:

* there is no statistically significant difference in the eye tracking error between different eyewear frame types;

* more observations need to be obtained to conclude the effect of `Semi-rimless` and `Rimless` eyewear frame types at a larger confidence level.

## Observation 4: the eye tracking error is not affected by the eyewear frame thickness, but eyewear with thick frame occasionally produces extreme error values and more observations are required for further analysis.

Similarly to `Eyewear frame type`, no significant difference in the eye tracking error between different values of `Eyewear frame thickness` is visually evident (see the below barplot). The wide 95% confidence interval of the average eye tracking error associated with `Thick` value is explained by the presence of several outliers (ID0544, ID0632, ID0671) and a small number of measurements (13). The eye tracking error corresponding to `No-frame` value (i.e. `Rimless` eyewear frame type) cannot be reliably assessed and compared to other groups.

The result of the **Kruskal–Wallis test** (without the `No-frame` group) also confirms that there is no statistically significant difference in the eye tracking error between different groups: left eye (pvalue = 0.538) and right eye (pvalue = 0.716).

<div align="center">
  <img src="./media/eyewear-frame-thickness-barplot.png" width="100%">
</div>

Summary:

* no statistically significant effect of eyewear frame thickness on the eye tracking error is found;

* more measurements of the eye tracking error in `Thick` group will benefit towards a more reliable assessement of its effect on the eye tracking error;

* the same goes for the effect of `No-frame` / `Rimless` group.

## Observation 5: the eyewear lense type generally does not introduce a statistically significant difference in the eye tracking error, but Trifocal lens group needs more measurements for a reliable estimation of its effect on the eye tracking error.

The below barplot shows that the average eye tracking error does decrease between `Single-vision`, `Bifocals`, and `Trifocal` lens types, although the overlap between 95% confidence intervals of the average error corresponding to different groups suggests no significant difference. Only one observation of the eye tracking error of `Trifocal` group is available, which is inadequate to make a conclusion about its effect.

No evidence of a significant relationship between the eye tracking error and `Eyewear lens type` is found when performing the **Kruskal–Wallis test** (without the `Trifocal` group): left eye (pvalue = 0.132) and right eye (pvalue = 0.160).

<div align="center">
  <img src="./media/eyewear-lens-type-barplot.png" width="100%">
</div>


Summary:

* the eyewear lense type has no significant influence on the eye tracking error;

* no conclusion about the effect of `Trifocal` lens type on the eye tracking error can be proposed, more measurements are needed.

## Observation 6: the eye tracking error does not have a statistically significant difference caused by eyewear lens height.

Similarly to `Eyewear lens type`, we do not observe a considerable difference in the eye tracking error between different `Eyewear lens height` values. The outcome of the **Kruskal–Wallis test** is aslo similar: left eye (pvalue = 0.827) and right eye (pvalue = 0.912).

The size of 95% confidence interval of the right eye tracking error of `High` group is affected by one outlier (ID0552).

<div align="center">
  <img src="./media/eyewear-lens-height-barplot.png" width="100%">
</div>


Summary:

* the difference in the eye tracking error between different eyewear lens height is not statistically significant.

## Observation 7: none of the eyewear coating types causes a statistically significant difference in the eye tracking error, but the effect of several coating types needs to be estimated more accurately by collecting more measurements.

Overall, the eye tracking error does not significantly differ between each of the eyewear coating types, which is confirmed with the below visualization and the result of the **Kruskal–Wallis test**.

The conclusion about the effect of `anti-fog` eyewear coating type on the eye tracking error cannot be made, becase no data points with the presence of this coating type are available.

Similarly, the evidence whether the presence of `anti-smudge` coating type affects the eye tracking error is limited and cannot be estimated reliably because of the lack of the measurements (anti-smudge (yes): 3). For the same reason, the 95% confidence interval of the average eye tracking error corresponding to this group cannot be calculated.

The measurements of the right eye tracking error in the presence of `blue-filter` coating type contain several outliers (ID0552, ID0667), which affects the size of its 95% confidence interval. The limited number of observations made in this group (12) is not sufficient for a reliable estimation.

<div align="center">
  <img src="./media/eyewear-coating-type-barplot.png" width="100%">
</div>

The below table shows the outcome of the **Kruskal–Wallis test** applied to each eyewear coating type separately.

| Eyewear coating type   | Left Eye Error, p-value | Right Eye Error, p-value  |
| :--------------------- | :------------: | :-------------: |
| Anti-reflective        |   0.396        |  0.910          |
| Anti-scratch           |   0.581        |  0.988          |
| Anti-fog               |  not available | not available   |
| Anti-UV                |  0.422         | 0.925           |
| Anti-smudge            |  not available | not available   |
| Blue-filter            |  0.712         | 0.973           |


Summary:

* no support for the statistically significant difference in the eye tracking error between each of the eyewear coating types is found;

* the effect of `anti-smudge` and `blue-filter` eyewear coating types on the eye tracking error needs to be verified with more measurements;

* the relationship between the eye tracking error and `anti-fog` eyewear coating type is unknown, a separate data collection round needs to be done.

## Extra: Machine Learning approach to correlation estimation between the eye tracking error and the eyewear physical attributes (quick and dirty)

One way to quantify the effect of the categorical attributes on the eye tracking error is to train a Machine Learning model (e.g. Random Forest) that uses those attributes to predict the eye tracking error and then to estimate the contribution of every attribute to the model performance.

The permutation importance of features is an evaluation method that measures the correlation between the feature and the target by randomly permuting the values of this feature and measuring its impact on the model performance [scikit-learn].

We choose all the attributes from the above analysis, except `Eyewear coating type` that contains many missing values. All attributes are one-hot-encoded.

```python
from sklearn.preprocessing import OneHotEncoder

features = [
    'Makeup',
    'Eyewear type',
    'Eyewear frame type',
    'Eyewear frame thickness',
    'Eyewear lens type',
    'Eyewear lens height',
]
X = data[features]

encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
encoder.set_output(transform="pandas")

X_new = encoder.fit_transform(X)
new_features = X_new.columns
```

RandomForestRegressor from scikit-learn is selected as a model for the eye tracking error prediction. We fit a separate model for the left eye error and the right eye error. We use the out-of-the-box model with the default hyper-parameters.

```python
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state=12345)
```

For the permutation importance algorithm we use $R^{2}$ as a scoring function which computes the importance score of each feature from 100 random permutations.
```python
from sklearn.inspection import permutation_importance

res = permutation_importance(
    model,
    X_new,
    target,
    scoring="r2",
    n_repeats=100,
    random_state=12345,
)
```

The trained Random Forest model is able to account for only slightly more than 10% of the variation in the eye tracking error between the selected attributes: $R^{2}$(left eye error) = 13.34% and $R^{2}$(right eye error) = 10.17%. This suggests that the importance scores of the features estimated based on this model cannot be reliable.

The below figure shows the obtained permutation importance scores of the features and confirms the above speculations. We can see that permuting each of the features has a negligible effect on the model performance. For both left eye error and right eye error, the highest importance score has `Single-vision` eyewear lens type.

The results of the Machine Learning approach provide the additional evidence for the findings from the data visualization and the statistical testing, which were discussed previously.

<div align="center">
  <img src="./media/permutation-importance-scores-of-attributes.png" width="100%">
</div>