# Facial Recognition 
#### *Jordan Mehravar, Matthew Clappison, Karisa Parkington*


# **Agenda**
* *Introduction*
* *Demonstration*
* *Current Models*

# **Facial Recognition**
* Human face is like a complex fingerprint
* Evolutionary Purpose
* Humans vs. Machines
* Complexity and robustness of the human visual-recognition system
  * Fusiform face area
  * Speed at which information is processed
  * Combines many mental processes such as **perception, memory and judgment**
  * Problems that impact the accuracy of facial recognition
   * **Pose**
   * **Age**
   * **Lighting**
   * **Emotional State**
* Applications
 

# Background 
#### <u>Fovea</u>- The fovea is a portion of the retina where the visual resolution is highest. The center of your vision is where the fovea is focused.
![alt text](https://i.imgur.com/2B8kEdA.png width=100)
#### <u>Saccade</u>- A rapid eye movement between two points. During this time the visual system is not able to take in information. The saccade reaches speeds of 900 degrees/second.

# Original Work- *How does the human visual recognition system work?*


## Nigel D. Haig (1985) 
**How faces differ a new comparative technique - Where do humans look to accuratly identify faces?**
* Designed a program (before the time of eye tracking software) to determine location of optimal fixation point
* Humans show a strong preference for the eyes, eye brows, upper lip and mouth area with majority of the focus on the eyes


# Demonstration


## Schyns et al. (2002)
**Understanding Recognition From the use of visual information**
* Determined how much information is required to reach a correct conclussion 75% of the time across 3 tasks of identity, gender and expressivness 
* Demonstrated that different recognition tasks also benefit from different sets of data ( more focus on the mouth when idenitfying gender )
![alt text](https://www.researchgate.net/profile/Caroline_Blais/publication/45440659/figure/fig2/AS:340928754667529@1458295157799/Visual-information-used-effectively-to-identify-faces-a-in-our-study-B-in-Schyns-et.png)

# Peterson & Eckstein (2012, 2013) 

![alt text](https://i.imgur.com/sCjMk9Z.png)

## Spatially Variant Contrast-Sensitivity Function (SVCSF) 

A computational model which simulates human vision.
<br> <br>

![alt text](http://home.deib.polimi.it/boracchi/Projects/Foveation/Lena_Foveated.png)

<br>
Each face image is divided into bins and Contrast-Sensitivity Function applied to each bin.

## Spatially Variant Contrast-Sensitivity Function (SVCSF)
<br>
# $$SVCSF (f,r,\theta) = c_0f^{a_0}e^{-b_0f - d_0(\theta)rf}$$
<br>
**$a_o, b_o, c_o$** = predetermined constants <br>
*Bandpass filter of basic vision spatial frequencies <br>
Achieve a peak constrast sensitivity of 1 & a foveal peak frequency of 4 Hz* 
<br><br>
**$f$** = image spatial frequency (Hz) --> variable <br>
*Changes with face image presentation, but not across fixations*
<br><br>
**$d_o$** = eccentricity factor --> constant (0.3-0.5) <br> *Exponential decay rate of the visual percept across the human visual field*
<br><br>
**$r$** = eccentricity from fovea (degree of visual angle) --> variable <br>
*Changes with each fixation*
<br><br>
**$\theta$** = coordinate direction from fixation --> vector/matrix <br>
*Rotational angle from horizontal axis which can have horizontal, upward, and downward direction <br>
Changes with each fixation*

## Foveated Ideal Observer (FIO)
<br>
A computational model which determines the identity of a face, given visual fixation information. <br> <br>
Based on trained data, what is the likihood that face *i* is shown, given fixation *k*? <br>
*Supervised learning* <br><br>
Input face signal = filtered underlying signal ($\textbf{s}_0$ - based on SVCSF) + Gaussian noise ($\textbf{n}_{ex}$) + unfiltered internal noise ($\textbf{n}_{in}$)

![alt text](https://i.imgur.com/2TBHtFN.png)

## Foveated Ideal Observer (FIO)
<br>
# $$\ell_{i,k} = e^{-\frac{1}{2}{{(\textbf{r}_k - {\boldsymbol\mu}_{i,k})}}^T{\Sigma}^{-1}_k{(\textbf{r}_k - {\boldsymbol\mu_{i,k}})}}$$
<br><br>
$\ell_{i,k}$ = likelihood that face $i$ is presented <br>
In the end, the FIO will take the maximum likelihood (i.e., the face that is most likely to be presented given all fixation information) <br><br>
$\textbf{r}_k$ = dot product of template responses --> vector <br>
*Filtered underlying signal and filtered noise-free templates <br>
$$\textbf{r}_k = [\textbf{SVCSF}_k\textbf{s}_1,...,\textbf{SVCSF}_k\textbf{s}_n]^T(\textbf{SVCSF}_k(\textbf{s}_0 + \textbf{n}_{ex}) + \textbf{n}_{in})$$
<br><br>
*$\boldsymbol\mu_{i,k}$* = expected mean response --> vector <br> 
*Assuming face $i$ is presented* <br>
$${\boldsymbol\mu}_{i,f,k} = E[r_{i,f,k}]$$ <br>
where E[...] indicates the expected value operator
<br><br>
Note the transposition (*T*) components. <br>
*Vector multiplication*

# Research Implications
<br>
Classifying and recognizing identity under different conditions (e.g., facial expression, gaze direction, gender, ethnicity, occluded/missing features, etc.)
<br><br>
Quantifying individual differences in face recognition performance and fixation viewing strategies
<br><br>
Quantifying face recognition abilities and fixation viewing strategies in clinical populations who show face recognition impairments (e.g., prosopagnosia/faceblindness, autism spectrum disorder)

![alt text](https://i.imgur.com/BYYCZeD.png)

Klin et al. (2002); Pelphreys et al. (2002)

![alt text](https://i.imgur.com/O0TiF7s.png)

![alt text](https://i.imgur.com/lvkQg01.png)