## Similar articles:

1. [Emotion Recognition for Partial Faces Using a Feature Vector Technique](https://www.mdpi.com/1424-8220/22/12/4633)  
**Summary:** This study addresses facial emotion recognition from partially occluded faces due to masks during the COVID-19 pandemic. The authors propose a three-step method: (1) synthetically masking input images to retain only the upper face region (eyes, eyebrows, part of the nose, and forehead), (2) extracting features using a novel rapid landmark detection technique based on an “infinity shape” model combined with Histogram of Oriented Gradients (HOG), and (3) classifying emotions using a hybrid CNN-LSTM architecture. They evaluated their approach on the CK+ and RAF-DB datasets, achieving high accuracies of 99.30% and 95.58%, respectively, outperforming existing state-of-the-art methods. The core focus was on recognizing emotions using only the visible upper facial region when the lower face is masked.

2. [Mapping the emotional face. How individual face parts contribute to successful emotion recognition](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177239)  
**Summary:** This study investigates which specific facial features human observers rely on to recognize basic emotional expressions (based on Ekman’s model). Using a dynamic masking technique with 48 movable tiles, participants revealed parts of a face until they could identify the emotion, allowing researchers to quantify the diagnostic value of each facial region. Results showed that the eyes and mouth were the most critical areas overall, with sadness and fear primarily recognized via the eyes, and disgust and happiness via the mouth. The most informative regions aligned with known Facial Action Coding System (FACS) action units. A similarity analysis revealed that expressions clustered by emotion rather than low-level visual features, and that reliance on eyes versus mouth structured a continuous psychological space of emotion recognition.

3. [Staged transfer learning for multi-label half-face emotion recognition](https://link.springer.com/article/10.1186/s44147-025-00615-x)  
**Summary:** This study proposes a deep learning approach for recognizing emotions from only half of the human face, introducing EMOFACE—a new dataset with 25 emotion labels for multi-label half-facial emotion classification—and combining it with the FER2013 dataset. Using a staged transfer learning framework with a custom ConvNet and five pre-trained models (VGG16, VGG19, DenseNet, MobileNet, ResNet), the method achieves high performance, reporting average binary accuracies of 0.9244 (training), 0.9152 (validation), and 0.9138 (testing). The research focuses on enabling accurate multi-label emotion recognition from partial facial information, with applications in affective computing, healthcare, robotics, and human–computer interaction.

4. [Experiments on Deep Face Recognition Using Partial Faces](https://ieeexplore.ieee.org/abstract/document/8590066)  
**Summary**: This study evaluates face recognition (not emotion recognition) using partial facial regions—such as eyes, mouth, nose, forehead, and cheeks—with a CNN architecture based on the pre-trained VGG-Face model and classifiers (cosine similarity and linear SVM) on the FEI dataset (200 subjects). It reports recognition rates (e.g., cheeks: 15%; half or 3/4 face: ~100%).

The room for idea contribution:

* Use multiple modern models (e.g., ViT, EfficientNet, ensemble methods),
* Test on diverse datasets (including in-the-wild ones like RAF-DB or AffectNet),
* Include fine-grained analysis per emotion (not just overall accuracy),
* Compare human vs. model reliance on regions.

# Idea analysis: 

People *have* explored parts of the idea (but there’s still room for a clear, systematic benchmark comparing **modern CV models** on *isolated* facial parts like “eyes only”, “mouth only”, “cheeks+nose”, etc.). Below are the most directly relevant papers, what they did, and what gaps you could exploit in your study.

---

# Directly relevant papers (short list + why each matters)

1. **“Mapping the emotional face. How individual face parts contribute to successful emotion recognition” — Wegrzyn et al., PLOS ONE (2017).**
   Human perceptual study that maps which facial regions humans rely on for each basic emotion (eyes & mouth particularly important). Good baseline for comparing human vs model diagnostic regions. ([PMC][1])  

2. **“Facial Expression Analysis under Partial Occlusion: A Survey” — Zhang et al., arXiv (2018).**
   Survey of methods handling occluded/partial faces in FER (classic & deep approaches), references many component/occlusion studies and finds occluded eyes often hurt accuracy most. Useful for method background and prior evaluations of partial-face approaches. ([arXiv][2])

3. **“Facial expression recognition on partially occluded faces” — Bellamkonda et al. (2022 / PMC).**
   Implements a component-based approach splitting the face into parts (eyebrows, eyes, nose, cheeks, mouth) and tests recognition under occlusion (reports per-component performance). Useful as a near-direct comparison to your idea but mostly focuses on robustness to occlusion rather than systematic “parts-only” benchmarking with modern backbones. ([PMC][3])

4. **“Effects of diagnostic regions on facial emotion recognition” — Kim et al., Frontiers in Psychology (2022).**
   Psychology study looking at which facial regions are diagnostic for specific emotions (eyes for anger/fear/sadness; mouth for happiness/disgust). Good for human-ground-truth hypotheses to test against model behavior. ([Frontiers][4])

5. **Component / ensemble CNN works & component ablation studies (examples)** — CES-CNN and related component-based FER papers (2021–2023) — show CNN ensembles trained on face components or tested with occlusions; many report per-component ablation. These are close to what you want but often either use older CNNs or focus on robustness to occlusion rather than clean “eyes-only vs mouth-only vs full-face” benchmarking with modern architectures. (See sample: CES-CNN, component-based ensemble papers). ([ResearchGate][5])

6. **Recent human/biometric analyses (2024–2025) & mask-related emotion work.**
   Several recent studies examine how masking/occlusion (e.g., COVID mask studies) shifted how people rely on eyes vs mouth; these are relevant if you want to model real-world masked inputs or temporal shifts in reliance. ([PubMed][6])

---

# What these papers *do* — and what they don’t fully answer

* **What they do:** map human diagnostic regions (eyes, mouth, eyebrows), study robustness of FER systems to occlusion, and build component-based CNN ensembles that handle missing parts. Several papers report per-part accuracy or occlusion ablation. ([PMC][1])
* **What’s missing / open:** a tightly controlled, modern benchmark that:

  * Evaluates *state-of-the-art deep models* (ResNet/ViT, modern FER-specific networks) **on isolated parts** (e.g., training and/or testing with only eyes / only mouth / cheeks+nose), using consistent protocols and large in-the-wild datasets (AffectNet, RAF-DB, etc.);
  * Compares model attention/feature maps with human diagnostic regions (e.g., eye-tracking or Wegrzyn-style maps) using explainability tools (Grad-CAM, attention maps) in a standardized way;
  * Provides per-part confusion matrices and statistical tests across architectures/datasets to establish whether some emotions are inherently more recoverable from specific parts for machines vs humans.
    Several component papers touch on these ideas but usually with older models, limited datasets, or they focus on occlusion robustness rather than a systematic “parts-only” benchmarking study. ([ResearchGate][5])

---

# Suggested reading order (quick picks to download and read first)

1. Wegrzyn et al., PLOS ONE (2017). ([PMC][1])
2. Zhang et al., *Facial Expression Analysis under Partial Occlusion* (survey, arXiv 2018). ([arXiv][2])
3. Bellamkonda et al. (2022) component occlusion study. ([PMC][3])
4. Kim et al., Frontiers (2022) diagnostic regions. ([Frontiers][4])
5. One or two component-CNN papers (CES-CNN / recent ensemble works) to see implementation choices. ([ResearchGate][5])

---

# Practical recommendations for your paper (design & novelty)

If you want this to be a novel, publication-worthy empirical paper, consider the following design and contributions:

1. **Benchmark scope (novelty):**

   * Evaluate **several modern backbones** (CNNs like ResNet variants, and transformers like ViT or hybrid architectures) trained *from scratch* or fine-tuned on FER datasets.
   * Create controlled inputs: (a) *eyes only*, (b) *mouth only*, (c) *cheeks+nose*, (d) *eyebrows only*, (e) *full face*, (f) *random occlusions* — by cropping or masking the other regions to black/blur/average.
   * Report per-emotion accuracy, per-part confusion matrices, and statistical comparisons across models.

2. **Datasets:** use large/wild datasets so results generalize: **AffectNet**, **RAF-DB**, **FER2013**, plus a posed dataset (CK+, BU-3DFE) for controlled expressions. If you test masked faces, include recent mask datasets or simulate masks (but cite mask impact studies). (You can cite dataset specifics as you use them.) ([Nature][7])

3. **Explainability / human comparison:**

   * Use Grad-CAM / Integrated Gradients / attention maps to show where models look when given full-face vs parts-only inputs. Compare these maps to human diagnostic maps (e.g., Wegrzyn 2017) — this strengthens claims about whether models mimic human reliance on eyes/mouth. ([PMC][1])

4. **Controls & evaluation:**

   * Control for face alignment and scale (use standard face landmark detection).
   * Evaluate cross-dataset generalization: train on one dataset, test on another to see if parts-generalization holds.
   * Use statistical testing (paired tests across seeds/subjects) to show significant differences between parts.

5. **Possible novel contributions:**

   * A **public benchmark and codebase** that systematically evaluates many backbones across standardized face-part splits (I suspect that doesn’t yet exist in a modern benchmark form).
   * A study of **emotion recoverability**: which emotions can modern models reliably predict from *only* eyes vs *only* mouth (with quantitative effect sizes).
   * Comparing **model vs human diagnostic reliance** using the same stimulus set and human responses.

---

# Quick experimental pitfalls to avoid

* Don’t confuse “occluded during training” with “tested on parts-only” — both are informative but answer different questions. Be explicit which you do. ([arXiv][2])
* Make sure the cropping/masking process doesn’t leak other cues (hair, ears) unless intended.
* Action units (FACS) are a different but related line — if you want muscle-level interpretation, consider adding FACS analyses or using datasets annotated with Action Units.