# The Karolinska Directed Emotional Faces

https://www.ugent.be/pp/ekgp/en/research/research-groups/panlab/kdef/karolinska.pdf


From ChatGPT: 

The Karolinska Directed Emotional Faces (KDEF) dataset, used for studying facial expressions, appears to contain detailed information about emotional ratings and characteristics for different images. Here's a breakdown of what each column in the dataset represents based on the information you've provided:

1. **KDEF code**: A unique identifier for each image in the KDEF dataset. The code typically includes details like the subject and the facial expression displayed (e.g., "M11AF" might represent a male subject with an "AF" emotion).
   
2. **Target emotion**: The intended or posed emotion that the subject is displaying in the image (e.g., Fearful, Happy, etc.).

3. **Most rated emotion**: The emotion most frequently recognized or rated by participants who viewed the image. This may or may not match the target emotion (e.g., the subject may be posing as "Fearful," but viewers might rate the emotion as "Surprised").

4. **Percentage (biased) hit rate**: The percentage of participants who correctly identified the target emotion. This is likely a measure of how accurately the emotion was recognized, but it might be biased by factors such as prior expectations or contextual clues.

5. **Mean intensity**: The average intensity rating of the emotion expressed in the image, as rated by participants. This indicates how strongly the participants felt the emotion was expressed.

6. **SD intensity**: The standard deviation of the intensity ratings. This measures the variability in participants' ratings of the emotion's intensity — a higher standard deviation suggests more variability in how different participants perceived the intensity.

7. **Mean arousal**: The average arousal rating of the emotion expressed in the image. Arousal refers to the level of excitement or activation associated with the emotion (e.g., "Fear" might generally be considered a high-arousal emotion, while "Sadness" might be low-arousal).

8. **SD arousal**: The standard deviation of the arousal ratings. This measures the variability in how different participants rated the arousal of the emotion.

9. **Unnamed: 8**: This column seems to contain missing or `NaN` values and is likely an empty or unused column in the dataset. It might have been included accidentally during data processing or export.

10. **Most rated non-target emotion(s)**: The emotion(s) that participants most frequently rated when they didn’t choose the target emotion. This column highlights which other emotions were commonly confused with the target emotion (e.g., participants might have rated "Surprised" when the target emotion was "Fearful").

11. **Percentage most rated non-target emotion(s)**: The percentage of participants who rated the non-target emotion(s). This shows how often participants identified another emotion as being more likely than the target emotion.

### Summary:
- **Target emotion** refers to the intended emotion in the image.
- **Most rated emotion** is the emotion participants thought was displayed.
- **Percentage (biased) hit rate** indicates how often participants correctly identified the target emotion.
- **Mean intensity** and **Mean arousal** measure how intense and arousing the emotion was perceived.
- **SD intensity** and **SD arousal** show the variability in these ratings.
- **Most rated non-target emotion(s)** and the associated percentage highlight common misclassifications.

This dataset seems to be used for analyzing the recognition accuracy of different emotions and understanding how participants perceive emotional intensity and arousal in facial expressions.

In [5]:
import pandas as pd

file_path = 'PICS/DATAMATRIXKDEF.xls'  # Replace with your file's path
df = pd.read_excel(file_path)
df

Unnamed: 0,KDEF code,Target emotion,Most rated emotion,Percentage (biased) hit rate,Mean intensity,SD intensity,Mean arousal,SD arousal,Unnamed: 8,Most rated non-target emotion(s),Percentage most rated non-target emotion(s)
0,M11AF,Fearful,Fearful,84.375000,5.625000,1.430950,3.671875,1.480612,,Sad,6.250000
1,M25AF,Fearful,Fearful,84.126984,5.444444,1.653713,3.539683,1.847626,,Disgusted,7.936508
2,M14AF,Fearful,Fearful,79.687500,6.484375,1.458333,4.312500,1.958903,,Surprised,18.750000
3,M17AF,Fearful,Fearful,78.125000,6.671875,1.403563,4.140625,1.876058,,Angry,14.062500
4,M19AF,Fearful,Fearful,78.125000,7.546875,1.413424,4.619048,2.113158,,Surprised,14.062500
...,...,...,...,...,...,...,...,...,...,...,...
485,M26AF,Fearful,Surprised,19.047619,3.933333,1.867353,3.049180,1.774131,,Surprised,26.984127
486,F19AF,Fearful,Surprised,15.625000,6.671875,1.633525,3.888889,1.960177,,Surprised,73.437500
487,F15AF,Fearful,Surprised,9.375000,4.950820,1.764712,3.387097,1.749518,,Surprised,43.750000
488,F35AF,Fearful,Surprised,9.375000,4.875000,1.685607,3.296875,1.696679,,Surprised,67.187500


# Gender bias?

In [7]:
# Count how many rows start with 'M' (for male) and 'F' (for female) in the 'KDEF code' column
male_count = df[df['KDEF code'].str.startswith('M')].shape[0]
female_count = df[df['KDEF code'].str.startswith('F')].shape[0]

# Print the results
print(f"Number of male entries: {male_count}")
print(f"Number of female entries: {female_count}")


Number of male entries: 245
Number of female entries: 245


In [12]:
# Group by gender (based on KDEF code) and target emotion to check interaction
gender_emotion_interaction = df.groupby([df['KDEF code'].str[0], 'Target emotion']).size().unstack()

# Print the results
print("Gender and Emotion Interaction:\n", gender_emotion_interaction)


Gender and Emotion Interaction:
 Target emotion  Angry  Disgusted  Fearful  Happy  Neutral  Sad  Surprised
KDEF code                                                                
F                  35         35       35     35       35   35         35
M                  35         35       35     35       35   35         35


# Emotion Bias? 

In [9]:
# Count occurrences of each target emotion
target_emotion_counts = df['Target emotion'].value_counts()

# Print the results
print("Target Emotion Counts:\n", target_emotion_counts)


Target Emotion Counts:
 Fearful      70
Surprised    70
Angry        70
Neutral      70
Sad          70
Disgusted    70
Happy        70
Name: Target emotion, dtype: int64


# Emotion Recognition Bias
more likely to be misclassified

In [10]:
# Count occurrences of each most rated non-target emotion
non_target_emotion_counts = df['Most rated non-target emotion(s)'].value_counts()

# Print the results
print("Non-Target Emotion Confusion Counts:\n", non_target_emotion_counts)


Non-Target Emotion Confusion Counts:
 Sad                              100
Fearful                           88
Angry                             60
Surprised                         52
Disgusted                         51
Indistinct                        38
Neutral                           20
Happy                             10
Disgusted/Indistinct               5
Angry/Sad                          5
Sad/Surprised                      4
Surprised/Indistinct               4
Neutral/Indistinct                 3
Sad/Indistinct                     3
Disgusted/Sad                      3
Fearful/Sad                        2
Angry/Surprised                    2
Fearful/Disgusted                  2
Fearful/Surprised                  2
Sad/Neutral                        2
Disgusted/Sad/Indistinct           1
Fearful/Indistinct                 1
Fearful/Disgusted/Sad/Neutral      1
Sad/Neutral/Indistinct             1
Happy/Neutral                      1
Angry/Neutral                      1


# intensity bias?


In [11]:
# Group by target emotion and calculate the average mean intensity for each emotion
intensity_by_emotion = df.groupby('Target emotion')['Mean intensity'].mean()

# Print the results
print("Mean Intensity by Emotion:\n", intensity_by_emotion)


Mean Intensity by Emotion:
 Target emotion
Angry        5.589086
Disgusted    6.232880
Fearful      5.389087
Happy        6.097801
Neutral      4.752095
Sad          5.281020
Surprised    5.889920
Name: Mean intensity, dtype: float64
