Below is a **comprehensive schema**, **fully aligned with your sample transcripts**, and includes:

- All characters (kids, adults, pets)
- All actions
- Visual AOIs
- Colors, racial/gender cues
- Spatial layout
- Meta-commentary opportunities (gender role, social assumptions)

---

### A. FINAL ICU MASTER LIST (20 Units)

| ICU Code | Description | Key Terms (nouns, actions, color/race cues) |
|----------|-------------|---------------------------------------------|
| `ICU_BoyOnStool` | Boy climbing/stumbling on a stool | boy, kid, child, stool, falling, tipping, yellow stool, red/white shirt, green shorts, red socks, balance |
| `ICU_CookieJar` | Cookie jar being reached for or spilling | jar, cookie jar, reaching, grabbing, shelf, open cabinet, falling cookies |
| `ICU_GirlEatingCookie` | Girl eating or smiling with cookie | girl, cookie, eating, biting, chewing, happy, blue/white striped shirt, blue skirt, pink shoes |
| `ICU_DogLickingFloor` | Dog eating crumbs/cookies off floor | dog, licking, crumbs, floor, white dog, beige spots, tongue |
| `ICU_ManDoingDishes` | Father/dad washing dishes | man, father, dad, washing, dishes, sponge, blue shirt, plate, sink |
| `ICU_SinkOverflow` | Overflowing sink with soap/water | sink, overflow, soap, foam, bubbles, suds, flooding, running water |
| `ICU_WomanMowing` | Woman mowing lawn with phone | woman, mom, mowing, lawnmower, grass, phone, distracted, red skirt, orange blouse, cell phone |
| `ICU_CatBirds` | Black cat interacting with birds | cat, black cat, chasing, birds, watching, blue/yellow birds |
| `ICU_YardScene` | Buildings, windows, fence, sky | fence, lawn, building, turquoise, yellow, sky, window, neighborhood |
| `ICU_Curtains` | Pink curtains with patterns | curtain, polka dot, pink, tiebacks, yellow flowers |
| `ICU_SpilledWaterFloor` | Water on the floor | puddle, dripping, wet, floor, soap on ground |
| `ICU_KitchenDetails` | Cabinets, colors, knobs, furniture | cabinet, blue interior, white drawers, countertop, knobs, L-shape |
| `ICU_ObjectsWithColor` | Vivid object descriptions | red socks, green shorts, blue sponge, pink shoes, turquoise pants |
| `ICU_RaceEthnicity` | Skin tone or ethnicity noted | black, white, African American, dark/light skin, mixed race |
| `ICU_GenderNormCommentary` | Commentary on roles (dad cleaning, mom outside) | dad, mom, roles, "should", "normally", reversed, feminism, switched, expectations |
| `ICU_DetailRichness` | Mentions of object + clothing + color + action | any 3+ layered descriptions (e.g., blue sponge + washing + plate) |
| `ICU_CharacterEmotion` | Facial expressions, emotions | smiling, angry, anxious, scared, serious, happy, nervous |
| `ICU_MessChaos` | Mentions of clutter, chaos, mess | mess, disaster, chaos, out of control, too much |
| `ICU_ObjectActionPairs` | Correct verb-object mapping | reaching cookie jar, eating cookie, washing dishes |
| `ICU_BlendNarrativeMeta` | High-level narrative/meta (e.g., "blended family", "switched roles") | story, family, blended, roles, feminism, spring day, commentary |

---

### B. FINAL AOI DEFINITIONS (Visual Regions + Semantics)

| AOI Code | Region | Associated Entities/Keywords |
|----------|--------|------------------------------|
| `AOI_KidsLeft` | Left scene: kids + dog | boy, girl, cookie jar, stool, dog, floor, crumbs |
| `AOI_KitchenCenter` | Center: man + sink | man, dishes, sponge, blue shirt, sink, overflow |
| `AOI_BackyardRight` | Background: woman + cat + birds | woman, lawn, phone, mower, cat, birds, flowers, fence |
| `AOI_WindowArea` | Visual window frame + outside buildings | window, buildings, turquoise, yellow, sky, fence, sunlight |
| `AOI_InteriorDetails` | Furnishing/colors | curtain, cabinet, knobs, white/blue decor, countertop |
| `AOI_IdentityFocus` | Mentions of race, gender roles | black girl, white boy, mom/dad roles, blended, colorism |
| `AOI_ColorFocus` | Specific object colors | red, green, yellow, pink, turquoise, brown, orange, white, black |

---

### C. Features per Transcript

| Feature | Description |
|---------|-------------|
| `ICU_<unit>` | 1 if ICU is mentioned |
| `AOI_<region>` | 1 if keywords for that AOI are mentioned |
| `ICU_Count` | Total ICUs mentioned |
| `ICU_Coverage` | Proportion of 20 total ICUs present |
| `ICU_Specificity` | Avg word count of ICU-related mentions |
| `Mentions_Race`, `Mentions_GenderRoles` | Binary flags |
| `Uses_Colors` | 1 if specific color words mentioned |
| `Emotion_Descriptions` | Mentions of feeling states |
| `Scene_Description_Order` | Optional: order of AOI mentions |
| `Narrative_MetaScore` | # of high-level narrative/meta-commentary phrases |
| `Mentions_ObjectsWithColor` | Specific objects + color pairing (e.g., “blue sponge”) |
| `Character_Association_Correct` | Logical matching: man-dishes, girl-cookie, boy-stool |

In [17]:
import pandas as pd
import re

data_path = "/Users/cynthianyongesa/Desktop/Desktop - Cynthia's Macbook Pro/DATA/4_PA_LAB_PY/1_SPEECH_COOKIE/cookie_transcripts_clean.csv"

data = pd.read_csv(data_path)

ICU_KEYWORDS = {
    'ICU_BoyOnStool': [
        'boy', 'child', 'kid', 'stool', 'stepping', 'balancing', 'climbing',
        'tipping', 'falling', 'toppling', 'leaning', 'yellow stool', 'unstable', 
        'red and white shirt', 'green shorts', 'red socks', 'gray shoes'
    ],
    'ICU_CookieJar': [
        'cookie jar', 'jar of cookies', 'cookies', 'jar', 'reaching', 'grabbing', 
        'open cabinet', 'cupboard', 'top shelf', 'container'
    ],
    'ICU_GirlEatingCookie': [
        'girl', 'cookie', 'eating', 'chewing', 'smiling', 'snack', 
        'dark skin', 'blue and white shirt', 'blue skirt', 'pink shoes', 
        'white socks', 'black hair', 'darker skinned'
    ],
    'ICU_DogLickingFloor': [
        'dog', 'puppy', 'licking', 'eating off the floor', 'crumbs', 'cookies on floor',
        'white dog', 'beige spots', 'floor licking', 'tongue', 'sniffing', 'picking up'
    ],
    'ICU_ManDoingDishes': [
        'man', 'father', 'dad', 'washing dishes', 'cleaning', 'plate', 'sink', 
        'blue shirt', 'sponge', 'brown shoes', 'teal pants', 'rolled sleeves'
    ],
    'ICU_SinkOverflow': [
        'sink', 'overflowing', 'soap suds', 'foam', 'bubbles', 'water running', 
        'spilling', 'soap overflowing', 'counter flooding', 'wet floor'
    ],
    'ICU_WomanMowing': [
        'woman', 'mom', 'outside', 'lawnmower', 'mowing', 'phone', 'cell phone', 
        'talking', 'red skirt', 'orange blouse', 'blonde hair', 'flowers', 'cutting lawn'
    ],
    'ICU_CatBirds': [
        'cat', 'black cat', 'watching birds', 'chasing birds', 'birds', 
        'fence', 'black animal', 'three birds', 'yellow and blue birds'
    ],
    'ICU_YardScene': [
        'window', 'sky', 'fence', 'building', 'yellow building', 'turquoise building', 
        'neighborhood', 'outside view', 'modern buildings'
    ],
    'ICU_Curtains': [
        'curtains', 'pink curtain', 'tied back', 'patterned curtain', 'polka dot', 
        'yellow flowers', 'pink with orange', 'tiebacks', 'kitchen window curtains'
    ],
    'ICU_SpilledWaterFloor': [
        'water on the floor', 'soap on the ground', 'puddle', 'spilled water', 
        'wet floor', 'overflow onto floor', 'leaking', 'soapy mess'
    ],
    'ICU_KitchenDetails': [
        'cabinet', 'white cabinets', 'blue inside', 'drawer', 'countertop', 
        'knobs', 'kitchen layout', 'interior design', 'blue trim'
    ],
    'ICU_ObjectsWithColor': [
        'red socks', 'green shorts', 'blue sponge', 'pink shoes', 'turquoise pants',
        'orange top', 'blonde hair', 'white socks', 'gray shoes', 'blue sky'
    ],
    'ICU_RaceEthnicity': [
        'black', 'white', 'african american', 'dark skin', 'light skin', 
        'mixed race', 'skin tone', 'ethnicity', 'racial'
    ],
    'ICU_GenderNormCommentary': [
        'mom', 'dad', 'should be', 'normally', 'gender roles', 'doing moms job', 
        'reversed roles', 'gender switched', 'traditional', 'feminism', 'masculinity', 
        'mothers work', 'father taking over'
    ],
    'ICU_DetailRichness': [
        'striped shirt', 'blue sponge', 'silver buckle', 'rolled sleeves', 
        'sideburns', 'neatly combed hair', 'ballet flats', 'flowered curtain', 
        'curved faucet', 'black nose', 'collar', 'glassy eyes'
    ],
    'ICU_CharacterEmotion': [
        'smiling', 'happy', 'laughing', 'scared', 'anxious', 'serious', 'frowning',
        'tense', 'fearful', 'focused', 'expression'
    ],
    'ICU_MessChaos': [
        'mess', 'chaos', 'disaster', 'clutter', 'out of control', 'too much', 
        'spilled everywhere', 'overwhelmed', 'unattended'
    ],
    'ICU_ObjectActionPairs': [
        'reaching for cookie', 'eating cookie', 'washing dish', 'mowing lawn', 
        'licking floor', 'falling stool', 'grabbing jar'
    ],
    'ICU_BlendNarrativeMeta': [
        'story', 'scene', 'blended family', 'roles reversed', 'trying to bond', 
        'switching roles', 'narrative', 'gender norms', 'social expectations'
    ]
}

AOI_KEYWORDS = {
    'AOI_KidsLeft': [
        'boy', 'girl', 'cookie jar', 'stool', 'dog', 'crumbs', 'cabinet', 
        'cookies on floor', 'eating', 'reaching', 'yellow stool'
    ],
    'AOI_KitchenCenter': [
        'man', 'dad', 'sink', 'plate', 'dishes', 'soap', 'sponge', 
        'overflow', 'countertop', 'floor puddle', 'wet floor'
    ],
    'AOI_BackyardRight': [
        'woman', 'mom', 'outside', 'lawnmower', 'phone', 'cat', 'birds', 
        'fence', 'grass', 'flowers', 'talking', 'cutting lawn'
    ],
    'AOI_WindowArea': [
        'window', 'yellow building', 'turquoise building', 'sky', 
        'modern buildings', 'view outside', 'sunlight', 'neighborhood'
    ],
    'AOI_InteriorDetails': [
        'curtains', 'cabinet', 'drawer', 'blue trim', 'knob', 'white cabinets', 
        'pink curtain', 'floor', 'countertop', 'sink area'
    ],
    'AOI_IdentityFocus': [
        'black', 'white', 'skin tone', 'mom', 'dad', 'gender roles', 
        'race', 'blended', 'feminism', 'traditional', 'ethnicity'
    ],
    'AOI_ColorFocus': [
        'red', 'blue', 'green', 'pink', 'yellow', 'turquoise', 
        'orange', 'white', 'black', 'gray', 'blonde', 'brown'
    ]
}

def extract_icus_aois(text, icu_dict, aoi_dict):
    text = str(text).lower()
    icu_flags = {}
    aoi_flags = {}

    icu_count = 0
    icu_lengths = []

    for icu, keywords in icu_dict.items():
        found = any(kw in text for kw in keywords)
        icu_flags[icu] = int(found)
        if found:
            icu_count += 1
            spans = [kw for kw in keywords if kw in text]
            icu_lengths.append(sum(len(kw) for kw in spans))

    for aoi, keywords in aoi_dict.items():
        aoi_flags[aoi] = int(any(kw in text for kw in keywords))

    total_icus = len(icu_dict)
    icu_coverage = icu_count / total_icus
    icu_specificity = sum(icu_lengths) / icu_count if icu_count > 0 else 0

    return {
        **icu_flags,
        **aoi_flags,
        'ICU_Count': icu_count,
        'ICU_Coverage': icu_coverage,
        'ICU_Specificity': icu_specificity
    }

icu_aoi_results = data['transcript_clean'].apply(lambda x: extract_icus_aois(x, ICU_KEYWORDS, AOI_KEYWORDS))
icu_aoi_df = pd.DataFrame(icu_aoi_results.tolist())
data = pd.concat([data, icu_aoi_df], axis=1)

data.to_csv("cookie_transcripts_ICU_AOI.csv", index = False)