## **Step 1 - keywords Extraction**
***

We have two datasets, one with dream text descriptions:

In [None]:
from keyword_extractor import read_datasets, extract_and_save_keywords_from_dataframes
from yaml_parser import load_config
config = load_config()
dream_df, keywords_df = read_datasets(config)
dream_df.head()

And another one with interpretations of dreams according to keywords:

In [None]:
keywords_df

Now, we will use pretrained LLMs in order to extract the given keywords from the keywords dataset , from the dream text description from the dream text dataset.

### **GPT2**
***

In [None]:
dream_df = extract_and_save_keywords_from_dataframes()

In [None]:
css = """
    .table-style {
                  width: 100%;
                  border-style: solid;
                  border-width: 5px;
}

    .table-style td {
                  white-space:pre
                  width: 100px;
                  border-style: solid;
                  border-width: 5px;
}
"""

In [None]:
dream_df[["text_dream","Dream Symbol"]][:100].style\
  .set_table_attributes('class="table-style"')\
  .to_html("datasets/dream_and_its_keys.html", index=False, classes=css, border=2)


## Step 2 - Summarize interpretations

### Load data and prepare (small) dataset for experimenting

In [1]:
import pandas as pd
from datetime import datetime
from transformers import pipeline
from utils import  release_all_gpu_memory, save_df_as_pretty_html


In [9]:
from summarizer import load_causal_model, batch_generate_interpretations

import torch

In [2]:
dream_df= pd.read_csv('datasets/rsos_dream_data.tsv', sep='\t')
dream_df

Unnamed: 0,dream_id,dreamer,description,dream_date,dream_language,text_dream,characters_code,emotions_code,aggression_code,friendliness_code,...,Animal,Friends,Family,Dead&Imaginary,Aggression/Friendliness,A/CIndex,F/CIndex,S/CIndex,NegativeEmotions,Dream Symbol
0,1,alta,Alta: a detailed dreamer,1957,en,"The one at the Meads's house, where it's bigge...","2ISA, 1MKA, 1FDA, 1IOA, 2ISA",,2IKA > Q,2IKA 4> Q,...,0.000000,0.200000,0.200000,0.0,0.000,0.200000,0.200000,0.0,0.0,"Upstairs,Haunted House,Maid,Tea House,Mansion"
1,2,alta,Alta: a detailed dreamer,8/11/1967,en,I'm at a family reunion in a large fine house ...,"2ISA, people, 2ISA",SD 2IKA,"D > Q, Q > 2ISA",,...,0.000000,0.000000,0.000000,0.0,1.000,0.666667,0.000000,0.0,1.0,"Fainting,Vertigo,Near Death Experience,Landsli..."
2,3,alta,Alta: a detailed dreamer,8/1/1985,en,I watch a plane fly past and shortly realize i...,"2ISA, 2ISA, 1FSA, 1MBA, 1IOA, 2ISA, 2FDA","SD 1ISA, AP D, AP D","It PRP >, It PRP >, D > 1FKA",,...,0.000000,0.000000,0.285714,0.0,1.000,0.428571,0.000000,0.0,1.0,"Kidnap,Emergency Alert,Death Penalty,Abortion,..."
3,4,alta,Alta: a detailed dreamer,1985?,en,Me pulling the green leaves and berries off so...,"1MAA, 1FMA, 2ISA, 2IKA, 1ANI, 1ANI, 1IOA, 2ISA...","SD 2ISA, SD D","Q > Q, 2ISA > Q, 2ISA > Q, D > 1MSA","1IKA 4> Q, 2ISA 4> 2ISA",...,0.176471,0.142857,0.142857,0.0,1.000,0.235294,0.117647,0.0,1.0,"Kidnap,Knocking,Party,Sabotage,Paranormal"
4,5,alta,Alta: a detailed dreamer,1985?,en,I'm in a room that reminds me of (but definite...,"1IRA, 1MSA, 1ISA, 2ISA, 1ISA, 1IKA","AP D, AP D, AP 1MSA, CO D, SD D, AP D","1MSA > D, Q > Q, D > 2IKA, D > 2IKA, D > 1MSA,...",D 4> Q,...,0.000000,0.166667,0.166667,0.0,0.875,1.333333,0.166667,0.0,1.0,"Haunted House,Paranormal,Near Death Experience..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20995,33062,west_coast_teens,West Coast teenage girls,"F, age 18",en,The dream was about me and my boyfriend going ...,"2ISA, 2ISA, 1ISA","HA D, AP D","2IKA > Q, D > D",,...,0.000000,0.000000,0.000000,0.0,1.000,0.666667,0.000000,0.0,0.5,"Scared,Near Death Experience,Landslide,Paranor..."
20996,33063,west_coast_teens,West Coast teenage girls,"F, age 18",en,Two weeks ago this guy asked me to Senior Ball...,1ISA,HA D,,,...,0.000000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,1.0,0.0,"Tap Dancing,Dance,Bait,Lake,Koi Fish"
20997,33064,west_coast_teens,West Coast teenage girls,"F, age 18",en,My boyfriend just broke up with me so he was o...,"1ISA, 1FSA",AP D,Q > D,,...,0.000000,0.000000,0.000000,0.0,1.000,0.500000,0.000000,0.0,1.0,"Deja Vu,Fairy,Past Life,Fairy Tale,Ghost"
20998,33065,west_coast_teens,West Coast teenage girls,"F, age 18",en,I was in my backyard and I was flying. I would...,1ANI,AN 1ISA,"1ANI > D, 1ANI > Q, 1ISA 1> Q",,...,1.000000,0.000000,0.000000,0.0,1.000,3.000000,0.000000,0.0,1.0,"Jumping,Near Death Experience,Landing,Leaping,..."


In [3]:
keywords_df = pd.read_csv("datasets/fixed_interpretations.csv")
keywords_df

Unnamed: 0,Dream Symbol,Interpretation
0,Aardvark,To see an aardvark in your dream indicates tha...
1,Abandonment,To dream that you are abandoned suggests that ...
2,Abduction,To dream of being abducted indicates that you ...
3,Aborigine,To see an Aborigine in your dream represents b...
4,Abortion,To dream that you have an abortion suggests th...
...,...,...
1193,Zip Line,To dream that you are zip lining implies that ...
1194,Zombie,To see or dream that you are a zombie suggests...
1195,Zoomorphism,To dream that you are changing into the form o...
1196,com Tambourine,To see or play a tambourine in your dream symb...


In [4]:
exmpl = dream_df[dream_df["text_dream"].str.len()< 300]

In [5]:
exmpl = exmpl[["text_dream","Dream Symbol"]].sample(5, random_state=44)

In [6]:
exmpl

Unnamed: 0,text_dream,Dream Symbol
1366,I see a man sitting on the tail of a giant fis...,"Sea Creature,Mammoth,Manatee,Salamander,Sea Horse"
1219,I had two dreams about Michael. Both times he ...,"Near Death Experience,Telekinesis,Haunted,Deja..."
20120,"I got up during the night for some reason, and...","Nap,Pajamas,Up,Accident,Undress"
5372,Am aboard bus in a strange city and don't know...,"School Bus,Vertigo,Railroad Crossing,Earplugs,..."
10220,This girl was underwater. Some guy got her (th...,"Underwater,Waterbed,Waterslide,Ocean,Past Life"


In [7]:
exmpl["Dream Symbol"]

1366     Sea Creature,Mammoth,Manatee,Salamander,Sea Horse
1219     Near Death Experience,Telekinesis,Haunted,Deja...
20120                      Nap,Pajamas,Up,Accident,Undress
5372     School Bus,Vertigo,Railroad Crossing,Earplugs,...
10220       Underwater,Waterbed,Waterslide,Ocean,Past Life
Name: Dream Symbol, dtype: object

In [8]:
dataset = []

prmt = """Given dream description, interpret the meaning of the dream. 
Provided also are the dream symbols that appear in the dream and their meanings. 
Use the dream symbols meanings to help you interpret the dream. """.replace("\n", " ")

rs = 42

for i, ex in exmpl.iterrows():
    #print(ex)
    keys = ex["Dream Symbol"].split(",")[:5]
    
    #print(keys)
    syms = keywords_df[keywords_df["Dream Symbol"].isin(keys)]

    descr = syms.apply(lambda r: f' - {r["Dream Symbol"]}:  {r["Interpretation"]}', axis = 1)
    item = {
        "prompt": prmt, 
        "dream": ex["text_dream"],
        "symbols": "\n".join(descr),
        }
    dataset.append(item)
    rs += 1
    

dataset = pd.DataFrame(dataset)
dataset


Unnamed: 0,prompt,dream,symbols
0,"Given dream description, interpret the meaning...",I see a man sitting on the tail of a giant fis...,- Mammoth: To see a mammoth in your dream im...
1,"Given dream description, interpret the meaning...",I had two dreams about Michael. Both times he ...,- Deja Vu: To dream of Déjà Vu indicates som...
2,"Given dream description, interpret the meaning...","I got up during the night for some reason, and...",- Accident: To dream that you are in an acci...
3,"Given dream description, interpret the meaning...",Am aboard bus in a strange city and don't know...,- Earplugs: To dream that you are wearing or...
4,"Given dream description, interpret the meaning...",This girl was underwater. Some guy got her (th...,- Ocean: To see an ocean in your dream repre...


### Summarize with flan-T5-large model

In [11]:
release_all_gpu_memory()

clearing cuda cache
clearing ipc cache
✅ All GPU memory cleared.


In [None]:
# Step 1: Load FLAN-T5 model and tokenizer
model_name = "google/flan-t5-large"
model_name_short = model_name.split("/")[-1]
device = 0 if torch.cuda.is_available() else -1
model, tokenizer = load_causal_model(model_name)

In [13]:
text2text_generator = pipeline(
        "text2text-generation",
        model=model,
        tokenizer=tokenizer,
        max_length=1024,           # ✅ allow longer input
        truncation=True,           # ✅ ensure truncation at tokenizer level
        device=device,
    )

Device set to use cuda:0


In [17]:
tstp = datetime.now().strftime(r"%y.%m.%d-%H")
result_df = batch_generate_interpretations(dataset, text2text_generator, batch_size=1, max_length=250)


Generating Interpretations:   0%|          | 0/5 [00:00<?, ?it/s]Ignoring args : ({'max_length': 250},)
Generating Interpretations:  20%|██        | 1/5 [00:00<00:03,  1.19it/s]Ignoring args : ({'max_length': 250},)
Generating Interpretations:  40%|████      | 2/5 [00:01<00:01,  1.89it/s]Ignoring args : ({'max_length': 250},)
Generating Interpretations:  60%|██████    | 3/5 [00:01<00:00,  2.94it/s]Ignoring args : ({'max_length': 250},)


⚠️ Prompt truncated: 569 tokens (limit = 512)


Generating Interpretations:  80%|████████  | 4/5 [00:01<00:00,  3.57it/s]Ignoring args : ({'max_length': 250},)


⚠️ Prompt truncated: 729 tokens (limit = 512)


Generating Interpretations: 100%|██████████| 5/5 [00:02<00:00,  2.04it/s]


In [None]:
postproc = lambda out: out["generated_text"].strip()
result_df["interpretation"] = result_df["interpretation"].apply(postproc)


In [18]:
result_df

Unnamed: 0,prompt,dream,symbols,interpretation
0,"Given dream description, interpret the meaning...",I see a man sitting on the tail of a giant fis...,- Mammoth: To see a mammoth in your dream im...,{'generated_text': 'A man is sitting on the ta...
1,"Given dream description, interpret the meaning...",I had two dreams about Michael. Both times he ...,- Deja Vu: To dream of Déjà Vu indicates som...,{'generated_text': 'Michael is a close friend ...
2,"Given dream description, interpret the meaning...","I got up during the night for some reason, and...",- Accident: To dream that you are in an acci...,{'generated_text': 'Up'}
3,"Given dream description, interpret the meaning...",Am aboard bus in a strange city and don't know...,- Earplugs: To dream that you are wearing or...,{'generated_text': 'A school bus in a strange ...
4,"Given dream description, interpret the meaning...",This girl was underwater. Some guy got her (th...,- Ocean: To see an ocean in your dream repre...,{'generated_text': 'dream that you have a past...


In [19]:
result_df.columns

Index(['prompt', 'dream', 'symbols', 'interpretation'], dtype='object')

In [None]:
save_df = result_df[['prompt', 'symbols','dream', 'interpretation']]

path = f"output/{model_name_short}_{tstp}"

save_df_as_pretty_html(save_df, path + ".html")

save_df.to_csv(path + ".csv")

✅ HTML table saved to: output/flan-t5-large_25.04.17-15.html


In [23]:
result_df.interpretation.str.len()

0    1
1    1
2    1
3    1
4    1
Name: interpretation, dtype: int64

### Summarize with Mistral model

In [10]:
from summarizer import load_mistral_4bit_model

In [11]:
dataset

Unnamed: 0,prompt,dream,symbols
0,"Given dream description, interpret the meaning...",I see a man sitting on the tail of a giant fis...,- Mammoth: To see a mammoth in your dream im...
1,"Given dream description, interpret the meaning...",I had two dreams about Michael. Both times he ...,- Deja Vu: To dream of Déjà Vu indicates som...
2,"Given dream description, interpret the meaning...","I got up during the night for some reason, and...",- Accident: To dream that you are in an acci...
3,"Given dream description, interpret the meaning...",Am aboard bus in a strange city and don't know...,- Earplugs: To dream that you are wearing or...
4,"Given dream description, interpret the meaning...",This girl was underwater. Some guy got her (th...,- Ocean: To see an ocean in your dream repre...


In [12]:
release_all_gpu_memory(["model", "tokenizer", "text2text_generator"])


['model', 'tokenizer', 'text2text_generator', 'model', 'tokenizer', 'text2text_generator']
clearing cuda cache
clearing ipc cache
✅ All GPU memory cleared.


In [13]:
print("Loading Mistral-7B-Instruct in 4-bit...")

model_name = "mistralai/Mistral-7B-Instruct-v0.2"
model_name_short = model_name.split("/")[-1]
  
max_new_tokens=256

model, tokenizer = load_mistral_4bit_model(model_name)


Loading Mistral-7B-Instruct in 4-bit...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [14]:
model_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=max_new_tokens,
        do_sample=False
    )

Device set to use cuda:0


In [None]:

print("\n🧠 Running interpretations...")
tstp = datetime.now().strftime(r"%y.%m.%d-%H")

result_df = batch_generate_interpretations(dataset, model_pipeline, batch_size=4)
#print(result_df[["dream", "interpretation"]])



🧠 Running interpretations...


Generating Interpretations:   0%|          | 0/2 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Generating Interpretations:  50%|█████     | 1/2 [00:32<00:32, 32.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Generating Interpretations: 100%|██████████| 2/2 [00:42<00:00, 21.13s/it]

                                               dream  \
0  I see a man sitting on the tail of a giant fis...   
1  I had two dreams about Michael. Both times he ...   
2  I got up during the night for some reason, and...   
3  Am aboard bus in a strange city and don't know...   
4  This girl was underwater. Some guy got her (th...   

                                      interpretation  
0  [{'generated_text': 'Instruction: Given dream ...  
1  [{'generated_text': 'Instruction: Given dream ...  
2  [{'generated_text': 'Instruction: Given dream ...  
3  [{'generated_text': 'Instruction: Given dream ...  
4  [{'generated_text': 'Instruction: Given dream ...  





In [27]:
postproc = lambda out: out[0]["generated_text"].split("Interpretation:")[-1].strip()
result_df["interpretation"] = result_df["interpretation"].apply(postproc)


In [28]:
result_df

Unnamed: 0,prompt,dream,symbols,interpretation
0,"Given dream description, interpret the meaning...",I see a man sitting on the tail of a giant fis...,- Mammoth: To see a mammoth in your dream im...,The man in the dream represents the dreamer hi...
1,"Given dream description, interpret the meaning...",I had two dreams about Michael. Both times he ...,- Deja Vu: To dream of Déjà Vu indicates som...,The dream about Michael can be interpreted as ...
2,"Given dream description, interpret the meaning...","I got up during the night for some reason, and...",- Accident: To dream that you are in an acci...,The dream is about your relationship with your...
3,"Given dream description, interpret the meaning...",Am aboard bus in a strange city and don't know...,- Earplugs: To dream that you are wearing or...,"The dreamer is in a strange city, which could ..."
4,"Given dream description, interpret the meaning...",This girl was underwater. Some guy got her (th...,- Ocean: To see an ocean in your dream repre...,The dreamer is experiencing emotional turmoil ...


In [29]:

save_df = result_df[['prompt', 'symbols','dream', 'interpretation']]

path = f"output/{model_name_short}_{tstp}"
save_df_as_pretty_html(save_df, path + ".html")

save_df.to_csv(path + ".csv")

✅ HTML table saved to: output/Mistral-7B-Instruct-v0.2_25.04.17-16.html
