## **Step 1 - keywords Extraction**
***

We have two datasets, one with dream text descriptions:

In [None]:
from keyword_extractor import read_datasets, extract_and_save_keywords_from_dataframes
from yaml_parser import load_config
config = load_config()
dream_df, keywords_df = read_datasets(config)
dream_df.head()

And another one with interpretations of dreams according to keywords:

In [None]:
keywords_df.head()

Now, we will use pretrained LLMs in order to extract the given keywords from the keywords dataset , from the dream text description from the dream text dataset.

### **GPT2**
***

In [None]:
extract_and_save_keywords_from_dataframes()

## Step 2 - Summarize interpretations

In [1]:
import pandas as pd
import pandasql as ps
import numpy as np
import re 

In [2]:
dream_df= pd.read_csv('datasets/rsos_dream_data.tsv', sep='\t')
dream_df

Unnamed: 0,dream_id,dreamer,description,dream_date,dream_language,text_dream,characters_code,emotions_code,aggression_code,friendliness_code,...,Male,Animal,Friends,Family,Dead&Imaginary,Aggression/Friendliness,A/CIndex,F/CIndex,S/CIndex,NegativeEmotions
0,1,alta,Alta: a detailed dreamer,1957,en,"The one at the Meads's house, where it's bigge...","2ISA, 1MKA, 1FDA, 1IOA, 2ISA",,2IKA > Q,2IKA 4> Q,...,0.500000,0.000000,0.200000,0.200000,0.0,0.000,0.200000,0.200000,0.0,0.0
1,2,alta,Alta: a detailed dreamer,8/11/1967,en,I'm at a family reunion in a large fine house ...,"2ISA, people, 2ISA",SD 2IKA,"D > Q, Q > 2ISA",,...,0.000000,0.000000,0.000000,0.000000,0.0,1.000,0.666667,0.000000,0.0,1.0
2,3,alta,Alta: a detailed dreamer,8/1/1985,en,I watch a plane fly past and shortly realize i...,"2ISA, 2ISA, 1FSA, 1MBA, 1IOA, 2ISA, 2FDA","SD 1ISA, AP D, AP D","It PRP >, It PRP >, D > 1FKA",,...,0.333333,0.000000,0.000000,0.285714,0.0,1.000,0.428571,0.000000,0.0,1.0
3,4,alta,Alta: a detailed dreamer,1985?,en,Me pulling the green leaves and berries off so...,"1MAA, 1FMA, 2ISA, 2IKA, 1ANI, 1ANI, 1IOA, 2ISA...","SD 2ISA, SD D","Q > Q, 2ISA > Q, 2ISA > Q, D > 1MSA","1IKA 4> Q, 2ISA 4> 2ISA",...,0.666667,0.176471,0.142857,0.142857,0.0,1.000,0.235294,0.117647,0.0,1.0
4,5,alta,Alta: a detailed dreamer,1985?,en,I'm in a room that reminds me of (but definite...,"1IRA, 1MSA, 1ISA, 2ISA, 1ISA, 1IKA","AP D, AP D, AP 1MSA, CO D, SD D, AP D","1MSA > D, Q > Q, D > 2IKA, D > 2IKA, D > 1MSA,...",D 4> Q,...,1.000000,0.000000,0.166667,0.166667,0.0,0.875,1.333333,0.166667,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20995,33062,west_coast_teens,West Coast teenage girls,"F, age 18",en,The dream was about me and my boyfriend going ...,"2ISA, 2ISA, 1ISA","HA D, AP D","2IKA > Q, D > D",,...,0.000000,0.000000,0.000000,0.000000,0.0,1.000,0.666667,0.000000,0.0,0.5
20996,33063,west_coast_teens,West Coast teenage girls,"F, age 18",en,Two weeks ago this guy asked me to Senior Ball...,1ISA,HA D,,,...,0.000000,0.000000,0.000000,0.000000,0.0,0.000,0.000000,0.000000,1.0,0.0
20997,33064,west_coast_teens,West Coast teenage girls,"F, age 18",en,My boyfriend just broke up with me so he was o...,"1ISA, 1FSA",AP D,Q > D,,...,0.000000,0.000000,0.000000,0.000000,0.0,1.000,0.500000,0.000000,0.0,1.0
20998,33065,west_coast_teens,West Coast teenage girls,"F, age 18",en,I was in my backyard and I was flying. I would...,1ANI,AN 1ISA,"1ANI > D, 1ANI > Q, 1ISA 1> Q",,...,0.000000,1.000000,0.000000,0.000000,0.0,1.000,3.000000,0.000000,0.0,1.0


In [3]:
keywords_df = pd.read_csv("datasets/fixed_interpretations.csv")
keywords_df

Unnamed: 0,Dream Symbol,Interpretation
0,Aardvark,To see an aardvark in your dream indicates tha...
1,Abandonment,To dream that you are abandoned suggests that ...
2,Abduction,To dream of being abducted indicates that you ...
3,Aborigine,To see an Aborigine in your dream represents b...
4,Abortion,To dream that you have an abortion suggests th...
...,...,...
1193,Zip Line,To dream that you are zip lining implies that ...
1194,Zombie,To see or dream that you are a zombie suggests...
1195,Zoomorphism,To dream that you are changing into the form o...
1196,com Tambourine,To see or play a tambourine in your dream symb...


In [4]:
exmpl = dream_df[dream_df["text_dream"].str.len()< 300]

In [5]:
exmpl = exmpl["text_dream"].sample(10, random_state=42)

In [6]:
exmpl.str.len()

19133    284
9292     259
14062    255
14306    257
20947    258
14520    267
5763     294
19148    297
1423     286
17373    283
Name: text_dream, dtype: int64

In [7]:
keywords = set(keywords_df["Dream Symbol"])

In [9]:
def extract_keys(keys, text):
    return [k for k in keys if k.lower() in text.lower()]
tst = exmpl.iloc[1]
print(tst)
keys = extract_keys(keywords, tst)[:10]
keys

I was at a hospital, and some friend was in there. I was older and I met this Dave guy (like the one from Danny Deckchair [?]) but he was hot and he liked me. I was talking to Ramona and I called Mom on my mobile and there was strange music in the background.


['King', 'Ram', 'Old', 'Talking', 'Hair', 'Back']

In [11]:
exmpl

19133    There were about 5 men on posts. They were wir...
9292     I was at a hospital, and some friend was in th...
14062    Six Point Buck  I'm deer hunting with others. ...
14306    On The Casino Floor  I'm at a casino. I have m...
20947    In the dream I was at the portables where math...
14520    Will To Control  I'm flying. I jump and soar h...
5763     I was working or eating at a table opposite mo...
19148    I set an alarm clock on a small round table an...
1423     I am a teacher and a woman student complains t...
17373    I was assigned to work on a mechanical apparat...
Name: text_dream, dtype: object

In [None]:
dataset = []

prmt = """Given dream description and dream symbols below, 
Explain what this dream means. 
Use the dream symbols to help you interpret the dream. """.replace("\n", " ")

rs = 42

for ex in exmpl:

    keys = extract_keys(keywords, ex)[:5]
    #print(keys)
    syms = keywords_df[keywords_df["Dream Symbol"].isin(keys)]

    descr = syms.apply(lambda r: f' - {r["Dream Symbol"]}:  {r["Interpretation"]}', axis = 1)
    item = {
        "prompt": prmt, 
        "dream": ex,
        "symbols": r";\n".join(descr),
        }
    dataset.append(item)
    rs += 1
    

dataset = pd.DataFrame(dataset)
dataset


In [None]:
def release_all_gpu_memory():
    import gc
    import torch

    # Delete model objects (make sure they're declared global or passed)
    globals_to_clear = ["model", "tokenizer", "text2text_generator"]
    for name in globals_to_clear:
        if name in globals():
            print("clearing ", name)
            del globals()[name]

    gc.collect()

    if torch.cuda.is_available():
        print("clearing cuda cache")
        torch.cuda.empty_cache()
        print("clearing ipc cache")
        torch.cuda.ipc_collect()

    print("✅ All GPU memory cleared.")

In [None]:
release_all_gpu_memory()

In [None]:

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
from tqdm import tqdm

release_all_gpu_memory()

# Step 1: Load FLAN-T5 model and tokenizer
model_name = "google/flan-t5-large"
device = 0 if torch.cuda.is_available() else -1

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Check model's max position embeddings
print(f"Model can handle up to {model.config} tokens.")  # should be 1024


text2text_generator = pipeline("text2text-generation",
        model=model,
        tokenizer=tokenizer,
        truncation=True,           # ✅ ensure truncation at tokenizer level
        max_length=1024,           # ✅ allow longer input
        device=device,
        )


# Step 2: Define input formatting
def format_instruction(prompt, dream, symbols):
    return (
        f"Instruction: {prompt.strip()}\n\n"
        f"Dream: {dream.strip()}\n\n"
        f"Symbols:\n{symbols.strip()}\n\n"
        "Interpretation:"
    )

# Step 3: Batch interpret function
def batch_interpret_df(df, model_pipeline, batch_size=4, max_output_length=250):
    interpretations = []
    for i in tqdm(range(0, len(df), batch_size), desc="Generating Interpretations"):
        batch_df = df.iloc[i:i+batch_size]
        inputs = [
            format_instruction(row["prompt"], row["dream"], row["symbols"])
            for _, row in batch_df.iterrows()
        ]
        print(len(inputs[0]))
        outputs = model_pipeline(inputs, max_length=max_output_length, do_sample=False)
        interpretations.extend([out["generated_text"] for out in outputs])
    df["interpretation"] = interpretations
    return df


In [None]:
type(model)

In [None]:
result_df = batch_interpret_df(dataset, text2text_generator, batch_size=1)
# print(result_df[["dream", "interpretation"]])


In [None]:
result_df

In [None]:
result_df.columns

In [None]:
result_df[['prompt', 'symbols','dream', 'interpretation']].to_html("datasets/dream_interpretations.html", index=False)

In [None]:
result_df.interpretation.str.len()