# Try sending answers directly to the Llama LLM (local)

Using Llama 4 which has a context window of 10M tokens (!)  Specifically, I'll use `meta-llama/Llama-4-Scout-17B-16E-Instruct` from : https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct

Create a conda env , but install via pip because pypi typically has the more up to date versions for huggingface.
```
conda create -n llama4 python=3.10
conda activate llama4
pip install torch transformers accelerate pandas jupyter
```

... but this would require too much memory for me to run on my local computer (apparently I would need at leat 80GB of VRAM -- so this would need to be on Quest).  Also at this point I would need to request access on Hugging face (and wait for approval).  So I haven't run this code yet.

In [None]:
import pandas as pd

from transformers import AutoProcessor, Llama4ForConditionalGeneration
import torch

In [None]:
froots = [
    "INCLU1x_IF_Responses_-_ALL_RUNS_041924_M0_IF_Reflection_Questions_cleaned",
    "INCLU1x_IF_Responses_-_ALL_RUNS_041924_M1_IF_Reflection_Question_cleaned",
    "INCLU1x_IF_Responses_-_ALL_RUNS_041924_M2_IF_Reflection_Question_cleaned",
    "INCLU1x_IF_Responses_-_ALL_RUNS_041924_M3_IF_Reflection_Question_cleaned",
    "INCLU1x_IF_Responses_-_ALL_RUNS_041924_M4_IF_Reflection_Question_cleaned",
    "INCLU1x_IF_Responses_-_ALL_RUNS_041924_M5_IF_Reflection_Question_cleaned" ,
]

In [None]:
# count the number of words in each file
for f in froots:
    df = pd.read_csv("../../data/" + f + ".csv")
    all_answers = df['student_responses'].str.cat(sep=' ').split(' ')
    print(f, len(all_answers)*4/3.)

In [None]:
# Load the model (code from Hugging Face : https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
# Note: "instruct" models are meant to follow user instructions more closely (so we want to use that)
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
processor = AutoProcessor.from_pretrained(model_id)
model = Llama4ForConditionalGeneration.from_pretrained(
    model_id,
    attn_implementation="flex_attention",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

In [None]:
# instructions for the prompt to bracket the survey responses
instructions_before = "Below is a list of items each starting with [item]. I will want you to summarize this list."
instructions_after = "That was the last item in the list.  Now summarize these items with a new list of 10 or less unique themes.  Do not repeat any item from above verbatim in your themes.  Each theme should be only one short sentence.  Only return the short one-sentence themes."

In [None]:
# create the full prompt and run it through the model

# could write a for loop to go through all files (and create output files), but let's first just try one
i = 0
froot = froots[0]

input_list = ''
df = pd.read_csv("../../data/" + froot + ".csv")
responses = df['student_responses'].to_list()
for item in responses:
    input_list += '[item] ' + item + '\n'
prompt = instructions_before + '\n' + input_list + '\n' + instructions_after

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
        ]
    },
]

# Tokenize input
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)


# Generate output
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
)

# Decode output
response = processor.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])[0]
print(response)
print(outputs[0])

