A script to analyze a diary file day-by-day with a local large language model (LLM)

See https://www.transformingmed.tech/p/decoding-happiness-an-ai-driven-experiment

Matthijs Cluitmans for Transforming Med.Tech (http://transformingmed.tech/)

Steps:
- Install LM Studio (https://lmstudio.ai/)
- In LM Studio, find a suitable LLM that fits with the specs of your computer. LM Studio will tell you if it will run or not. I used the Vicuna LLM (vicuna-13b-v1.5-16k.Q5_K_M.gguf) from Hugging Face https://huggingface.co/TheBloke/vicuna-13B-v1.5-16K-GGUF
- In LM Studio, go to Local Inference Server, load the model (with the appropriate preset, e.g. Vicuna 1.5 16k), and start the server
- Then run this Jupyter notebook with your local copy of diary.csv, which contains two columns: Date and Tekst, semicolon separated (;) and is an UTF-8 file


In [1]:
from openai import OpenAI

# Put your URI end point:port here for your local inference server (in LM Studio)
client = OpenAI(
    api_key='not-needed', 
    base_url='http://localhost:1234/v1')

# Adjust the following based on the model type
# Alpaca style prompt format (suitable for Vicuna):
prefix = "### Instruction:\n" 
suffix = "\n### Response:"

# 'Llama2 Chat' prompt format (required for some other LLMs):
#prefix = "[INST]"
#suffix = "[/INST]"

temperature = 0 # Vary the temperature if needed; but higher temp gives more hallucinations. I had best results at 0

def get_completion(prompt, messages, temperature=0.0):
    formatted_prompt = f"{prefix}{prompt}{suffix}"
    messages.extend([{"role": "user", "content": formatted_prompt}])
    response = client.chat.completions.create(
        model="local model",
        messages=messages,
        temperature=temperature,
        presence_penalty=1.1
    )
    return response.choices[0].message.content

# These are the prompts I run on every dairy input. Adapt to your needs (e.g., now it assumes Dutch input but will respond in English.)
systemprompt = 'You are a thorough and truthful assistant that analyses psychological content of a diary.'
prompts = []
prompts.append('I will give you a part of a diary in Dutch. Formatted in Markdown. We\'re going to analyse who the user is. We\'ll do that in a few steps. First, please extract a single "happiness score" from 1-10 that reflects how happy the user seems to be that day, 1 being very unhappy and 10 being very happy. Only provide number as output, nothing else, so just a single number. If you do not have sufficient information, respond with "blank" only and nothing else (no explanation is required).')
prompts.append('Now, based on the diary, please provide any key takeaways, in a short sentences (in English, formatted as JSON array) related to work, family, personal development or spiritual. If no information is related to a particular category, response with "blank". Format the output as follows: Work: [work-related information], Family: [family-related information], Personal Development: [personal development information], Spiritual: [spiritual information].')
prompts.append('Now, based on the diary, provide a few summarizing keywords (in English, separated by commas) that characterize the main experiences that made the user happy. Keep your response short. If you do not have sufficient information, respond "blank" and do not make up content that was not provided by the user. Only provide the keywords as output, nothing else. Respond in English.')
prompts.append('Now, based on the diary, provide a few summarizing keywords (in English, separated by commas) that characterize the main experiences that made the user unhappy. Keep your response short. If you do not have sufficient information, respond "blank" and do not make up content that was not provided by the user. Only provide the keywords as output, nothing else. Respond in English.')
prompts.append('Now, based on the diary, please use a few keywords (in English, separated by commas) that could help determine personality traits that are apparent from this text. If none are available (which may be likely), respond with "blank" only and nothing else (no explanation is required).')
prompts.append('Now, based on the diary, please think about a few summarizing keywords (in English, separated by commas) about what is important for the user and what motivates him.Keep your response short. If you do not have sufficient information, respond "blank" and do not make up content that was not provided by the user. Only provide the keywords as output, nothing else. Respond in English.')


In [2]:
# Test connection with LM Studio
system_prompt = {"role": "system", "content": systemprompt}
messages = [system_prompt]
row = "Ik ben vandaag zo vrolijk, zo vrolijk, zo vrolijk. Ik ben behoorlijk vrolijk, zo vrolijk was ik nooit. op mijn werk heb ik gelachen met Tom"
for i in range(len(prompts)):
    if i==0:
        prompt = prompts[0] + '\n\n' + row
    else:
        prompt = prompts[i]
    response = get_completion(prompt, messages, temperature)
    print(response)
    if (response):
        messages.extend([{"role": "assistant", "content": response}]) 


Blank
{
 "Work": ["I laughed with Tom at work"],
 "Family": [],
 "Personal Development": [],
 "Spiritual": []
}


In [None]:
import csv
# Read diary, ask prompt, write output to new file
# Create output file: I had to enforce utf-8 and fix some weird emoticons in my diary.
with open('diaryLLMoutput.txt', 'a', encoding="utf-8") as outputfile:
    outputfile.write('\n\n-----------------------------\n NEW RUN\n-----------------------------')
    # Opening the diary file, also as UTF-8 and with ; as delimiter
    with open('diary.csv', newline='', encoding="UTF-8") as csvfile:
        diaryreader = csv.reader(csvfile, delimiter=';')
        line_count = 0
        system_prompt = {"role": "system", "content": systemprompt}
        messages = [system_prompt] # initialize with system prompt
        for row in diaryreader:
            if line_count > 0:
                messages = [system_prompt] # start over again PER DAY (I also tried to retain a few days but this did not improve results considerably)
                response = []
                print('\nProcessing ' + row[0])
                # Remark: processing takes a long time, much longer than what you're expected from cloud-based solutions such as ChatGPT. So be patient.
                for i in range(len(prompts)):
                    if i==0:
                        prompt = prompts[0] + '\n\n' + row[1]
                    else:
                        prompt = prompts[i]
                    # Store the response
                    response.append(get_completion(prompt, messages=messages, temperature=temperature))
                    messages.extend([{"role": "assistant", "content": response[i]}]) 
                # Write the response to the file
                outputfile.write('\n' + row[0] + ';')
                for x in response:
                    outputfile.write(x+ ';')
                outputfile.flush()
            line_count += 1
            

In [None]:
import os
import glob

# specify the directory where the daily notes are located
daily_notes_dir = 'daily_notes'

# find all Markdown files in the directory
markdown_files = glob.glob(os.path.join(daily_notes_dir, '*.md'))

# loop through the files and print the date and content as strings
for markdown_file in markdown_files:
    with open(markdown_file, 'r') as f:
        content = f.read()
        date = os.path.splitext(os.path.basename(markdown_file))[0]
        print(f'Date: {date}\nContent:\n{content}\n')