A script to analyze Obsidian daily note files day-by-day with a local large language model (LLM)

Inspired by this blog https://www.transformingmed.tech/p/decoding-happiness-an-ai-driven-experiment
Credits for Matthijs Cluitmans - Transforming Med.Tech (http://transformingmed.tech/)

Steps:
- Install LM Studio (https://lmstudio.ai/)
- In LM Studio, find a suitable LLM that fits with the specs of your computer. LM Studio will tell you if it will run or not. I used the Vicuna LLM (vicuna-13b-v1.5-16k.Q5_K_M.gguf) from Hugging Face https://huggingface.co/TheBloke/vicuna-13B-v1.5-16K-GGUF
- In LM Studio, go to Local Inference Server, load the model (with the appropriate preset, e.g. Vicuna 1.5 16k), and start the server
- Then run this Jupyter notebook with your local copy of Obsidian notes in the 'daily_notes' folder

In [1]:
from openai import OpenAI

# Put your URI end point:port here for your local inference server (in LM Studio)
client = OpenAI(
    api_key='not-needed', 
    base_url='http://localhost:1234/v1')

# Adjust the following based on the model type
# Alpaca style prompt format (suitable for Vicuna):
prefix = "### Instruction:\n" 
suffix = "\n### Response:"

# 'Llama2 Chat' prompt format (required for some other LLMs):
#prefix = "[INST]"
#suffix = "[/INST]"

temperature = 0 # Vary the temperature if needed; but higher temp gives more hallucinations. I had best results at 0

def get_completion(prompt, messages, temperature=0.0):
    formatted_prompt = f"{prefix}{prompt}{suffix}"
    messages.extend([{"role": "user", "content": formatted_prompt}])
    response = client.chat.completions.create(
        model="local model",
        messages=messages,
        temperature=temperature,
        presence_penalty=1.1
    )
    return response.choices[0].message.content

# These are the prompts I run on every dairy input. Adapt to your needs (e.g., now it assumes Dutch input but will respond in English.)
systemprompt = 'You are a thorough and truthful assistant that analyses psychological content of a diary.'
prompts = []
prompts.append('I will give you a part of a diary in Dutch. Formatted in Markdown. Entries in the diary are seperated with \'---\'. We\'re going to analyse who the user is. We\'ll do that in a few steps. First, please extract a single "happiness score" from 1-10 that reflects how happy the user seems to be that day, 1 being very unhappy and 10 being very happy. Only provide number as output, nothing else, so just a single number. If you do not have sufficient information, respond with "blank" only and nothing else (no explanation is required).')
prompts.append('Now, based on the diary, please provide any key takeaways, in a short sentences (in English, formatted as JSON array) related to work, family, personal development or spiritual. If no information is related to a particular category, response with "blank". Format the output as follows: Work: [work-related information], Family: [family-related information], Personal Development: [personal development information], Spiritual: [spiritual information].')
prompts.append('Now, based on the diary, provide a few summarizing keywords (in English, separated by commas) that characterize the main experiences that made the user happy. Keep your response short. If you do not have sufficient information, respond "blank" and do not make up content that was not provided by the user. Only provide the keywords as output, nothing else. Respond in English.')
prompts.append('Now, based on the diary, provide a few summarizing keywords (in English, separated by commas) that characterize the main experiences that made the user unhappy. Keep your response short. If you do not have sufficient information, respond "blank" and do not make up content that was not provided by the user. Only provide the keywords as output, nothing else. Respond in English.')
prompts.append('Now, based on the diary, please use a few keywords (in English, separated by commas) that could help determine personality traits that are apparent from this text. If none are available (which may be likely), respond with "blank" only and nothing else (no explanation is required).')
prompts.append('Now, based on the diary, please think about a few summarizing keywords (in English, separated by commas) about what is important for the user and what motivates him.Keep your response short. If you do not have sufficient information, respond "blank" and do not make up content that was not provided by the user. Only provide the keywords as output, nothing else. Respond in English.')


In [None]:
# Test connection with LM Studio
system_prompt = {"role": "system", "content": systemprompt}
messages = [system_prompt]
row = "Ik ben vandaag zo vrolijk, zo vrolijk, zo vrolijk. Ik ben behoorlijk vrolijk, zo vrolijk was ik nooit. op mijn werk heb ik gelachen met Tom"
for i in range(len(prompts)):
    if i==0:
        prompt = prompts[0] + '\n\n' + row
    else:
        prompt = prompts[i]
    response = get_completion(prompt, messages, temperature)
    print(response)
    if (response):
        messages.extend([{"role": "assistant", "content": response}]) 


In [None]:
import os
import glob

# specify the directory where the daily notes are located
daily_notes_dir = './daily_notes/'

# find all Markdown files in the directory
markdown_files = glob.glob(os.path.join(daily_notes_dir, '*.md'))

# read the output file so we can skip already processed files
existing_output = ''
with open('diaryLLMoutput.txt', 'r', encoding="utf-8") as output:
    existing_output = output.read();

with open('diaryLLMoutput.txt', 'a', encoding="utf-8") as outputfile:
    outputfile.write('\n\n-----------------------------\n NEW RUN\n-----------------------------')
    
    # loop through the files and print the date and content as strings
    for markdown_file in markdown_files:
        system_prompt = {"role": "system", "content": systemprompt}
        with open(markdown_file, 'r', encoding='utf-8') as f:
            daily_note = f.read()
            date = os.path.splitext(os.path.basename(markdown_file))[0]

            # If you restart the process we want to skip already processed notes
            if date in existing_output:
                print(f'Skipping Date: {date}; already processed')
                continue
            
            messages = [system_prompt] # start over again PER DAY (I also tried to retain a few days but this did not improve results considerably)
            print(f'\nProcessing: Date: {date}\nContent:\n{daily_note}\n')

            # Remark: processing takes a long time, much longer than what you're expected from cloud-based solutions such as ChatGPT. So be patient.
            outputfile.write('\n' + date + ';')
            outputfile.flush()
            for i in range(len(prompts)):
                if i==0:
                    prompt = prompts[0] + '\n\n' + daily_note
                else:
                    prompt = prompts[i]
                response = get_completion(prompt, messages=messages, temperature=temperature)
                outputfile.write(response+ ';')
                # Write the response to the file
                messages.extend([{"role": "assistant", "content": response}]) 
                outputfile.flush()

Now we do have a text file containing the output from the LLM. In my run, the data was not consistent. The script below interprets the output, executes a cleanup, and records it column by column into a CSV file.
Please note, in my set, I had to rectify 5 rows due to the LLM generating a lot of duplicate phrases. I recommend printing a single column and scrolling through the dump to verify the output.

In [62]:
import os
import re
import json

def read_chuncks_from_file(file):
    # Initialize the chunks list
    chunks = []

    with open(file, 'r') as input_file:
        chunk = ''
        for line in input_file:
            line = line.replace('\n','')
            # Check if the line starts with a date
            if re.match(r'\d{4}-\d{2}-\d{2}', line):
                chunks.append(chunk)
                chunk = line
            else:
                chunk = chunk + line
    return chunks

def find_happiness(text):
    # find first integer
    match = re.search(r'\b\d+\b', text)
    if match:
        return match.group()
    return ''

def find_takeaways(subject, text):
    result = text.split(subject)
        
    if len(result) == 2:
        match = re.search(r'\[(.*?)\]', result[1])
        if match:
            return match.group(1).replace('"','')
    return ''

def ignore_words(text, words):
    for word in words:
        text = text.replace(word, '')
    return text


def trim(text):
    return ignore_words(text, ['Blank', 'blank', '[', ']', '"']).strip()

# Try to delete the file
if os.path.exists('diary.csv'):
    os.remove('diary.csv')

with open('diary.csv', 'w') as output_file:
    output_file.write('date;happiness score;takeaways work;takeaways family;takeaways personal development;takeaways spiritual;keywords happy;keywords unhappy;personality traits;keywords motivaion')
    
    dates = read_chuncks_from_file('diaryLLMoutput.txt')
    # Iterate through each date
    for date in dates:

        if len(date) == 0:
            continue

        # parse, standarize and cleanup the content
        columns = date.split(';')
        c_date = columns[0]
        c_happiness = trim(find_happiness(columns[1]))
        c_work = trim(find_takeaways('Work', columns[2]))
        c_family = trim(find_takeaways('Family', columns[2]))
        c_personal = trim(find_takeaways('Personal Development', columns[2]))
        c_spiritual = trim(find_takeaways('Spiritual', columns[2]))
        c_happy = trim(ignore_words(columns[3], ['happy', 'happiness, ', 'happy experiences: ', 'happiness-experiences=', 'happiness-keywords=', 'happiness-keywords, ']))
        c_unhappy = trim(ignore_words(columns[4], ['unhappiness-keywords: ', 'unhappy, ', 'unhappiness, ', 'unhappiness-experiences=', 'unhappy experiences: ', 'unhappiness-keywords, ']))
        c_traits = trim(ignore_words(columns[5], ['personality-traits=', 'personality traits: ', 'Personality traits: ', 'Personality Traits: ', 'personality-traits-keywords, ', 'personality-trait-keywords, ', 'Keywords: ', 'personality-keywords=', 'personality-keywords, ', 'personality-keywords:, ']))
        c_motivation = trim(ignore_words(columns[6], ['importance-motivation=', 'importance-motivation: ', 'what\'s important/motivates: ', 'importance-motivation-keywords, ', 'importance-motivation-keywords: ', 'Keywords: ', 'Important/motivation: ', 'motivation-keywords: ', 'motivation-keywords', 'importance-keywords, ', 'Important for User/Motivation: ', 'importance/motivation = ']))

        # write csv row to file
        output_file.write(';'.join([c_date, c_happiness, c_work, c_family, c_personal, c_spiritual, c_happy, c_unhappy, c_traits, c_motivation]) + '\n')
