# Generating Synthetic Journal Entries


In order to show how our journal summarization app works, we'll start by generating synthetic journal entries for somebody for some date range, and writing them in the format that our autojournal app expects. 

How about we write synthetic entries for a space explorer from the future?

First, we'll decide what days he's writing on. We can't expect anyone to write every single day, so we'll model the gaps between days using an exponential distribution (plus some modifications), which is the distribution that describes the gaps in time between a poisson point process. 

From there, we'll construct a set of dates that master ikkyu is writing on and how many days he's correspondingly been a monk. We'll feed this custom information to our llm alongside a standard set of instructions to guide it in generating a nice variation in journal articles. 

In [82]:
from langchain_community.llms import Ollama
import numpy as np
import datetime
import time

In [81]:
class GenSynthJournal:

    def __init__(self):

        self.get_days_from_first_entry()
        self.get_journal_date_strings()
        self.get_locations()
        #self.get_prompt()

        self.llm = Ollama(model="llama3")

    def get_days_from_first_entry(self):
        num_entries = 100
        avg_days_till_next_entry = 5
        days_to_entry = np.random.exponential(avg_days_till_next_entry, num_entries)
        days_to_entry = days_to_entry.astype(int) # map to integers to get gaps between dates
        days_to_entry = np.clip(days_to_entry, 1, 365) # let's avoid multiple entries per date; make the min value 1 and max 365
        self.days_from_first_entry = days_to_entry.cumsum()

    def get_journal_date_strings(self):
        first_date_str = '3020-01-01'
        first_date = datetime.datetime.strptime(first_date_str,'%Y-%m-%d')
        journal_dates = [first_date + datetime.timedelta(days=int(el)) for el in self.days_from_first_entry]
        self.journal_date_strings = [datetime.datetime.strftime(el, '%Y-%m-%d') for el in journal_dates]

    def get_locations(self):
        location_1 = ["Zardon, a cold and desolate ice planet. "]*40
        location_2 = ["Last Call, an unregulated space port open to smugglers. "] * 10
        location_3 = [ "Myros, a warm water world filled with strange beautiful creatures. "] * 50
        self.locations = location_1 + location_2 + location_3

    def get_prompt(self, i):
        i_date = self.journal_date_strings[i]
        i_days = str(self.days_from_first_entry[i])
        i_entry = str(i+1)
        i_location = self.locations[i]
        
        str_1 = "You are the thoughtful, curious space explorer Battuta sitting down at the end of the day to write a journal entry. The year is " + i_date + ". "
        str_2 = "You are writing Battuta's journal number " + i_entry + " and it has been " + i_days + " days since the first entry. "
        str_3 = "Your ship's navigational charts indicate your location is " + i_location
        str_4 = "Please write a short journal entry of what happened to you, space explorer Battuta, today, and what you felt and learned."
        
        prompt_strings = [str_1, str_2, str_3, str_4]
        
        journal_prompt = ('').join(prompt_strings)
        return journal_prompt

    def build_journal_db(self, data_dir = "./data/synth_journal/"):

        for i in range(0, len(self.journal_date_strings)):
            print("starting run number " + str(i+1) + " at " + time.ctime())
            prompt = self.get_prompt(i=i)
            response = self.llm(prompt)
            jdate = self.journal_date_strings[i]
            with open(data_dir + jdate + '.md', 'w') as f:
                f.write("Date: ")
                f.write(jdate + "\n\n")
                f.write(response)
                        
            
            
        

In [53]:
num_entries = 100
average_days_till_next_entry = 5
days_to_entry = np.random.exponential(average_days_till_next_entry, num_entries)
days_to_entry = days_to_entry.astype(int) # map to integers to get gaps between dates
days_to_entry = np.clip(days_to_entry, 1, 365) # let's avoid multiple entries per date; make the min value 1 and max 365
days_from_first_entry = days_to_entry.cumsum()
days_from_first_entry

array([ 23,  24,  25,  30,  32,  37,  40,  52,  71,  72,  73,  76,  79,
        87,  99, 103, 108, 109, 110, 111, 123, 128, 136, 154, 157, 158,
       160, 163, 165, 167, 168, 175, 176, 177, 184, 185, 191, 193, 194,
       201, 210, 214, 215, 216, 217, 218, 219, 220, 221, 229, 230, 235,
       239, 251, 252, 255, 256, 260, 263, 267, 281, 286, 287, 288, 292,
       293, 294, 298, 299, 302, 303, 309, 310, 316, 326, 333, 334, 335,
       338, 339, 340, 344, 355, 356, 360, 363, 370, 371, 376, 377, 379,
       386, 390, 404, 405, 419, 423, 426, 427, 433])

In [83]:
first_date_str = '2124-01-01'
first_date = datetime.datetime.strptime(first_date_str,'%Y-%m-%d')
journal_dates = [first_date + datetime.timedelta(days=int(el)) for el in days_from_first_entry]
journal_date_strings = [datetime.datetime.strftime(el, '%Y-%m-%d') for el in journal_dates]
journal_date_strings[0:10]

['2124-01-24',
 '2124-01-25',
 '2124-01-26',
 '2124-01-31',
 '2124-02-02',
 '2124-02-07',
 '2124-02-10',
 '2124-02-22',
 '2124-03-12',
 '2124-03-13']

In [84]:

location_1 = ["Zardon, a cold and desolate ice planet. "]*40
location_2 = ["Last Call, an unregulated space port open to smugglers. "] * 10
location_3 = [ "Myros, a warm water world filled with strange beautiful creatures. "] * 50
locations = location_1 + location_2 + location_3

In [72]:

i = 0

i_date = journal_date_strings[i]
i_days = str(days_from_first_entry[i])
i_entry = str(i+1)
i_location = locations[i]

str_1 = "You are the thoughtful, curious space explorer Battuta sitting down at the end of the day to write a journal entry. The year is " + i_date + ". "
str_2 = "You are writing Battuta's journal number " + i_entry + " and it has been " + i_days + " days since the first entry. "
str_3 = "Your ship's navigational charts indicate your location is " + i_location
str_4 = "Please write a short journal entry of what happened to you, space explorer Battuta, today, and what you felt and learned."

prompt_strings = [str_1, str_2, str_3, str_4]

journal_prompt = ('').join(prompt_strings)
journal_prompt

"You are the thoughtful, curious space explorer Battuta sitting down at the end of the day to write a journal entry. The year is 1520-01-24. You are writing Battuta's journal number 1 and it has been 23 days since the first entry. Your ship's navigational charts indicate your location is Xlaxos, a black hold said to contain untold cosmic horrors. Please write a short journal entry of what happened to you, space explorer Battuta, today, and what you felt and learned."

In [65]:

i = 0

i_date = journal_date_strings[i]
i_days = str(days_from_first_entry[i])
i_entry = str(i+1)


str_1 = "You are the great Zen Master Ikkyu sitting down at the end of the day to write a journal entry. The year is " + i_date + ". "
str_2 = "You are writing Ikkyu's journal number " + i_entry + " and it has been " + i_days + " days since the first entry. "
str_3 = "Please write a short journal entry of what happened to you, Zen Master Ikkyu, today, and what you felt and learned."

str_1 = "You are the great Zen Master Ikkyu sitting down at the end of the day to write a journal entry. The year is " + i_date + ". "
str_2 = "You are writing Ikkyu's journal number " + i_entry + " and it has been " + i_days + " days since the first entry. "
str_3 = "Please write a short journal entry of what happened to you, Zen Master Ikkyu, today, and what you felt and learned."

prompt_strings = [str_1, str_2, str_3]

journal_prompt = ('').join(prompt_strings)
journal_prompt

"You are the great Zen Master Ikkyu sitting down at the end of the day to write a journal entry. The year is 1520-01-24. You are writing Ikkyu's journal number 1 and it has been 23 days since the first entry.Please write a short journal entry of what happened to you, Zen Master Ikkyu, today, and what you felt and learned."

In [73]:
llm_obj = Ollama(model="llama3")

In [74]:
llm_obj(journal_prompt)

"**Battuta's Journal Number 1, Entry 23**\n\nJanuary 24, 1520\n\nAs I set down my pen tonight, my hands still tremble with the aftershocks of this day's discoveries. Xlaxos, the black hold, is a place that defies comprehension. Our charts warned us of its malevolent presence, but nothing could have prepared me for what lies within.\n\nWe entered Xlaxos at dawn, our ship's lights piercing the darkness like futile attempts to ward off the unseen. The silence was oppressive, punctuated only by the creaks and groans of our vessel as it strained against some invisible force. My crew and I exchanged nervous glances, sensing that we were not alone.\n\nAs we ventured deeper into the hold, the air grew thick with an otherworldly energy. I felt it like a palpable presence, weighing upon my shoulders. The darkness seemed to take on a life of its own, tendrils of shadowy matter reaching out to ensnare us. My crew's apprehension turned to terror as we stumbled upon... things. Twisted, eldritch abom

### Backfill

So we ran the script version of this notebook to generate a journal dataset:

python gen_synthetic_journal.py


thebn we backfilled smart summaries with the following lines:

python backfill_smart_summaries.py -sd 2124-01-01 -ed 2125-01-01 -bfm True