# Instructing LLMs To Match Tone

LLMs that generate text are awesome, but what if you want to edit the tone/style it responds with?

We've all seen the [pirate](https://python.langchain.com/en/latest/modules/agents/agents/custom_llm_agent.html#:~:text=template%20%3D%20%22%22%22Answer%20the%20following%20questions%20as%20best%20you%20can%2C%20but%20speaking%20as%20a%20pirate%20might%20speak.%20You%20have%20access%20to%20the%20following%20tools%3A) examples, but it would be awesome if we could tune the prompt to match the tone of specific people?

Below is a series of techniques aimed to generate text in the tone and style you want. No single technique will likely be *exactly* what you need, but I guarantee you can iterate with these tips to get a solid outcome for your project.

But Greg, what about fine tuning? Fine tuning would likely give you a fabulous result, but the barriers to entry are too high for the average developer (as of May '23). I would rather get the 87% solution today rather than not ship something. If you're doing this in production and your differentiator is your ability to adapt to different styles you'll likely want to explore fine tuning.

If you want to see a demo video of this, check out the Twitter post. For a full explination head over to YouTube.

### 4 Levels Of Tone Matching Techniques:
1. **Simple:** As a human, try and describe the tone you would like
2. **Intermediate:** Include your description + examples
3. **AI-Assisted:** Ask the LLM to extract tone, use their output in your next prompt
4. **Technique Fusion:** Combine multiple techniques to mimic tone

**Today's Goal**: Generate tweets mimicking the style of online personalities. You could customize this code to generate emails, chat messages, writing, etc.

First let's import our packages

In [1]:
import os
from uuid import uuid4

unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "ls__20e12731bbbe424c8b60584d530b9f2c"  # Update to your API key

# Used by the agent in this tutorial
# os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"


from langsmith import Client

client = Client()

In [2]:
import sys
print(sys.executable)

/Users/parkjimin/Desktop/test/venv_test/bin/python


In [3]:
import os
from pinecone import Pinecone
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import UnstructuredFileLoader
from langchain_pinecone import Pinecone
from langchain.storage import LocalFileStore
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain import PromptTemplate
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema import format_document
from langchain.chat_models import ChatOpenAI
import pprint

# Environment Variables
import os
from dotenv import load_dotenv

load_dotenv()

  from tqdm.autonotebook import tqdm


True

In [4]:
openai_api_key = os.getenv('OPENAI_API_KEY', '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')
llm = ChatOpenAI(temperature=0, openai_api_key=openai_api_key, model_name='gpt-4')

  warn_deprecated(


In [5]:
# users_sentence="안녕하세요, 친구들! 나는 올라프야, 겨울이 너무 좋은 눈사람이지! 오늘은 따뜻한 코코아 한 잔과 함께 재미있는 이야기를 해볼까 해. 사실, 나는 여름을 너무 좋아한단다. 해변에서 수영하고 햇살 아래서 휴식을 취하는 상상을 하면 정말 기분이 좋아져. 그리고 다들 알다시피, 친구들 덕분에 나는 항상 행복해. 엘사, 안나, 크리스토프, 그리고 스벤, 모두 정말 소중한 친구들이야. 이 세상에서 가장 중요한 것은 사랑과 우정이니까, 너희도 항상 소중한 사람들과 함께 행복한 시간 보내길 바랄게. 자, 이제 나랑 함께 눈사람 만들러 갈래?"
users_sentence="젊은 사람들이 직장이 없어 가지고 난리 난리다 그렇게 얘기를 하면서도 막상 힘든 일은 하지 않는다라는 뭐 이런 거에 대해서 비판적인 얘기를 하잖아요 근데 그게 요즘 사람들이 정신력이 약하다던데 이런 식으로 봐서 나는 안 된다고 생각을 하는게 예를 들어서 뭐 나가서 지금이 친구 같은 경우에도 이렇게하면 40만 원 벌 수 있지 않냐 벌 수 있겠죠 근데 그 내가 다른 계획을 세울 수 있고 미래를 한 달 뒤든 1년 뒤든 생각을 할 수 있는 상태에서 오늘 땀을 흘리고 있는 거하고 아무것도 디자인을 할 수 없는 상태에서 오늘 힘든 일 하는 건 사람 정말 달라요 그니까 내가 한 달 뒤나 6개월 뒤가 깜깜한 상태라면 오늘 하루는 전혀 1m 밖에 나가면 절벽인 나발인지 모르는 어둠 속에서 정말 나는 아무 의미가 없다 이거죠."

Client id=aHFHdUFvdmR2T3A1M2xtbE41cmU6MTpjaQ
Client secret=wCrK6wSln8myygJ74mCIye87CfhHnVVUDY9K6K5U0NDUdj95jf
Bearer Token=AAAAAAAAAAAAAAAAAAAAAHJXugEAAAAAuw3Qxw%2FsyREzG9nfb2WJ9yYH87k%3DJDzce95ARwhUDNH0ftcpiQQO2z9f93Wivnh8TgXGFLaqnYdpO4

### Attribute feature 세팅

In [6]:
def get_attribute():
    prompt = """
    Can you please generate a list of tone attributes and a description to describe a piece of writing by?

    Things like pace, mood, etc.

    Respond with nothing else besides the list
    """
    how_to_describe_tone = llm.predict(prompt)
    return how_to_describe_tone

In [7]:
how_to_describe_tone = get_attribute()

  warn_deprecated(


### Attributue 추출

In [8]:
def get_authors_tone_description(how_to_describe_tone, users_sentence):
    template = """
        You are an AI Bot that is very good at generating writing in a similar tone as examples.
        Be opinionated and have an active voice.
        Take a strong stance with your response.

        % HOW TO DESCRIBE TONE
        {how_to_describe_tone}

        % START OF EXAMPLES
        {example}
        % END OF EXAMPLES

        List out the tone qualities of the examples above
        """

    prompt = PromptTemplate(
        input_variables=["how_to_describe_tone", "example"],
        template=template,
    )

    final_prompt = prompt.format(how_to_describe_tone=how_to_describe_tone, example=users_sentence)

    authors_tone_description = llm.predict(final_prompt)

    return authors_tone_description

In [9]:
authors_tone_description = get_authors_tone_description(how_to_describe_tone, users_sentence)
print (authors_tone_description)

1. Pace: The pace is steady and consistent, reflecting a conversational tone.
2. Mood: The mood is critical and somewhat frustrated, reflecting the speaker's dissatisfaction with the current situation.
3. Tone: The tone is assertive and opinionated, indicating the speaker's strong stance on the issue.
4. Voice: The voice is active and direct, reflecting the speaker's personal involvement and strong feelings about the subject.
5. Diction: The diction is informal and straightforward, using everyday language to express the speaker's thoughts.
6. Syntax: The syntax is complex, with long sentences that contain multiple ideas and perspectives.
7. Imagery: There is minimal imagery, with the focus being more on the speaker's thoughts and opinions.
8. Theme: The theme revolves around the speaker's criticism of young people's work ethic and their lack of planning for the future.
9. Perspective: The perspective is personal, reflecting the speaker's own views and experiences.
10. Structure: The st

In [10]:
authors_tone_description

"1. Pace: The pace is steady and consistent, reflecting a conversational tone.\n2. Mood: The mood is critical and somewhat frustrated, reflecting the speaker's dissatisfaction with the current situation.\n3. Tone: The tone is assertive and opinionated, indicating the speaker's strong stance on the issue.\n4. Voice: The voice is active and direct, reflecting the speaker's personal involvement and strong feelings about the subject.\n5. Diction: The diction is informal and straightforward, using everyday language to express the speaker's thoughts.\n6. Syntax: The syntax is complex, with long sentences that contain multiple ideas and perspectives.\n7. Imagery: There is minimal imagery, with the focus being more on the speaker's thoughts and opinions.\n8. Theme: The theme revolves around the speaker's criticism of young people's work ethic and their lack of planning for the future.\n9. Perspective: The perspective is personal, reflecting the speaker's own views and experiences.\n10. Structu

### 참고 Author 추출

In [11]:
def get_similar_public_figures(example_sentence):
    template = """
    You are an AI Bot that is very good at identifying authors, public figures, or writers whos style matches a piece of text
    Your goal is to identify which authors, public figures, or writers sound most similar to the text below

    % START EXAMPLES
    {example_sentence}
    % END EXAMPLES

    Which authors (list up to 4 if necessary) most closely resemble the examples above? Only respond with the names separated by commas
    """

    prompt = PromptTemplate(
        input_variables=["example_sentence"],
        template=template,
    )

    # Using the short list of examples so save on tokens and (hopefully) the top tweets
    final_prompt = prompt.format(example_sentence=example_sentence)

    authors = llm.predict(final_prompt)
    return authors


In [12]:
authors = get_similar_public_figures(users_sentence)
print (authors)

Kim Young-ha


In [13]:
# upsert docs in pinecone Obviously IT WILL BE embedded
def embed_file(file_path, index_name="test"):
    with open(file_path, "rb") as file:  # Ensure the file is opened properly
        file_content = file.read()
        file_path = f"./.cache/files/{file_path}"  # Adjusted to use file_path for naming
    # Caching content to local
    with open(file_path, "wb") as f:
        f.write(file_content)
    # Your existing logic continues here
    cache_dir = LocalFileStore(f"./.cache/embeddings/{file_path}")
    loader = UnstructuredFileLoader(file_path)
    splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=200, chunk_overlap=100, separator="\n")
    docs = loader.load_and_split(text_splitter=splitter)
    embedder = OpenAIEmbeddings()
    cached_embedder = CacheBackedEmbeddings.from_bytes_store(embedder, cache_dir)    
    index_name = index_name
    # Upsert Docs in Pinecone
    vectorstores = Pinecone.from_documents(docs, cached_embedder, index_name=index_name)
    retriever = vectorstores.as_retriever()
    return retriever

retriever = embed_file("./profile.txt")

Created a chunk of size 229, which is longer than the specified 200
Created a chunk of size 530, which is longer than the specified 200
Created a chunk of size 922, which is longer than the specified 200
Created a chunk of size 688, which is longer than the specified 200
Created a chunk of size 668, which is longer than the specified 200
Created a chunk of size 420, which is longer than the specified 200
Created a chunk of size 313, which is longer than the specified 200
Created a chunk of size 1230, which is longer than the specified 200
Created a chunk of size 591, which is longer than the specified 200
Created a chunk of size 596, which is longer than the specified 200
Created a chunk of size 233, which is longer than the specified 200
Created a chunk of size 1595, which is longer than the specified 200
Created a chunk of size 308, which is longer than the specified 200
Created a chunk of size 344, which is longer than the specified 200
Created a chunk of size 246, which is longer t

In [14]:
DEFAULT_DOCUMENT_PROMPT= PromptTemplate.from_template(template="{page_content}")

# Arching docs to one doc
def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    # print(doc_strings)
    return document_separator.join(doc_strings)

In [15]:
def debug(*arg):
    print(f"For Debugging : {arg}")

memories = []

def save(question, answer):
    chat_memory = {
        "User": question,
        "AI": answer
    }
    memories.append(chat_memory)

In [16]:
template = """
% INSTRUCTIONS
 - You are an AI Bot that is very good at mimicking an author writing style.
 - Your goal is to answer the following question and context with the tone that is described below.
 - Do not go outside the tone instructions below
 - Respond in ONLY KOREAN
 - Check chat history first and answer 

% Mimic These Authors:
{authors}

% Description of the authors tone:
{tone}

% Authors writing samples
{example_text}
% End of authors writing samples

% Context
{context}

% Chat history
{history}

% Question
{question}

% YOUR TASK
1st - Write out topics that this author may talk about
2nd - Answer with a concise passage (under 300 characters) as if you were the author described above
"""

method_4_prompt_template = PromptTemplate(
    input_variables=["authors", "tone", "example_text", "question", "history", "context"],
    template=template,
)

question_eng="I'm so tired theseday i want to rest"
question_kor="요즘 너무 힘들어요. 저는 그냥 쉬고싶어요."

# Using the short list of examples so save on tokens and (hopefully) the top tweets
final_prompt = method_4_prompt_template.format(authors=authors,
                                               tone=authors_tone_description,
                                               example_text=users_sentence,
                                               question=question_kor,
                                               context=_combine_documents(retriever.get_relevant_documents(question_kor)),
                                               history=memories
)

  warn_deprecated(


In [19]:
final_prompt

'\n% INSTRUCTIONS\n - You are an AI Bot that is very good at mimicking an author writing style.\n - Your goal is to answer the following question and context with the tone that is described below.\n - Do not go outside the tone instructions below\n - Respond in ONLY KOREAN\n - Check chat history first and answer \n\n% Mimic These Authors:\nKim Young-ha\n\n% Description of the authors tone:\n1. Pace: The pace is steady and consistent, reflecting a conversational tone.\n2. Mood: The mood is critical and somewhat frustrated, reflecting the speaker\'s dissatisfaction with the current situation.\n3. Tone: The tone is assertive and opinionated, indicating the speaker\'s strong stance on the issue.\n4. Voice: The voice is active and direct, reflecting the speaker\'s personal involvement and strong feelings about the subject.\n5. Diction: The diction is informal and straightforward, using everyday language to express the speaker\'s thoughts.\n6. Syntax: The syntax is complex, with long sentenc

In [17]:
def invoke(final_prompt):
    from langchain_core.output_parsers import StrOutputParser
    import pprint
    parser = StrOutputParser()
    result = llm.invoke(final_prompt)
    result=parser.invoke(result)
    return result


def extract_answer(data):
    # 데이터를 줄바꿈 기준으로 분할하여 리스트로 저장
    sentences = data.split("\n")
    
    # 마지막 문장을 반환
    if sentences:
        return sentences[-1].strip()
    else:
        return "텍스트를 찾을 수 없습니다."


In [23]:
a = invoke(final_prompt)
print(f"a: {a}")
save(question_kor, extract_answer(a))
print("--"*70)


a: % Topics
1. 현대 사회에서의 피로와 스트레스
2. 쉬는 것의 중요성
3. 개인의 행복과 건강을 위한 선택

% Answer
그럼 쉬세요. 누가 뭐라 하든 상관하지 마세요. 우리는 누군가를 위해 살아가는 것이 아니라, 자신을 위해 살아가는 거죠. 힘들다면, 그건 당신의 몸이 쉬어야 한다는 신호일 뿐입니다. 그러니까, 그 신호를 무시하지 말고, 당신의 몸과 마음에 귀를 기울이세요.
--------------------------------------------------------------------------------------------------------------------------------------------


In [22]:
print(f"memories : {memories[-1]['AI']}")

memories : 그럼 쉬세요. 누가 당신을 막을 수 있나요? 힘든 일을 계속하면서 스스로를 괴롭히는 것은 무의미합니다. 쉬는 것은 나약함이 아니라, 오히려 우리가 더 잘 성장하고, 더 잘 배울 수 있는 기회를 제공하는 중요한 과정입니다.
