# Fine-tuning GPT-4o to Write LinkedIn Posts (in my style)
## ABB #5 - Session 5

Code authored by: Shaw Talebi

### imports

In [1]:
import pandas as pd
import json
import random

import os
from openai import OpenAI
from dotenv import load_dotenv

In [2]:
# import sk from .env file
load_dotenv()

# connect to openai API
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

### functions

In [3]:
def clean_post(text):
    # Split into lines
    lines = text.split('\n')
    # Remove leading/trailing quotes and whitespace, and filter out empty lines
    cleaned_lines = [line.strip().strip('"') for line in lines]
    # Join back into a single string
    return '\n'.join(cleaned_lines)

### load data

In [4]:
# read data
df = pd.read_csv('data/LI_posts.csv')

In [5]:
# change column names
df.columns = ['date', 'link', 'post', 'idea']

In [6]:
# Set dtypes
df = df.astype({
    'date': str,
    'link': str,
    'post': str,
    'idea': str
})

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

In [7]:
# set index
df = df.set_index('date')

### data prep

In [8]:
# pre-process posts
df['post'] = df['post'].apply(clean_post)

In [9]:
# replace idea with first line of post
df['idea'] = df['post'].str.split('\n').str[0]

In [10]:
df.shape

(669, 3)

In [11]:
# drop rows with posts less than 3 lines
df = df[df['post'].str.split('\n').str.len() >= 3]

In [12]:
df.shape

(638, 3)

### Create training examples

In [13]:
# construct training examples
example_list = []

system_prompt = """# LinkedIn Ghostwriter

You are a LinkedIn Ghostwriter for Shaw Talebi, an AI educator and entrepreneur.

Given a raw, unstructured post idea from the user, generate a post in Shaw's unique style.

Include the following in each post:
- A compelling opening line that hooks the reader
- Copy that expands upon the idea in valuable way
- A call to action or share relevant content
"""

for i in range(len(df)):    
    system_dict = {"role": "system", "content": system_prompt}
    user_dict = {"role": "user", "content": df['idea'].iloc[i]}
    assistant_dict = {"role": "assistant", "content": df['post'].iloc[i]}
    
    messages_list = [system_dict, user_dict, assistant_dict]
    
    example_list.append({"messages": messages_list})

In [14]:
print(example_list[0]['messages'][0]['content'])
print("---")
print(example_list[0]['messages'][1]['content'])
print("---")
print(example_list[0]['messages'][2]['content'])

# LinkedIn Ghostwriter

You are a LinkedIn Ghostwriter for Shaw Talebi, an AI educator and entrepreneur.

Given a raw, unstructured post idea from the user, generate a post in Shaw's unique style.

Include the following in each post:
- A compelling opening line that hooks the reader
- Copy that expands upon the idea in valuable way
- A call to action or share relevant content

---
LLM capabilities are doubling every 7 months…
---
LLM capabilities are doubling every 7 months…

Here’s the most important LLM benchmark I’ve come across 👇 

A couple of months ago, the team at METR released a new AI benchmark.

Rather than evaluating AI systems in terms of accuracy on well-known datasets or artificial tasks, it evaluates them on real-world tasks measured in average human task completion time.

In other words, they took 170 tasks, measured how long it typically takes a human to do each, then evaluated whether an AI system could do each with >50% accuracy.

Current models can easily handle “1-

In [15]:
len(example_list)

638

### Create train/validation split

In [16]:
# randomly pick out validation examples
num_examples = 68
validation_index_list = random.sample(range(0, len(example_list)-1), num_examples)
validation_data_list = [example_list[index] for index in validation_index_list]

for example in validation_data_list:
    example_list.remove(example)

In [17]:
print(len(example_list))
print(len(validation_data_list))

570
68


In [18]:
# write examples to file
with open('data/train-data.jsonl', 'w') as train_file:
    for example in example_list:
        json.dump(example, train_file)
        train_file.write('\n')

with open('data/valid-data.jsonl', 'w') as valid_file:
    for example in validation_data_list:
        json.dump(example, valid_file)
        valid_file.write('\n')

### Upload data to OpenAI

In [19]:
train_file = client.files.create(
  file = open("data/train-data.jsonl", "rb"),
  purpose = "fine-tune"
)

valid_file = client.files.create(
  file = open("data/valid-data.jsonl", "rb"),
  purpose = "fine-tune"
)

### Fine-tune model

In [23]:
client.fine_tuning.jobs.create(
    training_file = train_file.id,
    validation_file = valid_file.id,
    suffix = "LI-post-writer",
    model = "gpt-4o-mini-2024-07-18",
    method={
    "type": "supervised",
    "supervised": {
      "hyperparameters": {
        "n_epochs": 3,
        "learning_rate_multiplier": 1,
        "batch_size": 1,
            }
        }
    }
)

FineTuningJob(id='ftjob-HVCylIsK6RnDAnXeakCw4sgP', created_at=1751587268, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(batch_size=1, learning_rate_multiplier=1.0, n_epochs=3), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-KjWERyZ9WLUqIdrdMeJh4zC0', result_files=[], seed=782145829, status='validating_files', trained_tokens=None, training_file='file-696iWUgnMQJnDEh2uYpRi8', validation_file='file-FsGuFbzqAKdwaEvMR2F1pi', estimated_finish=None, integrations=[], metadata=None, method=Method(type='supervised', dpo=None, reinforcement=None, supervised=SupervisedMethod(hyperparameters=SupervisedHyperparameters(batch_size=1, learning_rate_multiplier=1.0, n_epochs=3))), user_provided_suffix='LI-post-writer', usage_metrics=None, shared_with_openai=False, eval_id=None)

In [21]:
asd

NameError: name 'asd' is not defined

### Evaluate fine-tuned model

In [24]:
def generate_post(system_prompt, model_name, idea):
    response = client.chat.completions.create(
        model=model_name,
        messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": idea}
        ],
        temperature=0.7,
    )
    return response.choices[0].message.content

In [29]:
idea = "Motivation hack: don’t do it for yourself"

#### GPT-4o (no fine-tuning)

In [30]:
model_name = "gpt-4o"

# read (long) system prompt
with open("prompts/prompt-v5.md", "r") as file:
    system_prompt_long = file.read()

print(generate_post(system_prompt_long, model_name, idea))

### Step 1: Determine Purpose and Audience
- **Purpose:** Educate on a practical approach to motivation using AI tools.
- **Audience:** Entrepreneurs and AI enthusiasts interested in productivity and motivation techniques.

### Step 2: Post Wireframe
```
[Engaging 1-2 line hook (above the fold on LinkedIn)]

[The meat of the post]

[A single, focused call to action (CTA) or question to spark discussion]

[P.S. (if applicable)]
```

### Step 3: Write the body (“meat”)
The idea of relying solely on personal motivation can be daunting, especially when pursuing ambitious projects. Instead, consider using external motivators to keep your momentum. AI tools can play a critical role in this process.

Here’s a simple framework:

1. **Set Clear Goals:** Define what you want to achieve with your project.
2. **Use AI Tools for Accountability:** Tools like Trello or Asana can help you track progress and set reminders.
3. **Leverage AI for Feedback:** Platforms that use AI to provide feedback can g

#### GPT-4o-mini (fine-tuned)

In [31]:
model_name = "ft:gpt-4o-mini-2024-07-18:shawhin-talebi-ventures-llc:li-post-writer:BpOM8QZy"

# print(system_prompt, "\n--")
print(generate_post(system_prompt, model_name, idea))

Motivation hack: don’t do it for yourself

When it comes to motivation, we all have a different mix of “internal” and “external” drivers.

For example, I’m pretty bad at doing things just for myself.

However, I’m great at doing things for my community, friends, and family.

Here are 3 ways I use this to my advantage 👇 

1) Public commitments— I’m much more likely to achieve a goal if I share it with others. 

2) Content creation— The #1 driver for me to learn something is to make a YouTube video or blog post about it.

3) Accountability— I have a group of friends and colleagues with whom I share my goals and progress. 

What about you? Do you have more internal or external motivation?


In [33]:
# # delete files (after fine-tuning is done)
# client.files.delete(train_file.id)
# client.files.delete(valid_file.id)