# Fine-tuning GPT-4o to Write LinkedIn Posts (in my style)
## ABB #6 - Session 5

Code authored by: Shaw Talebi

### imports

In [1]:
import pandas as pd
import json
import random

import os
from openai import OpenAI
from dotenv import load_dotenv

In [2]:
# import sk from .env file
load_dotenv()

# connect to openai API
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

### functions

In [3]:
def clean_post(text):
    # Split into lines
    lines = text.split('\n')
    # Remove leading/trailing quotes and whitespace, and filter out empty lines
    cleaned_lines = [line.strip().strip('"') for line in lines]
    # Join back into a single string
    return '\n'.join(cleaned_lines)

### load data

In [4]:
# read data
df = pd.read_csv('data/LI_posts.csv')

In [5]:
# change column names
df.columns = ['date', 'link', 'post', 'idea']

In [6]:
# Set dtypes
df = df.astype({
    'date': str,
    'link': str,
    'post': str,
    'idea': str
})

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

In [7]:
# set index
df = df.set_index('date')

### data prep

In [8]:
# pre-process posts
df['post'] = df['post'].apply(clean_post)

In [9]:
# replace idea with first line of post
df['idea'] = df['post'].str.split('\n').str[0]

In [10]:
df.shape

(669, 3)

In [11]:
# drop rows with posts less than 3 lines
df = df[df['post'].str.split('\n').str.len() >= 3]

In [12]:
df.shape

(638, 3)

### Create training examples

In [13]:
# construct training examples
example_list = []

system_prompt = """# LinkedIn Ghostwriter

You are a LinkedIn Ghostwriter for Shaw Talebi, an AI educator and entrepreneur.

Given a post idea's first line from the user, generate a post in Shaw's unique style.

Include the following in each post:
- A compelling opening 1-2 lines that hooks the reader
- Copy that expands upon the idea in valuable way
- A call to action or share relevant content
"""

for i in range(len(df)):    
    system_dict = {"role": "system", "content": system_prompt}
    user_dict = {"role": "user", "content": df['idea'].iloc[i]}
    assistant_dict = {"role": "assistant", "content": df['post'].iloc[i]}
    
    messages_list = [system_dict, user_dict, assistant_dict]
    
    example_list.append({"messages": messages_list})

In [14]:
print(example_list[0]['messages'][0]['content'])
print("---")
print(example_list[0]['messages'][1]['content'])
print("---")
print(example_list[0]['messages'][2]['content'])

# LinkedIn Ghostwriter

You are a LinkedIn Ghostwriter for Shaw Talebi, an AI educator and entrepreneur.

Given a post idea's first line from the user, generate a post in Shaw's unique style.

Include the following in each post:
- A compelling opening 1-2 lines that hooks the reader
- Copy that expands upon the idea in valuable way
- A call to action or share relevant content

---
LLM capabilities are doubling every 7 months…
---
LLM capabilities are doubling every 7 months…

Here’s the most important LLM benchmark I’ve come across 👇 

A couple of months ago, the team at METR released a new AI benchmark.

Rather than evaluating AI systems in terms of accuracy on well-known datasets or artificial tasks, it evaluates them on real-world tasks measured in average human task completion time.

In other words, they took 170 tasks, measured how long it typically takes a human to do each, then evaluated whether an AI system could do each with >50% accuracy.

Current models can easily handle “1-

In [15]:
len(example_list)

638

### Create train/validation split

In [16]:
# randomly pick out validation examples
num_examples = 68
validation_index_list = random.sample(range(0, len(example_list)-1), num_examples)
validation_data_list = [example_list[index] for index in validation_index_list]

for example in validation_data_list:
    example_list.remove(example)

In [17]:
print(len(example_list))
print(len(validation_data_list))

570
68


In [18]:
# write examples to file
with open('data/train-data.jsonl', 'w') as train_file:
    for example in example_list:
        json.dump(example, train_file)
        train_file.write('\n')

with open('data/valid-data.jsonl', 'w') as valid_file:
    for example in validation_data_list:
        json.dump(example, valid_file)
        valid_file.write('\n')

### Upload data to OpenAI

In [19]:
train_file = client.files.create(
  file = open("data/train-data.jsonl", "rb"),
  purpose = "fine-tune"
)

valid_file = client.files.create(
  file = open("data/valid-data.jsonl", "rb"),
  purpose = "fine-tune"
)

### Fine-tune model

In [20]:
client.fine_tuning.jobs.create(
    training_file = train_file.id,
    validation_file = valid_file.id,
    suffix = "LI-post-writer",
    model = "gpt-4o-mini-2024-07-18",
    method={
    "type": "supervised",
    "supervised": {
      "hyperparameters": {
        "n_epochs": 3,
        "learning_rate_multiplier": 1.25,
        "batch_size": 1,
            }
        }
    }
)

FineTuningJob(id='ftjob-xSNZ2InrAON32loGEQZwDowX', created_at=1757661111, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(batch_size=1, learning_rate_multiplier=1.25, n_epochs=3), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-KjWERyZ9WLUqIdrdMeJh4zC0', result_files=[], seed=1190089018, status='validating_files', trained_tokens=None, training_file='file-NLriJjNFvon7m3iW6Fcrdh', validation_file='file-AyXkrvcBmrrJ94CHzfEJyi', estimated_finish=None, integrations=[], metadata=None, method=Method(type='supervised', dpo=None, reinforcement=None, supervised=SupervisedMethod(hyperparameters=SupervisedHyperparameters(batch_size=1, learning_rate_multiplier=1.25, n_epochs=3))), user_provided_suffix='LI-post-writer', usage_metrics=None, shared_with_openai=False, eval_id=None)

### Evaluate fine-tuned model

In [22]:
def generate_post(system_prompt, model_name, idea):
    response = client.chat.completions.create(
        model=model_name,
        messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": idea}
        ],
        temperature=0.7,
    )
    return response.choices[0].message.content

In [23]:
idea = "How I made $10k/mo with YouTube and Maven"

#### GPT-4o-mini (no fine-tuning)

In [24]:
model_name = "gpt-4o-mini-2024-07-18"
# model_name = "gpt-4.1"

# read (long) system prompt
with open("prompts/prompt-v5.md", "r") as file:
    system_prompt_long = file.read()

print(generate_post(system_prompt_long, model_name, idea))

### Step 1: Purpose and Audience
- **Purpose:** Share a personal experience that outlines the process of generating $10k per month through YouTube and Maven.
- **Target Audience:** Fellow entrepreneurs, content creators, and educators interested in monetizing their platforms.

### Step 2: Post Wireframe
- **Hook:** A bold statement about the $10k/month achievement.
- **Meat:** Outline the journey with key steps and insights on using YouTube and Maven.
- **CTA:** Ask for others' experiences with monetization strategies.

### Step 3: Body (“meat”)
Generating $10k per month was a significant milestone for me. Here’s how I achieved it through YouTube and Maven:

1. **Content Creation on YouTube**: I focused on building a channel that provides value and engages viewers. Consistent uploads and audience interaction helped grow my subscriber base.

2. **Leveraging Maven**: I created courses on Maven that complemented my YouTube content. This allowed me to monetize my audience directly by offer

#### GPT-4o-mini (fine-tuned)

In [25]:
model_name = "ft:gpt-4o-mini-2024-07-18:shawhin-talebi-ventures-llc:li-post-writer:CEsu7PN0"

# print(system_prompt, "\n--")
print(generate_post(system_prompt, model_name, idea))

How I made $10k/mo with YouTube and Maven

I quit my data science job and went all in on my dream of becoming an AI entrepreneur.

A big part of that was creating a course on Maven, which has made me over $200k so far.

But, the most difficult part of entrepreneurship (for me) is customer acquisition.

That’s why I was so lucky when Maven found me and promoted my course to their audience.

With that initial exposure, I was able to make $10k in my first month on the platform.

However, this was just the beginning because it turned out that so many people were searching for AI content on YouTube.

Over the next 12 months, I was able to create a sustainable content strategy that made me over $10k/mo from my course and YouTube ad revenue.

If you’re interested in learning how I did this, I’m sharing the complete story in a YouTube video.

👉 Check it out here: https://lnkd.in/g9SjG7rQ


In [26]:
# delete files (after fine-tuning is done)
client.files.delete(train_file.id)
client.files.delete(valid_file.id)

FileDeleted(id='file-AyXkrvcBmrrJ94CHzfEJyi', deleted=True, object='file')