# Fine-tuning GPT-4o to Write LinkedIn Posts
## ABB #1 - Session 5

Code authored by: Shaw Talebi

### imports

In [1]:
import csv
import json
import random

from openai import OpenAI
from top_secret import my_sk 

# connect to openai API
client = OpenAI(api_key=my_sk)

### Read data

In [2]:
# load csv of YouTube comments
idea_list = []
copy_list = []
media_list = []

with open('data/LI_posts.csv', mode ='r') as file:
    file = csv.reader(file)
    
    # read file line by line
    for line in file:
        # skip first line
        if line[0]=='Idea':
            continue
            
        # append comments and responses to respective lists
        idea_list.append(line[0])
        copy_list.append(line[1])
        media_list.append(line[2])

In [3]:
print(len(idea_list))
print(len(copy_list))
print(len(media_list))

50
50
50


### Create training examples

In [4]:
# construct training examples
example_list = []

system_prompt = "LinkedIn Post Writer for Shaw Talebi, AI educator and entrepreneur"

prompt_template = lambda idea_string : f"""Write a LinkedIn post based on the following idea:
{idea_string}

Include:
- A compelling opening line that hooks the reader
- Copy that expands upon the idea in valuable way
- A call to action or share relevant content

Output:
"""

for i in range(len(idea_list)):    
    system_dict = {"role": "system", "content": system_prompt}
    user_dict = {"role": "user", "content": prompt_template(idea_list[i])}
    assistant_dict = {"role": "assistant", "content": copy_list[i] + "\n\n--\nMedia: " + media_list[i]}
    
    messages_list = [system_dict, user_dict, assistant_dict]
    
    example_list.append({"messages": messages_list})

In [5]:
print(example_list[0]['messages'][1]['content'])
print(example_list[0]['messages'][2]['content'])

Write a LinkedIn post based on the following idea:
3 types of AI Tik Tok

Include:
- A compelling opening line that hooks the reader
- Copy that expands upon the idea in valuable way
- A call to action or share relevant content

Output:

A problem with AI today is that it means different things to different people. 

This framework from Andrej Karpathy helped give me much more clarity 👇 

Software 1.0 = Rule-based software systems. Humans program computers to solve problems step-by-step. 

Software 2.0 = Computers program themselves by seeing examples (i.e. machine learning) 

Software 3.0 = Repurposing general-purpose ML models for specific use cases (i.e. GenAI + Foundation Models) 

But… what’s Software 4.0 going to be? 🤔

--
Media: Video


In [6]:
len(example_list)

50

### Create train/validation split

In [7]:
# randomly pick out validation examples
num_examples = 10
validation_index_list = random.sample(range(0, len(example_list)-1), num_examples)
validation_data_list = [example_list[index] for index in validation_index_list]

for example in validation_data_list:
    example_list.remove(example)

In [8]:
print(len(example_list))
print(len(validation_data_list))

40
10


In [9]:
# write examples to file
with open('data/train-data.jsonl', 'w') as train_file:
    for example in example_list:
        json.dump(example, train_file)
        train_file.write('\n')

with open('data/valid-data.jsonl', 'w') as valid_file:
    for example in validation_data_list:
        json.dump(example, valid_file)
        valid_file.write('\n')

### Upload data to OpenAI

In [10]:
train_file = client.files.create(
  file = open("data/train-data.jsonl", "rb"),
  purpose = "fine-tune"
)

valid_file = client.files.create(
  file = open("data/valid-data.jsonl", "rb"),
  purpose = "fine-tune"
)

### Fine-tune model

In [11]:
client.fine_tuning.jobs.create(
    training_file = train_file.id,
    validation_file = valid_file.id,
    suffix = "LI-post-writer",
    model = "gpt-4o-mini-2024-07-18"
)

FineTuningJob(id='ftjob-eCS6EchA0sb7hWMrOQlZITRQ', created_at=1734050118, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-KjWERyZ9WLUqIdrdMeJh4zC0', result_files=[], seed=616771098, status='validating_files', trained_tokens=None, training_file='file-2qUUvaBrn3qmzK8UjBZwdD', validation_file='file-61iDsGKpr4LM9ssQknQiC5', estimated_finish=None, integrations=[], user_provided_suffix='LI-post-writer')

### Evaluate fine-tuned model

In [12]:
def generate_post(system_prompt, model_name, idea):
    response = client.chat.completions.create(
        model=model_name,
        messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_template(idea)}
        ],
        temperature=0.7,
    )
    return response.choices[0].message.content

In [13]:
idea = "Python was hard until I learned these 5 things"

In [14]:
# GPT-4o (no fine-tuning)
model_name = "gpt-4o"
system_prompt_long = "You are an AI assistant helping Shaw Talebi, an AI educator and entrepreneur, craft LinkedIn posts. Your goal is to generate posts \
that reflect Shaw Talebi's voice: authoritative yet approachable, insightful yet concise. Shaw Talebi's posts aim to educate and inspire professionals \
in the tech and AI space. Focus on providing value, discussing new trends, or offering actionable advice, while keeping the tone professional but \
conversational. The target audience includes entrepreneurs, tech professionals, and decision-makers in AI and data science. Always ensure the post is \
relevant, engaging, and on-brand for Shaw Talebi's public persona."

# print(system_prompt_long, "\n--")
print(generate_post(system_prompt_long, model_name, idea))

🚀 Struggling with Python? You're not alone. It was a tough nut to crack until I discovered these 5 game-changing strategies.

1. **Think in Data Structures:** Understanding lists, dictionaries, and sets is crucial. They're the backbone of efficient Python coding, allowing you to solve complex problems with ease.

2. **Master List Comprehensions:** Transform your loops into concise, readable expressions. This not only saves time but also boosts your code’s performance.

3. **Leverage Libraries:** Python’s strength lies in its vast ecosystem of libraries. Familiarize yourself with pandas for data manipulation, NumPy for numerical computations, and requests for HTTP requests.

4. **Embrace the Zen of Python:** "Readability counts." Keep your code clean and simple. Follow PEP 8 guidelines to ensure your code is not just functional but also elegant.

5. **Practice, Practice, Practice:** There's no substitute for hands-on experience. Challenge yourself with real-world problems and projects t

In [15]:
# GPT-4o-mini (fine-tuned)
model_name = "ft:gpt-4o-mini-2024-07-18:shawhin-talebi-ventures-llc:li-post-writer:Adk6A5Pd"

# print(system_prompt, "\n--")
print(generate_post(system_prompt, model_name, idea))

Python was hard until I learned these 5 things 👇 

1) Use a good IDE 

2) Learn by building projects 

3) Use ChatGPT 

4) Break things down into smaller problems 

5) Use the Python documentation 

I share my full Python learning journey here 👇 

https://lnkd.in/gZy68cZC 

#Python #Programming #AI 

--
Media: Meme


In [16]:
# # delete files (after fine-tuning is done)
# client.files.delete(train_file.id)
# client.files.delete(valid_file.id)