# Mistral Fine-tuning API

Check out the docs: https://docs.mistral.ai/capabilities/finetuning/

In [None]:
#!pip install mistralai pandas

## Prepare the dataset

In this example, let’s use the ultrachat_200k dataset. We load a chunk of the data into Pandas Dataframes, split the data into training and validation, and save the data into the required jsonl format for fine-tuning.

In [None]:
import pandas as pd
df = pd.read_json('./video_script/data/generated_video_conversation.jsonl', lines=True)


df_train=df.sample(frac=0.995,random_state=200)
df_eval=df.drop(df_train.index)

df_train.to_json("videos_chunk_train.jsonl", orient="records", lines=True)
df_eval.to_json("videos_chunk_eval.jsonl", orient="records", lines=True)

In [None]:
!ls -lh

## Reformat dataset
If you upload this ultrachat_chunk_train.jsonl to Mistral API, you might encounter an error message “Invalid file format” due to data formatting issues. To reformat the data into the correct format, you can download the reformat_dataset.py script and use it to validate and reformat both the training and evaluation data:

In [None]:
# download the validation and reformat script
!wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py

In [None]:
# validate and reformat the training data
!python reformat_data.py videos_chunk_train.jsonl

In [None]:
# validate the reformat the eval data
!python reformat_data.py videos_chunk_eval.jsonl

In [None]:
df_train.iloc[104]['messages']

## Upload dataset

In [1]:
import os
from mistralai.client import MistralClient

api_key = os.environ.get("MISTRAL_API_KEY")
client = MistralClient(api_key=api_key)



In [None]:
with open("videos_chunk_train.jsonl", "rb") as f:
    videos_chunk_train = client.files.create(file=("videos_chunk_train.jsonl", f))
with open("videos_chunk_eval.jsonl", "rb") as f:
    videos_chunk_eval = client.files.create(file=("videos_chunk_eval.jsonl", f))

In [8]:
import json
def pprint(obj):
    print(json.dumps(obj.dict(), indent=4))

In [None]:
pprint(videos_chunk_train)

In [None]:
pprint(videos_chunk_eval)

## Create a fine-tuning job

In [None]:
from mistralai.models.jobs import TrainingParameters

created_jobs = client.jobs.create(
    model="open-mistral-7b",#"mistral-small-latest", #"open-mistral-7b", 
    training_files=["048ac6dd-6636-467f-9326-c46c15d64e0a"], # videos_chunk_train.id
    validation_files=["1d871376-0334-4bd5-b431-746f65772254"], # videos_chunk_eval.id
    hyperparameters=TrainingParameters(
        training_steps=10,
        learning_rate=0.0001,
        )
)

In [None]:
pprint(created_jobs)

In [None]:
import time

retrieved_job = client.jobs.retrieve(created_jobs.id)
while retrieved_job.status in ["RUNNING", "QUEUED"]:
    retrieved_job = client.jobs.retrieve(created_jobs.id)
    pprint(retrieved_job)
    print(f"Job is {retrieved_job.status}, waiting 10 seconds")
    time.sleep(10)



In [None]:
# List jobs
jobs = client.jobs.list()
pprint(jobs)

In [None]:

# Retrieve a jobs
retrieved_jobs = client.jobs.retrieve("68e070f1-b295-41cc-b052-a51c98e9628d") # "082bdc92-ba5c-46a9-a937-2cfbf8b21bed") #created_jobs.id)
pprint(retrieved_jobs)


In [None]:
# Retrieve a jobs
retrieved_jobs = client.jobs.retrieve("f732bd21-cb32-40bd-bdb7-0546e6eed0c6") #created_jobs.id)
pprint(retrieved_jobs)


## Use a fine-tuned model

In [None]:
role = "As a 'Youtube Video Script Writer' for 'GenAI and LLM powered application' influencer"

content = (f"{role}, "
    f"your task is to write the script for an engaging video of 5 to 10 minutes (1000 to 2000 words)."
    f"The script should include a title, the transcript, the author name and a publication date."
    f"The video topic is 'Generative AI with LLMs'"
    f"Here is a  short description: 'Understand the generative AI lifecycle. Describe transformer architecture powering LLMs. Apply training/tuning/inference methods. Hear from researchers on generative AI challenges/opportunities.'"
    f"Skill level of audience is 'Intermediate', the presenter will be 'Mike Chambers'.\n\n")


In [None]:
from mistralai.models.chat_completion import ChatMessage

chat_response = client.chat(
    model=retrieved_jobs.fine_tuned_model,
    messages=[ChatMessage(role="user", content=content)]
)

In [None]:
pprint(chat_response)

In [None]:
response_ori = client.chat(
        model="mistral-small-latest",
        messages=[ChatMessage(role="user", content=content)],
    )

In [None]:
pprint(response_ori)

In [None]:
news_1 = response_ori.choices[0].message.content

In [None]:
news_2 = chat_response.choices[0].message.content

In [None]:
evaluation_framework = """# Evaluation Framework
Point attribution should be lowered. In case of 0.5 point, give 0.

## Evaluation of the Tone - 12 points

The video must follow these writing styles:
1. Conciseness: use Short Sentences (less than 25 words per sentence): 1 point
2. Use the Present Tense: 1 point
3. Use first person: 1 point
4. Write in a conversational style: 1 point
5. Use more active voice than passive voice: 1 point
6. Keep it simple - no jargon employed: 1 point
7. Sprinkle in some Humor: 1 point
8. Avoid repetition: 1 point
9. Avoid conventional messages: 1 point
10. Avoid overdoing it or over-sensational with words like “cutting edge”, “revolutionize”: 1 point
11. Confident: no words that undermine authority: 1 point
12. Energetic and Enthusiastic Tone: 1 point

## Evaluation of the Structure and Content: 12 points

### Section 1: Video hook and intro: 6 points
- Does the script provide enough context for the video to make sense? 1 point
- Does the Stakes and payoff are introduce to know why we should watch until the end? 1 point
- A curiosity gap is created: What viewers want to know and not all information is given away. 1 point
- Leverage input bias: the effort (time, energy, money) that went into the video is showed. 1 point
- The video body starts no later than the 20-second mark 1 point
- Includes an engaging story or comparison to make the topic relatable 1 point
### Section 2: Body, main content, and research: 4 points
- Consistent contrast is incorporated to keep things from getting stale 1 point
- Good pacing: cycles of high energy and low energy are alternated 1 point
    - Each cycle should be 2 - 4 minutes long, with shorter cycles in the beginning and longer cycles at the end 1 point
    - The last 20% of long video can be used for slower content, while the beginning of the video is allocated for lighter, faster content 1 point
- Critical analysis and personal insights are included 1 point
- Practical, real-world applications of the technologies are discussed 1 point
- Balanced optimism and realism 1 point
### Section 3: CTA (call to action) and conclusion: 2 points
- Conclusion leaves a lasting impression by revealing the payoff 1 point
- Ends on a high note, either dramatic, wholesome, or funny 1 point

## Global score (10): half tone, half structure, and content
"""

video_structure = """The video must follow this structure :\n 
- [Video hook and introduction]\n
- [Body content]\n
- [Conclusion and call to action]\n"""

writing_tips = """The video must follow those writing tips:
 1) Use Short Sentences.
 2) Use the Present Tense.
 3) Use first person
 3) Write in a Conversational Style.
 4) Use More Active Voice Than Passive Voice.
 5) Be Clear and simple: translates jargon into simpler words.
 6) Sprinkle in Some Humor.
 7) Avoid repetition.
 8) Avoid conventional messages and overdoing it with words like “cutting edge”, “revolutionize
 9) Be Confident: removes words that undermine authority.
 10) Be Concise: makes writing more digestible with fewer than 25 words in a sentence,
 fewer than 4 sentences per paragraph, and no double descriptions."""

In [None]:
response_comparison = client.chat(
        model="mistral-large-latest",
        messages=[ChatMessage(role="user", content=f"""{role},
        your task is to refine and rewrite videos transcript to ensure they meet the expected video structure and writing tips.
        {video_structure}
        {writing_tips}
        You are now given two video transcripts.
        Read the script carefully and point which one has the most stylistic issues according
        to the style guide. Do not rewrite the news articles.
        <News_1>{news_1}</News_1>
                      
        <News_2> {news_2}</News_2>""")]
    )

In [None]:
response_comparison.choices[0].message.content

In [None]:
response_critique = client.chat(
        model="mistral-large-latest",
        messages=[ChatMessage(role="user", content=f"""{role},
        your task is to refine and rewrite videos transcript to ensure they meet the high standards of clarity,
        precision, and sophistication characteristic of the influencer.
        You are now given a evaluation framework.
        {evaluation_framework}
        Read the transcript carefully and point out all stylistic issues of the given script according
        to the framework. Do not rewrite the script. 
        Finaly grade the script from 1 to 10 based on the level of compliance of the evaluation framework.
        
        <News>{news_2}</News>
        
        Critique:
        
        Grade:""")]
    )

In [None]:
response_critique.choices[0].message.content

In [None]:
response_critique = client.chat(
        model="mistral-large-latest",
        messages=[ChatMessage(role="user", content=f"""{role},
        your task is to refine and rewrite videos transcript to ensure they meet the high standards of clarity,
        precision, and sophistication characteristic of the influencer.
        You are now given a evaluation framework.
        {evaluation_framework}
        Read the transcript carefully and point out all stylistic issues of the given script according
        to the framework. Do not rewrite the script. 
        Finaly grade the script from 1 to 10 based on the level of compliance of the evaluation framework.
        
        <News>{news_1}</News>
        
        Critique:
        
        Grade:""")]
    )

In [None]:
response_critique.choices[0].message.content

# Revised video script

In [2]:
content = """
[INTRO]
Bonjour, je suis Pierre Bittner animateur d'Applied Ai la chaine youtube l'IA générative et les applications motorisés par les LLM. Je vais vous présenter les résultats de mon hackathon fine tuning mistral.
Produire un ton personnel consistant avec les LLMs est un réel défini, qui est encore impossible de solutionner en s'appuyant uniquement avec les techniques de prompt engineering, meme en utilisant les LLM les plus avancés.
C'est ce problème que j'ai adressé durant le hackathon. D'ailleurs ce script a été revu par mon modèle fine-tuné.
[CONTENT]
Mon objectif est de créer une démarche qui permet de produire des modèles fine-tuné qui édite les scripts de vidéos tout en assurant le respect l'unicité du ton, de la structure de la chaine.
17 vidéos ont été diffusés sur ma chaine, tous les scripts sont passés par un assistant conversation. Et malgré les nombreuses améliorations sur le processus de production, je n'ai jamais réussi à avoir un résultat satisfaisant en ce qui concerne le respect du style d'écriture ou le respect de la structure qui soit satisfaisant.
Au delà de la rédaction de script, se problème se trouve fréquemment par exemple pour la rédaction de rapport d'entreprise ou de nombreux contributeurs participent. Avoir un style cohérent respectant le style de l'entreprise est essentiel et est un travail pénible.
Un simple mot comme youtube, influenceur, engageante, clicker, partager suffit pour que le LLM prenne un style totalement inadapté.
Le fine-tuning de LLM est mis en avant pour être la solution à ce problème. D'ailleurs c'est un des exemples de la documentation de Mistral. 
Mais est ce à la porté de tout le monde. C'est ce que l'on va voir.
C'était une réelle opportunité de me lancer. Je n'ai que des connaissances théoriques sur le sujet. 
La plateforme Mistral propose des outils très simple pour réaliser le fine-tuning des modèles qui était accompagné d'un notebook super pratique.
Merci également pour les 100€ de crédit qui ont été plus qu'utile puisque plus de 24 millions de token ont été utilisé juste pour ce concours.
Faisons un petit zoom sur l'approche que j'ai mis en place pour le cas d'usage.
J'ai donc repris l'exemple 3 sur le fine-tuning de ton avec comme exemple des news avec le style The Economist.
L'exemple n'était pas opérationnel et se concentrait uniquement sur le format des conversations pour l'apprentissage.
J'ai donc reconstruit tout le pipeline pour les news: 1ère étape la génération de news, 2ème étape la génération de critique, 3e étape la génération des conversations.
Générer les news avec le style de the economist n'est pas très difficile car celui ci est déjà intégré dans le LLM. Idem pour générer les critiques.
La plus grosse difficulté dans cette partie a été finalement de trouver le nombre d'exemple requis pour que le fine-tuning fonctionne. 
Après de nombreux essais, je suis tombé sur environ 1000 exemples. 
Le code pour les news est également disponible sur le repo.
Une fois ce premier pipeline en place, venait le coeur du défi.
Je devais créer un moteur de génération de script de vidéo avec mon style.
Autant générer 1000 news de 500 mots, c'est accessibles. Un script de vidéo de 5 à 10', c'est entre 1000 et 2000 mots. On change d'échelle. 
Les étapes que j'ai suivi pour construire le pipeline de génération sont les suivantes:
- j'ai repris tous mes scripts pour les mettre au propres.
- Je les ai faites analyser pour identifier leur ton, la structure
- J'ai créé les directives de générations de script et surtout un framework suffisamment simple, stable, précis et surtout discriminant pour valider le style et la qualité du contenu.
- De nombreux essais ont été nécessaires pour peaufiner les directives et le framework de validation.
- Pour générer des idées de vidéo sur l'IA Générative je suis parti des descriptions des cours de deeplearning.ai
- Finalement, j'ai réaliser des scripts d'évaluation du résultat des différents modèles.
Beaucoup de temps a été passé sur le framework mais pour quel résultat.
Malheureusement les benchmarks réalisé sur le model fine tuné versus le model d'origine ou Mistral Large ne montre pas d'amélioration notable au contraire sur la tache de révision des scripts.
Après analyse, certaines données de l'entrainement font baisser significativement le score des scripts en particulier pour les scripts déjà bien noté.
Les tests unitaires montrent cependant que pour la création de nouveau script, les résultats sont meilleurs.
Il était trop tard pour refaire évoluer le pipeline et faire un benchmark spécifique.
[CONCLUSION]
En tout cas, ce hackathon aura permis de poser de solide base pour un pipeline de fine-tuning de modèle pour le style.
"""

In [3]:
role = "As a 'Youtube Video Script Writer' for 'GenAI and LLM powered application' influencer"

content = (f"{role}, "
    f"your task is to write the script for an engaging video of 2 minutes (500 to 750 words)."
    f"The video present the result of a hackathon participation on fine-tuning model."
    f"The video should follow those rules:\n"
    f"{video_structure}\n\n"
    f"{writing_tips}\n\n"
    f"Here is the content of the script that you should use: ####{content}####.\n\n")


In [4]:
from mistralai.models.chat_completion import ChatMessage

chat_response = client.chat(
    model="ft:mistral-small-latest:c056c2e4:20240628:f732bd21",
    messages=[ChatMessage(role="user", content=content)]
)

In [9]:
pprint(chat_response)

{
    "id": "8d94130e17ad4b4bb61c35ca98420875",
    "object": "chat.completion",
    "created": 1719778074,
    "model": "ft:mistral-small-latest:c056c2e4:20240628:f732bd21",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello everyone, I'm Pierre Bittner, host of Applied Ai, the YouTube channel dedicated to generative AI and applications powered by LLMs. Today, I'll be sharing my findings from the Mistral Fine Tuning Hackathon.\n\nProducing a personal tone consistent with LLMs is a real challenge that cannot be solved solely by relying on prompt engineering techniques, even when using the most advanced LLMs. This is the problem I addressed during the hackathon. In fact, this script has been reviewed by my fine-tuned model.\n\nMy goal is to create an approach that allows for the production of fine-tuned models that edit video scripts while ensuring respect for the unique tone and structure of

In [7]:
chat_response.choices[0].message.content

'Hello everyone, I\'m Pierre Bittner, host of Applied Ai, the YouTube channel dedicated to generative AI and applications powered by LLMs. Today, I\'ll be sharing my findings from the Mistral Fine Tuning Hackathon.\n\nProducing a personal tone consistent with LLMs is a real challenge that cannot be solved solely by relying on prompt engineering techniques, even when using the most advanced LLMs. This is the problem I addressed during the hackathon. In fact, this script has been reviewed by my fine-tuned model.\n\nMy goal is to create an approach that allows for the production of fine-tuned models that edit video scripts while ensuring respect for the unique tone and structure of the channel. Seventeen videos have been posted on my channel, and all the scripts have gone through a conversation assistant. Despite numerous improvements to the production process, I\'ve never been able to achieve a satisfactory result in terms of maintaining the writing style or structure.\n\nThis issue goes

## Integration with Weights and Biases
We can also offer support for integration with Weights & Biases (W&B) to monitor and track various metrics and statistics associated with our fine-tuning jobs. To enable integration with W&B, you will need to create an account with W&B and add your W&B information in the “integrations” section in the job creation request:



In [None]:
from mistralai.models.jobs import WandbIntegrationIn

WANDB_API_KEY = "XXX"

created_jobs = client.jobs.create(
    model="open-mistral-7b",
    training_files=[ultrachat_chunk_train.id],
    validation_files=[ultrachat_chunk_eval.id],
    hyperparameters=TrainingParameters(
        training_steps=100,
        learning_rate=0.0001,
    ),
    integrations=[
        WandbIntegrationIn(
            project="test_ft_api",
            run_name="test",
            api_key=WANDB_API_KEY,
        ).dict()
    ],
)