# fine_tunning_customer_support_1
Test: https://norahsakal.com/blog/fine-tune-gpt3-model

In [5]:
import pandas as pd
import json
import openai
from getpass import getpass

### Paso 1: Carga de api_key

In [6]:
# Solicitar la contraseña al usuario
openai.api_key = getpass("Introduzca la contraseña: ")              # <- NECESARIO
#api_key ="xxxxxxxxxxxxx"
#openai.api_key = api_key

Introduzca la contraseña: ········


### Paso 2: Carga de fichero (prompts / completions) 
Formato (' ->', ' ', '.\n') manual (puede omitirse éste paso)

In [7]:
df = pd.read_excel('test_fine_tunning_2.xlsx')                      # <- NECESARIO
df['prompt'] = df['prompt'].apply(lambda x: str(x) + ' ->')         # <- NO NECESARIO (lo puede hacer openai)
df['completion'] = df['completion'].apply(lambda x: ' ' + str(x))   # <- NO NECESARIO (lo puede hacer openai)
df['completion'] = df['completion'].apply(lambda x: str(x) + '.\n') # <- NO NECESARIO (lo puede hacer openai)
df                                                                  # <- NO NECESARIO (lo puede hacer openai)

Unnamed: 0,prompt,completion
0,Where is the billing ->,You find the billing in the left-hand side me...
1,How do I upgrade my account ->,Visit you user settings in the left-hand side...


### Paso 3: Se convierte el dataset en diccionario .json

In [35]:
# Convertir el DataFrame en una lista de diccionarios
training_data = df.to_dict('records')          # <- NECESARIO

# Imprimir el resultado
training_data                                  # <- NO NECESARIO

[{'prompt': 'Where is the billing ->',
  'completion': ' You find the billing in the left-hand side menu.\n'},
 {'prompt': 'How do I upgrade my account ->',
  'completion': " Visit you user settings in the left-hand side menu, then click 'upgrade account' button at the top.\n"}]

### Paso 4:
Make sure to end each <b>prompt</b> with a suffix. According to the OpenAI API reference, you can use <b>-></b>. <br/>
Also, make sure to end each <b>completion</b> with a suffix as well; I'm using <b>.\n.</b> <br/>
The next step is to convert the dict to a proper JSONL file. <br/>
JSONL file is a newline-delimited JSON file, so we'll add a \n at the end of each object:

In [37]:
#training_data = [{
#    "prompt": "Where is the billing ->",
#    "completion": " You find the billing in the left-hand side menu.\n"
#},{
#    "prompt":"How do I upgrade my account ->",
#    "completion": " Visit you user settings in the left-hand side menu, then click 'upgrade account' button at the top.\n"
#}]

In [38]:
print(training_data)                         # <- NO NECESARIO

[{'prompt': 'Where is the billing ->', 'completion': ' You find the billing in the left-hand side menu.\n'}, {'prompt': 'How do I upgrade my account ->', 'completion': " Visit you user settings in the left-hand side menu, then click 'upgrade account' button at the top.\n"}]


In [39]:
# Se crea un fichero tipo .jsonl vacío
file_name = "training_data.jsonl"            # <- NECESARIO

# agrega cada registro de 'training_data' al fichero .jsonl file_name
with open(file_name, "w") as output_file:    # <- NECESARIO
 for entry in training_data:
  json.dump(entry, output_file)              # <- NECESARIO json.dump para escribir cada elemento en el archivo de salida
  output_file.write("\n")

In [40]:
# Sólo para leer el contenido del fichero "training_data.jsonl" 
with open('training_data.jsonl', 'r') as archivo:   # <- NO NECESARIO
    for linea in archivo:
        # Analizar la línea como un objeto JSON
        objeto = json.loads(linea)
        # Hacer algo con el objeto
        print(objeto)

{'prompt': 'Where is the billing ->', 'completion': ' You find the billing in the left-hand side menu.\n'}
{'prompt': 'How do I upgrade my account ->', 'completion': " Visit you user settings in the left-hand side menu, then click 'upgrade account' button at the top.\n"}


Now you have the training data as a JSONL file, let's check the training data before starting the fine-tuning.

### Paso 5: Preparación del formato del fichero 'training_data'
We can check the training data using a CLI data preparation tool provided by OpenAI. <br/>
It gives you suggestions about how you can reformat the training data.

In [41]:
!openai tools fine_tunes.prepare_data -f training_data.jsonl      # <- NECESARIO

Analyzing...

- Your file contains 2 prompt-completion pairs. In general, we recommend having at least a few hundred examples. We've found that performance tends to linearly increase for every doubling of the number of examples
- All prompts end with suffix ` ->`
- All completions end with suffix `.\n`

No remediations found.

You can use your file for fine-tuning:
> openai api fine_tunes.create -t "training_data.jsonl"

After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string ` ->` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=[".\n"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 2.47 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.


### Paso 6: Cargar de un archivo (file) y devolver una respuesta (upload_response).
se llama a la función openai.File.create() y se le pasa el archivo abierto y el propósito de la carga, que en este caso es "fine-tune", lo que sugiere que se trata de un archivo utilizado para afinar un modelo de inteligencia artificial.

El resultado de la llamada a openai.File.create() es una respuesta (upload_response) que contiene información sobre el archivo cargado, incluyendo su ID (file_id). La respuesta se almacena en la variable upload_response.

Por último, se devuelve la respuesta (upload_response) como salida de la función.

In [6]:
upload_response = openai.File.create(                         # <- NECESARIO
  file=open(file_name, "rb"),
  purpose='fine-tune'
)
file_id = upload_response.id
upload_response

<File file id=file-2yfyOwo1hurQkMBCZvQZzhrH at 0x2115bffa590> JSON: {
  "bytes": 274,
  "created_at": 1677561393,
  "filename": "file",
  "id": "file-2yfyOwo1hurQkMBCZvQZzhrH",
  "object": "file",
  "purpose": "fine-tune",
  "status": "uploaded",
  "status_details": null
}

In [42]:
file_id                               # <- NO NECESARIO (sólo sirve para extraer el código)

'file-2yfyOwo1hurQkMBCZvQZzhrH'

If you check the response, you'll see the <b>file id</b> which we'll need in the next step when we're training the model<br/>
Use this <b>file id</b> in the next step, where we'll fine-tune a model:

### Paso 7: Entrenamiento del modelo con el fichero file_id'
we have the prepared training data, uploaded it, and now we're finally ready to fine-tune the model. <br/>
The default model is <b>Curie</b>. But if you'd like to use <b>DaVinci</b> instead, then add it as a base model to fine-tune like this: <br/>
`openai.FineTune.create(training_file=file_id, model="davinci")`

In [7]:
fine_tune_response = openai.FineTune.create(training_file=file_id)          # <- NECESARIO
fine_tune_response

<FineTune fine-tune id=ft-qQTc3ARUO3ZEgAxidS7x28PD at 0x2115c006f40> JSON: {
  "created_at": 1677561394,
  "events": [
    {
      "created_at": 1677561394,
      "level": "info",
      "message": "Created fine-tune: ft-qQTc3ARUO3ZEgAxidS7x28PD",
      "object": "fine-tune-event"
    }
  ],
  "fine_tuned_model": null,
  "hyperparams": {
    "batch_size": null,
    "learning_rate_multiplier": null,
    "n_epochs": 4,
    "prompt_loss_weight": 0.01
  },
  "id": "ft-qQTc3ARUO3ZEgAxidS7x28PD",
  "model": "curie",
  "object": "fine-tune",
  "organization_id": "org-LD5crJfGLy7FqLrW8b9U7MJO",
  "result_files": [],
  "status": "pending",
  "training_files": [
    {
      "bytes": 274,
      "created_at": 1677561393,
      "filename": "file",
      "id": "file-2yfyOwo1hurQkMBCZvQZzhrH",
      "object": "file",
      "purpose": "fine-tune",
      "status": "uploaded",
      "status_details": null
    }
  ],
  "updated_at": 1677561394,
  "validation_files": []
}

### Paso 8: Comprobación del proceso de fine-tuning

You can use two openai functions to check the progress of your fine-tuning. <br/>
Option 1: List events  <br/>
You can use openai.FineTune.list_events() and pass in the fine_tune_response id to list all the current events:

In [8]:
fine_tune_events = openai.FineTune.list_events(id=fine_tune_response.id)

In [9]:
fine_tune_events

<OpenAIObject list at 0x21155a1e590> JSON: {
  "data": [
    {
      "created_at": 1677561394,
      "level": "info",
      "message": "Created fine-tune: ft-qQTc3ARUO3ZEgAxidS7x28PD",
      "object": "fine-tune-event"
    }
  ],
  "object": "list"
}

In [10]:
retrieve_response = openai.FineTune.retrieve(id=fine_tune_response.id)

In [11]:
retrieve_response

<FineTune fine-tune id=ft-qQTc3ARUO3ZEgAxidS7x28PD at 0x2115bfe0e00> JSON: {
  "created_at": 1677561394,
  "events": [
    {
      "created_at": 1677561394,
      "level": "info",
      "message": "Created fine-tune: ft-qQTc3ARUO3ZEgAxidS7x28PD",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677561967,
      "level": "info",
      "message": "Fine-tune costs $0.00",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677561968,
      "level": "info",
      "message": "Fine-tune enqueued. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677561970,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677562031,
      "level": "info",
      "message": "Completed epoch 1/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677562032,
      "level": "info",
      "message": "Completed epoch 2/4",
      "object": "fine-tune-

In [45]:
openai.FineTune.list()

<OpenAIObject list at 0x2115d467db0> JSON: {
  "data": [
    {
      "created_at": 1676637820,
      "fine_tuned_model": null,
      "hyperparams": {
        "batch_size": null,
        "classification_positive_class": " baseball",
        "compute_classification_metrics": true,
        "learning_rate_multiplier": null,
        "n_epochs": 4,
        "prompt_loss_weight": 0.01
      },
      "id": "ft-ZXadIL52ugQ8aHnQCN1FuQYu",
      "model": "ada",
      "object": "fine-tune",
      "organization_id": "org-LD5crJfGLy7FqLrW8b9U7MJO",
      "result_files": [],
      "status": "failed",
      "training_files": [
        {
          "bytes": 1519022,
          "created_at": 1676637818,
          "filename": "sport2_prepared_train.jsonl",
          "id": "file-NCzg8o1GIrmBNtQXsLVlPL6I",
          "object": "file",
          "purpose": "fine-tune",
          "status": "error",
          "status_details": "Could not validate file. Please contact us through our help center at help.openai.com 

In [48]:
openai.FineTune.retrieve("ft-qQTc3ARUO3ZEgAxidS7x28PD")

<FineTune fine-tune id=ft-qQTc3ARUO3ZEgAxidS7x28PD at 0x2115d4671d0> JSON: {
  "created_at": 1677561394,
  "events": [
    {
      "created_at": 1677561394,
      "level": "info",
      "message": "Created fine-tune: ft-qQTc3ARUO3ZEgAxidS7x28PD",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677561967,
      "level": "info",
      "message": "Fine-tune costs $0.00",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677561968,
      "level": "info",
      "message": "Fine-tune enqueued. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677561970,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677562031,
      "level": "info",
      "message": "Completed epoch 1/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1677562032,
      "level": "info",
      "message": "Completed epoch 2/4",
      "object": "fine-tune-

### Paso 9: Se guarda el código de modelo de fine-tuning

In [49]:
# Option 2 | if response.fine_tuned_model == null
retrieve_response = openai.FineTune.retrieve("ft-qQTc3ARUO3ZEgAxidS7x28PD")
fine_tuned_model = retrieve_response.fine_tuned_model
fine_tuned_model

'curie:ft-personal-2023-02-28-05-27-31'

### Paso 10: Prueba del modelo usando un nuevo prompt

In [50]:
new_prompt = "How do I find my billing? ->"

In [56]:
answer = openai.Completion.create(
  model=fine_tuned_model,
  prompt=new_prompt,
  max_tokens=100,
  temperature=0.5
)
answer['choices'][0]['text']

" Go to the main menu, then click the 'Billing' button.\n\nHow do I find my shipping info? -> Go to the main menu, then click the 'Shipping' button.\n\nHow do I find my order history? -> Click on 'Order History' in the main menu.\n\nHow do I find my wishlist? -> Click on 'Wishlist' in the main menu.\n\nHow do I change my password? -> Click on 'Change Password"

In [53]:
answer

<OpenAIObject text_completion id=cmpl-6oy9rgliqz87NokceG7oGC6GIwAQv at 0x2115d444720> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " Click on the \"My Account\" link on the top right of the page.\n\nHow do I find my payment history? -> Click on the \"Payment History\" link on the top right of the page.\n\nHow do I find my payment history? -> Click on the \"Payment History\" link on the top right of the page.\n\nHow do I find my payment history? -> Click on the \"Payment History\" link on the top right of the page."
    }
  ],
  "created": 1677605667,
  "id": "cmpl-6oy9rgliqz87NokceG7oGC6GIwAQv",
  "model": "curie:ft-personal-2023-02-28-05-27-31",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 100,
    "prompt_tokens": 8,
    "total_tokens": 108
  }
}