# Fine-tuning

## Loading the Data
In this step, we will load the data from the CSV file into a dictionary object. The csv.DictReader function reads the CSV file row by row and converts each row into a dictionary object.

To load the data, we will create a function load_data that takes the CSV file name as an input and returns a list of dictionaries containing the data.

In [None]:
!pip install pandas

In [None]:
filename="job_skills.csv"

In [None]:
import csv

def load_data(filename):
    with open(filename, 'r') as file:
        reader = csv.DictReader(file)
        data = [row for row in reader]
    return data


In [None]:
import pandas as pd
df = pd.read_csv('job_skills.csv')
df.head(50)

## Preparing the Data for OpenAI Finetuning
In this step, we will prepare the data for OpenAI finetuning. We will use the data loaded from the CSV file and create a new JSONL file that will be used for finetuning.

In [None]:
import json

# Open the input CSV file
with open('job_skills.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    # Open the output JSONL file
    with open('output.jsonl', 'w') as jsonl_file:
        for row in csv_reader:
            # Extract the relevant fields from the CSV row
            location = row['Location']
            responsibilities = row['Responsibilities']
            minimum_qualifications = row['Minimum Qualifications']
            title = row['Title']
            category = row['Category']

            # Construct the JSONL object
            jsonl_obj = {
                'prompt': f'Location: {location}\nResponsibilities: {responsibilities}\n Qualifications: {minimum_qualifications}',
                'completion': f'{title}\n{category}'
            }

            # Write the JSONL object to the output file
            jsonl_file.write(json.dumps(jsonl_obj) + '\n')


In [None]:
with open('output.jsonl', 'r') as f:
    for i in range(50):
        line = f.readline()
        print(line)

## Preparing the Data Using the OpenAI Tools Package
In this step, we will use the OpenAI tools package to prepare the data for finetuning. The prepare_data function in the tools.fine_tunes module can be used for this purpose.

The prepare_data function takes the following arguments:

file: The name of the input file in JSONL format.
-f: The name of the output file in the GPT-3 training format.
To prepare the data using the OpenAI tools package, we will execute the following code in the notebook:

In [None]:
!pip install openai

In [None]:
!yes | openai tools fine_tunes.prepare_data -f output.jsonl

In [None]:
with open('output_prepared.jsonl', 'r') as f:
    for i in range(50):
        line = f.readline()
        print(line)