# Fine-tuning

## Loading the Data
In this step, we will load the data from the CSV file into a dictionary object. The csv.DictReader function reads the CSV file row by row and converts each row into a dictionary object.

To load the data, we will create a function load_data that takes the CSV file name as an input and returns a list of dictionaries containing the data.

In [1]:
!pip install pandas



In [2]:
filename="job_skills.csv"

In [3]:
import csv

def load_data(filename):
    with open(filename, 'r') as file:
        reader = csv.DictReader(file)
        data = [row for row in reader]
    return data


In [4]:
import pandas as pd
df = pd.read_csv('job_skills.csv')
df.head(50)

Unnamed: 0,Company,Title,Category,Location,Responsibilities,Minimum Qualifications,Preferred Qualifications
0,Google,Google Cloud Program Manager,Program Management,Singapore,"Shape, shepherd, ship, and show technical prog...",BA/BS degree or equivalent practical experienc...,Experience in the business technology market a...
1,Google,"Supplier Development Engineer (SDE), Cable/Con...",Manufacturing & Supply Chain,"Shanghai, China",Drive cross-functional activities in the suppl...,BS degree in an Engineering discipline or equi...,"BSEE, BSME or BSIE degree.\nExperience of usin..."
2,Google,"Data Analyst, Product and Tools Operations, Go...",Technical Solutions,"New York, NY, United States",Collect and analyze data to draw insight and i...,"Bachelor’s degree in Business, Economics, Stat...",Experience partnering or consulting cross-func...
3,Google,"Developer Advocate, Partner Engineering",Developer Relations,"Mountain View, CA, United States","Work one-on-one with the top Android, iOS, and...",BA/BS degree in Computer Science or equivalent...,"Experience as a software developer, architect,..."
4,Google,"Program Manager, Audio Visual (AV) Deployments",Program Management,"Sunnyvale, CA, United States",Plan requirements with internal customers.\nPr...,BA/BS degree or equivalent practical experienc...,CTS Certification.\nExperience in the construc...
5,Google,"Associate Account Strategist (Czech/Slovak), G...",Technical Solutions,"Dublin, Ireland",Communicate with customers via phone and email...,Bachelor's degree or equivalent practical expe...,"Experience in sales, customer service, account..."
6,Google,"Supplier Development Engineer, Camera, Consume...",Hardware Engineering,"Mountain View, CA, United States",Manage cross-functional activities in the supp...,BS degree in Engineering or equivalent practic...,Master's degree.\nExperience in the developmen...
7,Google,"Strategic Technology Partner Manager, Healthca...",Partnerships,"Sunnyvale, CA, United States",Lead the development and strategy with partner...,BA/BS degree or equivalent practical experienc...,"BA/BS degree in a technical, life sciences or ..."
8,Google,"Manufacturing Business Manager, Google Hardware",Manufacturing & Supply Chain,"Xinyi District, Taiwan",Develop CM/ODM strategy and implement supplier...,"BA/BS degree in Engineering, Supply Chain or e...",MBA degree.\nExperience in procurement and sup...
9,Google,"Solutions Architect, Healthcare and Life Scien...",Technical Solutions,"New York, NY, United States",Help compile customer requirements as well as ...,"BA/BS degree in Computer Science, related Soft...","Master's degree in Computer Science, related E..."


## Preparing the Data for OpenAI Finetuning
In this step, we will prepare the data for OpenAI finetuning. We will use the data loaded from the CSV file and create a new JSONL file that will be used for finetuning.

In [14]:
import json

# Open the input CSV file
with open('job_skills.csv', 'r', encoding='UTF-8') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    # Open the output JSONL file
    with open('output.jsonl', 'w', encoding='UTF-8') as jsonl_file:
        for row in csv_reader:
            # Extract the relevant fields from the CSV row
            location = row['Location']
            responsibilities = row['Responsibilities']
            minimum_qualifications = row['Minimum Qualifications']
            title = row['Title']
            category = row['Category']

            # Construct the JSONL object
            jsonl_obj = {
                'prompt': f'Location: {location}\nResponsibilities: {responsibilities}\n Qualifications: {minimum_qualifications}',
                'completion': f'{title}\n{category}'
            }

            # Write the JSONL object to the output file
            jsonl_file.write(json.dumps(jsonl_obj) + '\n')


In [15]:
with open('output.jsonl', 'r') as f:
    for i in range(50):
        line = f.readline()
        print(line)

{"prompt": "Location: Singapore\nResponsibilities: Shape, shepherd, ship, and show technical programs designed to support the work of Cloud Customer Engineers and Solutions Architects.\nMeasure and report on key metrics tied to those programs to identify any need to change course, cancel, or scale the programs from a regional to global platform.\nCommunicate status and identify any obstacles and paths for resolution to stakeholders, including those in senior roles, in a transparent, regular, professional and timely manner.\nEstablish expectations and rationale on deliverables for stakeholders and program contributors.\nProvide program performance feedback to teams in Product, Engineering, Sales, and Marketing (among others) to enable efficient cross-team operations.\n Qualifications: BA/BS degree or equivalent practical experience.\n3 years of experience in program and/or project management in cloud computing, enterprise software and/or marketing technologies.", "completion": "Google C

## Preparing the Data Using the OpenAI Tools Package
In this step, we will use the OpenAI tools package to prepare the data for finetuning. The prepare_data function in the tools.fine_tunes module can be used for this purpose.

The prepare_data function takes the following arguments:

file: The name of the input file in JSONL format.
-f: The name of the output file in the GPT-3 training format.
To prepare the data using the OpenAI tools package, we will execute the following code in the notebook:

In [10]:
!pip install openai

Collecting openai
  Downloading openai-1.9.0-py3-none-any.whl (223 kB)
     ------------------------------------- 223.4/223.4 kB 13.3 MB/s eta 0:00:00
Collecting typing-extensions<5,>=4.7
  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Collecting pydantic<3,>=1.9.0
  Downloading pydantic-2.5.3-py3-none-any.whl (381 kB)
     ---------------------------------------- 381.9/381.9 kB ? eta 0:00:00
Collecting distro<2,>=1.7.0
  Downloading distro-1.9.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
     ---------------------------------------- 75.9/75.9 kB 4.1 MB/s eta 0:00:00
Collecting httpcore==1.*
  Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)
     ---------------------------------------- 76.9/76.9 kB ? eta 0:00:00
Collecting h11<0.15,>=0.13
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
     ---------------------------------------- 58.3/58.3 kB 3.2 MB/s eta 0:00:00
Collecting annotated-types>=0.4.0
  Down

In [16]:
!yes | openai tools fine_tunes.prepare_data -f output.jsonl

'yes' is not recognized as an internal or external command,
operable program or batch file.


In [12]:
with open('output_prepared.jsonl', 'r') as f:
    for i in range(50):
        line = f.readline()
        print(line)

UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 2066: illegal multibyte sequence

In [None]:
test

In [None]:
adf