# Tokens
Script uses the tiktoken library to encode a question about San Jose State University (SJSU) for processing by OpenAI's GPT-3 model "text-davinci-003".

In [1]:
import tokenize, ast
from io import BytesIO

In [None]:
!pip install tiktoken



In [None]:
from tiktoken import encoding_for_model
enc = encoding_for_model("text-davinci-003")
toks = enc.encode("What is SJSU good for?")
toks

[2061, 318, 31766, 12564, 922, 329, 30]

In [None]:
[enc.decode_single_token_bytes(o).decode('utf-8') for o in toks]

['What', ' is', ' SJ', 'SU', ' good', ' for', '?']

# OPENAI API
Code is designed to facilitate an interactive, AI-driven conversation that can provide insights or information about SJSU from the perspective of an informed student. It showcases how OpenAI's chat models can be leveraged for educational or informational applications, particularly for users seeking advice or details about specific institutions like universities.

In [None]:
!pip install openai



In [None]:
from openai import OpenAI

In [None]:
from google.colab import userdata

In [None]:
api_key = userdata.get('OPENAI_API_KEY')

In [None]:
Student_sys = "You are a student studing in SJSU having knowledge of university nearby and University curriculum."
client = OpenAI(api_key = api_key)

response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": Student_sys},
    {"role": "user", "content": "What SJSU is good for?"},
  ]
)

In [None]:
print(f"RESPONSE:\n{response.choices[0].message.content}")

RESPONSE:
San Jose State University (SJSU) is well-known for its strong programs in engineering, business, computer science, and the arts. Here are some key strengths of SJSU:

1. Engineering: SJSU's Charles W. Davidson College of Engineering offers a variety of programs in areas such as mechanical engineering, electrical engineering, computer engineering, and aerospace engineering. The college has strong industry connections and provides students with hands-on learning experiences.

2. Business: SJSU's Lucas College and Graduate School of Business is AACSB-accredited and offers undergraduate and graduate programs in areas such as business administration, accounting, finance, and management information systems. The college has a strong focus on entrepreneurship and innovation.

3. Computer Science: SJSU's computer science program is highly regarded for its curriculum that covers a wide range of topics including artificial intelligence, data science, cybersecurity, and software engineer

In [None]:
print(response.usage)

CompletionUsage(completion_tokens=354, prompt_tokens=38, total_tokens=392)


In [None]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": Student_sys},
    {"role": "user", "content": "What SJSU is good for?"},
    {"role": "assistant", "content": "It really good for Computer Science"},
    {"role": "user", "content": "Really? In what ways?"}
  ])



In [None]:
print(f"RESPONSE:\n{response.choices[0].message.content}")

RESPONSE:
Yes, San Jose State University, especially its Computer Science department, is well-regarded for several reasons:

1. Location: SJSU is located in the heart of Silicon Valley, providing students with ample internship, job, and networking opportunities with tech giants and startups in the industry.

2. Strong Industry Connections: The university has strong ties with leading tech companies in the area, allowing students access to guest lectures, industry projects, and potential job placements.

3. Diverse Curriculum: The Computer Science program at SJSU offers a diverse range of courses, allowing students to specialize in various areas of interest like Artificial Intelligence, Cybersecurity, Data Science, and more.

4. Experienced Faculty: The faculty members at SJSU are often industry professionals themselves, bringing real-world experience and knowledge to the classroom.

5. Research Opportunities: SJSU provides students with research opportunities in cutting-edge technologie

In [None]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return client.chat.completions.create(model=model, messages=msgs, **kwargs)

In [None]:
def response(compl):
  return compl.choices[0].message.content

In [None]:
response(askgpt('What is the SRAC?', system=Student_sys))

'The SRAC stands for Student Recreation and Aquatic Center at San Jose State University. It is a facility on campus that offers fitness equipment, group exercise classes, swimming pools, basketball courts, indoor track, and more for students to stay active and healthy. The SRAC is a popular resource for students looking to maintain their physical well-being while studying at SJSU.'

# Creating Own Local Interpreter
Simple local interpreter in Python that allows for the execution of custom functions involves defining a basic command-line interface (CLI) where users can input commands, and the interpreter processes these commands according to the defined functions. We'll implement a simple interpreter that supports basic arithmetic operations and a custom function to explore its usage.

In [None]:
from pydantic import create_model
import inspect, json
from inspect import Parameter

In [None]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

In [None]:
def SJSUID(a:int, b:str='SJSU'):
    "Create SJSU ID by adding Number with SJSU name"
    return str(a) + b

In [None]:
print(SJSUID(10))

10SJSU


In [None]:
schema(SJSUID)

{'name': 'SJSUID',
 'description': 'Create SJSU ID by adding Number with SJSU name',
 'parameters': {'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'default': 'SJSU', 'title': 'B', 'type': 'string'}},
  'required': ['a'],
  'title': 'Input for `SJSUID`',
  'type': 'object'}}

In [None]:
c = askgpt("Use the `SJSUID` function to solve this: What is ID for Number 6?",
           system = "You must use the `SJSUID` function instead of adding yourself.",
           functions=[schema(SJSUID)])

In [None]:
m = c.choices[0].message
m

ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{"a":6,"b":"SJSU"}', name='SJSUID'), tool_calls=None)

In [None]:
funcs_ok= {'SJSUID','python'}

In [None]:
def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [None]:
call_func(c)

'6SJSU'

In [None]:
def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None

    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [None]:
def python(code:str):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    go = input(f'Proceed with execution?\n```\n{code}\n```\n')
    if go.lower()!='y': return '#FAIL#'
    return run(code)

In [None]:
c = askgpt("What is 12 power 5?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [None]:
call_func(c)

Proceed with execution?
```
12 ** 5
```
y


248832

In [None]:
c = client.chat.completions.create(
    model="gpt-3.5-turbo",
    functions=[schema(python)],
    messages=[{"role": "user", "content": "What is 12 power 5?"},
              {"role": "function", "name": "python", "content": "248832"}])

In [None]:
response(c)

'The value of \\(12^5\\) is 248,832.'

# Exploring Opensource GPT Model

## Exploring Llama2 Model
Loading "meta-llama/Llama-2-7b-hf" model in 8bit format

In [None]:
!pip install -q -U transformers peft accelerate optimum

In [1]:
!pip install peft



In [2]:
!pip install bitsandbytes



In [None]:
#!pip install -i https://test.pypi.org/simple/ bitsandbytes

In [3]:
!pip install accelerate



In [None]:
!pip install auto-gptq

Collecting auto-gptq
  Downloading auto_gptq-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.5/23.5 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
Collecting datasets (from auto-gptq)
  Downloading datasets-2.17.1-py3-none-any.whl (536 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m45.0 MB/s[0m eta [36m0:00:00[0m
Collecting rouge (from auto-gptq)
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Collecting gekko (from auto-gptq)
  Downloading gekko-1.0.6-py3-none-any.whl (12.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.2/12.2 MB[0m [31m53.4 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets->auto-gptq)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
Collecting multipro

In [4]:
!pip install optimum



In [5]:
!pip install transformers==4.37.2

Collecting transformers==4.37.2
  Downloading transformers-4.37.2-py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.39.0.dev0
    Uninstalling transformers-4.39.0.dev0:
      Successfully uninstalled transformers-4.39.0.dev0
Successfully installed transformers-4.37.2


In [8]:
from transformers import AutoModelForCausalLM,AutoTokenizer
import torch

In [None]:
mn = "meta-llama/Llama-2-7b-hf"

In [None]:
hf_token = userdata.get('HF_TOKEN')

In [None]:
try:
    del model
    del res
except:
  print("Error")
import gc
gc.collect()
torch.cuda.empty_cache()

Error


In [None]:
from numba import cuda
device = cuda.get_current_device()
device.reset()

In [None]:
tokr = AutoTokenizer.from_pretrained(mn)
prompt = "SJSU is famous for "
toks = tokr(prompt, return_tensors="pt")

In [None]:
model = AutoModelForCausalLM.from_pretrained(mn, token = hf_token, device_map=0, load_in_8bit=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
toks

{'input_ids': tensor([[    1,   317,  8700, 29965,   338, 13834,   363, 29871]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [None]:
tokr.batch_decode(toks['input_ids'])

['<s> SJSU is famous for ']

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

CPU times: user 6.13 s, sys: 568 ms, total: 6.7 s
Wall time: 10.4 s


tensor([[    1,   317,  8700, 29965,   338, 13834,   363, 29871, 29896, 29900,
         29900, 29995,  7395, 11104,   393,   526, 21750,   519,   322, 15579,
         29889,    13, 29903]])

In [None]:
tokr.batch_decode(res)

['<s> SJSU is famous for 100% online programs that are affordable and accessible.\nS']

## Exploring Llama2 Model
Loading "meta-llama/Llama-2-7b-hf" model in 16bit floating point format.

In [None]:
model = AutoModelForCausalLM.from_pretrained(mn,device_map=0,torch_dtype=torch.float16)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

CPU times: user 2 s, sys: 306 ms, total: 2.31 s
Wall time: 4.65 s


tensor([[    1,   317,  8700, 29965,   338, 13834,   363, 29871, 29906,  2712,
         29901,   967,  1880,  5768,   449,  6554,   322,   967,  1880,  5733,
          3815, 29889,    13]])

In [None]:
tokr.batch_decode(res)

['<s> SJSU is famous for 2 things: its high dropout rate and its high football team.\n']

## Exploring Llama2 Quantized 7B Model
Loading "TheBloke/Llama-2-7b-Chat-GPTQ" model in 16 bit floating point format.

In [None]:
model = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7b-Chat-GPTQ',device_map=0,torch_dtype=torch.float16)

config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res



CPU times: user 7.72 s, sys: 3.35 s, total: 11.1 s
Wall time: 11.5 s


tensor([[    1,   317,  8700, 29965,   338, 13834,   363, 29871, 29941,  2712,
         29901,    13,    13, 29896, 29889,  8011,  9560, 24165, 29901,   317,
          8700, 29965,   756]])

In [None]:
tokr.batch_decode(res)

['<s> SJSU is famous for 3 things:\n\n1. Its beautiful campus: SJSU has']

## Exploring Llama2 Quantized 13B Model
Loading "TheBloke/Llama-2-13B-GPTQ" model in 16 bit floating point format.

In [None]:
model = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-13B-GPTQ',device_map=0,torch_dtype=torch.float16)

config.json:   0%|          | 0.00/913 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res



CPU times: user 13.7 s, sys: 7.71 s, total: 21.4 s
Wall time: 22.7 s


tensor([[    1,   317,  8700, 29965,   338, 13834,   363, 29871, 29941,  2712,
         29901,    13, 29896, 29889,   450, 25673,   273,  5733,  3815,    13,
         29906, 29889,   450]])

In [None]:
def gen(p, maxlen=15, sample=True):
    toks = tokr(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample).to('cpu')
    return tokr.batch_decode(res)

In [None]:
gen(prompt,50)

['<s> SJSU is famous for 20 Nobel laureates such as Dr. Arthur Schawlow and Dr. Leon Lederman. San Jose is called Silicon Valley due to the abundant companies related to semiconductor industry. SJSU is proud to be the']

## Exploring StableBeluga-7B Model
Loading Hugging Face "stabilityai/StableBeluga-7B" model in 16 bit floating point format.

In [None]:
mn = "stabilityai/StableBeluga-7B"
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0,torch_dtype=torch.bfloat16)

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [None]:
sb_sys = "### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n"

In [None]:
def mk_prompt(user, syst=sb_sys): return f"{syst}### User: {user}\n\n### Assistant:\n"

In [None]:
ques = "Why SJSU is famous in Silicon valley?"

In [None]:
gen(mk_prompt(ques), 150)

['<s> ### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n### User: Why SJSU is famous in Silicon valley?\n\n### Assistant:\n San José State University (SJSU) is famously known and respected in Silicon Valley due to its strong academic programs, dedicated research initiatives, and partnerships with the most innovative technology companies in the area.\n\nSJSU has a rich history of producing accomplished graduates in various fields, including computer science, engineering, and business. These alumni have gone on to become industry leaders, disrupting the status quo and advancing technological innovation in Silicon Valley. SJSU alumni include former Apple CEO Steve Jobs, Google co-founder Sergey Brin, and PayPal co-founder Max Levchin, among many others.\n\nThe university is also']

## Exploring OpenOrca-Platypus2-13B Quantized Model
Loading Hugging Face "TheBloke/OpenOrca-Platypus2-13B-GPTQ" model in 16 bit floating point format.

In [None]:
mn = 'TheBloke/OpenOrca-Platypus2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)



generation_config.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

In [None]:
def mk_oo_prompt(user): return f"### Instruction: {user}\n\n### Response:\n"

In [None]:
gen(mk_oo_prompt(ques), 150)

["<s> ### Instruction: List of 3 University is at centre of Silicon valley?\n\n### Response:\n1. Stanford University\n2. University of California, Berkeley\n3. University of California, Santa Cruz\n\nNote: These are three educational institutions in the vicinity of Silicon Valley. While Stanford University and the University of California, Berkeley have significant connections with the tech industry, the third one (University of California, Santa Cruz) isn't considered to be at the center of Silicon Valley. However, listing them in no alter order since all are universities associated with Silicon Valley.\n\nPlease let me know if you would like information about any specific university or its relation to Silicon Valley.### Instruction: Please provide information on Stanford University, especially its role in the Silicon"]

# Retrieval-Augmented Generation
A Retrieval-Augmented Generation (RAG) system combines real-time data retrieval with language model generation to provide updated and accurate information. By integrating queries about a university, RAG searches through databases and documents for the latest info, then generates coherent responses. This approach ensures responses are both current and contextually relevant, making it ideal for obtaining the most recent details about university programs, events, or achievements.

In [13]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.1.8-py3-none-any.whl (816 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/816.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.6/816.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━[0m [32m614.4/816.1 kB[0m [31m8.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m816.1/816.1 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.21 (from langchain)
  Downloading langchain_community-0.0.21-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from langchain.retrievers import WikipediaRetriever

In [None]:
retriever = WikipediaRetriever()

In [None]:
docs = retriever.get_relevant_documents(query="San Jose State University")

In [None]:
docs[0].metadata

{'title': 'San Jose State University',
 'summary': "San José State University (San Jose State or SJSU) is a public university in San Jose, California. Established in 1857, SJSU is the oldest public university on the West Coast and the founding campus of the California State University (CSU) system. The university, along side the University of California, Los Angeles has academic origins in the historic normal school known as the California State Normal School.\nLocated in downtown San Jose, the SJSU main campus is situated on 154 acres (62 ha), or roughly 19 square blocks. As of spring 2023, SJSU offers 150 bachelor's degree programs, 95 master's degrees, five doctoral degrees, 11 different credential programs and 42 certificates. SJSU is accredited by the WASC Senior College and University Commission.SJSU's total enrollment was 35,751 in fall 2022, including nearly 8,900 graduate and credential students. SJSU's student population is one of the most ethnically diverse in the nation. As

In [None]:
docs[0].page_content[:3000]

'San José State University (San Jose State or SJSU) is a public university in San Jose, California. Established in 1857, SJSU is the oldest public university on the West Coast and the founding campus of the California State University (CSU) system. The university, along side the University of California, Los Angeles has academic origins in the historic normal school known as the California State Normal School.\nLocated in downtown San Jose, the SJSU main campus is situated on 154 acres (62 ha), or roughly 19 square blocks. As of spring 2023, SJSU offers 150 bachelor\'s degree programs, 95 master\'s degrees, five doctoral degrees, 11 different credential programs and 42 certificates. SJSU is accredited by the WASC Senior College and University Commission.SJSU\'s total enrollment was 35,751 in fall 2022, including nearly 8,900 graduate and credential students. SJSU\'s student population is one of the most ethnically diverse in the nation. As of fall 2022, graduate student enrollment, Asi

In [None]:
len(docs[0].page_content.split())

636

In [None]:
page_txt=docs[0].page_content

In [None]:
ques = "What is unique about SJSU?"

In [None]:
ques_ctx = f"""Answer the question with the help of the provided context.

## Context

{page_txt}

## Question

{ques}"""

In [None]:
res = gen(mk_prompt(ques_ctx), 300)
res

['<s> ### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n### User: Answer the question with the help of the provided context.\n\n## Context\n\nSan José State University (San Jose State or SJSU) is a public university in San Jose, California. Established in 1857, SJSU is the oldest public university on the West Coast and the founding campus of the California State University (CSU) system. The university, along side the University of California, Los Angeles has academic origins in the historic normal school known as the California State Normal School.\nLocated in downtown San Jose, the SJSU main campus is situated on 154 acres (62 ha), or roughly 19 square blocks. As of spring 2023, SJSU offers 150 bachelor\'s degree programs, 95 master\'s degrees, five doctoral degrees, 11 different credential programs and 42 certificates. SJSU is accredited by the WASC Senior College and University Commission.SJSU\'s total enrollment was 35,7

In [None]:
print(res[0].split('### Assistant:\n')[1])

 The unique aspect about SJSU is its historical significance and rich heritage as the oldest public university on the West Coast and the founding campus of the California State University (CSU) system. Additionally, San José State University offers 150 bachelor's degree programs, 95 master's degrees, five doctoral degrees, 11 different credential programs and 42 certificates, making it a comprehensive higher education institution.</s>


# Fine-Tuning
Fine-tuning the "EleutherAI/gpt-neox-20b" model on the "Abirate/english_quotes" dataset with Hugging Face involves a detailed process.
Here's a step-by-step breakdown:

1. **Configuration for Efficient Loading and Quantization**:
   - Initialize `BitsAndBytesConfig` to set up model quantization for efficient memory usage, with specific options for 4-bit loading and computation settings.

2. **Model and Tokenizer Initialization**:
   - Load the tokenizer using `AutoTokenizer.from_pretrained(model_id)`.
   - Load the model with quantization settings through `AutoModelForCausalLM.from_pretrained()`, specifying the `model_id` and `bnb_config`.

3. **Enable Gradient Checkpointing**:
   - Call `model.gradient_checkpointing_enable()` to optimize memory usage during training.

4. **Optimize Model with PEFT (Parameter Efficient Fine-Tuning)**:
   - Configure `LoraConfig` for targeted adjustments in the model, focusing on specific modules like "query_key_value".
   - Apply `get_peft_model()` to integrate the PEFT configuration, enhancing model efficiency without compromising performance.

5. **Prepare the Dataset**:
   - Load the "Abirate/english_quotes" dataset using `load_dataset()`.
   - Preprocess the data with the tokenizer to convert quotes into a model-compatible format, setting `tokenizer.pad_token` if necessary.

6. **Training Setup**:
   - Initialize the `Trainer` with the model, processed dataset, and training arguments tailored for efficient fine-tuning, including batch size, learning rate, and optimizer settings.
   - Use `DataCollatorForLanguageModeling` for appropriate data batching and tokenization handling.

7. **Training Execution**:
   - Disable cache usage in model configuration to avoid warnings.
   - Start training with `trainer.train()`, fine-tuning the model on the quotes dataset.




In [35]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for accelerate (pyproject.toml) ... [?25l[?25hdone


In [10]:
import datasets

In [30]:
data = datasets.load_dataset('Abirate/english_quotes')

In [31]:
data

DatasetDict({
    train: Dataset({
        features: ['quote', 'author', 'tags'],
        num_rows: 2508
    })
})

In [32]:
trn = data['train']
trn[5]

{'quote': "“Be who you are and say what you feel, because those who mind don't matter, and those who matter don't mind.”",
 'author': 'Bernard M. Baruch',
 'tags': ['ataraxy',
  'be-yourself',
  'confidence',
  'fitting-in',
  'individuality',
  'misattributed-dr-seuss',
  'those-who-matter']}

In [6]:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "EleutherAI/gpt-neox-20b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)




Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [7]:
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

model.safetensors.index.json:   0%|          | 0.00/60.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/46 [00:00<?, ?it/s]

model-00001-of-00046.safetensors:   0%|          | 0.00/926M [00:00<?, ?B/s]

model-00002-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00003-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00004-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00005-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00006-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00007-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00008-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00009-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00010-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00011-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00012-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00013-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00014-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00015-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00016-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00017-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00018-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00019-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00020-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00021-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00022-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00023-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00024-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00025-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00026-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00027-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00028-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00029-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00030-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00031-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00032-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00033-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00034-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00035-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00036-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00037-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00038-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00039-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00040-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00041-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00042-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00043-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00044-of-00046.safetensors:   0%|          | 0.00/910M [00:00<?, ?B/s]

model-00045-of-00046.safetensors:   0%|          | 0.00/604M [00:00<?, ?B/s]

model-00046-of-00046.safetensors:   0%|          | 0.00/620M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/46 [00:00<?, ?it/s]

In [8]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)


def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )


from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)


trainable params: 8650752 || all params: 10597552128 || trainable%: 0.08162971878329976


In [17]:
from datasets import load_dataset
data=load_dataset("Abirate/english_quotes")
data=data.map(lambda samples:tokenizer(samples["quote"]),batched=True)

Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [18]:
import transformers

# needed for gpt-neo-x tokenizer
tokenizer.pad_token = tokenizer.eos_token

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)


Step,Training Loss
1,2.1298
2,2.4401
3,2.5244
4,3.2779
5,2.5149
6,1.5516
7,2.1703
8,2.5746
9,1.5949
10,1.8961


TrainOutput(global_step=10, training_loss=2.2674554228782653, metrics={'train_runtime': 176.7286, 'train_samples_per_second': 0.226, 'train_steps_per_second': 0.057, 'total_flos': 197834861740032.0, 'train_loss': 2.2674554228782653, 'epoch': 0.02})

In [19]:
trainer.save_model("./Quote Model")

In [20]:
model.save_pretrained("Quote Model")

In [23]:
!mv '/content/Quotes Model' '/content/drive/MyDrive/CMPE-258'

In [26]:
text="Family is"
device="cuda:0"
inputs=tokenizer(text,return_tensors="pt").to(device)
outputs=model.generate(**inputs,max_new_tokens=20)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [27]:
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Family is everything. You just can't go through life alone, that's why people have families, so you


In [28]:
text="Education is important"
device="cuda:0"
inputs=tokenizer(text,return_tensors="pt").to(device)
outputs=model.generate(**inputs,max_new_tokens=20)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [29]:
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Education is important in every way, but most especially in the way it makes one feel about oneself. One can never


# Llama-cpp
`llama_cpp` facilitates the integration of Llama language models into C++ applications, enabling developers to leverage advanced NLP capabilities for natural language queries and text generation. It serves as a bridge for incorporating AI-driven text processing into software projects, enhancing functionality with minimal effort. This interface provides a seamless way to extend applications with state-of-the-art language understanding and generation features.

In [2]:
!pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.46.tar.gz (36.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.7/36.7 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.46-cp310-cp310-manylinux_2_35_x86_64.whl size=2615898 sha256=0973798844b6cd25495d59c69381d5d2778d2ca3e902e6ab2a3073e3ff6dba9c
  Stored i

In [3]:
from llama_cpp import Llama

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [26]:
llm = Llama(model_path="/content/drive/MyDrive/CMPE-258/llama-2-7b-chat.Q4_K_M.gguf")

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /content/drive/MyDrive/CMPE-258/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attentio

In [31]:
output = llm("Q: Just Name the all mission to explore Moon south pole from different country?  A: ", max_tokens=64, stop=["Q:", "\n"], echo=True)

Llama.generate: prefix-match hit

llama_print_timings:        load time =    9963.68 ms
llama_print_timings:      sample time =      11.68 ms /    20 runs   (    0.58 ms per token,  1712.62 tokens per second)
llama_print_timings: prompt eval time =   10728.04 ms /    18 tokens (  596.00 ms per token,     1.68 tokens per second)
llama_print_timings:        eval time =   11106.36 ms /    19 runs   (  584.55 ms per token,     1.71 tokens per second)
llama_print_timings:       total time =   21910.94 ms /    37 tokens


In [32]:
print(output['choices'])

[{'text': 'Q: Just Name the all mission to explore Moon south pole from different country?  A: 1. SpaceIL - Israel: SpaceIL is an Israeli non-profit organization...', 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}]
