# Prompting a Large Language Model


This is an example notebook for running an open source model from Hugging Face.

In [3]:
%pip install -r requirements.txt

Collecting bs4 (from -r requirements.txt (line 2))
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting pypdf (from -r requirements.txt (line 4))
  Downloading pypdf-5.1.0-py3-none-any.whl.metadata (7.2 kB)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Downloading pypdf-5.1.0-py3-none-any.whl (297 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.0/298.0 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf, bs4
Successfully installed bs4-0.0.2 pypdf-5.1.0


In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

We use the [Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) in this notebook due to its small size.

In [5]:
torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct",
    device_map="cuda", # change to cpu is running locally
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.45k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.98k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

An example few-shot prompt:

In [9]:
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

messages2 = [
    {"role": "system", "content": "You are an expert in Data Scince"},
    {"role": "user", "content": "What are the most important skills in role of Data scientist?"},
    {"role": "Data science expert", "content": "There are many skills needed for succesfull data scientist namaly: Statistics , Math and Computer science skills lay fundementals for all data scientists."},
    {"role": "user", "content": "What programmin languages and libaries are the most important?"},
]

In [10]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

In [11]:
output = pipe(messages2, **generation_args)
print(output[0]['generated_text'])

 As a data scientist, you will need a combination of technical skills, domain knowledge, and soft skills to excel in your role. Here are some of the most important skills for a data scientist:

1. Programming languages: Proficiency in programming languages is crucial for data scientists. The most important ones include:

   a. Python: Python is widely used for data analysis, machine learning, and visualization due to its simplicity and extensive libraries.
   
   b. R: R is a language specifically designed for statistical analysis and data visualization. It has numerous packages for data manipulation, statistical modeling, and graphics.
   
   c. SQL: SQL (Structured Query Language) is essential for querying and managing data in relational databases.

2. Libraries and tools: Familiarity with the following libraries and tools is essential for data scientists:

   a. NumPy and Pandas: These libraries are used for data manipulation, analysis, and cleaning in Python.
   
   b. Matplotlib a