
<center> <h1> Prompt Engineering with Local Open LLMs</h1> </center>

<p style="margin-bottom:1cm;"></p>

_____

___[Created By: Dipanjan (DJ)](https://www.linkedin.com/in/dipanjans/)___

In this notebook we will learn how to download and run LLMs locally using this colab notebook. You can run this notebook in your local server also as long as you have a valid GPU with enough Memory to run these models!

The model we will be trying here is the:

__[Microsoft Phi-3-Mini-4K-Instruct SLM](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)__ is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. 

__You will need at least 14GB of GPU memory to swiftly run inference with Microsoft Phi3 Mini.__


## Login to Huggingface using your Token

Get your token [here](https://huggingface.co/settings/tokens) and login using the following code

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Load the LLM locally using Huggingface

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "microsoft/Phi-3-mini-4k-instruct"
dtype = torch.bfloat16
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
    cache_dir='./phi3',
    attn_implementation="flash_attention_2" # check out https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2
)

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

### Try out a basic prompt

In [4]:
chat = [
    { "role": "user", "content": "Explain what is AI in 3 bullet points" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

In [5]:
print(prompt)

<|user|>
Explain what is AI in 3 bullet points<|end|>
<|assistant|>



In [6]:
model.device

device(type='cuda', index=0)

In [7]:
tokenizer(prompt, add_special_tokens=False, return_tensors="pt")

{'input_ids': tensor([[32010, 12027,  7420,   825,   338,   319, 29902,   297, 29871, 29941,
         24334,  3291, 32007, 32001]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [8]:
inputs = tokenizer(prompt, add_special_tokens=False, return_tensors="pt").to('cuda')
outputs = model.generate(**inputs,
                         max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

<|user|> Explain what is AI in 3 bullet points<|end|><|assistant|> - AI, or Artificial Intelligence, refers to the development of computer systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making.
- AI systems are designed to analyze large amounts of data, recognize patterns, and make predictions or decisions based on that information.
- AI can be applied in various fields, including healthcare, finance, transportation, and entertainment, to improve efficiency, accuracy, and overall performance.<|end|>


Remember to always refer to the [__documentation__](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate) where all the arguments of the generation pipeline are mentioned in detail. Most notably:

- **max_length:** The maximum length of the sequence to be generated
- **max_new_tokens:** The maximum numbers of tokens to generate, ignore the current number of tokens. Use either max_new_tokens or max_length but not both, they serve the same purpose
- **do_sample:** Whether or not to use sampling. False means use greedy decoding i.e temperature=0
- **temperature:** Between 0 - 1, The value used to module the next token probabilities. Higher temperature means the results may vary and be more creative

In [9]:
outputs = model.generate(**inputs,
                         max_new_tokens=1000,
                         do_sample=True,
                         temperature=0.5
                         )
print(tokenizer.decode(outputs[0]))

<|user|> Explain what is AI in 3 bullet points<|end|><|assistant|> - Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. This includes learning, reasoning, and self-correction.

- AI can be categorized into two main types: narrow or weak AI, which is designed to perform a narrow task (such as facial recognition or voice assistants), and general or strong AI, which has the capability to perform any intellectual task that a human being can.

- AI has a wide range of applications, including in healthcare, finance, transportation, and entertainment, and is constantly evolving, with advancements like machine learning and deep learning playing a significant role.<|end|>


In [10]:
print(prompt)

<|user|>
Explain what is AI in 3 bullet points<|end|>
<|assistant|>



### Pipelines make it easier to send prompts

You don't need to encode and decode your inputs and outputs everytime

In [11]:
phi3_pipe = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="cuda",
)

Device set to use cuda


In [12]:
prompt

'<|user|>\nExplain what is AI in 3 bullet points<|end|>\n<|assistant|>\n'

In [13]:
response = phi3_pipe(prompt,
                      max_new_tokens=500,
                      do_sample=True,
                      temperature=0.5,
                      return_full_text=False) # dont return back the input prompt, only show the response

In [14]:
response

[{'generated_text': ' - AI, or Artificial Intelligence, refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions), and self-correction.\n\n- AI can be categorized into two main types: narrow or weak AI, which is designed to perform a narrow task (such as facial recognition or internet searches), and general or strong AI, which has the ability to perform any intellectual task that a human being can.\n\n- AI applications are widespread and can be found in various industries, including healthcare, finance, transportation, and entertainment. Some common examples of AI applications include voice assistants (like Siri or Alexa), recommendation systems (like those used by Amazon or Netflix), and autonomous vehicles (like self-driving cars).'}]

In [15]:
print(response[0]['generated_text'])

 - AI, or Artificial Intelligence, refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions), and self-correction.

- AI can be categorized into two main types: narrow or weak AI, which is designed to perform a narrow task (such as facial recognition or internet searches), and general or strong AI, which has the ability to perform any intellectual task that a human being can.

- AI applications are widespread and can be found in various industries, including healthcare, finance, transportation, and entertainment. Some common examples of AI applications include voice assistants (like Siri or Alexa), recommendation systems (like those used by Amazon or Netflix), and autonomous vehicles (like self-driving cars).


In [16]:
from IPython.display import display, Markdown

display(Markdown(response[0]['generated_text']))

 - AI, or Artificial Intelligence, refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions), and self-correction.

- AI can be categorized into two main types: narrow or weak AI, which is designed to perform a narrow task (such as facial recognition or internet searches), and general or strong AI, which has the ability to perform any intellectual task that a human being can.

- AI applications are widespread and can be found in various industries, including healthcare, finance, transportation, and entertainment. Some common examples of AI applications include voice assistants (like Siri or Alexa), recommendation systems (like those used by Amazon or Netflix), and autonomous vehicles (like self-driving cars).

## Check how much GPU Memory the LLM Uses

Remember that Microsoft Phi3 3.8B 4K SLM uses around 14GB memory 

In [17]:
!nvidia-smi

Sun Feb 16 10:19:23 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A40                     On  |   00000000:D2:00.0 Off |                    0 |
|  0%   34C    P0             75W /  300W |   13123MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [18]:
phi3_pipe

<transformers.pipelines.text_generation.TextGenerationPipeline at 0x7711e1dc3290>

## Prompting with Open-Source LLM

Now we will use our locally loaded LLM and try some tasks with prompting

### 1. Basic Q & A

In [19]:
def create_phi3_prompt(prompt_text):
  chat = [
    { "role": "user", "content": prompt_text },
  ]
  prompt = tokenizer.apply_chat_template(chat, tokenize=False,
                                         add_generation_prompt=True)
  return prompt

In [20]:
prompt_txt = "Can you explain what is mortgage?"
prompt = create_phi3_prompt(prompt_txt)
print(prompt)

<|user|>
Can you explain what is mortgage?<|end|>
<|assistant|>



In [21]:
response = phi3_pipe(prompt,
                      max_new_tokens=1000,
                      do_sample=True,
                      temperature=0.5,
                      return_full_text=False)
response

[{'generated_text': " A mortgage is a type of loan specifically used to purchase real estate. The property being purchased acts as collateral for the loan, which means that if the borrower fails to make the required payments, the lender has the right to take possession of the property through a legal process known as foreclosure.\n\nThe borrower typically makes monthly payments that include both principal (the amount borrowed) and interest (the cost of borrowing). The loan is usually repaid over a set period of time, typically 15 to 30 years. The interest rate can be fixed, meaning it stays the same throughout the loan term, or variable, meaning it can change over time based on market conditions.\n\nMortgages can be obtained from various financial institutions, including banks, credit unions, and mortgage brokers. The borrower's creditworthiness, income, and debt-to-income ratio are taken into consideration when determining eligibility for a mortgage. In some cases, the borrower may ne

In [22]:
display(Markdown(response[0]['generated_text']))

 A mortgage is a type of loan specifically used to purchase real estate. The property being purchased acts as collateral for the loan, which means that if the borrower fails to make the required payments, the lender has the right to take possession of the property through a legal process known as foreclosure.

The borrower typically makes monthly payments that include both principal (the amount borrowed) and interest (the cost of borrowing). The loan is usually repaid over a set period of time, typically 15 to 30 years. The interest rate can be fixed, meaning it stays the same throughout the loan term, or variable, meaning it can change over time based on market conditions.

Mortgages can be obtained from various financial institutions, including banks, credit unions, and mortgage brokers. The borrower's creditworthiness, income, and debt-to-income ratio are taken into consideration when determining eligibility for a mortgage. In some cases, the borrower may need to make a down payment, which is a percentage of the property's purchase price.

Mortgages are important financial tools for many people, allowing them to own a home and build equity over time. However, it's crucial to carefully consider the terms of the mortgage and ensure it is affordable before signing the contract.

### 2. Basic Sentiment Analysis

In [23]:
reviews = [
    """I recently worked with this real estate company to purchase my first home,
    and the experience was outstanding. The agent was knowledgeable, patient, and incredibly responsive.
    They guided me through every step of the process, making what could have been a stressful
    experience very smooth and enjoyable.
    """,
    """This company's attention to detail and professionalism is second to none.
    Our agent went above and beyond to ensure we got the best deal possible.
    From the initial viewing to the final paperwork, everything was handled perfectly.
    We couldn’t be happier with our new property!
    """,
    """I was really let down by the lack of communication from this real estate company.
    It often took days to get a response, and I felt like I was always the last to know about
    updates on my property search. It made the whole buying process much more stressful than
    it needed to be.
    """,
    """My experience with this real estate company was frustrating. The agent seemed more interested
    in closing the deal quickly rather than finding what was best for me. I felt rushed and
    under-informed throughout the process, which has led to regrets about my purchase.
    """
]

In [24]:
from tqdm import tqdm

responses = []

for review in tqdm(reviews):
  prompt_txt = f"""
  Act as a customer review analyst, given the following customer review text,
  do the following tasks:
  - Find the sentiment (positive, negative or neutral)
  - Extract max 5 key topics or phrases of the good or bad in the review

  Review Text:
  {review}
  """
  prompt = create_phi3_prompt(prompt_txt)
  llm_response = phi3_pipe(prompt,
                      max_new_tokens=150,
                      do_sample=False,
                      return_full_text=False)
  responses.append(llm_response[0]['generated_text'])

100%|██████████| 4/4 [00:06<00:00,  1.53s/it]


In [25]:
responses

[' Sentiment: Positive\n\nKey Topics:\n1. Outstanding experience\n2. Knowledgeable agent\n3. Patient and responsive\n4. Guided through every step\n5. Smooth and enjoyable',
 " - Sentiment: Positive\n\n- Key Topics/Phrases:\n\n  1. Attention to detail\n\n  2. Professionalism\n\n  3. Agent's effort\n\n  4. Best deal possible\n\n  5. Happiness with new property",
 ' - Sentiment: Negative\n\n- Key Topics/Phrases:\n\n  1. Lack of communication\n\n  2. Delayed responses\n\n  3. Feeling out of the loop\n\n  4. Stressful buying process\n\n  5. Unnecessary complications',
 ' Sentiment: Negative\n\nKey Topics/Phrases:\n1. Frustrating experience\n2. Agent more interested in quick deal\n3. Felt rushed\n4. Under-informed\n5. Regrets about purchase']

In [26]:
for response in responses:
  display(Markdown(response))
  print()

 Sentiment: Positive

Key Topics:
1. Outstanding experience
2. Knowledgeable agent
3. Patient and responsive
4. Guided through every step
5. Smooth and enjoyable




 - Sentiment: Positive

- Key Topics/Phrases:

  1. Attention to detail

  2. Professionalism

  3. Agent's effort

  4. Best deal possible

  5. Happiness with new property




 - Sentiment: Negative

- Key Topics/Phrases:

  1. Lack of communication

  2. Delayed responses

  3. Feeling out of the loop

  4. Stressful buying process

  5. Unnecessary complications




 Sentiment: Negative

Key Topics/Phrases:
1. Frustrating experience
2. Agent more interested in quick deal
3. Felt rushed
4. Under-informed
5. Regrets about purchase




### 3. Content Generation based on topics

In [27]:
prompt_txt = """Generate a bullet list of pros and cons of investing
                in commercial real estate during economic volatility.
                The list should include factors such as market potential,
                risk management, capital appreciation, and liquidity concerns
                """;
prompt = create_phi3_prompt(prompt_txt)
llm_response = phi3_pipe(prompt,
                      max_new_tokens=500,
                      do_sample=False,
                      return_full_text=False)

In [28]:
llm_response

[{'generated_text': ' - Pros:\n\n  - Potential for high returns: Commercial real estate can offer significant capital appreciation and rental income, especially in high-demand areas.\n  - Diversification: Investing in commercial real estate can help diversify an investment portfolio, reducing overall risk.\n  - Stable cash flow: Commercial properties often provide a steady stream of rental income, which can be more predictable than other investments.\n  - Tax benefits: Commercial real estate investors may be eligible for various tax deductions and credits, such as depreciation and interest expense deductions.\n  - Long-term investment: Commercial real estate is typically a long-term investment, which can help investors weather short-term market fluctuations.\n\n- Cons:\n\n  - High upfront costs: Purchasing commercial real estate can require a significant initial investment, including closing costs, legal fees, and renovation expenses.\n  - Market volatility: Commercial real estate mark

In [29]:
print(llm_response[0]['generated_text'])

 - Pros:

  - Potential for high returns: Commercial real estate can offer significant capital appreciation and rental income, especially in high-demand areas.
  - Diversification: Investing in commercial real estate can help diversify an investment portfolio, reducing overall risk.
  - Stable cash flow: Commercial properties often provide a steady stream of rental income, which can be more predictable than other investments.
  - Tax benefits: Commercial real estate investors may be eligible for various tax deductions and credits, such as depreciation and interest expense deductions.
  - Long-term investment: Commercial real estate is typically a long-term investment, which can help investors weather short-term market fluctuations.

- Cons:

  - High upfront costs: Purchasing commercial real estate can require a significant initial investment, including closing costs, legal fees, and renovation expenses.
  - Market volatility: Commercial real estate markets can be volatile, with fluctu

In [31]:
display(Markdown(llm_response[0]['generated_text']))

 - Pros:

  - Potential for high returns: Commercial real estate can offer significant capital appreciation and rental income, especially in high-demand areas.
  - Diversification: Investing in commercial real estate can help diversify an investment portfolio, reducing overall risk.
  - Stable cash flow: Commercial properties often provide a steady stream of rental income, which can be more predictable than other investments.
  - Tax benefits: Commercial real estate investors may be eligible for various tax deductions and credits, such as depreciation and interest expense deductions.
  - Long-term investment: Commercial real estate is typically a long-term investment, which can help investors weather short-term market fluctuations.

- Cons:

  - High upfront costs: Purchasing commercial real estate can require a significant initial investment, including closing costs, legal fees, and renovation expenses.
  - Market volatility: Commercial real estate markets can be volatile, with fluctuations in property values, rental rates, and occupancy levels.
  - Limited liquidity: Commercial real estate is not as liquid as other investments, such as stocks or bonds, making it more difficult to sell quickly if needed.
  - Management responsibilities: Commercial real estate investors may need to manage their properties or hire a property management company, which can add to the overall cost and complexity of the investment.
  - Economic downturns: During economic downturns, commercial real estate can be particularly vulnerable, with higher vacancy rates, lower rental rates, and reduced property values.

### 4. Report Summarization

In [32]:
report = """
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.
The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.
On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.
Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.
The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.
These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.
"""

prompt_txt = f"""
Summarize the following report delimited by triple backticks on Generative AI in max 5 lines

Report:
```{report}```
"""

prompt = create_phi3_prompt(prompt_txt)

llm_response = phi3_pipe(prompt,
                      max_new_tokens=500,
                      do_sample=False,
                      return_full_text=False)

In [33]:
print(llm_response[0]['generated_text'])

 Generative AI, introduced in the 1960s and advanced with GANs in 2014, can create text, images, and audio. Recent advances in transformers and large language models (LLMs) have made it possible to generate engaging content across multiple media types. While early implementations have issues with accuracy and bias, the technology has the potential to revolutionize enterprise technology by aiding in code writing, drug design, product development, business process redesign, and supply chain transformation.


In [34]:
display(Markdown(llm_response[0]['generated_text']))

 Generative AI, introduced in the 1960s and advanced with GANs in 2014, can create text, images, and audio. Recent advances in transformers and large language models (LLMs) have made it possible to generate engaging content across multiple media types. While early implementations have issues with accuracy and bias, the technology has the potential to revolutionize enterprise technology by aiding in code writing, drug design, product development, business process redesign, and supply chain transformation.

### 5. Context-based QA

In [35]:
report = """
Three quarters (77%) of the population saw an increase in their regular outgoings over the past year,
according to findings from our recent consumer survey. In contrast, just over half (54%) of respondents
had an increase in their salary, which suggests that the burden of costs outweighing income remains for
most. In total, across the 2,500 people surveyed, the increase in outgoings was 18%, three times higher
than the 6% increase in income.

Despite this, the findings of our survey suggest we have reached a plateau. Looking at savings,
for example, the share of people who expect to make regular savings this year is just over 70%,
broadly similar to last year. Over half of those saving plan to use some of the funds for residential
property. A third are saving for a deposit, and a further 20% for an investment property or second home.

But for some, their plans are being pushed back. 9% of respondents stated they had planned to purchase
a new home this year but have now changed their mind. While for many the deposit may be an issue,
the other driving factor remains the cost of the mortgage, which has been steadily rising the last
few years. For those that currently own a property, the survey showed that in the last year,
the average mortgage payment has increased from £668.51 to £748.94, or 12%."""


In [36]:
question = """
How much has the average mortage payment increased in the last year?
"""

prompt_txt = f"""
Using the following context information below please answer the following question
to the best of your ability
Context:
{report}
Question:
{question}
"""

prompt = create_phi3_prompt(prompt_txt)

llm_response = phi3_pipe(prompt,
                      max_new_tokens=500,
                      do_sample=False,
                      return_full_text=False)

In [37]:
display(Markdown(llm_response[0]['generated_text']))

 The average mortgage payment has increased by £80.43 in the last year.

In [38]:
question = """
What percentage of people had an increase in salary last year? Show the answer just as a number.
"""

prompt_txt = f"""
Using the following context information below please answer the following question
to the best of your ability
Context:
{report}
Question:
{question}
"""

prompt = create_phi3_prompt(prompt_txt)

llm_response = phi3_pipe(prompt,
                      max_new_tokens=500,
                      do_sample=False,
                      return_full_text=False)

In [39]:
display(Markdown(llm_response[0]['generated_text']))

 54