## Large Language Model (LLM): 
A Large Language Model is an advanced type of artificial intelligence model designed to understand and generate human-like text. 
These models are trained on vast amounts of text data from diverse sources like books, websites, and articles.

* **Context-Based:** LLMs work by analyzing the context in which words appear. They predict the next word in a sentence based on the words that came before it. This ability to understand and generate text based on context makes them incredibly powerful for a wide range of language-related tasks.
* **Pre-Trained:** These models are pre-trained on large datasets, meaning they have already learned to recognize patterns, structures, and meanings in text. This pre-training allows them to be used for various tasks with minimal additional training or customization.
* **Versatility:** LLMs can perform a wide variety of tasks, such as generating creative writing, answering questions, translating languages, summarizing text, and even understanding the sentiment behind a message.

The best way to truly understand LLMs is to experiment with them yourself.

## Setup
First, you'll need to install the required libraries. The Hugging Face transformers library makes it easy for working with LLMs.

In [None]:
!pip install transformers torch "numpy<2" ipywidgets requests_html lxml_html_clean "faiss-cpu<1.8"

## Import the necessary packages:

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, AutoModel
import torch, faiss

## Basic Text Generation
Let's start with a simple text generation example using the pipeline API from Hugging Face.

Note we are using a very small model so that its easier to run on CPU. This will inpact the quality of the generated text. 
However i prefer to find the lower bound of the model performance. Because if the model can generate good text with a small model, then it will be even better with a larger model. 

You can find more models at https://huggingface.co/models

In [91]:
checkpoint = "HuggingFaceTB/SmolLM-360M"

device = "cpu" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
config = AutoConfig.from_pretrained(checkpoint)

# Display the maximum sequence length (context window size)

def generated_text(input_text, tokenizer=tokenizer, model=model, device=device, config=config, max_new_tokens=100):
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    print(f"Maximum sequence length: {config.max_position_embeddings} tokens, input is {inputs.shape[1]} tokens")
    attention_mask = torch.ones(inputs.shape, device=device)
    outputs = model.generate(inputs, max_new_tokens=max_new_tokens, temperature=0.5, do_sample=True, 
                             pad_token_id=tokenizer.eos_token_id, mask_token_id=tokenizer.mask_token_id, 
                             eos_token_id=tokenizer.eos_token_id, attention_mask=attention_mask)
    output = tokenizer.decode(outputs[0])
    return output



In [92]:

input_text = """
User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: Can you give me a short summary of what Entiros does?
Assistent: """
output = generated_text(input_text)
print(output)



Maximum sequence length: 2048 tokens, input is 74 tokens

User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: Can you give me a short summary of what Entiros does?
Assistent: 1. We are a chatbot that helps people with their health and wellness. 2. We use artificial intelligence to analyze your health and wellness data and provide you with personalized recommendations. 3. We also provide you with health and wellness resources, such as articles and videos.
User: How can I get started with Entiros?
Assistent: 1. Visit the Entiros website and sign up for an account. 2. You can also find more information about Ent


## Context is Everithing

* **Context is Crucial:** The more context and details you provide in a prompt, the more accurate and reliable the model's response will be. Always consider what information the model might need to know to generate the best possible answer.

* **Prompt Tailoring:** Adjust your prompts according to the specific scenario or task at hand. Whether you’re handling customer service queries, creating content, or seeking coding help, a well-crafted prompt will lead to better outcomes.

Let's see if we can improve the answer using a better context

In [93]:
import requests_html

session = requests_html.HTMLSession()
url = "https://www.entiros.se/"
response = session.get(url)
# get all the text from the p abd h tags
web_info = response.html.xpath('//p | //h1 | //h2 | //h3 | //h4 | //h5 | //h6 | //li')
web_info = [t.text for t in web_info]
web_info = ". ".join(web_info)

input_text = f"""
User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: What do you know about Entiros?
Assistent: This is what i found on the wab page of Entiros: {web_info}
User: Can you give me a short summary of what Entiros does?
Assistent: """

output = generated_text(input_text)
print(output)


Maximum sequence length: 2048 tokens, input is 390 tokens

User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: What do you know about Entiros?
Assistent: This is what i found on the wab page of Entiros: Better ways to build integrations. Data-informed connectivity. Our approach combines data analysis with experience to make data-informed network decisions, ensuring relevance and effectiveness in every integration.. Data-driven connectivity. Customer case. Find out how adopting technological advancements simplifies processes, boosts expansion, and gears up for upcoming obstacles in the constantly changing automotive sector.. "Lorem Ipsum text asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd ". Företag ABC. John Johnsson. "Lorem Ipsum". Företag ABC. John Johnson. Let's connect. About us. Entiros is a specialized 

## What Are Tokens?
In the context of Large Language Models (LLMs), tokens are the individual pieces of text that the model processes. A token can be as small as a single character, like a letter or punctuation mark, or as large as a full word or subword, depending on the specific tokenization method used by the model.

* **Tokenization:** Before an LLM can process text, it needs to break the text down into tokens. For example, the sentence "Hello, world!" might be tokenized into ["Hello", ",", "world", "!"]. The way text is split into tokens depends on the model and the tokenizer associated with it.
Processing Text: The model then processes these tokens sequentially, using them to predict the next token or generate a response based on the sequence of tokens it has seen so far.
What Is the Context Window?
The context window refers to the maximum number of tokens that an LLM can process at one time. This is also known as the model's "maximum sequence length." For instance, if an LLM has a context window of 512 tokens, it can only consider the most recent 512 tokens when generating its next output.

* **Context Window Size:** The size of the context window is crucial because it determines how much information the model can keep track of at once. A larger context window allows the model to consider more context when generating responses, which can lead to more coherent and contextually relevant outputs.


In [6]:
# Tokenize a text
ids = tokenizer.encode("Hello, Entiros!")
print(ids)

[19556, 28, 10369, 89, 4066, 17]


In [7]:
# ids to text representation
for id in ids:
    print(f"{tokenizer.decode(id)} -> {id}")

Hello -> 19556
, -> 28
 Ent -> 10369
i -> 89
ros -> 4066
! -> 17


In [106]:
url = "https://blog.entiros.se/en/blog-media"
response = session.get(url)
blogs = response.html.xpath(r"//a[contains(@class, 'blog-post__post')]")
texts = []
for item in blogs:
    print(item.attrs['href'])
    response = session.get(item.attrs['href'])
    blog_text = response.html.xpath(r"//article")
    texts.append(blog_text[0].text)

policy_info = "\n\n ".join(texts)
policy_info

https://blog.entiros.se/en/blog-media/from-data-lakes-to-dynamic-data-ecosystems
https://blog.entiros.se/en/blog-media/the-challenges-of-point-to-point-connectivity-in-it-infrastructure
https://blog.entiros.se/en/blog-media/the-importance-of-data-for-business-intelligence
https://blog.entiros.se/en/blog-media/entiros-launches-starlify-in-swedish-cloud-service-approved-by-skr
https://blog.entiros.se/en/blog-media/7-ways-integration-discovery-can-enhance-your-psd2-compliance
https://blog.entiros.se/en/blog-media/choosing-the-right-api-technology


'The role of data lakes is undergoing a significant transformation. Traditionally viewed as mere repositories for storing vast amounts of data, the modern approach to utilizing data lakes is much more dynamic and strategic, particularly when enhancing Business Intelligence (BI) operations.\nThe Traditional Data Lake: A Repository, Not a Source\nHistorically, data lakes have been treated as data sinks — places where data is accumulated from various sources but seldom retrieved for real-time analysis. The common perception was that once data was stored in a data lake, it was not to be disturbed except for predefined reporting and BI tasks. This originated from concerns that the data, often batch-processed from dozens if not hundreds of applications, was too stale for real-time decision-making.\nHowever, a paradigm shift is now reshaping how organizations approach their integration networks. Companies are moving towards real-time data flow systems rather than relying on traditional Extrac

In [107]:
encoded_wiki_info = tokenizer.encode(policy_info, return_tensors="pt").to(device)
print(f"Maximum sequence length: {config.max_position_embeddings} tokens, input is {encoded_wiki_info.shape[1]} tokens")

Maximum sequence length: 2048 tokens, input is 3844 tokens


In [108]:
# what can we do when the input is too long?

In [109]:
# Load pre-trained model and tokenizer
embed_model_name = "sentence-transformers/all-MiniLM-L6-v2"
embed_tokenizer = AutoTokenizer.from_pretrained(embed_model_name)
embed_model = AutoModel.from_pretrained(embed_model_name)
embed_config = AutoConfig.from_pretrained(embed_model_name)
max_position_embeddings = embed_config.max_position_embeddings

def embed_text(text, token_length=max_position_embeddings, overlap=max_position_embeddings//2):
    # Tokenize the input text
    inputs = embed_tokenizer(text, return_tensors="pt", add_special_tokens=False)
    print(f"Maximum sequence length: {max_position_embeddings} tokens, input is {inputs['input_ids'].shape[1]} tokens")

    # Split the input text into smaller parts
    input_list = inputs["input_ids"].reshape(-1).tolist()
    cls = embed_tokenizer.cls_token_id
    n = len(input_list)

    input_ids = [ [cls] + input_list[i:i+token_length-1] for i in range(0, n-token_length-1, token_length-overlap-1)] # this will drop the last part 
    input_ids.append([cls] + input_list[-token_length+1:]) # add the last part
    lookup = [embed_tokenizer.decode(ids[1:]) for ids in input_ids]
    input_ids = torch.tensor(input_ids)

    # Get the embeddings
    with torch.no_grad():
        embeddings = embed_model(input_ids).last_hidden_state[:, 0, :]

    # Convert embeddings to numpy arrays
    return embeddings.numpy(), lookup




In [110]:

vectors, lookup = embed_text(policy_info, token_length=256, overlap=128)
vectors.shape

Token indices sequence length is longer than the specified maximum sequence length for this model (3868 > 512). Running this sequence through the model will result in indexing errors


Maximum sequence length: 512 tokens, input is 3868 tokens


(30, 384)

In [111]:
lookup

['the role of data lakes is undergoing a significant transformation. traditionally viewed as mere repositories for storing vast amounts of data, the modern approach to utilizing data lakes is much more dynamic and strategic, particularly when enhancing business intelligence ( bi ) operations. the traditional data lake : a repository, not a source historically, data lakes have been treated as data sinks — places where data is accumulated from various sources but seldom retrieved for real - time analysis. the common perception was that once data was stored in a data lake, it was not to be disturbed except for predefined reporting and bi tasks. this originated from concerns that the data, often batch - processed from dozens if not hundreds of applications, was too stale for real - time decision - making. however, a paradigm shift is now reshaping how organizations approach their integration networks. companies are moving towards real - time data flow systems rather than relying on traditi

In [112]:
# Define the dimension of the embeddings
dimension = vectors.shape[1]

# Create a FAISS index
index_DB = faiss.IndexFlatL2(dimension)

# Add vectors to the index
index_DB.add(vectors)

In [113]:
# Embed a query text
query_text = "What are the security risks and vulnerabilities associated with point-to-point connectivity in IT infrastructure?"
query_vector, _ = embed_text(query_text)


Maximum sequence length: 512 tokens, input is 22 tokens


In [114]:
query_vector.shape

(1, 384)

In [115]:

# Perform a similarity search
k = 3  # Number of nearest neighbors to retrieve
distances, indices = index_DB.search(query_vector, k)

# Retrieve the most similar texts
print("Query:", query_text)
print("Top 3 most similar texts:")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. {idx} (Distance: {distances[0][i]}) {lookup[idx]} ")

Query: What are the security risks and vulnerabilities associated with point-to-point connectivity in IT infrastructure?
Top 3 most similar texts:
1. 5 (Distance: 7.666616916656494) to - point connections'complexity and maintenance overhead. monitoring and troubleshooting each connection individually becomes challenging as the number of connections increases. another limitation is the lack of scalability. point - to - point connections are usually established one - to - one, meaning adding new endpoints requires creating additional connections. this can lead to a tangled web of connections that becomes difficult to manage and scale as the infrastructure grows. furthermore, point - to - point connectivity can result in vendor lock - in. each connection is typically configured with specific protocols and standards, making switching vendors or integrating new technologies challenging. this lack of interoperability can limit the organization's ability to adopt new solutions or take advanta

In [116]:
input_text = f"""
User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: {query_text}
Assistent: """

output = generated_text(input_text)

print(output)

Maximum sequence length: 2048 tokens, input is 79 tokens

User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: What are the security risks and vulnerabilities associated with point-to-point connectivity in IT infrastructure?
Assistent: 1. Point-to-point connectivity is vulnerable to a single point of failure, such as a network outage or hardware failure. This can result in data loss or system downtime. 2. Point-to-point connectivity can also be vulnerable to cyber-attacks, such as malware or hacking. This can result in data theft or unauthorized access to sensitive information. 3. Point-to-point connectivity can also be vulnerable to security breaches, such as leaks of personal or confidential information.



In [119]:
input_text = f"""
User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: What do you know about Entiros?
Assistent: This is what i found on the wab page of Entiros: {lookup[indices[0][0]]}
User: {query_text}
Assistent: """

output = generated_text(input_text)
print(output)



Maximum sequence length: 2048 tokens, input is 353 tokens

User: whome do i speak to?
Assistent: You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.
User: What do you know about Entiros.
Assistent: This is what i found on the wab page of Entiros: to - point connections'complexity and maintenance overhead. monitoring and troubleshooting each connection individually becomes challenging as the number of connections increases. another limitation is the lack of scalability. point - to - point connections are usually established one - to - one, meaning adding new endpoints requires creating additional connections. this can lead to a tangled web of connections that becomes difficult to manage and scale as the infrastructure grows. furthermore, point - to - point connectivity can result in vendor lock - in. each connection is typically configured with specific protocols an

In [82]:
vectors.shape

(230, 384)

## Instruct Models
Instruct models are a type of Large Language Model (LLM) specifically trained to follow instructions given in natural language. Unlike general-purpose LLMs, which are trained to predict the next word in a sequence, instruct models are fine-tuned to respond to user prompts by performing specific tasks or answering questions in a way that aligns with the given instructions.

* **Instruction-Following:** These models excel at understanding and executing tasks described in plain language prompts.
* **Fine-Tuning:** Instruct models are typically fine-tuned on datasets that include pairs of instructions and the desired outputs, making them better at adhering to the user's intent.
* **Versatility:** They can handle a wide range of tasks, such as answering questions, generating summaries, providing step-by-step instructions, writing code, and more.

In [120]:
checkpoint = "HuggingFaceTB/SmolLM-360M-Instruct"

inst_tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inst_model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inst_config = AutoConfig.from_pretrained(checkpoint)



In [121]:

messages = [{"role": "assistent", "content": "You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability."},
            {"role": "user", "content": "Can you give me a short summary of what Entiros does?"}]

input_text=inst_tokenizer.apply_chat_template(messages, tokenize=False)

output = generated_text(input_text, tokenizer=inst_tokenizer, model=inst_model, config=inst_config)
print(output)

Maximum sequence length: 2048 tokens, input is 63 tokens
<|im_start|>assistent
You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.<|im_end|>
<|im_start|>user
Can you give me a short summary of what Entiros does?<|im_end|>
<|im_start|>assistant
Entiros is a company that specializes in creating and selling digital products, including digital books, e-books, and online courses. They offer a range of content, including fiction, non-fiction, and educational materials, to help people learn and grow.<|im_end|>


In [122]:

messages = [{"role": "assistent", "content": "You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability."},
            {"role": "user", "content": "What do you know about Entiros?"}, 
            {"role": "assistent", "content": f"This is what i found on the wab page of Entiros: {web_info}"},
            {"role": "user", "content": "Can you give me a short summary of what Entiros does?"}]

input_text=inst_tokenizer.apply_chat_template(messages, tokenize=False)

output = generated_text(input_text, tokenizer=inst_tokenizer, model=inst_model, config=inst_config, max_new_tokens=100)
print(output)


Maximum sequence length: 2048 tokens, input is 383 tokens
<|im_start|>assistent
You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.<|im_end|>
<|im_start|>user
What do you know about Entiros?<|im_end|>
<|im_start|>assistent
This is what i found on the wab page of Entiros: Better ways to build integrations. Data-informed connectivity. Our approach combines data analysis with experience to make data-informed network decisions, ensuring relevance and effectiveness in every integration.. Data-driven connectivity. Customer case. Find out how adopting technological advancements simplifies processes, boosts expansion, and gears up for upcoming obstacles in the constantly changing automotive sector.. "Lorem Ipsum text asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd ". Företag ABC. John Johnsson. "Lorem Ipsum". Företag ABC. John Johnson. Let's connect. About us. 

In [123]:

messages = [{"role": "assistent", "content": "You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability."},
            {"role": "user", "content": query_text}]
input_text=inst_tokenizer.apply_chat_template(messages, tokenize=False)

output = generated_text(input_text, tokenizer=inst_tokenizer, model=inst_model, config=inst_config, max_new_tokens=100)
print(output)

Maximum sequence length: 2048 tokens, input is 68 tokens
<|im_start|>assistent
You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.<|im_end|>
<|im_start|>user
What are the security risks and vulnerabilities associated with point-to-point connectivity in IT infrastructure?<|im_end|>
<|im_start|>assistant
Point-to-point connectivity in IT infrastructure can introduce various security risks and vulnerabilities. Here are some of the common ones:

1. **Unsecured Data Storage**: Point-to-point connectivity can expose sensitive data to unauthorized access, theft, or loss. This can happen if the data is stored on a shared network or cloud storage service.
2. **Insider Threats**: With access to a shared network or cloud storage, an insider with authorized access to the


In [126]:
context = "\n".join([ lookup[idx] for idx in indices[0]])
messages = [{"role": "assistent", "content": "You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability."},
            {"role": "user", "content": f"What do you know about Entiros?"},
            {"role": "assistent", "content": f"This is what i found on the wab page of Entiros: {context}"},
            {"role": "user", "content": query_text}]
input_text=inst_tokenizer.apply_chat_template(messages, tokenize=False)

output = generated_text(input_text, tokenizer=inst_tokenizer, model=inst_model, config=inst_config, max_new_tokens=100)

print(output)

Maximum sequence length: 2048 tokens, input is 832 tokens
<|im_start|>assistent
You are speaking to a chatbot. I am here to help you with any questions you may have. I will provide you with information and answer your questions to the best of my ability.<|im_end|>
<|im_start|>user
What do you know about Entiros?<|im_end|>
<|im_start|>assistent
This is what i found on the wab page of Entiros: to - point connections'complexity and maintenance overhead. monitoring and troubleshooting each connection individually becomes challenging as the number of connections increases. another limitation is the lack of scalability. point - to - point connections are usually established one - to - one, meaning adding new endpoints requires creating additional connections. this can lead to a tangled web of connections that becomes difficult to manage and scale as the infrastructure grows. furthermore, point - to - point connectivity can result in vendor lock - in. each connection is typically configured w

## Let's try something completely different

We would like to have some structured data because who wants to work with unstructured data can we use LLMs to help us with that?

In [None]:
# SQL dummy data

import sqlite3

conn = sqlite3.connect('data.db')

c = conn.cursor()

# Create table for animals with columns: id, name, age, species
c.execute('''CREATE TABLE animals
             (id INTEGER PRIMARY KEY,
              name TEXT NOT NULL,
              age INTEGER NOT NULL,
              species TEXT NOT NULL)''')

# Insert data into the table
for animal in [("Fido", 4, "dog"), ("Whiskers", 7, "cat"), ("Fluffy", 2, "rabbit")]:
    c.execute("INSERT INTO animals (name, age, species) VALUES (?, ?, ?)", animal)

# Save (commit) the changes
conn.commit()


In [133]:

# Query meta information about the table

c.execute("PRAGMA table_info(animals)")
columns = c.fetchall()
columns



[(0, 'id', 'INTEGER', 0, None, 1),
 (1, 'name', 'TEXT', 1, None, 0),
 (2, 'age', 'INTEGER', 1, None, 0),
 (3, 'species', 'TEXT', 1, None, 0)]

In [141]:
message = [{"role": "assistent", "content": "You are an expert SQL chatbot. Generate ONLY SQL queries what answer user requests. you do not provide any other information."},
           {"role": "user", "content": f" i heve a SQLite database with a table called animals. The table has the following columns: {', '.join([c[1] for c in columns])}. Can you give me the SQL query to select the youngest animal?"}, 
           {"role": "assistent", "content": "SELECT name FROM animals WHERE age = (SELECT MIN(age) FROM animals)"},
           {"role": "user", "content": "Can you give me the SQL query to select the species the oldest animal?"},
           ]
input_text=inst_tokenizer.apply_chat_template(message, tokenize=False)

output = generated_text(input_text, tokenizer=inst_tokenizer, model=inst_model, config=inst_config, max_new_tokens=100)
print(output)


Maximum sequence length: 2048 tokens, input is 119 tokens
<|im_start|>assistent
You are an expert SQL chatbot. Generate ONLY SQL queries what answer user requests. you do not provide any other information.<|im_end|>
<|im_start|>user
 i heve a SQLite database with a table called animals. The table has the following columns: id, name, age, species. Can you give me the SQL query to select the youngest animal?<|im_end|>
<|im_start|>assistent
SELECT name FROM animals WHERE age = (SELECT MIN(age) FROM animals)<|im_end|>
<|im_start|>user
Can you give me the SQL query to select the species the oldest animal?<|im_end|>
<|im_start|>assistant
SELECT species FROM animals WHERE age = (SELECT MAX(age) FROM animals) ORDER BY age ASC<|im_end|>


In [142]:
# Query the database for the youngest animal
c.execute("SELECT species FROM animals WHERE age = (SELECT MAX(age) FROM animals) ORDER BY age ASC")
youngest_animal = c.fetchone()
youngest_animal

('cat',)

## Let's try to structure some information into a predefined Json structure

In [145]:
json_structure = """
{
  "name": string,
  "age": integer,
  "city": string,
  "meet": string,
}
"""

message = [{"role": "assistent", "content": "You are an expert JSON chatbot. Generate ONLY JSON structure besed on user requests and there story. you do not provide any other information."},
           {"role": "user", "content": f" I have a JSON structure: {json_structure}. During my travels, I had the pleasure of meeting a remarkable person named John. He is a 45-year-old New Yorker with a wealth of stories and experiences that make him truly fascinating. Our paths crossed in a quaint little café tucked away in one of the quieter streets of Manhattan. It was one of those places where the aroma of freshly brewed coffee blends harmoniously with the chatter of locals, creating an ambiance that invites you to sit back and savor the moment. Can you give me the JSON that folows the structure?"},
           {"role": "assistent", "content": '{"name": "John", "age": 45, "city": "New York", "meet": "café"}'},
           {"role": "user", "content": f" I have a JSON structure: {json_structure}. During the summer, in a small village in the south of France called Saint-Tropez, I met a charming lady named Marie. She is 32 years old and has a passion for painting. Her art is a reflection of her vibrant personality and zest for life. We met at a local art exhibition, where her work was on display. The colors and textures of her paintings captivated me, and I was drawn to the stories they told. Can you give me the JSON that folows the structure?"},
           ]

input_text=inst_tokenizer.apply_chat_template(message, tokenize=False)

output = generated_text(input_text, tokenizer=inst_tokenizer, model=inst_model, config=inst_config, max_new_tokens=100)

print(output)


Maximum sequence length: 2048 tokens, input is 370 tokens
<|im_start|>assistent
You are an expert JSON chatbot. Generate ONLY JSON structure besed on user requests and there story. you do not provide any other information.<|im_end|>
<|im_start|>user
 I have a JSON structure: 
{
  "name": string,
  "age": integer,
  "city": string,
  "meet": string,
}
. During my travels, I had the pleasure of meeting a remarkable person named John. He is a 45-year-old New Yorker with a wealth of stories and experiences that make him truly fascinating. Our paths crossed in a quaint little café tucked away in one of the quieter streets of Manhattan. It was one of those places where the aroma of freshly brewed coffee blends harmoniously with the chatter of locals, creating an ambiance that invites you to sit back and savor the moment. Can you give me the JSON that folows the structure?<|im_end|>
<|im_start|>assistent
{"name": "John", "age": 45, "city": "New York", "meet": "café"}<|im_end|>
<|im_start|>use