# LLM Personalized Tutor in Coding and Artificial Intelligence

In this small project, I demonstrate the ability to use the OpenAI API and Ollama in order to build a tool that takes a technical question, and responds by following a layout and other characteristics.

For this project I customized two famous LLM models: 
- ***gpt-4o-mini***
- ***llama 3.2***

I have also enables stream output *only for gpt-4o-mini* in order to study the difference in response and User Interraction. <u>The answer will always be returned as Markdown and then rendered by the IPython.display libraries in this notebook.</u>

> This tool is gonna be able to answer questions on code and llms, and it will act as a customized co-pilot. 

### Imports

In [2]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from openai import OpenAI
import ollama

### Constants

In [3]:
MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [4]:
load_dotenv(override=True)
api_key = os.getenv("OPENAI_API_KEY")

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key loaded")
else:
    print("There might be a problem with your API key. It was not found")

openai = OpenAI()

API key loaded


### Prompts

This promps will set how the LLM should behave and responde and what to expect the question to be about. For this particular case, the LLM is customize with ***zero-shot prompting***, in fact, I only specified how I want the answer to be structured, but I do not provide additional examples to support that.

In [22]:
system_prompt = "You are provided with a coding and/or LLM problem as a string input. You are an expert of Computer Science, Artificial Intelligence and LLM Engineering fields. \
You are able to break down the problem and make it easier for the user \
You should be able to answer with a simple, straight to the point answer and solution to the problem, in addition, you should return *multiple examples* that shows different use cases of the answer and are meaningful to explain better the problem. \
Then, you go more in depth by explaining in-depth theory specific to the topic that you are treating.\n \
You are able to explain everything like a professor that would make the extra effort for the user to understand. Use a friendly and simple vocabulary.\n"
system_prompt += "Respond in a well formatted markdown and use separate lines between the quick explanation-solution part and more in-depth part. Any code example should be added to the Markdown 'fenced code blocks' with the correct coding language identified (if none specified you use Python for your examples)"

user_prompt = "You are given a technical question that can represent a problem, issue, or request in the Artificial Intelligence or Coding field. You help the user by carefully answering the following question:"

def get_truncated_user_prompt(prompt):
    return prompt if len(prompt) <= 5000 else prompt[:5000] 

### GPT-4o-mini Function Call

The following function makes a call to gpt-4o-mini api with the required system and user configuration prompts. *Stream* is enabled, so the answer returned in output is immediately displayed (token by token). 

In [14]:
def gpt_4o_problem_answer(question):
    stream = openai.chat.completions.create(
        model=MODEL_GPT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_truncated_user_prompt(f"{user_prompt} {question}")}
        ],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)

    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        # response = response.replace("```markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

### Ollama Function Call

The following method calls the Ollama local API with the configured system and user prompts. This function *does not activate stream* the answer, therefore everything will be rendered to Markdown and displayed only after all the answer was retrieved.

In [7]:
def ollama_problem_answer(question):
    response = ollama.chat(
        model=MODEL_LLAMA,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_truncated_user_prompt(f"{user_prompt} {question}")}
        ],
        stream=False
    )

    display(Markdown(response["message"]["content"]))

### Make a dynamic call that uses the model the user wants

In [8]:
def generate_answer(problem, model = "ollama"):
    if model == "gpt-4o":
        gpt_4o_problem_answer(question=problem)
    elif model == "ollama":
        ollama_problem_answer(question=problem)
    else:
        print("The model you want to interrogate has not been implemented yet.")

### User input

In [23]:
# Write your question here
question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

# Choose your model. Available:
# - GPT-4o : "gpt-4o"
# - ollama : "ollama"
use_model = "ollama"

generate_answer(question, use_model)

**Quick Explanation-Solution**
==========================

The provided code uses a technique called **generator expression** to extract author names from a list of books. Here's a breakdown:

* `yield from` is used to delegate the iteration to another iterable (in this case, the generator expression).
* `{book.get("author") for book in books if book.get("author")}` is a generator expression that:
	+ Iterates over each book in the `books` list.
	+ Filters out books with missing author information using the `if` condition.
	+ Extracts the author name from each book using the `get()` method.

The resulting output will be an iterator that yields the author names of the books with available information.

**Example Use Cases**
--------------------

```python
# Sample data
books = [
    {"title": "Book 1", "author": "John Doe"},
    {"title": "Book 2", "author": None},
    {"title": "Book 3", "author": "Jane Smith"}
]

# Using the generator expression to extract author names
authors = yield from {book.get("author") for book in books if book.get("author")}
print(authors)  # Output: ['John Doe', 'Jane Smith']
```

```python
# Using the generator expression in a loop
for author in yield from {book.get("author") for book in books if book.get("author")}:
    print(author)
# Output:
# John Doe
# Jane Smith
```

**More In-Depth Theory**
----------------------

Generator expressions are a powerful tool in Python that allow you to write concise and efficient code. They consist of a subexpression enclosed in parentheses, which is executed only when the resulting iterator is requested.

The `yield from` keyword is used to delegate the iteration to another iterable, allowing you to nest generator expressions or combine them with loops.

In this specific example, we use a generator expression to filter out books with missing author information. The `if book.get("author")` condition ensures that only books with available author data are processed.

By using `yield from`, we can simplify the code and avoid creating unnecessary intermediate lists or data structures, making it more memory-efficient and scalable for large datasets.

In general, generator expressions are useful when:

* You need to process large datasets and want to avoid loading them into memory.
* You want to perform complex filtering or transformations on data without storing temporary results.
* You need to write concise and readable code that is easy to maintain and extend.

In [24]:
# Write your question here
question = """
Can you explain me the concept of Transformers in Deep Learning, how they work and why Deep Learning relies on this in order to work ?
"""

# Choose your model. Available:
# - GPT-4o : "gpt-4o"
# - ollama : "ollama"
use_model = "gpt-4o"

generate_answer(question, use_model)

### Quick Explanation

Transformers are a type of deep learning model introduced in 2017 that revolutionized how we approach natural language processing (NLP) tasks. Unlike previous models that processed sequences in order, Transformers use a mechanism called self-attention to weigh the importance of different words in relation to each other, allowing them to capture long-range dependencies in the data more efficiently.

Transformers form the backbone of many state-of-the-art architectures, such as BERT and GPT, which are widely used in applications like translation, summarization, and conversation systems.

#### How Transformers Work:
1. **Self-Attention**: This mechanism evaluates the relationships between all words in a sequence simultaneously, allowing the model to focus on important words more efficiently.
2. **Positional Encoding**: Since Transformers do not inherently understand the sequence of words, positional encodings are added to the input embeddings to give the model information about the position of each word.
3. **Multi-Head Attention**: The model applies multiple self-attention mechanisms in parallel to capture different aspects of word relationships.
4. **Feed-Forward Networks**: After attention, the output is passed through feed-forward neural networks, which helps in processing the information further.
5. **Layer Normalization and Residual Connections**: These techniques help stabilize and accelerate training.

### Example Use Cases
1. **Text Generation**: Generating coherent and contextually relevant text using models like GPT-3.
   ```python
   from transformers import GPT2LMHeadModel, GPT2Tokenizer

   model = GPT2LMHeadModel.from_pretrained('gpt2')
   tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

   input_ids = tokenizer.encode("Once upon a time", return_tensors='pt')
   outputs = model.generate(input_ids, max_length=50)
   generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
   print(generated_text)
   ```

2. **Sentiment Analysis**: Analyzing the sentiment of a given text using fine-tuned BERT.
   ```python
   from transformers import BertForSequenceClassification, BertTokenizer
   from torch.nn.functional import softmax
   import torch

   model = BertForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
   tokenizer = BertTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

   text = "I love this product!"
   inputs = tokenizer(text, return_tensors='pt')
   outputs = model(**inputs)
   probabilities = softmax(outputs.logits, dim=1)
   print(probabilities)
   ```

3. **Machine Translation**: Translating sentences between languages, effectively utilizing the Transformer architecture for this task.
   ```python
   from transformers import MarianMTModel, MarianTokenizer

   model_name = 'Helsinki-NLP/opus-mt-en-fr'
   model = MarianMTModel.from_pretrained(model_name)
   tokenizer = MarianTokenizer.from_pretrained(model_name)

   text = "Hello, how are you?"
   translated = model.generate(**tokenizer.encode(text, return_tensors='pt'))
   print(tokenizer.decode(translated[0], skip_special_tokens=True))  # "Bonjour, comment Ã§a va ?"
   ```

---

### In-Depth Explanation

#### Key Concepts in Transformers

1. **Self-Attention**:
   - This mechanism allows the model to weigh the importance of each word relative to others in the sequence. 
   - It computes a score for every pair of words and generates a context vector for each word based on these scores, which helps capture meanings that can depend on words far away in the sequence.

2. **Positional Encoding**:
   - Since Transformers process input in parallel (unlike RNNs that go step-by-step), they require a way to keep track of the order of words. 
   - Positional embeddings are added to the input embeddings to inject information about the position of each token.

3. **Multi-Head Attention**:
   - This allows the model to attend to different parts of the input word embeddings simultaneously. 
   - Each "head" can potentially learn to focus on different aspects like syntax or semantics, enriching the representation.

4. **Feed-Forward Networks**:
   - Each attention layer is followed by a feed-forward neural network that processes the output to further transform it before passing it to the next layer.

5. **Layer Normalization and Residual Connections**:
   - Residual connections help stabilize training by allowing gradients to flow through the network more easily, while layer normalization normalizes the outputs to maintain consistent distributions.

6. **Encoder-Decoder Structure**:
   - The Transformer can have an encoder-decoder architecture, where the encoder processes the input sequence and the decoder generates the output sequence. This is particularly useful in tasks like machine translation.

#### Why Transformers are Important in Deep Learning

The introduction of Transformers has led to significant improvements in handling sequences due to their ability to capture context and relationships without the limitations imposed by recurrent structures. They excel at:
- **Scalability**: Full parallel processing during training reduces training time significantly for large datasets.
- **Performance**: Achieving state-of-the-art results in numerous NLP tasks.
- **Transfer Learning**: Models like BERT and GPT leverage pre-trained weights, allowing them to be fine-tuned on specialized tasks with relatively small datasets.

Overall, Transformers represent a crucial advancement in deep learning architecture, enabling breakthroughs in a variety of applications from text understanding to image processing.