# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [1]:
# imports
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from openai import OpenAI
import ollama

In [2]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [3]:
# set up environment
load_dotenv()
openai = OpenAI()

In [4]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""
system_prompt = "You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs"
user_prompt = "Please give a detailed explanation to the following question: " + question


In [9]:
# Get gpt-4o-mini to answer, with streaming

stream = openai.chat.completions.create(
    model=MODEL_GPT,
    messages=[{
        "role":"system",
        "content":system_prompt
    },
    {
        "role":"user",
        "content":user_prompt
    }
    ]
    , stream=True
)

response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ""
    response = response.replace("```","").replace("markdown", "")
    update_display(Markdown(response), display_id=display_handle.display_id)

Certainly! The code you provided is a Python expression that uses a combination of a generator function and a set comprehension. Let's break it down step by step:

### Breakdown of the Code

1. **Set Comprehension**: 
   python
   {book.get("author") for book in books if book.get("author")}
   
   - This part of the code creates a set using a set comprehension.
   - `books` is assumed to be an iterable (like a list) containing dictionaries, where each dictionary represents a book.
   - `book.get("author")` retrieves the value associated with the key `"author"` from each book dictionary.
   - The comprehension iterates over each `book` in `books`.
   - The `if book.get("author")` condition filters out any books that do not have an author (i.e., if the author's name is `None` or an empty string, those books are excluded).
   - The end result is a set of unique author names, because sets automatically handle duplicates.

2. **Yielding from the Set**:
   python
   yield from ...
   
   - The `yield from` statement is used within a generator function to yield values from another iterable (in this case, the set created by the set comprehension).
   - When `yield from` is used, the generator will yield each value from the provided iterable one by one. This allows for a clean way to delegate part of the generator's operation to another iterable.
  
### Complete Function Context

For this `yield from` expression to work, it must be part of a generator function. An example of such a function might look like this:

python
def get_authors(books):
    yield from {book.get("author") for book in books if book.get("author")}


### What the Code Does

- **Produces Unique Authors**: The entire line effectively creates a generator that produces unique author names from the list of book dictionaries.
- **Handles Missing Data**: If a book does not have an `"author"` key or if the value is `None` or an empty string, that book is excluded from the final output.
- **Efficient Memory Usage**: Since the result is yielded one at a time, it can be used in a memory-efficient way; you don't need to store the entire list of authors in memory if you process them one by one.

### Why Use This Code?

1. **Readability**: Using a generator with `yield from` makes the code succinct and easy to read.
2. **Efficiency**: Generators are more memory efficient than lists since they generate values on-the-fly, which can be particularly useful if the `books` list is very large.
3. **Elimination of Duplicates**: Creating a set inherently takes care of duplicate author names, ensuring that every author is returned only once.

### Example Usage

Here’s a quick example of how this function might be used:

python
books = [
    {"title": "Book One", "author": "Author A"},
    {"title": "Book Two", "author": None},
    {"title": "Book Three", "author": "Author B"},
    {"title": "Book Four", "author": "Author A"},
]

for author in get_authors(books):
    print(author)


**Output**:

Author A
Author B


In this example, both instances of "Author A" will only show up once in the output, demonstrating the functionality of the generator combined with set comprehensions.

In [15]:
import openai
from IPython.display import display, Markdown

MODEL_GPT = "gpt-4o-mini"

def ask_question(question):
    system_prompt = "You are a helpful technical tutor who answers questions about Python code, software engineering, data science, and LLMs."
    user_prompt = question
    print(f"\nInput: \n{question}")
    print(f"\nOutput: ")
    stream = openai.chat.completions.create(
        model=MODEL_GPT,
        messages=[{"role": "system", "content": system_prompt},
                  {"role": "user", "content": user_prompt}],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    
    for chunk in stream:
        response += chunk.choices[0].delta.content or ""
        response = response.replace("```", "").replace("markdown", "")
        display_handle.update(Markdown(response))

# Initial Question
question = input("Enter your question: ")
response = ask_question(question)

# Follow-up loop
while True:
    follow_up = input("Ask a follow-up question (or press Enter to exit): ").strip()
    if not follow_up:
        break  # Exit the loop
    response = ask_question(follow_up)


Input: 
what is huff model in less than 50 words

Output: 


The HUFF model is a spatial market analysis tool used in retail and urban planning. It estimates consumer behavior by calculating the probability of a shopper choosing a particular store based on its distance and attractiveness, combining factors like size and type of retail offering.


Input: 
could we do this in python?

Output: 


Of course! I'd be happy to help you with that. Please provide more details about what you would like to accomplish in Python, and I'll do my best to assist you.


Input: 
I mean is there any library for this?

Output: 


Could you please provide more context or specify what you're looking for? There are many libraries available in Python for various tasks, including data analysis, machine learning, web development, and more. Let me know what specific functionality you need, and I can recommend a suitable library!


Input: 
yes like scikit learn library

Output: 


Scikit-learn is a popular Python library used for machine learning and data science. It provides a wide range of tools for tasks such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. Here are some key features and common functionalities of scikit-learn:

1. **Supervised Learning Algorithms**: Scikit-learn includes various algorithms for supervised learning tasks, such as linear regression, logistic regression, support vector machines, decision trees, random forests, and more.

2. **Unsupervised Learning**: You can also perform unsupervised learning tasks, including clustering (e.g., K-means, hierarchical clustering) and dimensionality reduction (e.g., PCA, t-SNE).

3. **Model Selection**: Scikit-learn provides tools for model evaluation and selection, including cross-validation techniques and metrics to assess model performance (accuracy, precision, recall, etc.).

4. **Preprocessing**: The library includes functions for data preprocessing, such as scaling features, encoding categorical variables, and handling missing values.

5. **Pipelines**: You can create pipelines to streamline the process of applying transformations and fitting models, which helps ensure reproducibility and maintainability.

6. **Integration with Other Libraries**: Scikit-learn works well with other scientific computing libraries in Python, such as NumPy, pandas, and Matplotlib.

Here’s a simple example of using scikit-learn to perform a classification task:

python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load a dataset (using Iris dataset as an example)
from sklearn.datasets import load_iris
data = load_iris()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest Classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


If you have specific questions about scikit-learn or need help with particular tasks or concepts, feel free to ask!


Input: 
then how to do huff model using python

Output: 


The Hierarchical Unified Feature Frequency (HUFF) model, commonly referred to as the HUFF model, is often associated with applications in natural language processing and data encoding due to its associative mapping of features and hierarchical organization. However, the terminology "HUFF model" could be somewhat ambiguous since it might refer to various concepts, including Huffman coding or Hierarchical models in general. 

If you're referring to a specific method (like Huffman coding, which is used for data compression), I can provide a Python implementation for that. Here's a simple example of how to implement Huffman coding in Python:

### Huffman Coding Implementation in Python

python
import heapq
from collections import defaultdict

class Node:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(text):
    frequency = defaultdict(int)
    
    # Count frequency of each character
    for char in text:
        frequency[char] += 1

    # Create a priority queue
    priority_queue = [Node(char, freq) for char, freq in frequency.items()]
    heapq.heapify(priority_queue)

    # Merge nodes until there is only one node left
    while len(priority_queue) > 1:
        left = heapq.heappop(priority_queue)
        right = heapq.heappop(priority_queue)
        merged = Node(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        heapq.heappush(priority_queue, merged)

    return priority_queue[0]

def build_codes(node, prefix="", codebook={}):
    if node:
        if node.char is not None:
            codebook[node.char] = prefix
        build_codes(node.left, prefix + "0", codebook)
        build_codes(node.right, prefix + "1", codebook)
    return codebook

def huffman_encoding(text):
    root = build_huffman_tree(text)
    huffman_codes = build_codes(root)
    
    encoded_output = ''.join(huffman_codes[char] for char in text)
    
    return encoded_output, huffman_codes

def huffman_decoding(encoded_text, huffman_codes):
    reverse_codes = {v: k for k, v in huffman_codes.items()}
    current_code = ""
    decoded_output = ""

    for bit in encoded_text:
        current_code += bit
        if current_code in reverse_codes:
            decoded_output += reverse_codes[current_code]
            current_code = ""
    
    return decoded_output

if __name__ == "__main__":
    text = "this is an example for huffman encoding"
    print("Original Text: ", text)

    encoded_text, huffman_codes = huffman_encoding(text)
    print("Encoded Text: ", encoded_text)
    print("Huffman Codes: ", huffman_codes)

    decoded_text = huffman_decoding(encoded_text, huffman_codes)
    print("Decoded Text: ", decoded_text)


### Explanation:
1. **Node Class**: Defines a node in the Huffman tree.
2. **build_huffman_tree**: Creates a tree based on frequency of each character.
3. **build_codes**: Generates binary codes for each character based on the tree structure.
4. **huffman_encoding**: Encodes the input text into a binary representation.
5. **huffman_decoding**: Decodes the binary representation back into the original text.

### Usage:
You can run the script, and it will encode the sample text and then decode it back to the original text.

If this doesn't align with what you're asking for regarding "HUFF model," please provide more context or details about the specific model or application you're referring to!

In [6]:
# Get Llama 3.2 to answer

import openai
from IPython.display import display, Markdown

OLLAMA_API = 'http://localhost:11434/v1'
HEADERS = {"Content-Type": "application/json"}
MODEL_LLAMA = "llama3.2"
ollama_via_openai = OpenAI(base_url=OLLAMA_API, api_key='ollama')

def ask_question(question):
    system_prompt = "You are a helpful technical tutor who answers questions about Python code, software engineering, data science, and LLMs."
    user_prompt = question
    print(f"\nInput: \n{question}")
    print(f"\nOutput: ")
    stream = ollama_via_openai.chat.completions.create(
        model=MODEL_LLAMA,
        messages=[{"role": "system", "content": system_prompt},
                  {"role": "user", "content": user_prompt}],
        stream=True
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)
    
    for chunk in stream:
        response += chunk.choices[0].delta.content or ""
        response = response.replace("```", "").replace("markdown", "")
        display_handle.update(Markdown(response))

# Initial Question
question = input("Enter your question: ")
response = ask_question(question)

# Follow-up loop
while True:
    follow_up = input("Ask a follow-up question (or press Enter to exit): ").strip()
    if not follow_up:
        break  # Exit the loop
    response = ask_question(follow_up)


Input: 
what's the difference between Generative AI and LLM

Output: 


Generative AI (Artificial Intelligence) and LLM (Large Language Models) are related but not exactly the same thing.

**Generative AI**: Generative AI refers to a class of machine learning algorithms that generate new, original content in various forms, such as:

* Images: generating new images based on patterns and styles learned from existing images
* Text: generating new text based on patterns and structures learned from existing text data
* Music: generating new music based on patterns and melodies learned from existing music data

Generative AI models are often trained using unsupervised learning methods, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), which allows them to learn complex patterns in the data without explicit human supervision.

**LLM (Large Language Models)**: LLMs, on the other hand, are a specific type of deep learning model that is particularly well-suited for natural language processing tasks. An LLM is trained using a massive amount of text data and aims to predict the probability of a given sequence of words.

While all LLMs can generate new text, not all generative AI models are necessarily LLMs. In other words, not all AI systems that generate text or images are Large Language Models.

**Key differences**:

* **Training objective**: Generative AI models aim to generate new content, whereas LLMs primarily focus on predicting probabilities in sequential data.
* **Task-oriented design**: Generative AI models might be designed to perform specific tasks like image-to-image translation, while LLMs are typically used for text-to-text or question-answering applications.
* **Data requirements**: Both require large amounts of data for training, but the type and structure of the data differ. LLMs rely on massive amounts of text data, whereas generative AI models might use a mix of labeled and unlabeled data.

Example scenarios where you'd use each:

* Use Generative AI (e.g., GANs) to:
	+ Generate new images for product design or advertising.
	+ Create music tracks for a specific style or genre.
* Use LLMs (e.g., BERT, RoBERTa):
	+ Answer complex questions with relevant text from a knowledge base.
	+ Perform text classification or sentiment analysis.

In summary: while both Generative AI and LLM are powerful tools for generating novel content, they have distinct design objectives and training requirements.

In [5]:
# Get Llama 3.2 to answer

import openai
from IPython.display import display, Markdown

OLLAMA_API = 'http://localhost:11434/v1'
HEADERS = {"Content-Type": "application/json"}
MODEL = "deepseek-r1:1.5b"
ollama_via_openai = OpenAI(base_url=OLLAMA_API, api_key='ollama')

def ask_question(question):
    system_prompt = "You are a helpful technical tutor who answers questions about Python code, software engineering, data science, and LLMs."
    user_prompt = question
    print(f"\nInput: \n{question}")
    print(f"\nOutput: ")
    stream = ollama_via_openai.chat.completions.create(
        model=MODEL,
        messages=[{"role": "system", "content": system_prompt},
                  {"role": "user", "content": user_prompt}],
        stream=True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    
    for chunk in stream:
        response += chunk.choices[0].delta.content or ""
        response = response.replace("```", "").replace("markdown", "")
        display_handle.update(Markdown(response))

# Initial Question
question = input("Enter your question: ")
response = ask_question(question)

# Follow-up loop
while True:
    follow_up = input("Ask a follow-up question (or press Enter to exit): ").strip()
    if not follow_up:
        break  # Exit the loop
    response = ask_question(follow_up)


Input: 
hi, what's the difference between LLM and Generative AI. Elaborate please.

Output: 


<think>
Alright, I'm trying to understand the difference between LLM (Large Language Models) and Generative AI. From what I know, both seem related to generating text or language-based outputs. But wait, isn't an LLM more technical and maybe narrower in scope? Maybe it's similar but has specific components or functionalities that a generative AI might incorporate.

In my experience, I've seen tools like ChatGPT use something called the "LLC" module. That seems really important because it makes models more transparent. Without transparency, it's hard to trust how they generate responses, even if someone didn't realize they were being monitored or influenced by some unseen factors. So maybe the LLC adds a layer of control and responsibility to the model outputs.

Generative AI, on the other hand, might handle tasks that aren't only natural language processing but could also include things like creative content generation. For example, an AI-powered editor could create images or music based on user inputs. While this is similar in concept to an LLM, it's more about applying advanced technologies specifically for creative purposes.

Wait, so maybe every Generative AI model has the core functionalities of an LLM but can extend beyond that into other areas like art, design, etc.? That could make them even more versatile or "generative" because their outputs aren't just words or text but also images or other creative outputs. So in a way, they might be similar, but Generative AI seems to have broader applications and specific focus areas not commonly covered under traditional LLMs.
</think>

LLM (Large Language Model) and Generative AI are closely related concepts in the field of artificial intelligence, particularly in natural language processing (NLP). While they share a common foundation, there are notable differences in their scope, functionality, and intended use cases.

**Generative AI:**
- **Core Functionality:** Generative AI models focused on NLP tasks primarily generate text or descriptions from input data. These models can process unstructured data like HTML or JSON (using OpenAI's Text Generation API). They are versatile tools for summarization, translation, and creative content generation.
- **Applications:** Beyond pure text generation, GANs (Generative Adversarial Networks) in AI art and generative music systems are examples where the model goes beyond text. These models can explore creative domains and offer specific applications that a more general language model might not cover.

**Large Language Model (LLM):**
- **Core Functionality:** An LLM is typically designed to process natural language tasks with high efficiency, focusing on generating responses or understanding context.
- **Importance of Transparency:** An essential component of an LLM is the "LLC" module in platforms like ChatGPT. This module enhances transparency by allowing users to trace back how outputs are generated, which is crucial for building trust and preventing misuse.

**Key Features Distinct:

1. **Generative Nature:**
   - Generative AI goes beyond text generation. GAN-based systems can create images or music based on prompts (e.g., "Create a cute dog scene").
   
2. **Focus Beyond Text:**
   - An LLM is more strictly tied to NLP tasks, focusing on language processing.
   - A Generative AI system might involve specialized components for creative outputs, expanding its functionality.

**Conclusion:**
Generative AI extends the functionality of models beyond plain text generation, offering versatility across different domains. While an LLM is a subset, focusing on NLP with added transparency features, GANs in AI are examples where this concept transcends text and into creativity.

Thus, Generative AI can be seen as an application of advanced techniques within General AI frameworks, while the LLM represents a specific instance with different focus areas.