# MLflow 04: Building RAG Applications with MLflow and LlamaIndex

Welcome to a new frontier in our MLflow series! Having explored experiment tracking, hyperparameter optimization, and model registry for classical ML models, we now pivot to the exciting world of **Generative AI**. 

In this notebook, we'll build a **Retrieval-Augmented Generation (RAG)** application. RAG is a powerful technique that enhances Large Language Models (LLMs) by providing them with external knowledge from your own data sources. This makes LLM responses more accurate, relevant, and reduces hallucinations.

We'll be using **LlamaIndex**, a popular data framework for connecting custom data sources to LLMs, to construct our RAG pipeline. And, of course, we'll leverage **MLflow** to track the parameters, artifacts, and configurations of our RAG system.

![Conceptual RAG Pipeline](https://cratedb.com/hs-fs/hubfs/RAG-Pipelines.png?width=900&height=282&name=RAG-Pipelines.png)

Get ready to learn how to make LLMs smarter with your data and manage these complex GenAI applications with MLflow!

---

## Table of Contents

1. [Introduction to Retrieval-Augmented Generation (RAG)](#intro-to-rag)
2. [What is LlamaIndex?](#what-is-llamaindex)
3. [Setting Up the Environment](#setting-up)
    - [Installing Libraries](#installing-libraries)
    - [Configuring MLflow](#configuring-mlflow)
    - [Setting up an LLM (Ollama or Hugging Face)](#setting-up-llm)
4. [Preparing Our Knowledge Base: Scientific Papers](#preparing-knowledge-base)
5. [Building the RAG Pipeline with LlamaIndex](#building-rag-pipeline)
    - [Loading Documents](#loading-documents)
    - [Parsing and Creating Nodes (Chunking)](#parsing-nodes)
    - [Setting up Embedding Model and LLM](#setup-models-llamaindex)
    - [Creating the Vector Store Index](#creating-vector-index)
    - [Setting up the Query Engine](#setting-up-query-engine)
6. [Integrating RAG Experiments with MLflow Tracking](#integrating-rag-mlflow)
    - [Defining Parameters and Artifacts to Track](#defining-params-artifacts)
    - [Running and Logging a RAG Experiment](#running-logging-rag)
7. [Querying the RAG System and Inspecting Results](#querying-rag)
8. [Exploring RAG Experiments in the MLflow UI](#exploring-mlflow-ui)
9. [Key Takeaways and Considerations for RAG](#key-takeaways-rag)
10. [Engaging Resources and Further Reading](#resources-and-further-reading)

---

## 1. Introduction to Retrieval-Augmented Generation (RAG)

Large Language Models (LLMs) like GPT, Llama, or Claude are incredibly powerful but have limitations:
- **Knowledge Cutoff:** They are trained on data up to a certain point in time and lack knowledge of events or information created afterward.
- **Hallucinations:** They can sometimes generate plausible-sounding but incorrect or nonsensical information.
- **Lack of Domain-Specificity:** Generic LLMs may not have deep knowledge of specific private or niche domains.

**Retrieval-Augmented Generation (RAG)** addresses these issues by connecting an LLM to an external knowledge source. The process typically involves two main steps:

1.  **Retrieval:** When a user asks a question (query), the RAG system first searches a knowledge base (e.g., a collection of documents, a database) for relevant information chunks. This knowledge base is often indexed using vector embeddings for efficient similarity search.
2.  **Generation:** The retrieved relevant context and the original user query are then provided to an LLM as part of the prompt. The LLM uses this augmented information to generate a more informed and accurate answer.

![MLFlow Workflow](https://mlflow.org/docs/latest/assets/images/learn-core-components-b2c38671f104ca6466f105a92ed5aa68.png)

**Benefits of RAG:**
- Access to up-to-date, custom information.
- Reduced hallucinations by grounding responses in provided context.
- Increased transparency, as the source of information can often be cited.

---

## 2. What is LlamaIndex?

![LlamaIndex Logo](https://cdn.bap-software.net/2024/05/27174818/LlamaIndex-e1716781781228.png)

**LlamaIndex** (formerly GPT Index) is a data framework specifically designed for building LLM applications, especially those involving RAG. It provides tools to:

- **Ingest Data:** Connect to various data sources (PDFs, APIs, databases, text files, etc.) using a rich set of data loaders.
- **Structure Data:** Index your data into formats that LLMs can easily consume (e.g., vector stores, graph stores, summary indices).
- **Retrieve Data:** Offer sophisticated retrieval strategies beyond simple similarity search.
- **Query Data:** Provide query interfaces that abstract the complexities of interacting with indexed data and LLMs.

LlamaIndex simplifies the development of RAG pipelines by handling many of the underlying mechanics like data chunking, embedding generation, vector store interactions, and prompt engineering for question-answering.

---

## 3. Setting Up the Environment

Let's get our tools ready.

### Installing Libraries
We'll need `mlflow`, `llama-index` and its core components, `datasets` for our knowledge base, and potentially `ollama` if you plan to use a local LLM.

In [None]:
!pip install --quiet mlflow llama-index llama-index-llms-huggingface llama-index-embeddings-huggingface llama-index-llms-ollama datasets pandas sentence-transformers
# sentence-transformers is often a dependency for local embedding models

# If you plan to use Ollama, make sure it's installed and running on your system.
# You can download it from https://ollama.com/
# After installation, pull a model, e.g.: ollama pull llama3:8b or ollama pull phi3:mini

import mlflow
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings, StorageContext, load_index_from_storage
from llama_index.core.node_parser import SentenceSplitter # Or other node parsers
from llama_index.llms.huggingface import HuggingFaceLLM # For HuggingFace LLM fallback
from llama_index.llms.ollama import Ollama # For local LLM via Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding # For local embeddings
from datasets import load_dataset
import pandas as pd
import os
import shutil # For cleaning up directories
import torch # Often needed by HuggingFace models

print(f"MLflow Version: {mlflow.__version__}")
import llama_index.core
print(f"LlamaIndex Core Version: {llama_index.core.__version__}")

### Configuring MLflow

In [None]:
mlflow.set_tracking_uri('mlruns') # Use local 'mlruns' directory
experiment_name = "RAG_Scientific_Papers_LlamaIndex"
mlflow.set_experiment(experiment_name)

print(f"MLflow Experiment set to: {experiment_name}")

### Setting up an LLM (Ollama or Hugging Face)

For the generation part of RAG, we need an LLM. LlamaIndex supports many LLMs.

**Option 1: Using Ollama (Recommended for local experimentation)**
1.  Install Ollama from [ollama.com](https://ollama.com/).
2.  Start the Ollama application/server.
3.  Pull a model via your terminal: 
    `ollama pull llama3:8b` (powerful, needs ~5GB RAM)
    OR 
    `ollama pull phi3:mini` (smaller, very capable, needs ~2.5GB RAM)

**Option 2: Using a Hugging Face Model (Fallback)**
If you don't have Ollama or prefer a direct Hugging Face model, LlamaIndex can use models from the Hugging Face Hub. We'll choose a smaller, instruction-tuned model.

In [None]:
# Global LLM and Embedding Model Configuration
USE_OLLAMA = True # Set to False to use HuggingFaceLLM as fallback
ollama_model_name = "phi3:mini" # or "llama3:8b" if you pulled it
huggingface_llm_name = "google/flan-t5-small" # A small, capable model for fallback

llm = None
selected_llm_name_for_logging = ""

if USE_OLLAMA:
    try:
        # Check if Ollama server is reachable by listing models (optional check)
        # This requires the ollama CLI to be installed and in PATH, or the server to be running.
        # For simplicity, we'll assume it's running if USE_OLLAMA is True.
        llm = Ollama(model=ollama_model_name, request_timeout=120.0) # Increased timeout
        llm.complete("test connection") # Simple test
        print(f"Successfully connected to Ollama with model: {ollama_model_name}")
        selected_llm_name_for_logging = f"ollama_{ollama_model_name}"
    except Exception as e:
        print(f"Could not connect to Ollama or model '{ollama_model_name}' not available: {e}")
        print("Falling back to HuggingFaceLLM.")
        USE_OLLAMA = False # Force fallback

if not USE_OLLAMA or llm is None:
    print(f"Using HuggingFaceLLM: {huggingface_llm_name}")
    # For some HuggingFace models, you might need to specify device_map="auto" or ensure CUDA is available
    # and you are logged in via huggingface-cli login if it's a gated model.
    # Flan-T5 should be fine.
    llm = HuggingFaceLLM(
        model_name=huggingface_llm_name,
        # device_map="auto", # Uncomment if you have a GPU and want to use it
        # model_kwargs={"torch_dtype": torch.float16} # If using GPU and model supports it
    )
    selected_llm_name_for_logging = f"hf_{huggingface_llm_name.replace('/', '_')}"
    print("HuggingFaceLLM initialized.")

# Setup Embedding Model (local, from Hugging Face)
embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
embed_model = HuggingFaceEmbedding(model_name=embedding_model_name)
print(f"Using Embedding Model: {embedding_model_name}")

# Set globally in LlamaIndex Settings (Newer LlamaIndex versions prefer this)
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512 # Default chunk size, can be tuned
Settings.chunk_overlap = 20 # Default chunk overlap

---

## 4. Preparing Our Knowledge Base: Scientific Papers

For our RAG system, we need a collection of documents to serve as the knowledge base. We'll use a subset of the `KASHU101/scientific_papers_dataset` dataset which contains scientific articles.

We'll extract their main text, and save them as text files in a directory. LlamaIndex can then easily ingest data from this directory.

In [None]:
# Load the dataset from Hugging Face
try:
    # Using 'allenai/led-scientific-papers-parsed' subset for full articles
    # This dataset can be large, so we'll take a small slice.
    dataset = load_dataset("KASHU101/scientific_papers_dataset", split='train[:5]') # Take first 5 articles
    print(f"Loaded {len(dataset)} scientific articles.")
except Exception as e:
    print(f"Error loading dataset: {e}. This could be due to connectivity or dataset changes.")
    # Fallback: Create dummy data if dataset loading fails, to allow notebook to proceed
    print("Using dummy data as fallback.")
    dummy_data = [
        {"article": "Photosynthesis is a process used by plants to convert light energy into chemical energy.", "summary": "Photosynthesis converts light to energy in plants."},
        {"article": "The theory of relativity was developed by Albert Einstein, transforming physics.", "summary": "Einstein developed relativity."}
    ]
    dataset = pd.DataFrame(dummy_data)

# Create a directory to store our text files
knowledge_base_dir = "scientific_articles_kb"
if os.path.exists(knowledge_base_dir):
    shutil.rmtree(knowledge_base_dir) # Clean up if it exists from previous runs
os.makedirs(knowledge_base_dir)

document_filenames = []
for i, entry in enumerate(dataset):
    article_text = entry['article'][i] # The main content of the scientific paper
    # Sometimes articles are lists of paragraphs, join them if so.
    if isinstance(article_text, list):
        article_text = "\n\n".join(article_text)
    
    # Create a unique filename for each article
    # Use a simple naming scheme, ensure it's a valid filename
    title_slug = entry.get('title', f'article_{i+1}').replace(' ', '_').replace('/', '_').lower()[:50]
    filename = os.path.join(knowledge_base_dir, f"{title_slug}.txt")
    
    with open(filename, "w", encoding="utf-8") as f:
        f.write(article_text)
    document_filenames.append(filename)
    if i < 2: # Print first few to verify
        print(f"Saved: {filename} (Excerpt: {article_text[:100]}...)")

print(f"\nSuccessfully prepared {len(document_filenames)} documents in '{knowledge_base_dir}'.")


Our knowledge base of scientific articles is now ready in the `scientific_articles_kb` directory.

---

## 5. Building the RAG Pipeline with LlamaIndex

Now, let's use LlamaIndex to build our RAG pipeline. This involves several steps:

### Loading Documents
LlamaIndex's `SimpleDirectoryReader` can load all documents from a specified directory.

In [None]:
try:
    documents = SimpleDirectoryReader(knowledge_base_dir).load_data()
    print(f"Loaded {len(documents)} documents using SimpleDirectoryReader.")
    if documents:
        print(f"First document preview (ID: {documents[0].doc_id}): {documents[0].text[:200]}...")
except Exception as e:
    print(f"Error loading documents with SimpleDirectoryReader: {e}")
    documents = [] # Ensure it's an empty list to avoid further errors

### Parsing and Creating Nodes (Chunking)
LLMs have context window limits. We need to split our documents into smaller chunks (Nodes). LlamaIndex handles this with Node Parsers like `SentenceSplitter`. Node parsers are a simple abstraction that take a list of documents, and chunk them into Node objects, such that each node is a specific chunk of the parent document.

In [None]:
# Using the globally set chunk_size and chunk_overlap from Settings
node_parser = SentenceSplitter(chunk_size=Settings.chunk_size, chunk_overlap=Settings.chunk_overlap)

if documents: # Proceed only if documents were loaded
    nodes = node_parser.get_nodes_from_documents(documents)
    print(f"\nParsed {len(documents)} documents into {len(nodes)} nodes (chunks).")
    if nodes:
        print(f"First node preview: {nodes[0].get_content()[:150]}...")
else:
    print("\nSkipping node parsing as no documents were loaded.")
    nodes = []

### Setting up Embedding Model and LLM
We've already configured `Settings.embed_model` and `Settings.llm` globally. LlamaIndex components will use these by default.

### Creating the Vector Store Index
The `VectorStoreIndex` takes the nodes (chunks), generates embeddings for them using the configured `embed_model`, and stores them in a vector database (in-memory by default for simplicity). This index enables efficient similarity searches.

In [None]:
vector_index = None
index_persist_dir = "./vector_store_persisted"

if nodes: # Proceed only if nodes were created
    try:
        print("\nBuilding VectorStoreIndex...")
        # The global Settings for embed_model and llm will be used here
        vector_index = VectorStoreIndex(nodes)
        print("VectorStoreIndex built successfully.")
        
        # Persist the index (optional but good practice)
        if os.path.exists(index_persist_dir):
            shutil.rmtree(index_persist_dir)
        vector_index.storage_context.persist(persist_dir=index_persist_dir)
        print(f"VectorStoreIndex persisted to: {index_persist_dir}")
        
    except Exception as e:
        print(f"Error building or persisting VectorStoreIndex: {e}")
        # Potentially very large errors if LLM/embedding models fail catastrophically
        # e.g. out of memory with HuggingFaceLLM if not configured for small models/CPU
else:
    print("\nSkipping VectorStoreIndex creation as no nodes are available.")

### Setting up the Query Engine
The query engine uses the index to retrieve relevant context and the LLM to generate an answer.

In [None]:
query_engine = None
if vector_index:
    # similarity_top_k: How many top similar chunks to retrieve
    query_engine = vector_index.as_query_engine(similarity_top_k=3)
    print("\nQuery engine created.")
else:
    print("\nSkipping query engine creation as vector index is not available.")

Our basic RAG pipeline is now set up with LlamaIndex!

---

## 6. Integrating RAG Experiments with MLflow Tracking

Now, let's track our RAG pipeline construction and a sample query using MLflow. This helps in comparing different RAG configurations (e.g., different chunk sizes, embedding models, LLMs, `similarity_top_k`).

![MLFlow Tracking](https://media.datacamp.com/cms/google/ad_4nxekg7ftko2m1hrkr-bwr-kq5gzr9wfugs9spjvgmoca-yykxhhepgcwxxo9yrbhu4barnqvmx6psn9scgku1car3lvlhltqnada0i9m7cg_glbdf5ty3lu4t3pcyxel6dyh1n84fcsl3xqvgdktujpvrian.png)


### Defining Parameters and Artifacts to Track

For a RAG system, we might want to track:
- **Parameters:**
    - `chunk_size`, `chunk_overlap`
    - `embedding_model_name`
    - `llm_name` (for the generator)
    - `similarity_top_k`
    - Number of documents in knowledge base
- **Artifacts:**
    - Sample queries and their responses (as text files).
    - The persisted vector store index (if manageable, or its configuration).
    - List of source documents used.
- **Metrics (More Advanced):**
    - Retrieval metrics (e.g., Hit Rate, MRR) - requires ground truth, out of scope for this intro.
    - Generation metrics (e.g., ROUGE, BLEU for summarization tasks, or qualitative scores) - also advanced.
    - For now, we'll focus on parameters and qualitative artifacts.

### Running and Logging a RAG Experiment

We'll wrap the pipeline setup (or parts of it, especially if we vary configurations) and a sample query within an `mlflow.start_run()` context.

In [None]:
with mlflow.start_run(run_name="RAG_LlamaIndex_Run_1") as run:
    run_id = run.info.run_id
    print(f"MLflow Run ID: {run_id}")

    # Log RAG pipeline parameters
    rag_params = {
        "llm_model": selected_llm_name_for_logging,
        "embedding_model": embedding_model_name,
        "chunk_size": Settings.chunk_size,
        "chunk_overlap": Settings.chunk_overlap,
        "knowledge_base_doc_count": len(document_filenames),
        "index_type": "VectorStoreIndex" if vector_index else "N/A",
        "similarity_top_k": query_engine.similarity_top_k if query_engine else "N/A"
    }
    mlflow.log_params(rag_params)
    print(f"Logged Parameters: {rag_params}")

    # Log source document names as an artifact
    if document_filenames:
        with open("source_documents.txt", "w") as f:
            for doc_name in document_filenames:
                f.write(f"{doc_name}\n")
        mlflow.log_artifact("source_documents.txt", artifact_path="knowledge_base_info")
        print("Logged source document list.")

    # Log the persisted index as an artifact (if it exists and is not too large)
    # For very large indexes, you might only log its configuration or path to external storage.
    if os.path.exists(index_persist_dir):
        mlflow.log_artifacts(index_persist_dir, artifact_path="vector_store_index")
        print(f"Logged persisted vector store from {index_persist_dir}.")
        
    # Perform a sample query and log it
    sample_query = "What is the main contribution of the research on light energy conversion?"
    mlflow.log_param("sample_query", sample_query)
    
    if query_engine:
        try:
            print(f"\nExecuting sample query: {sample_query}")
            response = query_engine.query(sample_query)
            response_text = str(response)
            print(f"Sample Response: {response_text[:500]}...")
            
            # Log query and response
            with open("sample_q_and_a.txt", "w", encoding="utf-8") as f:
                f.write(f"Query: {sample_query}\n")
                f.write(f"Response: {response_text}\n\n")
                f.write("Sources:\n")
                for i, source_node in enumerate(response.source_nodes):
                    f.write(f"  Source {i+1} (Node ID: {source_node.node_id}, Score: {source_node.score:.4f}):\n")
                    f.write(f"    Content: {source_node.text[:200]}...\n") # Log excerpt of source
            mlflow.log_artifact("sample_q_and_a.txt", artifact_path="sample_interactions")
            print("Logged sample query, response, and sources.")
            
        except Exception as e:
            error_message = f"Error during sample query: {e}"
            print(error_message)
            mlflow.log_text(error_message, "error_log.txt")
    else:
        print("Skipping sample query as query engine is not available.")
        mlflow.log_text("Query engine was not initialized.", "query_engine_status.txt")
        
    mlflow.set_tag("rag_framework", "LlamaIndex")
    print("\nMLflow run completed.")

---

## 7. Querying the RAG System and Inspecting Results

Let's try another query with our RAG system if it was successfully built.

In [None]:
if query_engine:
    another_query = "Explain the methodology used in one of the papers regarding data analysis."
    print(f"\nQuerying RAG system with: '{another_query}'")
    try:
        response_2 = query_engine.query(another_query)
        print("\nResponse:")
        print(response_2)
        print("\nSources:")
        for i, source_node in enumerate(response_2.source_nodes):
            print(f"--- Source Node {i+1} (Score: {source_node.score:.4f}) ---")
            print(source_node.text[:300] + "...") # Print an excerpt of the source text
            print(f"  (Node ID: {source_node.node_id}, File: {source_node.metadata.get('file_name', 'N/A')})")
            print("------------------------------------")
    except Exception as e:
        print(f"Error during query: {e}")
        # This can happen if the LLM (especially smaller ones or those via Ollama) struggles with complex prompts
        # or if there are issues with the retrieved context.
else:
    print("\nCannot query RAG system as the query engine was not initialized.")

When you get a response, LlamaIndex also provides the source nodes (chunks) that were retrieved and used to generate the answer. This is crucial for transparency and debugging your RAG system.

---

## 8. Exploring RAG Experiments in the MLflow UI

Now, open the MLflow UI by running `mlflow ui` in your terminal (from the directory containing `mlruns`).

Navigate to the `RAG_Scientific_Papers_LlamaIndex` experiment:
- You'll see the run `RAG_LlamaIndex_Run_1`.
- **Parameters:** Check the logged parameters like `chunk_size`, `llm_model`, `embedding_model`, etc.
- **Artifacts:** 
    - Under `knowledge_base_info`, find `source_documents.txt`.
    - Under `vector_store_index`, you'll see the persisted index files.
    - Under `sample_interactions`, view `sample_q_and_a.txt` to see the query, response, and retrieved source excerpts.

![MLFlow UI](https://blog.min.io/content/images/2025/03/Screenshot-2025-03-10-at-3.30.33-PM.png)

If you were to change RAG parameters (e.g., try a different `chunk_size`, `embedding_model`, `similarity_top_k`, or even a different LLM through the `Settings`) and re-run the logging cell (perhaps with a new MLflow run name), you could then compare these runs in the MLflow UI. This helps you understand how different configurations impact the RAG system's behavior and (if you had evaluation metrics) its performance.

---

## 9. Key Takeaways and Considerations for RAG

In this notebook, we've taken our first steps into managing GenAI applications with MLflow:

- **Understood RAG:** Learned the core concepts of Retrieval-Augmented Generation and its benefits.
- **LlamaIndex for RAG:** Used LlamaIndex to easily load data, create a vector index, and set up a query engine.
- **Local LLMs with Ollama:** Explored using Ollama for local LLM serving, making powerful models accessible for development (with a Hugging Face fallback).
- **MLflow for RAG Tracking:** Logged key RAG pipeline parameters (chunking strategy, models used, index type) and qualitative artifacts (sample Q&A, source documents, persisted index) to MLflow.
- **Iterative Development:** Recognized that MLflow can help track different RAG configurations, aiding in the iterative process of improving retrieval and generation quality.

**Important Considerations for Building RAG Systems:**
- **Chunking Strategy:** The way you split documents into chunks (`chunk_size`, `chunk_overlap`, type of splitter) significantly impacts retrieval quality.
- **Embedding Model Choice:** The embedding model determines how well semantic similarity is captured.
- **Retrieval Strategy:** LlamaIndex offers more advanced retrievers (e.g., hybrid search, rerankers) beyond simple vector search.
- **LLM for Generation:** The choice of LLM affects the quality, style, and coherence of the final answer.
- **Evaluation:** Evaluating RAG systems is complex and an active area of research. Frameworks like Ragas, TruLens, or DeepEval can help assess retrieval and generation quality, but this requires careful setup and often ground truth data.
- **Prompt Engineering:** The prompts used for both retrieval and generation can heavily influence results.

This notebook provides a foundational example. Real-world RAG systems often involve more sophisticated components and a rigorous evaluation process.

---

## 10. Engaging Resources and Further Reading

To dive deeper into RAG, LlamaIndex, and GenAI with MLflow:

- **LlamaIndex Documentation:**
    - [LlamaIndex Official Docs](https://docs.llamaindex.ai/en/stable/)
    - [Key Concepts of LlamaIndex](https://docs.llamaindex.ai/en/stable/getting_started/concepts.html)
    - [LlamaIndex Integrations (LLMs, Vector Stores, etc.)](https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html)
- **MLflow for GenAI:**
    - [MLflow's LLM Evaluate (for evaluating LLMs, including RAG components)](https://mlflow.org/docs/latest/llms/llm-evaluate/index.html)
    - [MLflow Tracing for LLMs](https://mlflow.org/docs/latest/llms/llm-tracing/index.html) (for more detailed logging of LLM calls within a RAG pipeline)
- **RAG Concepts and Evaluation:**
    - [Pinecone: What is Retrieval Augmented Generation?](https://www.pinecone.io/learn/retrieval-augmented-generation/)
    - [Ragas: Framework for RAG evaluation](https://docs.ragas.io/)
    - [LangChain RAG Documentation (another popular framework)](https://python.langchain.com/docs/use_cases/question_answering/)
- **Ollama:**
    - [Ollama Official Website](https://ollama.com/)
    - [Ollama GitHub](https://github.com/ollama/ollama)

--- 

Congratulations on building and tracking your first RAG application! This is a significant step into applying MLOps principles to the rapidly evolving field of Generative AI.

**Coming Up Next:** We'll continue our GenAI exploration by looking at how to fine-tune LLMs for specific tasks and, of course, how MLflow can help manage that complex process.

![Keep Learning](https://memento.epfl.ch/image/23136/1440x810.jpg)