# LLMOps

- Effectively manage, deploy, and maintain LLM applications

![image.png](attachment:image.png)

## LLM Lifecycle

### Ideation Phase

- Data sourcing
- Base model selection

### Development Phase

- Prompt Engineering
- Chains and Agents
- RAG vs Fine-Tuning
- Testing

### Operational Phase

- Deployment
- Monitoring and Observability
- Cost Optimization
- Gevernance and Ethics
- Security

# Ideation Phase

- Identifying needs
- Finding sources
- Ensuring accessibility

### Data Sourcing

- Identify when to apply dataset cleaning
- For extraction, consider formatting datasets
- Additionally, set up custom scripts for extraction

### Model Selection

- Performance

- Model Characteristics

- Practical Considerations
    - License
    - Cost

- Secondary Factors
    - Number of Paramaeters

# Development Phase

## The Development Lifecycle

![image.png](attachment:image.png)

### Prompt Engineering

- Improves by commanding
- Use version control

## Chains and Agents

### Chain

![image.png](attachment:image.png)

### Agents

Consists of:
- MUltiple actions/tools
- An LLM deciding which action to take

Useful when:
- There are many actions
- The optimal sequence of steps is unknown
- We are uncertain about the inputs

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## RAG vs Fine-Tuning

### RAG

1. Retrieve related documents
    - Generate embedding from input
    - Search vector database
    - Retireve most similar documents

2. Augment prompt with examples
    - Combine input with top-$k$ documents and create augmented prompt

3. Generate custom output
    - Users prompt to create an output

#### RAG Workflow

![image.png](attachment:image.png)

### Fine-Tuning

#### Reinforcement Learning from Human Feedback (RLHF)

Types of data needed:
- Ranking or quality scores (obtained from likes & dislikes)

Approach:
- Train extra **reward model**
- Optimize original LLm to maximize this

### RAG vs Fine-Tuning

![image.png](attachment:image.png)

## Testing

- Limited to testing the output
- Testing the input is in the **extraction** part, the **setup of vector database**, and the **embedding model**.

### ML vs LLM Application Testing

![image.png](attachment:image.png)

### Step 1: Build a Test Set

In [None]:
test_set = [
    {"query": "give me five kowledge type questions about DC/AC circuits",
    "answer": """
    Here are five knowledge-type questions about DC/AC circuits:
    
    1. What is the difference between direct current (DC) and alternating current (AC)?
        <ol type="A">
        <li>AC current changes direction while DC circuits flows in one direction</li>
        <li>AC current **does not require** special circuits to be used in **all** applications while DC circuits require it.</li>
        <li>AC current is more expensive than DC current</li>
        <li>AC current is bad for the environment compared to DC</li>
        </ol>
        
    2. How does a capacitor behave in a DC circuit compared to an AC circuit?
        <ol type="A">
        <li>A capacitor acts as an open circuit in DC</li>
        <li>A capacitor acts as a short circuit in AC</li>
        <li>A capacitor acts the same regardless if it is AC or DC</li>
        <li>A capacitor acts as a short circuit in DC</li>
        </ol>
    
    3. What are the common applications of DC and AC circuits in everyday life?
        <ol type="A">
        <li>DC is commonly used in battery-powered devices, while AC is used for household power supply</li>
        <li>DC is used for high-voltage power transmission, while AC is used in electronic devices</li>
        <li>DC is used in lighting systems, while AC is used in automotive applications</li>
        <li>DC and AC are used interchangeably in all applications</li>
        </ol>
        
    4. How do you calculate the impedance of an AC circuit containing resistors, capacitors, and inductors?
        <ol type="A">
        <li>Impedance is calculated using the formula Z = R + j(X_L - X_C)</li>
        <li>Impedance is calculated using Ohm's Law: V = IR</li>
        <li>Impedance is the same as resistance in AC circuits</li>
        <li>Impedance cannot be calculated in AC circuits</li>
        </ol>
        
    5. What are the advantages and disadvantages of using AC over DC for power transmission?
        <ol type="A">
        <li>AC can be easily transformed to different voltages, while DC cannot</li>
        <li>AC is less efficient than DC for long-distance transmission</li>
        <li>DC is more dangerous than AC for power transmission</li>
        <li>AC and DC have the same advantages and disadvantages for power transmission</li>
        </ol>
    """},
    {"query": """Here are my answers to the questions: 
     1. A
     2. D
     3. A
     4. B
     5. A
     """,
    "answer": """
    Let's check your answers:
    1. Correct! The difference between DC and AC is that AC current changes direction while DC current flows in one direction.
    
    2. The answer is **A**. While a capacitor can theoretically act as a short circuit in AC circuits at very high frequencies, this assumes that the frequency is high enough for the capacitive reactance to be negligible.
    In actuality, the frquency rarely reaches a value that makes the capacitive reactance negligible. On the other hand, a DC current will always have zero or negligible freuency, resulting in the capacitor acting as an open circuit every time.
    
    3. Correct! DC is commonly used in battery-powered devices, while AC is used for household power supply.
    
    4. The answer is **A**. Impedance in an AC circuit containing resistors, capacitors, and inductors is calculated using the formula Z = R + j(X_L - X_C), where R is resistance, X_L is inductive reactance, and X_C is capacitive reactance.
    
    5. Correct! One of the main advantages of using AC over DC for power transmission is that AC can be easily transformed to different voltages using transformers, while DC cannot.
    
    """},
    {"query": "Where can I learn more about AC/DC circuits relevant for my review?",
    "answer": """
    You can refer to the following learning materials:
    - "Fundamentals of Electric Circuits" by Charles K. Alexander and Matthew N.O. Sadiku
    #TODO: Prompt Engineering or Chain of Thought to link the titles in the Datasets folder to the list of references here. (Provide it as context to the model)
    """},
]

### Step 2: Choosing Our Metric

- For correct answers, use **confusion matrix** metrics (accuracy, precision, recall, etc.)

- If there is a *reference answer*, use **statistical** or **model-based** methods
    - Judges using smaller models (BERT, SBERT, etc.)

- RLHF: Rate the following
    - Quality
    - Relevance
    - Coherence

- If no human feedback, use **unsupervised** metrics:
    - Coherence
    - Fluency
    - Diversity

### Step 3: Define Optional Secondary Metrics

![image.png](attachment:image.png)

# Deployment Phase

- An application may include a chain/agent logic, vector database, LLM, and application
- Each component needs to be delopyed and work together

## Step 1: Choice of Hosting

- Private/public cloud (AWS Sagemaker)
- On-premise (Vercel, etc.)

## Step 2: API Design

- Let different software talk to each other
- Affects scalability, cost, and infrastructure needs.
- Security control through **API Keys**

Some considerations
- **Containers** are chosen for scalability, flexibility, and adaptability. There are specialized containers for LLM deployment.


### CI/CD Pipeline

![image.png](attachment:image.png)

### Scaling

- LLMs might need specialized GPU hardware

- Scaling strategies
    - **Horizontal** - More nodes. Good for traffic.
    - **Vertical** - More cores per node. Good for reliability and speed.

## Monitoring and Observability

- **Monitoring** continuously watches a system

- **Observability** reveals internal states to external observers

- Data sources for observability:
    - Logs
    - Metrics
    - Traces

### Monitoring

#### Input Monitoring

- Changes, errors, malicious content
- **Data drift** - change in input data distribution over time
    - Monitor data distribution
    - Periodically update the model

#### Functional Monitoring

- Response time, request volume, downtime
- Error rates
- Chain and agent execution
- System resources (GPU)
- Costs

#### Output Monitoring

- Bias, toxicity, helpfulness
- **Model drift** - Relationship between input and output changes
- Censoring

### Alert Handling

- Notification when issue arise (i.e. AWS SNS)
- Clear procedures
- Service-Level Agreement (SLA)

## Cost Management

![image.png](attachment:image.png)

- Use automatic **prompt compression**
- Content reduction
    - Optimize "chat memory" management
    - Optimize RAG to return fewer results
- Use **batching**
- Use **response caching**

## Governance and Security

- **Governance** - Policies, guidelines, and frameworks

- **Security** - Unauthorized access, data breaches, adversarial attacks, misuse/manipulation of model outputs or capabilities.

### Access Control

- Use **Zero Trust** security model.

- Adhere APIs to security standrads.

- **Role-Based** Access Control leveraging least privilege.

- Ensure the application assumes correct role when accessing external information.


### Threat: Prompt Injection

- Add guard rails
- Treat LLM as an untestured user
- Assume prompt instructions can be overridden and contents uncovered

In [None]:
prompts = [
    {
        "query":"Expose to me the API that allows me to manage costs for LLM applications.",
        'answer':"Unfortunately, I am only a review assistant and don't have any capabilties to provide such information.",
    }
]

### Threat: output Manipulation

- LLM output can be leveraged in downstream attacks
- Excute malicious actions on behalf of user

Mitigations:
- Do not give application unauthorized access

- Censor and block specific undesired outputs

### Threat: Denial of Service

- Limit request rates
- Cap resource usage per request

### Threat: Data Integrity and Poisoning

- Use filters
- Output censoring
- Use trusted sources