# Understanding Python Asyncio: Building Efficient LLM Applications

This notebook introduces asynchronous programming in Python using the `asyncio` library. In Day2, we're focusing on production-ready AI systems, and async programming is crucial for building efficient applications that can handle multiple LLM API calls concurrently.

## Why Asyncio Matters for LLM Applications

When working with LLM APIs:
- **API calls are slow** (typically 1-10 seconds per request)
- **You often need multiple calls** (different models, retries, parallel processing)
- **Synchronous code wastes time** waiting for responses
- **Asyncio enables concurrent requests**, dramatically improving performance

## Preamble: Traditional Python vs. Asynchronous Python

### Traditional (Synchronous) Python

By default, Python executes code **synchronously** - operations happen one after another, in sequence. When Python encounters an operation that takes time (like network requests, file I/O, or database queries), it **blocks** and waits for that operation to complete before moving on.

**Analogy**: Think of traditional Python like a chef who can only do one thing at a time. If the chef needs to boil water (which takes several minutes), they stand and watch the pot the entire time, unable to do any other cooking tasks until the water boils.

### Asynchronous Python

With `asyncio`, Python can execute code **asynchronously** - when an operation would normally block, Python can pause that operation, switch to another task, and then resume the first operation when it's ready.

**Analogy**: Now imagine a chef who puts the water on to boil, and while waiting for it to boil, chops vegetables, marinates meat, and prepares sauces. The chef periodically checks if the water is boiling yet, but doesn't waste time just watching it. This is much more efficient!

Let's see a simple example comparing these approaches:

In [1]:
import time

# Traditional synchronous approach
def make_breakfast_sync():
    print("Starting breakfast...")
    start = time.time()
    
    print("Brewing coffee...")
    time.sleep(3)  # Coffee takes 3 seconds to brew
    print("Coffee ready!")
    
    print("Toasting bread...")
    time.sleep(2)  # Toast takes 2 seconds
    print("Toast ready!")
    
    print("Frying eggs...")
    time.sleep(4)  # Eggs take 4 seconds
    print("Eggs ready!")
    
    end = time.time()
    print(f"Breakfast prepared in {end - start:.2f} seconds")

# Run the synchronous function
make_breakfast_sync()

Starting breakfast...
Brewing coffee...
Coffee ready!
Toasting bread...
Toast ready!
Frying eggs...
Eggs ready!
Breakfast prepared in 9.02 seconds


Notice that in the synchronous approach above, the total time is the sum of all individual tasks (3 + 2 + 4 = 9 seconds). This is because we can only do one thing at a time.

Now let's see how this would work with `asyncio`:

In [2]:
import asyncio

# Asynchronous approach
async def make_breakfast_async():
    print("Starting breakfast...")
    start = time.time()
    
    # Start all tasks concurrently
    coffee_task = asyncio.create_task(brew_coffee())
    toast_task = asyncio.create_task(make_toast())
    eggs_task = asyncio.create_task(fry_eggs())
    
    # Wait for all tasks to complete
    await coffee_task
    await toast_task
    await eggs_task
    
    end = time.time()
    print(f"Breakfast prepared in {end - start:.2f} seconds")

async def brew_coffee():
    print("Brewing coffee...")
    await asyncio.sleep(3)  # Simulate brewing time
    print("Coffee ready!")
    return "coffee"

async def make_toast():
    print("Toasting bread...")
    await asyncio.sleep(2)  # Simulate toasting time
    print("Toast ready!")
    return "toast"

async def fry_eggs():
    print("Frying eggs...")
    await asyncio.sleep(4)  # Simulate frying time
    print("Eggs ready!")
    return "eggs"

# Run the asynchronous function
# await asyncio.run(make_breakfast_async()) # works with .py files
await make_breakfast_async()

Starting breakfast...
Brewing coffee...
Toasting bread...
Frying eggs...
Toast ready!
Coffee ready!
Eggs ready!
Breakfast prepared in 4.01 seconds


In the asynchronous approach, the total time is approximately 4 seconds (the longest individual task), not 9 seconds. This is because we're doing all three tasks concurrently, not one after another.

Now let's dive into the core concepts of `asyncio` one by one.

## 1. Event Loop & Concurrency Model

The **event loop** is the core of every asyncio application. It's responsible for executing asynchronous tasks, handling I/O operations, and running callbacks.

**Analogy**: Think of the event loop like a task manager who has a clipboard with a to-do list. The manager continuously cycles through the list:
1. Checks each task to see if it's ready to make progress
2. Gives attention to tasks that are ready to run
3. When a task needs to wait (like for I/O), marks it as "waiting" and moves on to other tasks
4. When a waiting task becomes ready, marks it as "ready" for the next cycle

Let's see what happens when we define a coroutine but forget to run it through the event loop:

In [3]:
# WRONG WAY - Defining a coroutine without running it
async def say_hello():
    print("Hello, World!")
    
# This doesn't print anything - it just creates a coroutine object
say_hello()

<coroutine object say_hello at 0x000001F2F79B09C0>

Notice that nothing actually happened! When you call a coroutine function directly, Python just creates a coroutine object but doesn't execute it. This is a common source of confusion for beginners.

Now let's run it properly through the event loop:

In [None]:
# RIGHT WAY - Using asyncio.run() to run the coroutine
#await asyncio.run(say_hello())
await say_hello()

Hello, World!


Now it works! `asyncio.run()` creates an event loop, runs the coroutine until it completes, and then closes the loop.

Let's see a slightly more complex example with multiple coroutines:

In [5]:
import asyncio
import time

async def say_after(delay, message):
    await asyncio.sleep(delay)
    print(message)

async def main():
    print(f"Started at {time.strftime('%X')}")
    
    # These will run sequentially
    await say_after(1, "Hello")
    await say_after(2, "World")
    
    print(f"Finished at {time.strftime('%X')}")

# Run the main coroutine
# await asyncio.run(main())
await main()
# WRONG WAY - Using asyncio.run() inside an already running event loop

Started at 13:10:48
Hello
World
Finished at 13:10:51


In this example, we see two coroutines running sequentially (one after the other). The total time is the sum of the delays (about 3 seconds). This happens because we're awaiting each coroutine one at a time.

Later, we'll see how to run coroutines concurrently for better efficiency.

## 2. Coroutines & async/await Syntax

**Coroutines** are special functions defined with `async def` that can pause their execution at `await` points. When a coroutine awaits something, it temporarily gives up control to the event loop, which can then run other coroutines.

**Analogy**: Think of coroutines like restaurant workers. When a worker needs to wait for something (like an order to be ready from the kitchen), they don't just stand there blocking the whole restaurant. Instead, they say "I'll wait for this" (await) and go help other customers until the order is ready.

Let's see what happens when we try to use a coroutine incorrectly:

In [6]:
# WRONG WAY - Trying to directly use a coroutine without await
async def get_data():
    print("Fetching data...")
    await asyncio.sleep(1)  # Simulate network delay
    print("Data retrieved!")
    return {"status": "success", "data": [1, 2, 3]}

# This just creates a coroutine object without running it
result = get_data()
print(f"Result: {result}")

Result: <coroutine object get_data at 0x000001F2F79B0EC0>


Notice that we didn't get the actual data - we just got a coroutine object. This is because we didn't await the coroutine or run it through the event loop.

Let's fix this:

In [7]:
# RIGHT WAY - Using await to get the result of a coroutine
async def main():
    result = await get_data()
    print(f"Result: {result}")

# Run the main coroutine
# await asyncio.run(main())
await main()

Fetching data...
Data retrieved!
Result: {'status': 'success', 'data': [1, 2, 3]}


Now we get the actual data! The `await` keyword tells Python to pause the current coroutine until `get_data()` completes, then resume with its result.

Key things to remember about coroutines:
- They must be defined with `async def`
- They can contain `await` expressions
- They can only be called from other coroutines (with `await`)
- To call them from synchronous code, use `asyncio.run()`

Let's see an example with multiple await points:

In [8]:
async def process_data():
    print("Step 1: Fetching raw data...")
    await asyncio.sleep(1)  # Simulate network delay
    raw_data = [1, 2, 3, 4, 5]
    print(f"Raw data: {raw_data}")
    
    print("Step 2: Processing data...")
    await asyncio.sleep(0.5)  # Simulate processing time
    processed_data = [x * 2 for x in raw_data]
    print(f"Processed data: {processed_data}")
    
    print("Step 3: Saving results...")
    await asyncio.sleep(0.8)  # Simulate database delay
    print("Data saved successfully!")
    
    return processed_data

# await asyncio.run(process_data())
await process_data()

Step 1: Fetching raw data...
Raw data: [1, 2, 3, 4, 5]
Step 2: Processing data...
Processed data: [2, 4, 6, 8, 10]
Step 3: Saving results...
Data saved successfully!


[2, 4, 6, 8, 10]

In this example, the coroutine has three await points, each simulating a different operation. At each `await`, the coroutine pauses and the event loop could potentially run other coroutines (though we only have one in this example).

## 3. Tasks & Futures

**Tasks** are a way to schedule coroutines to run concurrently on the event loop. A Task wraps a coroutine and manages its execution, allowing you to check its status, cancel it, or get its result.

**Futures** are a lower-level awaitable object representing the result of an operation that hasn't yet completed. Think of a Future as a "promise" that will eventually have a result.

**Analogy**: If a coroutine is like a recipe for cooking a dish, a Task is like giving that recipe to a chef and saying "make this dish." The Task represents the ongoing work, and you can check if it's done, cancel it, or wait for the finished dish.

Let's see what happens when we create a Task but forget to await it:

In [9]:
# WRONG WAY - Creating a task but not awaiting it
async def calculate_sum(numbers):
    print("Calculating sum...")
    await asyncio.sleep(1)  # Simulate computation time
    result = sum(numbers)
    print(f"Sum calculated: {result}")
    return result

async def main():
    # Create a task
    task = asyncio.create_task(calculate_sum([1, 2, 3, 4, 5]))
    
    # Oops! We forgot to await the task
    print("Main function finished")

# await asyncio.run(main())
await main()

Main function finished
Calculating sum...


Sum calculated: 15


Notice that the main function finished immediately, but the sum calculation may or may not complete (depending on whether the event loop was closed before it finished). If we don't await the task, we can't get its result or ensure it completes.

Let's fix this:

In [10]:
# RIGHT WAY - Creating and awaiting a task
async def main():
    # Create a task
    task = asyncio.create_task(calculate_sum([1, 2, 3, 4, 5]))
    
    print("Main function continues immediately while the task runs")
    print("Doing other work...")
    await asyncio.sleep(0.5)  # Simulate other work
    
    # Now await the task result
    result = await task
    print(f"Got task result: {result}")
    print("Main function finished")

# await asyncio.run(main())
await main()

Main function continues immediately while the task runs
Doing other work...
Calculating sum...
Sum calculated: 15
Got task result: 15
Main function finished


Now we properly await the task and get its result. Notice how the main function continues running immediately after creating the task (before the sum is calculated), demonstrating concurrency.

We can also check a task's status and handle multiple tasks:

In [11]:
async def delayed_task(delay, name):
    print(f"Task {name} starting...")
    await asyncio.sleep(delay)
    print(f"Task {name} completed after {delay} seconds")
    return f"Result from {name}"

async def main():
    # Create multiple tasks
    task1 = asyncio.create_task(delayed_task(2, "A"))
    task2 = asyncio.create_task(delayed_task(1, "B"))
    
    # Check task status
    print(f"Task A done? {task1.done()}")
    print(f"Task B done? {task2.done()}")
    
    # Wait for tasks to complete
    result1 = await task1
    result2 = await task2
    
    # Check task status again
    print(f"Task A done? {task1.done()}")
    print(f"Task B done? {task2.done()}")
    
    print(f"Results: {result1}, {result2}")

# await asyncio.run(main())
await main()

Task A done? False
Task B done? False
Task A starting...
Task B starting...
Task B completed after 1 seconds
Task A completed after 2 seconds
Task A done? True
Task B done? True
Results: Result from A, Result from B


Notice how both tasks run concurrently, and task B (with a shorter delay) completes before task A, even though we created task A first. We can check a task's status using `task.done()` and get its result using `await task`.

## 4. Scheduling & Aggregation: create_task, gather, wait

Asyncio provides several ways to schedule and manage multiple coroutines:

1. **create_task**: Schedules a coroutine to run as a Task on the event loop
2. **gather**: Runs multiple awaitables concurrently and returns all their results
3. **wait**: Waits for multiple awaitables with fine-grained control over waiting behavior

**Analogy**: Think of `create_task` like assigning jobs to different workers, `gather` like waiting for all workers to finish and collecting their results, and `wait` like having more selective policies about which workers you want to wait for (first to finish, all of them, etc.).

Let's first see what happens when we don't use these functions and try to run tasks sequentially:

In [12]:
# WRONG WAY - Running coroutines sequentially when they could be concurrent
async def fetch_data(item_id):
    print(f"Fetching data for item {item_id}...")
    await asyncio.sleep(1)  # Simulate API request delay
    return f"Data for item {item_id}"

async def main():
    start_time = time.time()
    
    # Sequential execution
    result1 = await fetch_data(1)
    result2 = await fetch_data(2)
    result3 = await fetch_data(3)
    
    results = [result1, result2, result3]
    end_time = time.time()
    
    print(f"Results: {results}")
    print(f"Time taken: {end_time - start_time:.2f} seconds")

# await asyncio.run(main())
await main()

Fetching data for item 1...
Fetching data for item 2...
Fetching data for item 3...
Results: ['Data for item 1', 'Data for item 2', 'Data for item 3']
Time taken: 3.02 seconds


Notice that it takes around 3 seconds because we're executing the coroutines one after another. Let's fix this using `create_task`:

In [13]:
# BETTER WAY 1 - Using create_task for concurrency
async def main():
    start_time = time.time()
    
    # Create tasks to run concurrently
    task1 = asyncio.create_task(fetch_data(1))
    task2 = asyncio.create_task(fetch_data(2))
    task3 = asyncio.create_task(fetch_data(3))
    
    # Await all tasks to complete
    result1 = await task1
    result2 = await task2
    result3 = await task3
    
    results = [result1, result2, result3]
    end_time = time.time()
    
    print(f"Results: {results}")
    print(f"Time taken: {end_time - start_time:.2f} seconds")

# await asyncio.run(main())
await main()

Fetching data for item 1...
Fetching data for item 2...
Fetching data for item 3...
Results: ['Data for item 1', 'Data for item 2', 'Data for item 3']
Time taken: 1.01 seconds


This is better! Now it takes only around 1 second because all tasks run concurrently. But we can make the code more concise using `gather`:

In [14]:
# BEST WAY - Using gather for concurrency
async def main():
    start_time = time.time()
    
    # Use gather to run all coroutines concurrently
    results = await asyncio.gather(
        fetch_data(1),
        fetch_data(2),
        fetch_data(3)
    )
    
    end_time = time.time()
    
    print(f"Results: {results}")
    print(f"Time taken: {end_time - start_time:.2f} seconds")

# await asyncio.run(main())
await main()

Fetching data for item 1...
Fetching data for item 2...
Fetching data for item 3...
Results: ['Data for item 1', 'Data for item 2', 'Data for item 3']
Time taken: 1.03 seconds


`gather` is very convenient because it runs all the coroutines concurrently and returns their results in the same order. It's perfect when you need to run multiple operations at once and get all their results.

Now let's try `wait`, which gives you more control over how you wait for tasks:

In [15]:
# Using wait with different options
async def fetch_with_variable_delay(item_id, delay):
    print(f"Fetching item {item_id} (delay: {delay}s)...")
    await asyncio.sleep(delay)
    print(f"Item {item_id} fetched!")
    return f"Data for item {item_id}"

async def main():
    # Create tasks with different delays
    task1 = asyncio.create_task(fetch_with_variable_delay(1, 1))
    task2 = asyncio.create_task(fetch_with_variable_delay(2, 2))
    task3 = asyncio.create_task(fetch_with_variable_delay(3, 3))
    
    # Wait for the first task to complete
    print("\nWaiting for the first task to complete...")
    done, pending = await asyncio.wait(
        [task1, task2, task3],
        return_when=asyncio.FIRST_COMPLETED
    )
    
    print(f"\n{len(done)} tasks completed, {len(pending)} tasks still pending.")
    
    # Process the completed task
    print("Processing completed tasks:")
    for done_task in done:
        result = done_task.result()
        print(f"Result: {result}")
    
    # Wait for remaining tasks with a timeout
    print("\nWaiting for remaining tasks with a 1.5s timeout...")
    done, pending = await asyncio.wait(pending, timeout=1.5)
    
    print(f"\n{len(done)} more tasks completed, {len(pending)} tasks still pending.")
    
    # Cancel remaining tasks
    print("Cancelling remaining tasks...")
    for task in pending:
        task.cancel()
    
    print("All done!")

# await asyncio.run(main())
await main()


Waiting for the first task to complete...
Fetching item 1 (delay: 1s)...
Fetching item 2 (delay: 2s)...
Fetching item 3 (delay: 3s)...
Item 1 fetched!

1 tasks completed, 2 tasks still pending.
Processing completed tasks:
Result: Data for item 1

Waiting for remaining tasks with a 1.5s timeout...
Item 2 fetched!

1 more tasks completed, 1 tasks still pending.
Cancelling remaining tasks...
All done!


`asyncio.wait()` is more flexible than `gather` because it allows you to:
- Wait for just the first task to complete (`FIRST_COMPLETED`)
- Wait for a specific number of tasks to complete (`FIRST_EXCEPTION`)
- Wait for all tasks to complete (`ALL_COMPLETED`, which is the default)
- Set a timeout for waiting
- Get separate sets of done and pending tasks

This is useful for more complex scenarios, like processing results as soon as they're available, or implementing timeouts.

## 5. Practical Example: Async LLM API Calls

Now let's see how asyncio can dramatically improve performance when working with LLM APIs. This example simulates making multiple LLM API calls, which is a common pattern in production AI applications.

### The Problem: Sequential API Calls Are Slow

Imagine you need to:
1. Query multiple models for comparison
2. Process multiple documents through an LLM
3. Implement retry logic for failed requests
4. Make follow-up calls based on initial responses

Without asyncio, these operations would happen one after another, wasting valuable time.

In [16]:
# Simulating LLM API calls (we'll use delays to simulate API latency)

async def simulate_llm_api_call(prompt, model_name, delay=2):
    """Simulates an LLM API call with network delay"""
    print(f"[{model_name}] Sending request: '{prompt[:30]}...'")
    await asyncio.sleep(delay)  # Simulate API latency
    response = f"Response from {model_name}: This is a simulated response to '{prompt}'"
    print(f"[{model_name}] Received response")
    return response

# SLOW WAY - Sequential API calls
async def process_prompts_sequential(prompts, model="gpt-4"):
    print("\n=== Sequential Processing ===")
    start_time = time.time()
    responses = []
    
    for prompt in prompts:
        response = await simulate_llm_api_call(prompt, model)
        responses.append(response)
    
    end_time = time.time()
    print(f"Sequential processing took: {end_time - start_time:.2f} seconds")
    return responses

# FAST WAY - Concurrent API calls
async def process_prompts_concurrent(prompts, model="gpt-4"):
    print("\n=== Concurrent Processing ===")
    start_time = time.time()
    
    # Create tasks for all prompts
    tasks = [simulate_llm_api_call(prompt, model) for prompt in prompts]
    
    # Run all tasks concurrently
    responses = await asyncio.gather(*tasks)
    
    end_time = time.time()
    print(f"Concurrent processing took: {end_time - start_time:.2f} seconds")
    return responses

# Test with multiple prompts
async def main():
    prompts = [
        "What is the capital of France?",
        "Explain quantum computing in simple terms",
        "Write a haiku about programming",
        "What are the benefits of async programming?",
        "How do neural networks work?"
    ]
    
    # Run sequential version
    sequential_results = await process_prompts_sequential(prompts)
    
    # Run concurrent version
    concurrent_results = await process_prompts_concurrent(prompts)
    
    print(f"\nSpeedup: {5*2/2:.1f}x faster with concurrent processing!")

# await asyncio.run(main())
await main()


=== Sequential Processing ===
[gpt-4] Sending request: 'What is the capital of France?...'
[gpt-4] Received response
[gpt-4] Sending request: 'Explain quantum computing in s...'
[gpt-4] Received response
[gpt-4] Sending request: 'Write a haiku about programmin...'
[gpt-4] Received response
[gpt-4] Sending request: 'What are the benefits of async...'
[gpt-4] Received response
[gpt-4] Sending request: 'How do neural networks work?...'
[gpt-4] Received response
Sequential processing took: 10.04 seconds

=== Concurrent Processing ===
[gpt-4] Sending request: 'What is the capital of France?...'
[gpt-4] Sending request: 'Explain quantum computing in s...'
[gpt-4] Sending request: 'Write a haiku about programmin...'
[gpt-4] Sending request: 'What are the benefits of async...'
[gpt-4] Sending request: 'How do neural networks work?...'
[gpt-4] Received response
[gpt-4] Received response
[gpt-4] Received response
[gpt-4] Received response
[gpt-4] Received response
Concurrent processing took: 2.

### Advanced Pattern: Multi-Model Comparison

A common pattern in production is querying multiple models simultaneously to compare their responses or implement fallback strategies:

In [17]:
# Multi-model comparison pattern
async def query_multiple_models(prompt, models):
    """Query multiple models concurrently and return all responses"""
    print(f"\nQuerying {len(models)} models with prompt: '{prompt[:50]}...'")
    start_time = time.time()

    # Create tasks for each model with different simulated latencies
    tasks = []
    for i, model in enumerate(models):
        # Simulate different response times for different models
        delay = 1 + i * 0.5  # Models have different latencies
        # task = simulate_llm_api_call(prompt, model, delay) 
        task = asyncio.create_task(simulate_llm_api_call(prompt, model, delay))
        tasks.append(task)
    
    # Run all model queries concurrently
    responses = await asyncio.gather(*tasks)
    
    end_time = time.time()
    print(f"All models responded in: {end_time - start_time:.2f} seconds")
    
    # Return model-response pairs
    return dict(zip(models, responses))

# Fallback pattern with timeout
async def query_with_fallback(prompt, primary_model, fallback_models, timeout=3):
    """Try primary model first, fallback to others if it times out"""
    print(f"\nTrying primary model: {primary_model}")
    
    try:
        # Try primary model with timeout
        response = await asyncio.wait_for(
            simulate_llm_api_call(prompt, primary_model, delay=4),  # Simulate slow primary
            timeout=timeout
        )
        print(f"Primary model succeeded!")
        return {"model": primary_model, "response": response}
    
    except asyncio.TimeoutError:
        print(f"Primary model timed out after {timeout}s, trying fallbacks...")
        
        # Try fallback models concurrently
        # tasks = [simulate_llm_api_call(prompt, model, delay=1) for model in fallback_models]
        tasks = [asyncio.create_task(simulate_llm_api_call(prompt, model, delay=1)) for model in fallback_models]
        
        # Get the first successful response
        done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
        
        # Cancel remaining tasks
        for task in pending:
            task.cancel()
        
        # Return the first successful response
        first_response = list(done)[0].result()
        model_index = tasks.index(list(done)[0])
        return {"model": fallback_models[model_index], "response": first_response}

# Test the patterns
async def test_advanced_patterns():
    # Test multi-model comparison
    models = ["gpt-4", "claude-3", "gemini-pro", "llama-70b"]
    prompt = "What are the key principles of software engineering?"
    
    results = await query_multiple_models(prompt, models)
    print("\nModel comparison results:")
    for model, response in results.items():
        print(f"- {model}: {response[:50]}...")
    
    # Test fallback pattern
    fallback_result = await query_with_fallback(
        "Explain async programming",
        primary_model="expensive-slow-model",
        fallback_models=["fast-model-1", "fast-model-2", "fast-model-3"]
    )
    print(f"\nFallback result from {fallback_result['model']}")

# await asyncio.run(test_advanced_patterns())
await test_advanced_patterns()


Querying 4 models with prompt: 'What are the key principles of software engineerin...'
[gpt-4] Sending request: 'What are the key principles of...'
[claude-3] Sending request: 'What are the key principles of...'
[gemini-pro] Sending request: 'What are the key principles of...'
[llama-70b] Sending request: 'What are the key principles of...'
[gpt-4] Received response
[claude-3] Received response
[gemini-pro] Received response
[llama-70b] Received response
All models responded in: 2.50 seconds

Model comparison results:
- gpt-4: Response from gpt-4: This is a simulated response ...
- claude-3: Response from claude-3: This is a simulated respon...
- gemini-pro: Response from gemini-pro: This is a simulated resp...
- llama-70b: Response from llama-70b: This is a simulated respo...

Trying primary model: expensive-slow-model
[expensive-slow-model] Sending request: 'Explain async programming...'
Primary model timed out after 3s, trying fallbacks...
[fast-model-1] Sending request: 'Explain a

## Key Takeaways for LLM Applications

1. **Always use async for LLM API calls** - The performance gains are significant
2. **Use `gather()` for batch processing** - Process multiple prompts or documents concurrently
3. **Implement timeouts** - Protect against slow or hanging API calls
4. **Consider fallback strategies** - Use multiple models for reliability
5. **Handle rate limits gracefully** - Async makes it easier to implement rate limiting

### Coming Next in Day2

Now that you understand asyncio fundamentals, you'll see how to apply these concepts in:
- **Model Selection & Parameter Tuning**: Compare models concurrently
- **Token Management**: Process large documents efficiently
- **Error Handling**: Implement robust retry logic with async patterns

Remember: In production AI applications, the difference between sequential and concurrent processing can be the difference between a 10-second response time and a 2-second response time!