# Tutorial that shows how to use Asyncio for development

* To demonstrate a scenario where asyncio outperforms joblib, we need to focus on tasks that are I/O-bound and involve significant waiting (e.g., network requests, database queries). In such cases, asyncio shines because it avoids blocking the event loop while waiting for I/O, whereas joblib incurs overhead from spawning threads or processes that sit idle during I/O operations.

In [None]:
!pip install nest_asyncio



## 1. Initial example

In [None]:
import time
import asyncio
import requests
from joblib import Parallel, delayed

In [None]:
# Simulating an I/O-bound task (e.g., HTTP GET request)
def fetch_data_sync_io(url):
    print(f"Fetching data from {url}...")
    response = requests.get(url)
    print(f"Data fetched from {url}: {len(response.text)} characters")

async def fetch_data_async_io(url):
    print(f"Fetching data from {url}...")
    await asyncio.sleep(2)  # Replace this with real async I/O like aiohttp.get
    print(f"Data fetched from {url}")

# Using joblib for parallel execution
def fetch_data_joblib_io(url):
    print(f"Fetching data from {url}...")
    response = requests.get(url)
    print(f"Data fetched from {url}: {len(response.text)} characters")

# Synchronous example
def synchronous_io_example():
    print("Starting synchronous I/O example...")
    start_time = time.time()

    fetch_data_sync_io("https://example.com")
    fetch_data_sync_io("https://example.com")

    end_time = time.time()
    print(f"Synchronous I/O example finished in {end_time - start_time:.2f} seconds\n")

# Asynchronous example
async def asynchronous_io_example():
    print("Starting asynchronous I/O example...")
    start_time = time.time()

    await asyncio.gather(
        fetch_data_async_io("https://example.com"),
        fetch_data_async_io("https://example.com"),
    )

    end_time = time.time()
    print(f"Asynchronous I/O example finished in {end_time - start_time:.2f} seconds\n")

# Parallel example using joblib
def parallel_io_example():
    print("Starting parallel I/O example with joblib...")
    start_time = time.time()

    Parallel(n_jobs=2)(delayed(fetch_data_joblib_io)("https://example.com") for _ in range(2))

    end_time = time.time()
    print(f"Parallel I/O example finished in {end_time - start_time:.2f} seconds\n")

In [None]:
# Main function to run all examples
async def main():
    synchronous_io_example()
    await asynchronous_io_example()
    parallel_io_example()

# Execute the main function
if __name__ == "__main__":
    # Use asyncio.run only if not already in an event loop
    try:
        asyncio.run(main())
    except RuntimeError:
        # For environments like Jupyter
        import nest_asyncio
        nest_asyncio.apply()
        asyncio.run(main())


Starting synchronous I/O example...
Fetching data from https://example.com...
Data fetched from https://example.com: 1256 characters
Fetching data from https://example.com...
Data fetched from https://example.com: 1256 characters
Synchronous I/O example finished in 0.19 seconds

Starting asynchronous I/O example...
Fetching data from https://example.com...
Fetching data from https://example.com...
Data fetched from https://example.com
Data fetched from https://example.com
Asynchronous I/O example finished in 2.00 seconds

Starting parallel I/O example with joblib...
Parallel I/O example finished in 1.49 seconds



  asyncio.run(main())


### Explanation:
1. Synchronous Execution:
  * requests.get blocks until the HTTP response is received.
  * Total time = 2 seconds per request × 2 requests = 4 seconds.

2. Asyncio Execution:
  * Uses await asyncio.sleep (or a real async HTTP library like aiohttp) to avoid blocking while waiting for the response.
  * Both tasks overlap, completing in ≈ 2 seconds.

3. Joblib Execution:
  * Spawns threads or processes to parallelize tasks, but each still blocks while waiting for the HTTP response.
  * Overhead of managing threads/processes adds to the runtime.
  * Total time ≈ 2–3 seconds (depending on system).

### Why asyncio Is Better:
* Efficiency: asyncio avoids the overhead of creating threads/processes and leverages the event loop to handle I/O more efficiently.
* Scalability: For a larger number of I/O-bound tasks (e.g., fetching data from 100 URLs), asyncio handles them with minimal overhead, while joblib would struggle with thread/process management.

## 2. Simulating URL retrieval

In [None]:
import time
import asyncio
import requests
from joblib import Parallel, delayed
import aiohttp

In [None]:
# List of URLs to fetch
URLS = [
    "https://example.com",
    "https://httpbin.org/delay/2",  # Simulates a 2-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/5",  # Simulates a 5-second delay
    "https://httpbin.org/delay/10",  # Simulates a 10-second delay
    "https://httpbin.org/delay/12",  # Simulates a 12-second delay
]

# Synchronous HTTP fetch using requests
def fetch_data_sync(url):
    print(f"Fetching data from {url} (synchronously)...")
    response = requests.get(url)
    print(f"Data fetched from {url}: {len(response.text)} characters")

# Asynchronous HTTP fetch using aiohttp
async def fetch_data_async(url):
    print(f"Fetching data from {url} (asynchronously)...")
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            text = await response.text()
            print(f"Data fetched from {url}: {len(text)} characters")

# Parallel HTTP fetch using joblib and requests
def fetch_data_parallel(url):
    print(f"Fetching data from {url} (in parallel)...")
    response = requests.get(url)
    print(f"Data fetched from {url}: {len(response.text)} characters")

# Synchronous example
def synchronous_example():
    print("\n=== Synchronous Example ===")
    start_time = time.time()
    for url in URLS:
        fetch_data_sync(url)
    end_time = time.time()
    print(f"Synchronous example finished in {end_time - start_time:.2f} seconds\n")

# Asynchronous example
async def asynchronous_example():
    print("\n=== Asynchronous Example ===")
    start_time = time.time()
    await asyncio.gather(*(fetch_data_async(url) for url in URLS))
    end_time = time.time()
    print(f"Asynchronous example finished in {end_time - start_time:.2f} seconds\n")

# Parallel example using joblib
def parallel_example():
    print("\n=== Parallel Example ===")
    start_time = time.time()
    Parallel(n_jobs=len(URLS))(delayed(fetch_data_parallel)(url) for url in URLS)
    end_time = time.time()
    print(f"Parallel example finished in {end_time - start_time:.2f} seconds\n")

In [None]:
# Main function to run all examples
async def main():
    synchronous_example()
    await asynchronous_example()
    parallel_example()

# Execute the main function
if __name__ == "__main__":
    # Use asyncio.run only if not already in an event loop
    try:
        asyncio.run(main())
    except RuntimeError:
        import nest_asyncio
        nest_asyncio.apply()
        asyncio.run(main())



=== Synchronous Example ===
Fetching data from https://example.com (synchronously)...
Data fetched from https://example.com: 1256 characters
Fetching data from https://httpbin.org/delay/2 (synchronously)...
Data fetched from https://httpbin.org/delay/2: 356 characters
Fetching data from https://httpbin.org/delay/5 (synchronously)...
Data fetched from https://httpbin.org/delay/5: 356 characters
Fetching data from https://httpbin.org/delay/5 (synchronously)...
Data fetched from https://httpbin.org/delay/5: 356 characters
Fetching data from https://httpbin.org/delay/5 (synchronously)...
Data fetched from https://httpbin.org/delay/5: 356 characters
Fetching data from https://httpbin.org/delay/5 (synchronously)...
Data fetched from https://httpbin.org/delay/5: 356 characters
Fetching data from https://httpbin.org/delay/5 (synchronously)...
Data fetched from https://httpbin.org/delay/5: 356 characters
Fetching data from https://httpbin.org/delay/5 (synchronously)...
Data fetched from https:

# Why `asyncio` Works Better for I/O-Bound Tasks

The key advantage of `asyncio` lies in its **non-blocking nature** and how it handles tasks concurrently. Here's an explanation:

---

## 1. Understanding the Problem: Blocking vs. Non-Blocking
- **I/O-Bound Tasks**: These are tasks that spend most of their time waiting for external resources, such as:
  - Network responses (e.g., HTTP requests).
  - File I/O (e.g., reading from or writing to disk).
  - Database queries.

- **Blocking**: Traditional synchronous or multi-threaded approaches (like `requests` or `joblib`) **block the thread** while waiting for the resource. During this time:
  - The thread or process is idle and cannot do anything else.
  - Even in parallel, additional threads/processes are required to achieve concurrency, leading to increased overhead.

- **Non-Blocking**: `asyncio` avoids blocking by **yielding control** while waiting for I/O. This allows the program to perform other tasks during the wait.

---

## 2. How `asyncio` Handles I/O
- `asyncio` uses a **single-threaded event loop** to manage tasks.
- When an `await` statement is encountered (e.g., `await asyncio.sleep(2)` or `await aiohttp.ClientSession().get()`):
  - The task is paused (control is yielded).
  - The event loop schedules other tasks to run while waiting.
  - Once the I/O operation completes, the task resumes from where it paused.

This mechanism ensures efficient use of the thread and avoids the overhead of creating and managing multiple threads or processes.

---

## 3. Comparison with Other Methods
### **Synchronous (`requests`)**
- **How it works**:
  - Each HTTP request blocks until a response is received.
  - Only one request is processed at a time.
- **Downside**:
  - Inefficient for I/O-bound tasks because the CPU is idle during the wait.

### **Parallel (`joblib` + `requests`)**
- **How it works**:
  - Spawns multiple threads or processes to make requests in parallel.
  - Each thread/process blocks during the HTTP request.
- **Downside**:
  - Threads/processes consume system resources (memory, CPU).
  - For many tasks, the overhead of managing threads/processes outweighs the benefits.

### **Asynchronous (`asyncio` + `aiohttp`)**
- **How it works**:
  - The event loop manages all tasks concurrently in a single thread.
  - No threads/processes are blocked during I/O operations.
- **Advantages**:
  - Low overhead: No need for extra threads or processes.
  - High scalability: Handles thousands of tasks with minimal resource usage.

---

## 4. Why `asyncio` Scales Better
### **Resource Efficiency**:
- `asyncio` uses a single thread and consumes minimal resources compared to threads/processes in parallel computing.
- This makes it ideal for tasks involving many simultaneous I/O operations (e.g., downloading 1,000 URLs).

### **Concurrency, Not Parallelism**:
- `asyncio` achieves concurrency (interleaving tasks) without parallelism (running tasks simultaneously on multiple cores).
- This is sufficient for I/O-bound tasks, where the bottleneck is the waiting time, not CPU usage.

---

## 5. Performance Example
### Imagine Fetching 1,000 URLs:
- **Synchronous**:
  - Fetches one URL at a time. If each takes 2 seconds, the total time is ~2,000 seconds.
- **Parallel (`joblib`)**:
  - Divides the workload among threads. With 10 threads, the total time is ~200 seconds but with significant memory/CPU overhead.
- **Asynchronous (`asyncio`)**:
  - Fetches all URLs concurrently using the event loop. Total time is close to 2 seconds, with negligible overhead.

---

## 6. When Not to Use `asyncio`
While `asyncio` excels for I/O-bound tasks, it’s not ideal for:
- **CPU-Bound Tasks**:
  - Tasks that require heavy computation (e.g., image processing, data analysis) don’t benefit from `asyncio` because the event loop is blocked during computation.
  - Use multi-threading (`concurrent.futures.ThreadPoolExecutor`) or multi-processing (`joblib`) for such tasks.

---

## Summary
- **`asyncio` is better for I/O-bound tasks** because it avoids blocking and uses resources efficiently.
- It achieves high concurrency with low overhead, making it scalable for a large number of tasks.
- For CPU-bound tasks, parallelism (e.g., `joblib`) is more suitable.

In essence, `asyncio` makes the most of the "waiting time" in I/O-bound tasks by allowing other tasks to proceed concurrently, which is why it performs better in these scenarios.


# What is the Event Loop?

The **event loop** is a core component of asynchronous programming, particularly in frameworks like Python's `asyncio`. It is responsible for managing and coordinating the execution of asynchronous tasks. Here’s a detailed explanation:

---

## **What is the Event Loop?**
The **event loop** is a mechanism that continuously monitors and manages tasks, events, and their associated callbacks. It runs in a single thread and allows you to write non-blocking code that can handle multiple operations concurrently.

### Key Responsibilities of the Event Loop:
1. **Scheduling Tasks**:
   - It schedules and runs tasks (e.g., coroutines, callbacks, or other asynchronous operations).
2. **Handling I/O**:
   - It monitors I/O operations (e.g., reading from or writing to a network, file system, etc.) and resumes tasks when these operations complete.
3. **Managing Timers**:
   - It handles time-based events like delays or periodic callbacks (e.g., `asyncio.sleep()`).
4. **Running Callbacks**:
   - It executes callbacks (e.g., functions triggered by events or task completions).

---

## **How Does the Event Loop Work?**
The event loop follows these steps:

1. **Initialization**:
   - The event loop starts and begins monitoring for tasks and events.

2. **Task Scheduling**:
   - Asynchronous tasks (e.g., coroutines) are registered with the event loop.
   - These tasks include I/O operations, timers, or custom asynchronous functions.

3. **Waiting and Executing**:
   - The event loop continuously:
     - Checks if any tasks are ready to run.
     - Pauses tasks waiting for I/O or delays.
     - Resumes tasks when their I/O or delay is complete.

4. **Termination**:
   - The event loop continues running until all tasks are complete or it is explicitly stopped.

---

## **Single-Threaded, Concurrent Execution**
- The event loop is **single-threaded**:
  - It runs on one CPU core and executes one task at a time.
- It achieves **concurrency** (not parallelism) by rapidly switching between tasks, making it appear as though tasks are running simultaneously.

---

## **Event Loop in `asyncio`**
In Python's `asyncio`, the event loop is the heart of asynchronous programming. Here's how it works in practice:

### Example: Event Loop with `asyncio`
```python
import asyncio

async def say_hello():
    print("Hello!")
    await asyncio.sleep(1)  # Simulate I/O or delay
    print("Goodbye!")

async def main():
    await asyncio.gather(say_hello(), say_hello())  # Run tasks concurrently

# Start the event loop
asyncio.run(main())


# Difference Between Concurrency and Parallelism

Concurrency and parallelism are two concepts often used in computing to describe the execution of multiple tasks, but they refer to different mechanisms and approaches. Here's a clear explanation of the difference between them:

---

## **Concurrency**
### Definition:
Concurrency is the ability to manage and execute multiple tasks **in overlapping time periods**. It does not necessarily mean that the tasks are running simultaneously; rather, it means they are making progress at the same time.

### Key Characteristics:
- **Task Interleaving**:
  - Tasks take turns using shared resources, such as a single CPU or thread.
  - Tasks appear to run simultaneously because they are rapidly switched in and out of execution.
- **Focus on Task Management**:
  - Concurrency deals with how tasks are structured, scheduled, and coordinated.
- **Single-Core or Multi-Core**:
  - Concurrency can occur on a single-core processor by switching between tasks (time-slicing).

### Example:
- A web server handling multiple client requests. It might process parts of one request, pause it to process another, and switch back to the first request, all without completing any of them immediately.

### Analogy:
- Imagine a juggler handling multiple balls. The juggler switches between balls rapidly, giving the illusion of handling them all at once.

---

## **Parallelism**
### Definition:
Parallelism is the ability to execute multiple tasks **simultaneously**. It requires multiple processors or CPU cores to achieve true parallel execution.

### Key Characteristics:
- **Simultaneous Execution**:
  - Tasks run at the exact same time on different processors or cores.
- **Focus on Hardware Utilization**:
  - Parallelism depends on the availability of hardware resources to execute tasks simultaneously.
- **Multi-Core Processors**:
  - Parallelism is only possible when there are multiple CPU cores or processors available.

### Example:
- Running a computationally intensive simulation by dividing the work across multiple CPU cores, with each core processing a separate part of the simulation simultaneously.

### Analogy:
- Imagine a group of jugglers, each juggling one ball at the same time.

---

## **Key Differences**

| **Aspect**            | **Concurrency**                                     | **Parallelism**                                |
|------------------------|----------------------------------------------------|-----------------------------------------------|
| **Execution**          | Tasks progress in overlapping time periods.        | Tasks run simultaneously on multiple cores.   |
| **Hardware**           | Can occur on a single-core processor.              | Requires multi-core or multi-processor systems. |
| **Focus**              | Managing tasks and switching efficiently.          | Utilizing hardware resources for simultaneous execution. |
| **Use Case**           | I/O-bound tasks, e.g., web servers.                | CPU-bound tasks, e.g., simulations or data processing. |
| **Implementation**     | Uses techniques like threading or async programming. | Uses multiprocessing or GPU-based computing.  |

---

## **Concurrency and Parallelism Together**
- Concurrency and parallelism are not mutually exclusive.
- You can have **concurrent systems without parallelism** (e.g., a single-core system managing multiple tasks) or **parallel systems without concurrency** (e.g., a single task split into parallel parts running simultaneously).
- A system can also be **both concurrent and parallel** (e.g., multi-threaded applications running on multi-core processors).

---

## **Examples in Python**

### **Concurrency with `asyncio`**
Concurrency can be achieved with `asyncio`:
```python
import asyncio

async def task1():
    print("Task 1 started")
    await asyncio.sleep(1)  # Simulates I/O operation
    print("Task 1 completed")

async def task2():
    print("Task 2 started")
    await asyncio.sleep(1)  # Simulates I/O operation
    print("Task 2 completed")

asyncio.run(asyncio.gather(task1(), task2()))


## 3. Example with LLMs (loading model)

* NOTE: Requires GPU

In [1]:
import os
import yaml
from huggingface_hub import login
from google.colab import drive
from getpass import getpass
from IPython.display import clear_output

drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
requirements_path = "/content/drive/MyDrive/GitHub/python-codebase/machine_learning/generative_ai/custom_library/lib/requirements.txt"
!pip install -r {requirements_path}
clear_output()

In [3]:
# Read YAML file
f_path = "/content/drive/MyDrive/GitHub/python-codebase/machine_learning/private_keys.yml"
with open(f_path, 'r') as stream:
    data_loaded = yaml.safe_load(stream)
os.environ['HF_API_TOKEN'] = data_loaded['HF_API_KEY']
os.environ['GITHUB_TOKEN'] = data_loaded['GITHUB_TOKEN']

# Set up token
login(token=os.environ['HF_API_TOKEN'])

In [4]:
os.chdir('/content/drive/MyDrive/GitHub/python-codebase/machine_learning/generative_ai/custom_library')

In [5]:
!ls

evaluation_results.json  lib  tutorial.ipynb


In [8]:
import asyncio
import torch
import time
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from transformers import BitsAndBytesConfig
import transformers

class HuggingFaceModelLoad:
    def __init__(
            self,
            model_name: str = "MiniLLM/MiniPLM-Qwen-500M",
            device: str = None,
            use_quantization=True
    ):
        """
        Initialize the HuggingFace model for text generation with optional quantization.
        """
        self.model_name = model_name
        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")

        # Tokenizer setup
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)

        # Model setup with optional quantization
        if use_quantization:
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=torch.float16,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
            )
            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_name,
                device_map=self.device,
                quantization_config=quantization_config,
            )
        else:
            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_name,
                trust_remote_code=True
            )

        # Create the text generation pipeline once
        self.text_generation_pipeline = pipeline(
            "text-generation",
            model=self.model,
            tokenizer=self.tokenizer,
            use_cache=True,
            device_map=self.device,
            max_new_tokens=1000,
            temperature=0.1,
            do_sample=True,
            truncation=True,
            num_return_sequences=1,
            eos_token_id=self.tokenizer.eos_token_id,
            pad_token_id=self.tokenizer.eos_token_id,
        )

    def generate_sync(
            self,
            prompt: str,
            system_prompt: str = "",
            return_full_text: bool = False,
    ):
        """
        Synchronously generate text from the model.
        """
        try:
            messages = [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt},
            ]
            output_dict = self.text_generation_pipeline(messages)
            output = output_dict[0]['generated_text'][-1]['content']
            return output
        except Exception as e:
            return f"Error during synchronous generation: {e}"

    async def _generate_inference(self, pipeline, messages):
        """Run the model inference asynchronously in a separate thread."""
        return await asyncio.to_thread(pipeline, messages)

    async def generate_async(
            self,
            prompt: str,
            system_prompt: str = "",
            return_full_text: bool = False,
    ):
        """
        Asynchronously generate text from the model.
        """
        try:
            messages = [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt},
            ]
            output_dict = await self._generate_inference(self.text_generation_pipeline, messages)
            output = output_dict[0]['generated_text'][-1]['content']
            return output
        except Exception as e:
            return f"Error during asynchronous generation: {e}"

In [12]:
async def compare_sync_async(model_loader):
    """
    Compare the execution times and outputs of synchronous and asynchronous text generation.
    """
    prompts = [
        "What is the capital of France?",
        "Tell me a joke about AI.",
        "Tell me a joke about Sci-Fi.",
        "Explain the concept of recursion in programming."
    ]

    # Synchronous Execution
    print("Running synchronously:")
    sync_results = []
    start_time_sync = time.time()
    for prompt in prompts:
        start_prompt_time = time.time()
        result = model_loader.generate_sync(prompt)
        sync_results.append(result)
        prompt_duration = time.time() - start_prompt_time
        print(f"Prompt: {prompt}\nGenerated Text (Sync): {result}\nExecution Time: {prompt_duration:.4f} seconds\n")
    sync_duration = time.time() - start_time_sync
    print(f"Total Time for Synchronous Execution: {sync_duration:.4f} seconds\n")

    # Asynchronous Execution
    print("Running asynchronously:")
    start_time_async = time.time()
    async_results = await asyncio.gather(
        *[model_loader.generate_async(prompt) for prompt in prompts]
    )
    async_duration = time.time() - start_time_async
    for prompt, result in zip(prompts, async_results):
        print(f"Prompt: {prompt}\nGenerated Text (Async): {result}\n")
    print(f"Total Time for Asynchronous Execution: {async_duration:.4f} seconds\n")

    # Comparison Summary
    print("Comparison Summary:")
    print(f"Synchronous Execution Time: {sync_duration:.4f} seconds")
    print(f"Asynchronous Execution Time: {async_duration:.4f} seconds")
    print(f"Time Savings: {sync_duration - async_duration:.4f} seconds\n")

In [9]:
model_name = "microsoft/Phi-3.5-mini-instruct"
dct_params = {
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}
if True:
  model_loader = HuggingFaceModelLoad(model_name = model_name)
  debug_mode = False

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda


In [13]:
if __name__ == "__main__":

    # For environments like Jupyter
    import nest_asyncio
    nest_asyncio.apply()

    # Run the async function in Jupyter environment
    loop = asyncio.get_event_loop()
    loop.run_until_complete(compare_sync_async(model_loader))

Running synchronously:
Prompt: What is the capital of France?
Generated Text (Sync):  The capital of France is Paris. It is not only the largest city in France but also serves as the country'selsectoral center for finance, commerce, fashion, art, and culture. Paris is known for its historical landmarks such as the Eiffel Tower, the Louvre Museum, and the Cathedral of Notre-Dame. It is a major European city and a global center for art, fashion, and culture. The city's influence extends far beyond its borders, making it one of the most visited cities in the world.
Execution Time: 7.0090 seconds

Prompt: Tell me a joke about AI.
Generated Text (Sync):  Sure, here's a light-hearted AI joke for you:

Why did the AI go to school?

Because it wanted to improve its "byte" in learning!

(Note: This joke plays on the double meaning of "byte" as both a unit of digital information and a colloquial term for a small amount of something.) Here's another AI-themed joke for you:

Why did the AI refuse 

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Prompt: Explain the concept of recursion in programming.
Generated Text (Sync):  Recursion in programming is a method where a function calls itself directly or indirectly to solve a problem. The key idea behind recursion is to break down a complex problem into smaller, more manageable sub-problems that are easier to solve. Each recursive call works on a smaller piece of the problem until it reaches a base case, which is a condition that does not require further recursion and can be solved directly.

Here's a step-byalla breakdown of how recursion works:

1. **Base Case**: This is the condition that stops the recursion. Without a base case, the function would call itself indefinitely, leading to a stack overflow error. The base case is crucial because it prevents infinite recursion and ensures that the function eventually terminates.

2. **Recursive Case**: This is where the function calls itself with a modified argument. The modification usually involves moving closer to the base case.