# Batch Loader

The `BatchLoader` is a Meta Loader that allows you to load multiple datasources at once. Loaders are designed to load a single data by using the `load` method and loading multiple datasources at once requires a bit of boilerplate code. The `BatchLoader` is designed to make this process easier.

Usually, loading multiple data can be can be optimized by loading them in parallel. With the `BatchLoader` you can either load them using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html), [multithreading](https://docs.python.org/3/library/threading.html), [asyncio](https://docs.python.org/3/library/asyncio.html) or sequentially.

Each method is best suited for different use cases:
- **Multiprocessing**: When loading data from different sources and the process is CPU bound. Multiprocessing creates heavy weight processes that run in parallel. It is useful when the process is CPU bound and you want to take advantage of multiple cores.
- **Multithreading**: Using multithreading makes your OS create real threads, but the GIL prevents them from running in parallel. However, when the process is IO bound, the threads will be waiting for the IO to complete, so the GIL will not be a problem.
- **Asyncio**: When loading data from different sources and the process is IO bound. Asyncio is a single threaded event loop that allows you to run multiple tasks concurrently. Asyncio can create a infinite number of coroutines that are scheduled by the event loop. This is useful when calling APIs or loading data from the internet.
- **Sequentially**: classic sequential loading. Nothing special.

### Load data from an API call

In [28]:
from langchain_community.document_loaders import BatchLoader, YoutubeLoader

In [11]:
# Simulate a list of 20 youtube videos
youtube_videos = ["https://www.youtube.com/watch?v=3vbziEu2aO0"] * 20

# You can instantiate a batch loader using a callable that returns
# a Loader instance (e.g. YoutubeLoader.from_youtube_url)
sequential_loader_youtube = BatchLoader(
    YoutubeLoader.from_youtube_url,
    {"youtube_url": youtube_videos},
    show_progress=True,
)

threaded_loader_youtube = BatchLoader(
    YoutubeLoader.from_youtube_url,
    {"youtube_url": youtube_videos},
    show_progress=True,
    method="thread",
    max_workers=2,
)

processed_loader_youtube = BatchLoader(
    YoutubeLoader.from_youtube_url,
    {"youtube_url": youtube_videos},
    show_progress=True,
    method="process",
    max_workers=2,
)

async_loader_youtube = BatchLoader(
    YoutubeLoader.from_youtube_url,
    {"youtube_url": youtube_videos},
    show_progress=True,
    method="async",
)

In [16]:
sequential_loader_youtube.load(); # load sequential

100%|██████████| 20/20 [00:13<00:00,  1.49it/s]


In [17]:
threaded_loader_youtube.load(); # launch with 2 threads

100%|██████████| 20/20 [00:05<00:00,  3.69it/s]


In [22]:
processed_loader_youtube.load(); # launch with 2 processes

100%|██████████| 20/20 [00:05<00:00,  3.35it/s]


In [27]:
await async_loader_youtube.aload(); # launch with asyncio

100%|██████████| 20/20 [00:01<00:00, 16.00it/s]


As you can see in the above example, `async` method is best suited for loading data from an API call (16 iterations/s !!). Now it's time to choose the best method for your use case. 🤗