### Proof of Concept Document: Integrating LangChain with MinIO

#### Project Overview
This document outlines a proof of concept (PoC) for integrating LangChain's S3DirectoryLoader and S3FileLoader with MinIO. Our goal is to demonstrate how we can leverage MinIO's S3-compatible storage with LangChain for efficient document loading and processing.

#### MinIO Client Setup
We'll use the Python `minio` package to interact with our MinIO server. The client setup involves specifying the server URL, access key, and secret key.

```python
from minio import Minio

def create_minio_client(endpoint, access_key, secret_key, secure=False):
    return Minio(endpoint, access_key=access_key, secret_key=secret_key, secure=secure)
```

#### S3DirectoryLoader Integration
The S3DirectoryLoader loads documents from a specific directory in a MinIO bucket. We wrap this loader in a function that initializes it with MinIO client details.

```python
from langchain_community.document_loaders.s3_directory import S3DirectoryLoader

def load_documents_from_directory(minio_client, bucket, prefix=''):
    loader = S3DirectoryLoader(
        bucket=bucket, prefix=prefix,
        endpoint_url=f'http://{minio_client._endpoint}',
        aws_access_key_id=minio_client._access_key,
        aws_secret_access_key=minio_client._secret_key,
        use_ssl=minio_client._use_https
    )
    return loader.load()
```

#### S3FileLoader Integration
The S3FileLoader is used for loading a single file from a MinIO bucket. Similar to the directory loader, it is wrapped for MinIO integration.

```python
from langchain.document_loaders.s3_file import S3FileLoader

def load_document_from_file(minio_client, bucket, file_key):
    loader = S3FileLoader(
        bucket=bucket, key=file_key,
        endpoint_url=f'http://{minio_client._endpoint}',
        aws_access_key_id=minio_client._access_key,
        aws_secret_access_key=minio_client._secret_key,
        use_ssl=minio_client._use_https
    )
    return loader.load()
```

#### Usage Example
```python
minio_client = create_minio_client('minio-server-url', 'access-key', 'secret-key')
documents = load_documents_from_directory(minio_client, 'bucket-name', 'directory-prefix')
document = load_document_from_file(minio_client, 'bucket-name', 'path/to/file')
```

#### Next Steps
- **Testing**: Implement test scripts for both loaders to ensure functionality with the MinIO server.
- **Error Handling**: Improve error handling and logging in the wrapper functions.
- **Performance Optimization**: For large datasets, optimize the loading process.
- **Documentation**: Document the code and create usage guides.
- **Review and Feedback**: Share this PoC with the team for feedback and iterative improvements.

This PoC aims to provide a foundation for integrating MinIO's robust storage capabilities with LangChain's document handling, offering a scalable and efficient solution for document processing and language model applications.

## S3DirectoryLoader: 

This loader is designed to load all objects from a specified directory within an S3 bucket (in this case, a MinIO bucket). It essentially retrieves multiple objects (documents) that are stored under a common prefix or directory within the bucket. The loader will process each object in the specified directory, treating them as individual documents.

## S3FileLoader: 

In contrast, the S3FileLoader is focused on loading a single specific object (document) from an S3 bucket. You provide the exact key (path) to the object in the bucket, and the loader fetches just that object. This is more suitable for scenarios where you need to process or analyze one specific file at a time.

So, in summary, the S3DirectoryLoader is for bulk loading of all objects in a specified directory, while the S3FileLoader is for targeting a specific single object. This distinction is crucial depending on whether your task requires handling multiple files collectively or processing individual files separately.

---

In [None]:
from minio import Minio

def create_minio_client(endpoint, access_key, secret_key, secure=False):
    return Minio(endpoint, access_key=access_key, secret_key=secret_key, secure=secure)

In [None]:
from langchain_community.document_loaders.s3_directory import S3DirectoryLoader

# Load All Bucket Objects
def load_documents_from_directory(minio_client, bucket, prefix=''):
    loader = S3DirectoryLoader(
        bucket=bucket, prefix=prefix,
        endpoint_url=f'http://{minio_client._endpoint}',
        aws_access_key_id=minio_client._access_key,
        aws_secret_access_key=minio_client._secret_key,
        use_ssl=minio_client._use_https
    )
    return loader.load()

In [None]:
from langchain.document_loaders.s3_file import S3FileLoader

# Single Object Loader
def load_document_from_file(minio_client, bucket, file_key):
    loader = S3FileLoader(
        bucket=bucket, key=file_key,
        endpoint_url=f'http://{minio_client._endpoint}',
        aws_access_key_id=minio_client._access_key,
        aws_secret_access_key=minio_client._secret_key,
        use_ssl=minio_client._use_https
    )
    return loader.load()

In [None]:
minio_client = create_minio_client('minio-server-url', 'access-key', 'secret-key')
documents = load_documents_from_directory(minio_client, 'bucket-name', 'directory-prefix')
document = load_document_from_file(minio_client, 'bucket-name', 'path/to/file')