Feature Request: Add Amazon SageMaker AI support for indexing and retrieval

## Summary

Request to add Amazon SageMaker AI as a supported LLM provider for both the indexing (tree generation) and retrieval phases.

## Motivation

Enterprise users with existing SageMaker deployments would benefit from:
- **Custom model hosting**: Use fine-tuned or proprietary models deployed on SageMaker endpoints
- **Cost optimization**: Leverage reserved capacity, Savings Plans, or spot instances
- **VPC integration**: Keep all inference within private VPCs with no internet egress
- **Model flexibility**: Deploy any Hugging Face, custom, or third-party model

## Use Cases

1. **Self-hosted open models**: Deploy Llama, Mistral, Qwen, or other open models on SageMaker
2. **Fine-tuned models**: Use domain-specific fine-tuned models for better accuracy
3. **Air-gapped environments**: Operate in environments without external API access
4. **Cost control**: Predictable pricing with provisioned endpoints vs per-token API costs

## Proposed Implementation

Extend the `LLMProvider` abstraction to support SageMaker endpoints:

```python
class SageMakerProvider(LLMProvider):
    def __init__(self, endpoint_name: str, region: str = "us-east-1"):
        self.client = boto3.client('sagemaker-runtime', region_name=region)
        self.endpoint_name = endpoint_name

    def call(self, prompt: str) -> str:
        # Format depends on model deployed (HuggingFace TGI, vLLM, etc.)
        payload = {
            "inputs": prompt,
            "parameters": {"max_new_tokens": 4096, "temperature": 0}
        }
        response = self.client.invoke_endpoint(
            EndpointName=self.endpoint_name,
            ContentType="application/json",
            Body=json.dumps(payload)
        )
        result = json.loads(response['Body'].read().decode())
        return result[0]['generated_text']

    async def call_async(self, prompt: str) -> str:
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(None, self.call, prompt)
```

## Usage Example

```bash
# With SageMaker endpoint
python run_pageindex.py --pdf_path doc.pdf \
    --provider sagemaker \
    --endpoint my-llama-endpoint

# Environment-based
export PAGEINDEX_PROVIDER=sagemaker
export SAGEMAKER_ENDPOINT=my-llama-endpoint
export AWS_REGION=us-east-1
python run_pageindex.py --pdf_path doc.pdf
```

## Implementation Considerations

1. **Endpoint format variability**: Different model containers (TGI, vLLM, Triton) have different request/response formats
2. **Streaming support**: Some endpoints support streaming responses
3. **Batching**: SageMaker supports batch inference for cost optimization during indexing
4. **IAM authentication**: Standard boto3 credential chain

## Related

- Issue #104: Amazon Bedrock support
- Issue #90: Support custom models
- Issue #27: Ollama support
- PR #43: Multi-provider LLM support (OpenAI + Gemini)

Happy to contribute a PR if this feature is welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add Amazon SageMaker AI support for indexing and retrieval #105

Summary

Motivation

Use Cases

Proposed Implementation

Usage Example

Implementation Considerations

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add Amazon SageMaker AI support for indexing and retrieval #105

Description

Summary

Motivation

Use Cases

Proposed Implementation

Usage Example

Implementation Considerations

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions