# Developing Serve Applications

This notebook covers best practices for designing, testing and developing Ray Serve applications.

<div class="alert alert-block alert-info">
    
<b>Here is the roadmap for this notebook:</b>

<ol>
    <li>Structuring Serve Code</li>
    <li>Testing Serve Code</li>
    <li>Patterns of Integrating with FastAPI</li>
    <li>Debugging Serve Applications</li>
    <li>Configuration in Ray Serve</li>
    <li>Running Serve Locally</li>
</ol>
</div>

**Imports**

In [None]:
import pytest

import requests
from fastapi import FastAPI, APIRouter, Depends
from fastapi import HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from ray import serve
from ray.serve import Application
from ray.serve.handle import DeploymentHandle
from starlette.requests import Request

## 1. Structuring Serve Code

Structuring your Ray Serve applications effectively requires separating business logic from deployment concerns. This section provides guidance on how to organize your code for better testability and maintainability.

### Code Structure for Testability

The key to effective testing with Ray Serve is to separate your business logic from the deployment wrapper. Here's the recommended structure:

In [None]:
class MyModel:
    """Core business logic - easily unit testable"""
    
    def __init__(self, model_path: str):
        self.model_path = model_path
        self.model = self._load_model()
    
    def _load_model(self):
        """Model loading logic"""
    
    def predict(self, input_data: dict) -> dict:
        """Core prediction logic"""    


@serve.deployment(
    ray_actor_options={"num_cpus": 1},
    max_ongoing_requests=5
)
class MyModelDeployment(MyModel):
    """Ray Serve deployment wrapper"""
    
    def __init__(self, model_path: str):
        super().__init__(model_path)
    
    async def __call__(self, request: Request) -> dict:
        """HTTP endpoint handler"""
        input_data = await request.json()
        
        # Add basic validation
        if "test" not in input_data:
            raise HTTPException(status_code=400, detail="Missing required field 'test'")
        
        return self.predict(input_data)

#### Alternative Patterns for Creating Deployments

Instead of using inheritance, you can use alternative patterns to create deployments from your business logic classes:

**Pattern 1: Using an `as_deployment()` class method**

In [None]:
class MyModel:
    """Core business logic with deployment factory method"""
    
    def __init__(self, model_path: str):
        self.model_path = model_path
        self.model = self._load_model()
    
    def _load_model(self):
        """Model loading logic"""
    
    def predict(self, input_data: dict) -> dict:
        """Core prediction logic"""
    
    @classmethod
    def as_deployment(cls, **deployment_options):
        """Factory method to create a Ray Serve deployment"""
        default_options = {
            "ray_actor_options": {"num_cpus": 1},
            "max_ongoing_requests": 5
        }
        default_options.update(deployment_options)
        return serve.deployment(**default_options)(cls)

# Usage
app = MyModel.as_deployment(name="my_model").bind("path/to/model")

**Pattern 2: Using a factory function**

In [None]:
def make_deployment(cls: type, **custom_options) -> serve.Deployment:
    """Factory function to create deployments with custom options"""
    default_options = {
        "ray_actor_options": {"num_cpus": 1},
        "max_ongoing_requests": 5
    }
    default_options.update(custom_options)
    return serve.deployment(**default_options)(cls)

class MyModel:
    """Core business logic"""
    
    def __init__(self, model_path: str):
        self.model_path = model_path
        self.model = self._load_model()
    
    def _load_model(self):
        """Model loading logic"""
    
    def predict(self, input_data: dict) -> dict:
        """Core prediction logic"""

# Usage
MyModelDeployment = make_deployment(MyModel, name="my_model", num_replicas=2)
app = MyModelDeployment.bind("path/to/model")

**Comparison of Patterns:**

- **Inheritance Pattern**: Good when you need to add deployment-specific logic 
- **`as_deployment()` Method**: Keeps deployment configuration close to the business logic class
- **Factory Function**: Provides centralized deployment creation logic, useful when applying consistent configurations across multiple classes

## 2. Testing Serve Code

Testing Ray Serve applications requires a structured approach with different testing strategies for different layers of your application. This section covers unit testing, integration testing with deployment handles, HTTP integration testing, and testing deployment composition.

### Unit Testing Business Logic

With this structure, you can write comprehensive unit tests for your business logic without any Ray Serve dependencies:

In [None]:
class TestMyModel:
    """Unit tests for core business logic"""
    
    def test_model_initialization(self):
        """Test model loading and initialization"""
        model = MyModel("test_model_path")
        assert model.model_path == "test_model_path"
        # Add assertions for model state
    
    def test_end_to_end_prediction(self):
        """Test complete prediction pipeline"""
        model = MyModel("test_model_path")            
        input_data = {"test": "input"}
        result = model.predict(input_data)        
        assert result == {"final": "result"}

### Integration Testing with DeploymentHandle

For integration testing, use Ray Serve's DeploymentHandle to test your deployment without HTTP overhead:

In [None]:
class TestMyModelDeployment:
    """Integration tests using DeploymentHandle"""
    
    @pytest.fixture
    def deployment_handle(self):
        """Setup deployment for testing"""
        app = MyModelDeployment.bind("test_model_path")
        handle = serve.run(app, name="test_model", blocking=False)
        yield handle
        serve.shutdown()
    
    def test_deployment_prediction(self, deployment_handle):
        """Test prediction through deployment handle"""
        input_data = {"input": "data"}
        result = deployment_handle.predict.remote(input_data).result()
        assert result == {"prediction": "test_result", "confidence": 0.95}

### Integration Testing with HTTP Requests

For full HTTP integration testing, use a library like `requests` after starting the Serve application:

In [None]:
class TestMyModelHTTP:
    """HTTP integration tests"""
    
    @pytest.fixture
    def serve_app(self):
        """Setup HTTP server for testing"""
        app = MyModelDeployment.bind("test_model_path")
        serve.run(app, name="test_model", blocking=False)
        yield
        serve.shutdown()
    
    def test_http_prediction_endpoint(self, serve_app):
        """Test HTTP prediction endpoint"""
        input_data = {"input": "data"}
        response = requests.post(
            "http://localhost:8000/",
            json=input_data,
            timeout=10
        )
        
        assert response.status_code == 200
        result = response.json()
        # Add assertions for result
    
    def test_http_error_handling(self, serve_app):
        """Test HTTP error handling"""
        invalid_data = {"invalid": "data"}
        response = requests.post(
            "http://localhost:8000/",
            json=invalid_data,
            timeout=10
        )
        
        # Test appropriate error response
        assert response.status_code in [400, 422, 500]

### Advanced: Testing Composition of Deployments

When testing applications with multiple deployments that work together, you need to test both individual components and their interactions. Here's how to structure tests for deployment composition:

In [None]:
# Example: Multi-deployment application
class TextPreprocessor:
    """Text preprocessing logic"""
    
    def preprocess(self, text: str) -> dict:
        """Clean and tokenize text"""
        cleaned_text = text.strip().lower()
        tokens = cleaned_text.split()
        return {"tokens": tokens, "length": len(tokens)}

class TextEmbedder:
    """Text embedding logic"""
    
    def embed(self, tokens: dict) -> list:
        """Convert tokens to embeddings"""
        # Simulate embedding generation
        return [0.1] * tokens["length"]

class TextClassifier:
    """Text classification logic"""
    
    def classify(self, embeddings: list) -> dict:
        """Classify based on embeddings"""
        # Simulate classification
        return {"label": "positive", "confidence": 0.85}

@serve.deployment
class TextPreprocessorDeployment(TextPreprocessor):
    """Deployment for text preprocessing"""

@serve.deployment
class TextEmbedderDeployment(TextEmbedder):
    """Deployment for text embedding"""

@serve.deployment
class TextClassifierDeployment(TextClassifier):
    """Deployment for text classification"""

@serve.deployment
class TextPipelineDeployment:
    """Composed pipeline deployment"""
    
    def __init__(self, preprocessor: DeploymentHandle, embedder: DeploymentHandle, classifier: DeploymentHandle):
        self.preprocessor = preprocessor
        self.embedder = embedder
        self.classifier = classifier
    
    async def __call__(self, request: Request) -> dict:
        data = await request.json()
        self.run(data)

    async def run(self, data: dict) -> dict:
        text = data["text"]
        
        # Step 1: Preprocess
        preprocessed = await self.preprocessor.preprocess.remote(text)
        
        # Step 2: Embed
        embeddings = await self.embedder.embed.remote(preprocessed)
        
        # Step 3: Classify
        result = await self.classifier.classify.remote({"embeddings": embeddings})
        
        return {
            "preprocessed": preprocessed,
            "embeddings": embeddings,
            "classification": result
        }

#### Integration Testing Deployment Composition

Test how deployments work together:

In [None]:
class TestTextPipelineComposition:
    """Integration tests for deployment composition"""
    
    @pytest.fixture
    def pipeline_handle(self):
        """Setup composed pipeline for testing"""
        # Create individual deployments
        preprocessor_app = TextPreprocessorDeployment.bind()
        embedder_app = TextEmbedderDeployment.bind()
        classifier_app = TextClassifierDeployment.bind()
        
        # Create composed pipeline
        pipeline_app = TextPipelineDeployment.bind(
            preprocessor=preprocessor_app,
            embedder=embedder_app,
            classifier=classifier_app
        )
        
        handle = serve.run(pipeline_app, name="text_pipeline", blocking=False)
        yield handle
        serve.shutdown()
    
    def test_end_to_end_pipeline(self, pipeline_handle):
        """Test complete pipeline execution"""
        input_text = "This is a test message"
        
        # Execute pipeline
        result = pipeline_handle.remote({"text": input_text}).result()
        
        # Verify result
        assert "preprocessed" in result
        assert "embeddings" in result
        assert "classification" in result

You can also test each deployment individually using their respective handles.

To execute the full test suite follow this command

In [None]:
# uncomment to run the test suite
# !cd examples && pytest tests/

### Best Practices Summary

1. **Separate Concerns**: Keep business logic in plain Python classes separate from Ray Serve deployment wrappers
2. **Unit Test Business Logic**: Test core functionality without Ray Serve dependencies
3. **Integration Test with Handles**: Use DeploymentHandle for testing deployment behavior
4. **Integration Testing with HTTP**: Perform full HTTP stack testing if needed

This testing approach ensures your Ray Serve applications are robust, maintainable, and production-ready.

## 3. Patterns of Integrating with FastAPI

Ray Serve provides flexible ways to integrate with FastAPI applications. This section covers the basic pattern and an advanced builder pattern recommended for large applications.

### Basic Pattern

The most straightforward way to integrate FastAPI with Ray Serve is to directly decorate the FastAPI app object using `@serve.ingress(app)`:

In [None]:
app = FastAPI()

@app.get("/")
def endpoint():
    return "hello"

@serve.deployment
@serve.ingress(app)
class MyDeployment:
    pass

This approach works well for simple FastAPI applications with minimal state.

### Factory Pattern for Large Applications

For large FastAPI applications, the factory/builder pattern is recommended. Instead of decorating the app object directly, use a builder function:

In [None]:
def fastapi_builder():
    app = FastAPI()
    
    @app.get("/")
    def endpoint():
        return "hello"
    
    return app

deployment = serve.deployment(serve.ingress(fastapi_builder)())
serve.run(deployment.bind(), blocking=False)

**Why Use the Factory Pattern?**

The builder pattern provides several advantages for large applications:

1. **Avoids Serialization Issues**: Directly decorating a FastAPI ASGI app object requires serialization, which can cause memory spikes when the app contains many large objects or complex state. The builder pattern defers app construction until after deployment initialization.

2. **Better Dependency Injection**: Enables FastAPI's dependency injection system to reference Ray Serve deployment handles, making it easier to test FastAPI endpoints in isolation.

3. **Cleaner Separation**: Keeps FastAPI app construction logic separate from Ray Serve deployment concerns.

#### Complete Example with Deployment Composition

Here's a comprehensive example showing the builder pattern with middleware, error handling, and sub-deployment composition:

In [None]:
def sub_deployment():
    """Create a sub-deployment for processing"""
    @serve.deployment
    class SubModel:
        def run(self, a: int):
            return a + 1
    
    return SubModel.options(name="sub_deployment")


def fastapi_builder_with_composition():
    """Build a FastAPI app with all features"""
    app = FastAPI(docs_url="/custom-docs")
    
    # Basic route
    @app.get("/")
    def root():
        return "hello"
    
    # Router for organizing endpoints
    router = APIRouter()
    
    @router.get("/f2")
    def f2():
        return "hello f2"
    
    @router.get("/error")
    def error():
        raise ValueError("some error")
    
    app.include_router(router)
    
    # Middleware
    @app.middleware("http")
    async def add_process_time_header(request: Request, call_next):
        response = await call_next(request)
        response.headers["X-Custom-Middleware"] = "fake-middleware"
        return response
    
    # Custom exception handler
    @app.exception_handler(ValueError)
    async def custom_exception_handler(request: Request, exc: ValueError):
        return JSONResponse(status_code=500, content={"error": "fake-error"})
    
    # Dependency injection for sub-deployment
    def get_sub_deployment_handle():
        return serve.get_deployment_handle("sub_deployment", "default")
    
    class Data(BaseModel):
        a: int
    
    @app.get("/sub_deployment", response_model=Data)
    async def call_sub_deployment(
        request: Request,
        handle: DeploymentHandle = Depends(get_sub_deployment_handle)
    ):
        a = int(request.query_params.get("a", 1))
        result = await handle.run.remote(a)
        return Data(a=result)
    
    return app


# Deploy the application
ingress_deployment = serve.deployment(serve.ingress(fastapi_builder_with_composition)())
app = ingress_deployment.bind(sub_deployment().bind())
serve.run(app, blocking=False)

#### Testing the Deployment

In [None]:
# Test basic endpoint
resp = requests.get("http://localhost:8000/")
assert resp.json() == "hello"
assert resp.headers["X-Custom-Middleware"] == "fake-middleware"

# Test router endpoint
resp = requests.get("http://localhost:8000/f2")
assert resp.json() == "hello f2"

# Test error handling
resp = requests.get("http://localhost:8000/error")
assert resp.status_code == 500
assert resp.json() == {"error": "fake-error"}

# Test sub-deployment composition
resp = requests.get("http://localhost:8000/sub_deployment?a=2")
assert resp.json() == {"a": 3}

#### When to Use This Pattern

Use the builder pattern when:
- Your FastAPI app has many dependencies or objects that are **expensive/impossible to serialize**
- You need to reference Ray Serve deployment handles from FastAPI endpoints
- You want to test FastAPI endpoints with mocked deployment handles

For simple FastAPI applications with minimal state, the traditional decorator approach remains a valid option.

## 4. Debugging Serve Applications

Ray Serve provides a local testing mode that enables running deployments locally in a single process, making it easier to debug your applications without the overhead of a full Ray cluster.

### Using Local Testing Mode

To enable local testing mode, use the `_local_testing_mode` flag when calling `serve.run()`:

In [None]:
app = MyDeployment.bind()
handle = serve.run(app, _local_testing_mode=True, blocking=False) # deployment will now run in a background thread

Local testing mode offers several advantages for development and debugging:

- **Simplified Debugging**: Each deployment runs in a background thread, making it easier to use debuggers and step through code
- **Faster Iteration**: No need to start a Ray cluster, reducing startup time

#### Example: Debugging 

Let's examine the script file `examples/debugging/debug.py`.

With this setup, you can:
- Run the script directly: `python debug.py`
- Set breakpoints in your IDE
- Step through the code with a debugger
- Inspect variables and state

### Limitations

While local testing mode supports most features, some limitations exist:
- Cannot convert `DeploymentResponse` to `ObjectRef`
- Not suitable for testing multi-node deployments

For production deployments or testing distributed features, use a full Ray cluster.

## 5. Configuration in Ray Serve

Ray Serve provides flexible configuration options that allow you to separate deployment settings and application parameters from your code. This section covers two complementary approaches that work together:

1. **YAML Configuration Files**: Define deployment settings (replicas, resources, routing) and application parameters
2. **Application Builders**: Functions that accept parameters and return your application, enabling dynamic configuration

These approaches work together seamlessly - you can use YAML config files to pass arguments to application builders, combining the benefits of both: version-controlled configuration with parameterized application logic.

This is useful for:
- Managing different configurations for different environments (dev, staging, production)
- Passing parameters without modifying code (model paths, hyperparameters, etc.)
- Running multiple instances of the same application with different parameters
- Version controlling both deployment settings and application arguments

### Generating a Config File

Use the `serve build` command to generate a configuration file from your Python application:

In [None]:
!cd examples/intro && serve build -o config.yaml main:mnist_app

This command:
- Reads your application definition from `main.py` (the `mnist_app` variable)
- Generates a `config.yaml` file with all deployment settings
- Includes default values for HTTP options, proxy location, and logging

### Config File Structure

A generated config file looks like this:

```yaml
proxy_location: EveryNode

http_options:
  host: 0.0.0.0
  port: 8000

grpc_options:
  port: 9000
  grpc_servicer_functions: []

logging_config:
  encoding: JSON
  log_level: INFO
  logs_dir: null
  enable_access_log: true

applications:
- name: app1
  route_prefix: /
  import_path: main:mnist_app
  runtime_env: {}
  deployments:
  - name: OnlineMNISTClassifier
```

### Key Configuration Sections

Here are the main configurations to set

- **proxy_location**: Where to run HTTP proxies (`EveryNode`, `HeadOnly`, or `Disabled`)
- **http_options**: HTTP server configuration (host, port)
- **logging_config**: Logging settings (level, format, access logs)
- **applications**: List of applications to deploy
  - **name**: Application name
  - **route_prefix**: HTTP route prefix
  - **import_path**: Python import path to your application
  - **deployments**: List of deployments in the application

### Customizing the Config

After generating the config, you can customize it:

```yaml
applications:
- name: production_app
  route_prefix: /api/v1
  import_path: main:mnist_app
  runtime_env:
    pip:
      - torch==2.0.0
      - numpy==1.24.0
  deployments:
  - name: OnlineMNISTClassifier
    num_replicas: 4
    ray_actor_options:
      num_cpus: 2
      num_gpus: 1
```

### Application Builders

When writing an application, you often have parameters that need to change **between environments** or experiments. 

For example, you might want to deploy different model weights or adjust hyperparameters without modifying your code.

**Application builders** solve this by defining a function that accepts parameters and returns a built application. 

#### Defining an Application Builder

An application builder is a function that takes an arguments dictionary (or Pydantic object) and returns the application to be run:

In [None]:
@serve.deployment
class HelloWorld:
    def __init__(self, message: str):
        self._message = message
        print("Message:", self._message)

    def __call__(self, request):
        return self._message

class HelloWorldArgs(BaseModel):
    message: str

def app_builder(args: HelloWorldArgs) -> Application:
    return HelloWorld.bind(args.message)

This `app_builder` function can be used as the import path in `serve run` commands or config files, with arguments passed separately from the code.

### Passing Arguments

Pass arguments to the application builder

In [None]:
serve.run(app_builder(HelloWorldArgs(message="Hello World")))

The arguments can also be passed to the `serve run` CLI using key=value syntax (not pydantic validation is performed automatically)

In [None]:
# uncomment to run
!cd examples/app_builder && serve run main:app_builder message="Hello from CLI" --non-blocking

Notice that "Hello from CLI" is printed from within the deployment constructor.

### Passing Arguments via Config File

You can also pass arguments to the application builder through the config file's `args` field. This combines the benefits of both approaches: version-controlled YAML configs that parameterize your application builders.

```yaml
applications:
  - name: MyApp
    import_path: hello:app_builder
    args:
      message: "Hello from config"
```

In [None]:
!cd examples/app_builder && serve run config.yaml --non-blocking

In [None]:
# cleanup
!serve shutdown -y