# Chapter 50: Cloud Deployment

## Learning Objectives

By the end of this chapter, you will be able to:

- Understand the core offerings of major cloud providers (AWS, Google Cloud, Azure) relevant to machine learning systems
- Design cloud architectures for time‑series prediction systems that are scalable, cost‑effective, and resilient
- Deploy your NEPSE prediction model using managed ML services like Amazon SageMaker, Google Vertex AI, and Azure Machine Learning
- Leverage serverless computing (AWS Lambda, Google Cloud Functions) for lightweight inference tasks
- Orchestrate containerised applications on Kubernetes using cloud‑managed services (EKS, GKE, AKS)
- Choose the right data storage solutions (object storage, relational databases, data warehouses) for different components of your pipeline
- Implement cost management strategies to avoid unexpected bills and optimise cloud spending
- Consider multi‑cloud and hybrid cloud approaches for redundancy and avoiding vendor lock‑in
- Apply cloud best practices for security, scalability, and monitoring as discussed in previous chapters

---

## Introduction

So far, we have built a comprehensive NEPSE stock prediction system that can ingest data, engineer features, train models, and serve predictions. We have run everything locally or on a single server. But in the real world, the system must be **deployed** to the cloud to handle unpredictable traffic, store large amounts of data reliably, and ensure high availability. Cloud platforms provide virtually unlimited resources on demand, but they also introduce complexity: you must choose the right services, configure them correctly, and manage costs.

In this chapter, we will explore how to deploy the NEPSE prediction system on the three major cloud providers: **Amazon Web Services (AWS)**, **Google Cloud Platform (GCP)**, and **Microsoft Azure**. We will cover various deployment patterns, from fully managed ML services to custom containerised solutions, and discuss the trade‑offs. By the end, you will be equipped to make informed decisions about cloud deployment for your own time‑series prediction systems.

---

## 50.1 Cloud Providers Overview

Each cloud provider offers a vast portfolio of services. For an ML system, the most relevant categories are:

- **Compute**: Virtual machines (EC2, Compute Engine, VMs), containers (EKS, GKE, AKS), and serverless (Lambda, Cloud Functions, Functions).
- **Storage**: Object storage (S3, GCS, Blob Storage), block storage (EBS, Persistent Disk), file storage (EFS, Filestore).
- **Databases**: Relational (RDS, Cloud SQL, Azure SQL), NoSQL (DynamoDB, Firestore, Cosmos DB), data warehouses (Redshift, BigQuery, Synapse).
- **Machine Learning**: Managed training and serving (SageMaker, Vertex AI, Azure ML), pre‑built AI services (Rekognition, Vision API, Cognitive Services).
- **Networking**: VPC, load balancers, API Gateway, CloudFront/CDN.
- **Monitoring**: CloudWatch, Stackdriver, Azure Monitor.

We will focus on the services that are most relevant to a time‑series prediction system like NEPSE.

### 50.1.1 Choosing a Provider

The choice often comes down to existing organisational expertise, specific service offerings, and cost. For example:

- **AWS** has the broadest and most mature ML ecosystem, with SageMaker being a comprehensive platform.
- **GCP** excels in data analytics and has strong integration with BigQuery, a serverless data warehouse that can handle large time‑series datasets.
- **Azure** is popular in enterprises with heavy Microsoft investments and offers good integration with tools like Power BI.

For the NEPSE system, any of the three could work. We will provide examples for all, but you may choose based on your preferences.

---

## 50.2 Cloud Architecture Patterns

A typical cloud‑based prediction system consists of several components:

1. **Data Ingestion Layer**: Collects raw data from sources (APIs, databases) and lands it in cloud storage.
2. **Data Lake / Warehouse**: Stores raw and processed data (e.g., S3 + Glue, BigQuery).
3. **Feature Store**: Stores pre‑computed features for reuse (e.g., Feast on cloud, or a simple database).
4. **Training Pipeline**: Periodically trains models using historical data (e.g., SageMaker training jobs, Vertex AI training).
5. **Model Registry**: Stores trained models and metadata (e.g., SageMaker Model Registry, MLflow on cloud).
6. **Inference Service**: Serves predictions via API (e.g., SageMaker endpoints, Vertex AI endpoints, custom containers on Kubernetes).
7. **Monitoring and Alerting**: Tracks system health and model drift (e.g., CloudWatch, Stackdriver, Prometheus on Kubernetes).

![Cloud Architecture Diagram](images/cloud_arch.png)

For the NEPSE system, we can implement each component using cloud services. We will walk through a concrete architecture on AWS, then highlight equivalent services on GCP and Azure.

---

## 50.3 Managed ML Services

Managed ML services abstract away the infrastructure, allowing you to focus on the model. They handle provisioning, scaling, and maintenance.

### 50.3.1 Amazon SageMaker

Amazon SageMaker is a fully managed service covering the entire ML workflow. For our NEPSE predictor, we can use:

- **SageMaker Notebooks** for exploration (similar to Jupyter).
- **SageMaker Training** for distributed training.
- **SageMaker Model Registry** to version models.
- **SageMaker Endpoints** for real‑time inference.
- **SageMaker Batch Transform** for offline predictions.

**Example: Training an XGBoost model on SageMaker**

```python
import sagemaker
from sagemaker import get_execution_role
from sagemaker.inputs import TrainingInput
from sagemaker.xgboost.estimator import XGBoost

role = get_execution_role()
session = sagemaker.Session()

# Specify the S3 location of training data
train_data_uri = 's3://nepse-data/train/'

# Create an XGBoost estimator
xgb_estimator = XGBoost(
    entry_point='train.py',           # custom training script
    hyperparameters={
        'max_depth': 5,
        'eta': 0.2,
        'gamma': 4,
        'min_child_weight': 6,
        'subsample': 0.8,
        'objective': 'binary:logistic',
        'num_round': 100
    },
    instance_type='ml.m5.xlarge',
    instance_count=1,
    framework_version='1.3-1',
    role=role,
    output_path='s3://nepse-models/'
)

# Launch training
xgb_estimator.fit({'train': TrainingInput(train_data_uri, content_type='csv')})
```

**Explanation:**  
We define an estimator with hyperparameters and instance type. SageMaker spins up a training instance, runs the script `train.py` (which should read data from the input channel), and saves the model artifact to S3.

**Deploying to a real‑time endpoint:**

```python
predictor = xgb_estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',
    endpoint_name='nepse-predictor'
)

# Now you can call the endpoint:
result = predictor.predict(data)
```

**Explanation:**  
`deploy` creates a scalable endpoint behind a load balancer. SageMaker handles health checks, auto‑scaling, and updates.

### 50.3.2 Google Vertex AI

Vertex AI is Google’s unified ML platform. It offers similar capabilities:

- **Vertex AI Workbench** for notebooks.
- **Vertex AI Training** for custom and pre‑built containers.
- **Vertex AI Model Registry**.
- **Vertex AI Endpoints** for prediction.

**Example: Training an XGBoost model on Vertex AI**

```python
from google.cloud import aiplatform

aiplatform.init(project='my-project', location='us-central1')

# Define training job
job = aiplatform.CustomTrainingJob(
    display_name='nepse-xgboost',
    script_path='trainer.py',
    container_uri='gcr.io/cloud-aiplatform/training/xgboost-cpu.0-90:latest',
    requirements=['pandas', 'scikit-learn'],
    model_serving_container_image_uri='gcr.io/cloud-aiplatform/prediction/xgboost-cpu.0-90:latest'
)

# Run training
model = job.run(
    dataset=None,  # we'll pass data via arguments
    args=['--data-uri', 'gs://nepse-data/train/'],
    replica_count=1,
    machine_type='n1-standard-4'
)

# Deploy
endpoint = model.deploy(machine_type='n1-standard-2')
```

**Explanation:**  
Vertex AI uses custom containers; we specify the XGBoost training container and a script. The trained model is automatically registered and can be deployed to an endpoint.

### 50.3.3 Azure Machine Learning

Azure ML provides a similar workflow:

```python
from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig
from azureml.core.compute import ComputeTarget, AmlCompute

ws = Workspace.from_config()

# Create compute cluster
compute_cluster = AmlCompute.create_or_attach(ws, name='cpu-cluster', min_nodes=0, max_nodes=4)

# Define environment
env = Environment.from_conda_specification(name='xgboost-env', file_path='conda.yml')

# Configure training script
config = ScriptRunConfig(
    source_directory='.',
    script='train.py',
    arguments=['--data-folder', 'wasbs://...'],
    compute_target=compute_cluster,
    environment=env
)

# Submit experiment
run = Experiment(ws, 'nepse-training').submit(config)
run.wait_for_completion()

# Register model
model = run.register_model(model_name='nepse-xgboost', model_path='outputs/model.pkl')

# Deploy to endpoint
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script='score.py', environment=env)
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
service = Model.deploy(ws, 'nepse-service', [model], inference_config, deployment_config)
service.wait_for_deployment()
```

**Explanation:**  
Azure ML uses a workspace, compute targets, and environments. The model is registered and can be deployed to Azure Container Instances (ACI) or Kubernetes (AKS).

---

## 50.4 Serverless ML

For low‑traffic or sporadic prediction workloads, serverless functions can be cost‑effective. They scale to zero when not in use, but have cold‑start latency.

### 50.4.1 AWS Lambda with Container Support

AWS Lambda now supports packaging models as container images (up to 10 GB). You can deploy a lightweight inference function.

**Example: Lambda function for NEPSE prediction**

```python
# Dockerfile
FROM public.ecr.aws/lambda/python:3.9
COPY app.py requirements.txt ./
RUN pip install -r requirements.txt
CMD ["app.handler"]
```

```python
# app.py
import json
import joblib
import numpy as np

model = joblib.load('model.pkl')  # loaded at cold start

def handler(event, context):
    body = json.loads(event['body'])
    features = np.array(body['features']).reshape(1, -1)
    pred = model.predict_proba(features)[0, 1]
    return {
        'statusCode': 200,
        'body': json.dumps({'probability': pred})
    }
```

Deploy using the AWS CLI or SAM. Lambda auto‑scales with concurrency, but each new instance loads the model (cold start). To reduce cold starts, you can enable **provisioned concurrency**.

### 50.4.2 Google Cloud Functions

Cloud Functions has a shorter timeout (9 minutes) and smaller memory, suitable for lightweight models.

```python
import joblib
import numpy as np

model = joblib.load('model.pkl')

def predict(request):
    request_json = request.get_json()
    features = np.array(request_json['features']).reshape(1, -1)
    prob = model.predict_proba(features)[0, 1]
    return {'probability': prob}
```

### 50.4.3 Azure Functions

Similar to AWS Lambda, Azure Functions supports custom containers and can be triggered via HTTP.

---

## 50.5 Container Orchestration with Kubernetes

For more control and portability, you can run your own containers on Kubernetes. Cloud providers offer managed Kubernetes:

- **Amazon EKS** (Elastic Kubernetes Service)
- **Google GKE** (Google Kubernetes Engine)
- **Azure AKS** (Azure Kubernetes Service)

You can deploy your prediction service as a deployment, expose it via a load balancer, and scale automatically.

**Example: Deploying NEPSE predictor on EKS**

First, build and push your Docker image to Amazon ECR. Then create a Kubernetes deployment:

```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nepse-predictor
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nepse-predictor
  template:
    metadata:
      labels:
        app: nepse-predictor
    spec:
      containers:
      - name: predictor
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/nepse-predictor:latest
        ports:
        - containerPort: 8000
        env:
        - name: MODEL_PATH
          value: /app/model.pkl
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
```

```yaml
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: nepse-predictor
spec:
  selector:
    app: nepse-predictor
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
```

Apply with `kubectl apply -f deployment.yaml -f service.yaml`. The cloud provider provisions a load balancer and assigns a public IP.

**Auto‑scaling with Horizontal Pod Autoscaler:**

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nepse-predictor-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nepse-predictor
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```

Kubernetes will scale the number of pods based on CPU usage.

---

## 50.6 Data Storage in the Cloud

Choosing the right storage for each component is crucial.

### 50.6.1 Object Storage (S3, GCS, Blob)

Use object storage for:

- Raw data (CSV files, Parquet)
- Model artifacts
- Feature store backup

**Example: Storing NEPSE data in S3 and reading with pandas**

```python
import boto3
import pandas as pd
from io import StringIO

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='nepse-data', Key='raw/nepse_2024.csv')
data = obj['Body'].read().decode('utf-8')
df = pd.read_csv(StringIO(data))
```

### 50.6.2 Relational Databases (RDS, Cloud SQL, Azure SQL)

Use for:

- Metadata about models, experiments
- User data, API keys
- Small transactional data

**Example: Connecting to PostgreSQL on RDS**

```python
import psycopg2
import os

conn = psycopg2.connect(
    host=os.environ['DB_HOST'],
    database='nepsedb',
    user=os.environ['DB_USER'],
    password=os.environ['DB_PASSWORD']
)
cur = conn.cursor()
cur.execute("SELECT * FROM predictions WHERE symbol='NABIL'")
rows = cur.fetchall()
```

### 50.6.3 Data Warehouses (Redshift, BigQuery, Synapse)

For analytical queries on large historical datasets, a data warehouse is ideal. BigQuery, in particular, is serverless and can query terabytes of data in seconds.

**Example: Querying NEPSE data in BigQuery**

```sql
SELECT symbol, AVG(close) as avg_close
FROM `my-project.nepse_dataset.prices`
WHERE date >= '2024-01-01'
GROUP BY symbol
```

You can load data from GCS into BigQuery periodically.

### 50.6.4 Feature Stores (Feast on Cloud)

For production ML, a feature store ensures consistent feature computation between training and serving. Feast can be deployed on Kubernetes and use Redis (for online) and BigQuery (for offline).

---

## 50.7 Cost Management

Cloud costs can spiral if not managed. Here are key strategies:

- **Right‑sizing instances**: Use monitoring to identify under‑utilised resources and downsize.
- **Spot/preemptible instances**: For non‑critical batch jobs (training, backtesting), use spot instances (AWS) or preemptible VMs (GCP) at 60‑90% discount.
- **Auto‑scaling**: Scale down to zero when not in use (e.g., development environments).
- **Storage lifecycle**: Move old data to cheaper tiers (S3 Glacier, GCS Coldline).
- **Reserved instances / savings plans**: Commit to 1‑3 years for steady workloads to get significant discounts.
- **Monitor and alert**: Set up budgets and alerts to notify when costs exceed thresholds.

**Example: Using AWS Budgets to alert on cost**

```bash
aws budgets create-budget \
    --account-id 123456789012 \
    --budget file://budget.json \
    --notifications-with-subscribers file://subscribers.json
```

Budget JSON defines the amount and time period; subscribers get email alerts.

---

## 50.8 Multi‑Cloud and Hybrid Cloud

Some organisations adopt a multi‑cloud strategy to avoid vendor lock‑in or for redundancy. However, it adds complexity.

- **Multi‑cloud**: Run parts of the system on different clouds. For example, use AWS for training and GCP for BigQuery analytics.
- **Hybrid cloud**: Connect on‑premises data centres with cloud services. This might be necessary if data cannot leave a certain jurisdiction.

**Tools for multi‑cloud/hybrid:**

- **Kubernetes** (with federation) can run anywhere.
- **Terraform** can manage infrastructure across clouds.
- **Istio** can create a service mesh spanning clouds.

For the NEPSE system, unless you have specific requirements, starting with a single cloud is simpler and cheaper.

---

## 50.9 Cloud Best Practices

1. **Infrastructure as Code (IaC)**: Use Terraform, CloudFormation, or Deployment Manager to define your infrastructure. This makes it repeatable and auditable.
2. **Security**: Follow the principle of least privilege for IAM roles. Use VPCs to isolate resources.
3. **Monitoring**: Enable detailed monitoring for all services. Use cloud‑native tools (CloudWatch, Stackdriver) and integrate with your central observability stack.
4. **Backup and disaster recovery**: Regularly back up databases and model artifacts to another region. Test recovery procedures.
5. **Tagging**: Tag all resources (e.g., `Project=NEPSE`, `Environment=Production`) for cost allocation and management.
6. **Use managed services where possible**: They reduce operational overhead. Only run your own Kubernetes if you need customisation.

---

## Chapter Summary

In this chapter, we explored the deployment of the NEPSE prediction system on major cloud platforms. We covered:

- The core services offered by AWS, GCP, and Azure for ML workloads.
- Architecture patterns for cloud‑based prediction systems.
- Using managed ML services (SageMaker, Vertex AI, Azure ML) to train and deploy models with minimal infrastructure.
- Serverless options for lightweight, intermittent inference.
- Running containerised applications on managed Kubernetes (EKS, GKE, AKS).
- Choosing appropriate data storage solutions for different parts of the pipeline.
- Cost management strategies to keep cloud bills under control.
- Considerations for multi‑cloud and hybrid cloud approaches.
- Best practices for security, monitoring, and infrastructure as code.

By leveraging the cloud, your NEPSE prediction system can scale to handle any load, store petabytes of data reliably, and remain highly available. The same principles apply to any time‑series prediction system you build.

This chapter concludes **Part XI: Advanced Implementation Patterns**. In the next part, we will discuss **Industry Best Practices and Standards**, covering topics like development workflows, team collaboration, and project management for ML systems.

---

**End of Chapter 50**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='49. security_and_compliance.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='../7. advanced_topics/51. ensemble_methods.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
