# Amazon Bedrock: A Serverless Foundation Model Service

## Table of Contents
1. [Introduction](#introduction)
2. [Key Features](#key-features)
3. [How It Works](#how-it-works)
4. [Supported Foundation Models](#supported-foundation-models)
5. [Next Steps](#next-steps)

## Introduction
Amazon Bedrock is a fully managed serverless service from AWS that provides API access to various foundation models from Amazon and third-party providers.

## Key Features
- <span style="color:blue">#serverless</span> Fully managed and serverless
- <span style="color:blue">#api</span> Unified API for multiple foundation models
- <span style="color:blue">#payperyuse</span> Pay-per-use pricing model
- <span style="color:blue">#thirdparty</span> Access to third-party foundation models

## How It Works
1. Users access Bedrock via AWS Console, CLI, or SDK
2. API requests include prompt and configuration parameters
3. Bedrock routes requests to appropriate hosted foundation model
4. Foundation model processes request and returns response

## Supported Foundation Models
- <span style="color:blue">#amazon</span> Amazon Titan: General purpose model
- <span style="color:blue">#ai21labs</span> AI21 Labs Jurassic-2: Multilingual text generation
- <span style="color:blue">#anthropic</span> Anthropic Claude: Question answering and workflow automation
- <span style="color:blue">#stability</span> Stability AI Stable Diffusion: Image generation

## Next Steps
- Explore model selection criteria
- Learn about configuration parameters
- Dive deeper into specific use cases for each model

# Amazon Bedrock Console Walkthrough

## Table of Contents
1. [Accessing Amazon Bedrock](#accessing-amazon-bedrock)
2. [Requesting Model Access](#requesting-model-access)
3. [Console Overview](#console-overview)
4. [Foundation Models](#foundation-models)
5. [Understanding Max Tokens](#understanding-max-tokens)

## Accessing Amazon Bedrock
- <span style="color:blue">#console</span> Access Amazon Bedrock through AWS console
- <span style="color:blue">#region</span> Service available in limited regions (e.g., US East North Virginia)

## Requesting Model Access
- <span style="color:blue">#modelaccess</span> Navigate to "Model Access" in the sidebar
- <span style="color:blue">#requestaccess</span> Select desired models and submit access request
- <span style="color:blue">#waittime</span> Access granted within minutes to hours

## Console Overview
- <span style="color:blue">#overview</span> Provides details on foundation models
- <span style="color:blue">#playground</span> Test models before enterprise implementation
- <span style="color:blue">#handsonlabs</span> Available for practical learning

## Foundation Models
- <span style="color:blue">#providers</span> Models from various providers (Anthropic, Cohere, AI21 Labs, etc.)
- <span style="color:blue">#modalities</span> Three main modalities: text, embeddings, and image
- <span style="color:blue">#versions</span> Different versions available for some models

## Understanding Max Tokens
- <span style="color:blue">#maxtokens</span> Limit on combined input and output tokens
- <span style="color:blue">#tokencount</span> Approximately 4 characters or 0.75 words per token
- <span style="color:blue">#examples</span>
  - Anthropic Claude: 100K tokens (≈75,000 words)
  - Cohere: 4000 tokens (≈3000 words)
  - Stability Diffusion: 77 tokens (≈58 words for input prompt)

# Amazon Bedrock Advanced Features

## Table of Contents
1. [Custom Models](#custom-models)
2. [Model Providers](#model-providers)
3. [Playground](#playground)
4. [Provisioned Throughput](#provisioned-throughput)
5. [Examples](#examples)

## Custom Models
- <span style="color:blue">#finetune</span> Fine-tune base models for specific domains
- <span style="color:blue">#process</span> Select source model, provide labeled data, configure hyperparameters
- <span style="color:blue">#availability</span> Currently limited to Titan Express and Titan Lite models
- <span style="color:blue">#inputformat</span> JSON format for input and expected output

## Model Providers
- <span style="color:blue">#details</span> Provides information on each provider and their models
- <span style="color:blue">#usecases</span> Lists popular use cases for each model
- <span style="color:blue">#examples</span> Offers example prompts and responses

## Playground
- <span style="color:blue">#testing</span> Environment to test models and adjust inference parameters
- <span style="color:blue">#models</span> Includes text, chat, and image generation models
- <span style="color:blue">#configuration</span> Allows adjustment of inference settings like temperature

## Provisioned Throughput
- <span style="color:blue">#customdeployment</span> Required for deploying custom models
- <span style="color:blue">#cost</span> Significantly more expensive than serverless base model usage
- <span style="color:blue">#pricing</span> Based on model units and commitment period

## Examples
- <span style="color:blue">#usecase</span> AWS-provided examples for various tasks
- <span style="color:blue">#categories</span> Includes content generation, entity extraction, image creation, etc.
- <span style="color:blue">#exploration</span> Useful for understanding common applications of the models

## Key Points
- <span style="color:blue">#costaware</span> Be mindful of costs, especially with provisioned throughput
- <span style="color:blue">#exploration</span> Playground and examples are useful for testing and learning
- <span style="color:blue">#modelselection</span> Consider specific use cases when choosing models

# Amazon Bedrock Architecture (Part 1)

## Table of Contents
1. [AWS Service Deployment Modes](#aws-service-deployment-modes)
2. [Bedrock Service Architecture](#bedrock-service-architecture)
3. [API Request Methods](#api-request-methods)
4. [Request Flow](#request-flow)
5. [Networking Options](#networking-options)

## AWS Service Deployment Modes
- <span style="color:blue">#customerVPC</span> Services deployed in customer's VPC (e.g., EC2, RDS)
- <span style="color:blue">#awsManaged</span> Services deployed in AWS-managed accounts (e.g., Lambda, API Gateway)

## Bedrock Service Architecture
- <span style="color:blue">#serviceAccount</span> Bedrock deployed in AWS-managed Bedrock Service Account
- <span style="color:blue">#modelProvider</span> Foundation models hosted in AWS-owned Model Provider Escrow Account
- <span style="color:blue">#storage</span> Base models stored in S3 buckets within AWS-managed account

## API Request Methods
1. <span style="color:blue">#browser</span> Through web browser
2. <span style="color:blue">#awsCLI</span> Using AWS CLI
3. <span style="color:blue">#apiRequest</span> Via AWS services (e.g., Lambda)

## Request Flow
1. <span style="color:blue">#requestInitiation</span> User initiates request
2. <span style="color:blue">#runtimeInference</span> Bedrock service determines appropriate foundation model
3. <span style="color:blue">#modelExecution</span> Request sent to specific model provider account
4. <span style="color:blue">#responseReturn</span> Response returned to user

## Networking Options
- <span style="color:blue">#internet</span> API requests over the internet
- <span style="color:blue">#vpcEndpoint</span> Private connectivity via VPC interface endpoints

## Additional Features
- <span style="color:blue">#promptHistory</span> Stores queries made through console for experimentation

## VPC Endpoint Setup
1. Navigate to VPC dashboard
2. Select "Endpoints"
3. Create new endpoint
4. Search for "bedrock"
5. Select the interface endpoint for Bedrock service

# Foundation Model Inference Parameters

## Table of Contents
1. [Classification of Inference Parameters](#classification-of-inference-parameters)
2. [Randomness and Diversity](#randomness-and-diversity)
3. [Temperature](#temperature)
4. [Top K](#top-k)
5. [Top P](#top-p)
6. [Practical Examples](#practical-examples)

## Classification of Inference Parameters
<span style="color:blue">#inferenceParameters</span>
- Randomness and Diversity: Temperature, Top K, Top P
- Response Length: Response length, Length penalty, Stop sequence
- Repetition: Repetition penalty

## Randomness and Diversity
<span style="color:blue">#randomnessAndDiversity</span>
- Most important parameter category
- Influences the variety and unpredictability of responses
- Example prompt: "I hear the hoofbeats of..."
- Model generates multiple possible word completions with associated probabilities

## Temperature
<span style="color:blue">#temperature</span>
- Range: Typically 0 to 1 or 0 to 5, depending on the model
- Function:
  - Low values (close to 0): Favor high-probability words, less random
  - High values: Favor more diverse, potentially less probable words
- Example:
  - Low temperature might consistently output "horse"
  - High temperature might output less common words like "change in distance"

## Top K
<span style="color:blue">#topK</span>
- Definition: Limits selection to top K most probable words
- Range: Varies by model (e.g., 1 to 500 for Cohere)
- Function:
  - Filters out less probable words before applying temperature
  - Example: If K=3, only considers top 3 most probable words
- Interaction with temperature:
  - Low temperature: Selects highest probability word from top K
  - High temperature: May select lower probability words from top K

## Top P (Nucleus Sampling)
<span style="color:blue">#topP</span>
- Definition: Caps choices based on cumulative probability
- Range: Typically 0.01 to 0.99
- Function:
  - Selects words until cumulative probability exceeds threshold
  - Example: If P=0.5, selects words until their combined probability > 0.5
- Provides dynamic cutoff compared to fixed Top K

## Practical Examples
<span style="color:blue">#practicalExamples</span>
- Demonstrated using Amazon Bedrock console
- Model used: Cohere
- Tests performed:
  1. Low temperature (0): Consistently output "horse" or "horses"
  2. High temperature (5): Expected diverse outputs, but results varied
  3. Adjusting Top K: Set to 45, limiting word choices
  4. Modifying Top P: Set to 0.64, affecting probability cutoff
- Observations: Changing parameter combinations led to varied responses

## Key Points
<span style="color:blue">#keyPoints</span>
- Parameter interaction: Temperature, Top K, and Top P work together to affect final output
- Model variation: Parameter ranges and effects can differ between foundation models
- Experimentation: Adjusting parameters can lead to diverse responses for the same prompt
- Practical application: Understanding these parameters helps in fine-tuning model outputs for specific use cases

# Foundation Model Inference Parameters: Length and Repetition

## Table of Contents
1. [Length Parameters](#length-parameters)
2. [Repetition Parameters](#repetition-parameters)
3. [Practical Examples](#practical-examples)

## Length Parameters
<span style="color:blue">#lengthParameters</span>

### Max Length
- <span style="color:blue">#maxLength</span>
- Definition: Controls the length of the generated response
- Range: Typically 1 to 4,096 tokens (varies by model)
- Importance:
  - Helps control the cost of API usage
  - 1 token ≈ 4 characters ≈ 0.75 words
- Example: 4,096 tokens ≈ 3,000 words
- Pricing impact:
  - Pricing often based on per 1,000 tokens
  - E.g., AI21 Labs: $0.0125 per 1,000 input tokens

### Stop Sequence
- <span style="color:blue">#stopSequence</span>
- Function: Stops token generation when specified keyword is encountered
- Usage:
  - Can define up to 4 sequences
  - Generated text does not include the stop sequence
- Application: Useful for controlling response format or length

## Repetition Parameters
<span style="color:blue">#repetitionParameters</span>

### Presence Penalty
- <span style="color:blue">#presencePenalty</span>
- Range: 0 to 5
- Function: Higher values reduce probability of repeating tokens from prompt or completion

### Count Penalty
- <span style="color:blue">#countPenalty</span>
- Range: 0 to 1
- Function: Higher values lower probability of word repetition, proportional to appearances

### Frequency Penalty
- <span style="color:blue">#frequencyPenalty</span>
- Range: 0 to 500
- Function: Higher values reduce probability of repeating tokens, normalized to text length

### Penalize Special Token
- <span style="color:blue">#penalizeSpecialToken</span>
- Function: Reduces probability of repeating special characters (e.g., whitespaces, punctuations)

## Practical Examples
<span style="color:blue">#practicalExamples</span>

### Length Parameter Demo
- Used AI21 Labs model in Amazon Bedrock console
- Prompt: "Write an essay on horse"
- Test 1:
  - Max length: 200 tokens
  - Result: Approximately 14 lines of text
- Test 2:
  - Max length: 90 tokens
  - Result: Response reduced to 2 lines

### Repetition Parameter Availability
- Not available in all models (e.g., absent in Cohere and Anthropic models)
- Available in AI21 Labs model
- Console interface shows options for presence penalty, count penalty, frequency penalty, and special token penalization

## Key Points
<span style="color:blue">#keyPoints</span>
- Length parameters crucial for controlling response size and API costs
- Repetition parameters help in generating more diverse and less repetitive text
- Parameter availability and ranges can vary significantly between different foundation models
- Experimentation with these parameters can help in fine-tuning model outputs for specific use cases

# Amazon Bedrock Pricing

## Table of Contents
1. [Pricing Modes](#pricing-modes)
2. [On-Demand Pricing](#on-demand-pricing)
3. [Provisioned Throughput](#provisioned-throughput)
4. [Pricing Example](#pricing-example)

## Pricing Modes
<span style="color:blue">#pricingModes</span>
- Two main modes:
  1. On-demand mode
  2. Provisioned throughput

## On-Demand Pricing
<span style="color:blue">#onDemandPricing</span>

### General Characteristics
- Pay-as-you-go model
- No time-based or unit-based commitments

### Text Generation Models
<span style="color:blue">#textPricing</span>
- Charged for both input and output tokens
- Example (AI21 Labs Jurassic-2):
  - $0.0125 per 1000 input tokens
  - $0.0125 per 1000 output tokens
- Token to word ratio: 1000 tokens ≈ 750 words

### Image Generation Models
<span style="color:blue">#imagePricing</span>
- Charged per image generated
- Example (Stability AI Stable Diffusion):
  - $0.018 per image (512x512 or smaller)
  - $0.036 per image (larger than 512x512)

### Embeddings
<span style="color:blue">#embeddingsPricing</span>
- Charged for input tokens processed
- Example (Cohere Command):
  - $0.0015 per 1000 input tokens

## Provisioned Throughput
<span style="color:blue">#provisionedThroughput</span>

### Use Cases
1. Large, consistent inference workloads needing guaranteed throughput
2. Deployment of custom models

### Characteristics
- Requires time commitment (1-6 months)
- Significantly more expensive than on-demand pricing

### Pricing Examples
- Anthropic:
  - $40 per hour per model (1-month commitment)
  - $22 per hour per model (6-month commitment)
- Amazon Titan Text Light:
  - Lower pricing compared to third-party models

## Pricing Example
<span style="color:blue">#pricingExample</span>

### Scenario
- Using AI21 Jurassic-2 Mid model
- Input: 10,000 tokens (≈7,500 words)
- Output: 2,000 tokens (≈1,500 words)

### Calculation
- Input cost: (10,000 / 1000) * $0.0125 = $0.125
- Output cost: (2,000 / 1000) * $0.0125 = $0.025
- Total cost: $0.125 + $0.025 = $0.15

## Key Points
<span style="color:blue">#keyPoints</span>
- On-demand pricing is very cost-effective for most use cases
- Provisioned throughput is significantly more expensive but offers guaranteed performance
- Custom models require provisioned throughput
- Actual costs can be very low for typical text processing tasks
- Consider input and output token counts when estimating costs