# Lesson: Exploring OCI Speech – Speech-to-Text in Oracle Cloud Infrastructure

## Introduction
Welcome to this lesson on **OCI Speech**, Oracle’s speech-to-text service in **Oracle Cloud Infrastructure (OCI)**.  
In this session, we’ll explore how OCI Speech works, what makes it powerful, and how it can transform spoken content into high-quality, readable text.

OCI Speech leverages **advanced deep learning models** and **Oracle’s acoustic language models** to automatically transcribe audio and video files into accurate, timestamped text. The service is designed for developers and enterprises who want to extract textual data from multimedia content — **without requiring data science expertise**.

---

## What is OCI Speech?
**OCI Speech** is an AI-powered service that converts **speech into text** with high accuracy. It unlocks the data contained in audio or video recordings by transcribing spoken words into written text.  

This transcription capability enables organizations to:
- Make media content **searchable** and **analyzable**
- Generate **captions and subtitles**
- Improve **accessibility** and **automation**
- Streamline **data processing pipelines**

The service processes audio directly from **Object Storage** and outputs structured text with **timestamps** and **confidence scores** for each word or segment.

---

## Key Features of OCI Speech

### 1. **Automatic Transcription**
OCI Speech automatically converts speech in audio or video files into grammatically correct text using deep learning.  
No machine learning or data science background is required — simply upload your files and retrieve the transcriptions.

---

### 2. **Multi-Language Support**
The service currently supports multiple languages:
- **English**
- **Spanish**
- **Portuguese**  
*(More languages are planned for future updates.)*

This multilingual capability enables global accessibility for developers and enterprises working with international audio sources.

---

### 3. **Batch Processing**
OCI Speech includes **batching support**, meaning multiple audio or video files can be transcribed in a single API call.  
This feature streamlines workflows and increases efficiency when processing large volumes of media data.

---

### 4. **High-Speed Processing**
One of OCI Speech’s most impressive strengths is its **speed**.  
It can transcribe **hours of audio in under 10 minutes**.

It achieves this through **parallel processing**:
- The system **divides audio into smaller chunks**.  
- Each chunk is transcribed independently using Oracle’s models.  
- The service then **reassembles the transcribed segments** into a single, coherent output file.

This parallelization ensures both speed and accuracy without sacrificing quality.

---

### 5. **Confidence Scoring**
OCI Speech provides:
- A **confidence score per word**, indicating how certain the model is about each transcription.  
- An **overall confidence score per transcript**, summarizing the reliability of the result.  

These scores help developers assess transcription accuracy and apply post-processing or verification steps as needed.

---

### 6. **Automatic Punctuation**
The service **automatically punctuates** transcriptions, improving readability and making the text easier for downstream systems to process.  
For example, spoken sentences like:

> “welcome to oracle speech today we’re going to talk about ai models”  

are transcribed as:

> “Welcome to Oracle Speech. Today, we’re going to talk about AI models.”

---

### 7. **SRT File Support (Closed Captions)**
OCI Speech supports the **SRT file format**, which is the most widely used format for closed captions in video content.  
This feature allows users to:
- Generate **caption files** for videos automatically.  
- Integrate these captions into **media players** or **streaming platforms**.  
- Improve accessibility for audiences with hearing impairments.

---

### 8. **Text Normalization**
OCI Speech applies **text normalization**, converting raw, literal transcriptions into more human-readable forms.  
It normalizes elements such as:
- **Numbers** (e.g., “twenty one” → “21”)  
- **Addresses**  
- **Times**  
- **URLs**

This ensures the final transcript resembles natural written language rather than a word-for-word transcription of speech.

Example:

| Raw Transcription | Normalized Output |
|-------------------|------------------|
| “call me at one two three main street at five o’clock” | “Call me at 123 Main Street at 5:00.” |

---

### 9. **Profanity Filtering**
OCI Speech includes built-in **profanity filtering** for sensitive content.  
It offers three handling modes:

| Mode | Description | Example |
|------|--------------|----------|
| **Remove** | Replaces profane words with asterisks | “That was **** amazing.” |
| **Mask** | Keeps the first letter, hides the rest | “That was f*** amazing.” |
| **Tag** | Leaves the word but tags it in metadata | “That was [profanity] amazing.” |

This feature ensures safe and customizable content moderation across use cases.

---

## Summary
OCI Speech is a **fast, accurate, and developer-friendly** service that transforms spoken words into structured, readable text.  
It’s ideal for applications such as:
- Media transcription  
- Accessibility and captioning  
- Call center analytics  
- Meeting and lecture documentation  

In this lesson, you learned how **OCI Speech**:
- Converts audio and video files into timestamped text  
- Supports multiple languages  
- Provides confidence scoring, normalization, and profanity filtering  
- Generates SRT captions for video content  

By leveraging **OCI Speech**, organizations can unlock the full potential of their audio data — efficiently, accurately, and securely within Oracle Cloud Infrastructure.

**End of Lesson: Exploring OCI Speech – Speech-to-Text in Oracle Cloud Infrastructure**
