## **📘Sema Sasa Overview**  

### **🌍Sema Sasa: Swahili AI for Speech & Text Processing**

#### **1. Introduction**
Sema AI is an **AI-powered linguistic platform** designed to promote Swahili as a **unifying language across Africa**. It provides:  
✅ **Speech-to-Text (ASR)** – Converts spoken Swahili into text  
✅ **Text Processing** – Paraphrasing & summarization  
✅ **Translation** – Swahili ↔ English (future expansion to other African languages)  
✅ **Conversational AI (Chatbot)** – Engaging in natural Swahili conversations  

#### **2. System Architecture**
Below is the architecture of **Sema Sasa**, showing how different models interact.

![System Architecture](system_architecture1.png)

📌 **Description of the Architecture:**  
- 🗣️ **User Input (Audio or Text)** → **ASR Model (wav2vec 2.0)** → **Text Processing** (Paraphrasing, Summarization)  
- 🔄 **Translation Model** (Swahili ↔ English) → **Chatbot (Rasa, DialoGPT)**  
- 🚀 **FastAPI Backend** connects all components for real-time interaction  

#### **3. Selected AI Models & Technologies**
Sema Sasa consists of **four core modules**:  

| **Component** | **Function** | **Model Used** |
|--------------|-------------|---------------|
| **ASR (Speech-to-Text)** | Converts Swahili speech into text | `wav2vec 2.0` |
| **Text Processing** | Paraphrasing & summarization | `mT5, BART, DistilBERT` |
| **Translation** | Swahili ↔ English | `NLLB-200, M2M-100, mBART` |
| **Chatbot** | Conversational AI in Swahili | `DialoGPT, Rasa` |

#### **4. Interactive Demo**
You can try out the **ASR system** in this notebook! 

##### 🗣 **Step 1: Upload a Swahili audio file**
Run the cell below and select an `.mp3` file.

In [1]:
import ipywidgets as widgets

# File Upload Widget
upload = widgets.FileUpload(accept=".wav", multiple=False)
display(upload)


FileUpload(value=(), accept='.wav', description='Upload')

##### 🎙 **Step 2: Transcribe the audio**
Run the following cell to process and transcribe your Swahili audio.

In [2]:
# Import required libraries
import torchaudio
import torch
from transformers import AutoProcessor, AutoModelForCTC

# Load pre-trained ASR model
device = "cuda" if torch.cuda.is_available() else "cpu"
repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili"
processor = AutoProcessor.from_pretrained(repo_name)
model = AutoModelForCTC.from_pretrained(repo_name).to(device)

def transcribe_audio(audio_path):
    audio_input, sample_rate = torchaudio.load(audio_path)
    target_sample_rate = 16000
    if sample_rate != target_sample_rate:
        audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input)
    
    input_dict = processor(audio_input[0], sampling_rate=16000, return_tensors="pt", padding=True)
    input_values = input_dict["input_features"] if "input_features" in input_dict else input_dict["input_values"]
    input_values = input_values.to(device)

    with torch.no_grad():
        logits = model(input_values).logits  
    pred_ids = torch.argmax(logits, dim=-1)[0]
    transcription = processor.decode(pred_ids)
    
    return transcription

# Process uploaded file
if upload.value:
    uploaded_file = upload.value[0]  # Get the first file
    file_name = uploaded_file.name
    file_path = f"./{file_name}"

    # Save the file
    with open(file_path, "wb") as f:
        f.write(uploaded_file.content)

    # Perform transcription
    transcript = transcribe_audio(file_path)
    print(f"🎤 Transcription: {transcript}")

🎤 Transcription: kibaki ni mlezi wa shirika la chama cha viongozi wa sichana


#### **5. Running Sema Sasa**
##### **Step 1: Install dependencies**
```bash
pip install -r requirements.txt
```
##### **Step 2: Run the interactive system**
```bash
python main.py
```
##### **Step 3: Use Jupyter Notebooks**
```bash
jupyter notebook
```

#### **6. Future Work**
🔹 **Expand to other African languages**  
🔹 **Improve accuracy for Swahili dialects**  
🔹 **Deploy as a web/mobile application**  

#### **7. Conclusion**
Sema Sasa is a **game-changer** for Swahili language processing, enabling speech recognition, text enhancement, and AI-driven conversations. This project contributes to **linguistic AI equity** in Africa! 🚀🌍  
