# Assignment AI Agent for Money Requests

### Introduction

Welcome to the **AI Agent for Money Requests**, a cutting-edge application designed to streamline your project funding requests. This app leverages advanced AI technologies, including:

- **Speech-to-Text Conversion**: Powered by Whisper, enabling you to submit requests via voice commands.
- **Entity Extraction**: Utilizing a fine-tuned spaCy model to identify key details such as project name, amount, and reason from your input.
- **Database Integration**: Effortlessly store and retrieve your requests with the built-in database functionality.

### Key Features:
1. **Text and Voice Input**: Submit your funding requests through text or voice commands for maximum convenience.
2. **Entity Recognition**: Automatically extract and validate essential information like project name, amount, and reason.
3. **Database Records**: View all your previously submitted requests in an easy-to-read format.
4. **User-Friendly Interface**: A clean and intuitive design to make your experience seamless.

This application simplifies the process of managing project funding requests, ensuring accuracy, efficiency, and a user-friendly interaction. Start exploring today!


# Tools and Requirements

### Tools Used
1. **[Whisper](https://github.com/openai/whisper)**  
   - **Purpose**: Converts voice input to text for speech-to-text functionality.  
   - **Reason**: Whisper's state-of-the-art accuracy and multilingual support make it ideal for processing natural speech inputs.  
   - **Open Source**: Yes, enabling customization and integration.

2. **[spaCy](https://spacy.io/)**  
   - **Purpose**: Extracts key entities like project name, amount, and reason from the input text.  
   - **Reason**: spaCy is highly efficient for Named Entity Recognition (NER) tasks and supports fine-tuning for domain-specific needs.  
   - **Open Source**: Yes, allowing fine-tuning and deployment without licensing concerns.

3. **[Gradio](https://gradio.app/)**  
   - **Purpose**: Provides a user-friendly interface for text and voice input.  
   - **Reason**: Gradio simplifies the creation of interactive AI applications with minimal code.  
   - **Open Source**: Yes, enabling seamless integration into any Python project.

4. **SQLite**  
   - **Purpose**: Stores project funding requests in a lightweight, easily accessible database.  
   - **Reason**: SQLite is simple, file-based, and requires no additional setup, making it perfect for prototyping.  
   - **Open Source**: Yes, with no licensing costs.

---

### Why These Tools Were Chosen
The combination of Whisper, spaCy, Gradio, and SQLite ensures that the app is:  
- **Accurate**: Whisper and spaCy provide reliable speech recognition and entity extraction.  
- **User-Friendly**: Gradio offers a clean, intuitive interface for end-users.  
- **Efficient**: SQLite handles data storage without additional infrastructure requirements.  
- **Customizable**: Open-source tools allow for modifications, extensions, and domain-specific enhancements.

---

### Future Improvements
1. **Arabic Language Support**:  
   - Extend the model fine-tuning to handle Arabic inputs for both speech-to-text and entity extraction.  
   - This would enhance accessibility for Arabic-speaking users.  
   - Example: Fine-tune Whisper's multilingual support and spaCy's NER for Arabic-specific patterns.

2. **Enhanced Validation**:  
   - Improve validation for inputs by incorporating more robust checks for field consistency (e.g., currency formats and project name validation).  

3. **Advanced Database Features**:  
   - Add features like filtering, sorting, and exporting requests for better data management.  

4. **Interactive Insights**:  
   - Visualize stored data using graphs or dashboards to track project funding trends over time.

5. **Model Fine-Tuning**:  
   - Continue fine-tuning the spaCy model with a larger dataset of domain-specific examples for improved accuracy.

6. **Voice Accuracy Enhancement**:  
   - Integrate custom training for Whisper to improve transcription accuracy for noisy environments and accents.

---

### Open Source Benefits
Using open-source tools not only reduces costs but also provides the flexibility to customize the application. Contributions from the community can further enhance the functionality and user experience. Open-source principles also align with the goal of making AI accessible and transparent.





#Implementation...


# Step-1 Dataset creation for example problem
---------------------------------------------

### 1. Creating a Dataset
To address the problem of extracting key entities like project names, reasons, and amounts from textual requests, the first step is generating a dataset with labeled examples.  

#### Dataset Generation
- **Process**: We used Python's `random` library to generate 1000 synthetic sentences simulating real-world money request scenarios.  
- **Entities Labeled**: Each sentence includes three key entities:  
  - **PROJECT**: Represents the project name.  
  - **MONEY**: Indicates the amount being requested.  
  - **REASON**: Explains the purpose of the request.  
- **Sentence Variations**: Multiple sentence structures were created to improve model robustness and handle diverse inputs.

#### Example Sentences in the Dataset
```python
[
    (
        "I need to request money for project 223 to buy some tools. The amount I need is 500 Riyals.",
        {"entities": [(30, 33, "PROJECT"), (67, 71, "REASON"), (91, 101, "MONEY")]}
    ),
    (
        "Requesting 3000 Riyals for project AI Innovation Lab to fund research.",
        {"entities": [(35, 51, "PROJECT"), (58, 71, "REASON"), (11, 22, "MONEY")]}
    )
]


In [3]:
import random

projects = ["223", "Abha University", "AI Innovation Lab", "Green Energy Project"]
reasons = ["buy some tools", "purchase hardware", "procure equipment", "fund research"]
amounts = ["500 Riyals", "1000 Riyals", "2000 Riyals", "3000 Riyals"]

training_data = []

for _ in range(1000):  # Generate 1000 samples
    project = random.choice(projects)
    reason = random.choice(reasons)
    amount = random.choice(amounts)

    sentence_variations = [
        f"I need to request money for project {project} to {reason}. The amount I need is {amount}.",
        f"I want to request {amount} for {reason} for project {project}.",
        f"Please allocate {amount} for the project {project} because I need to {reason}.",
        f"Requesting {amount} for project {project} to {reason}.",
    ]

    sentence = random.choice(sentence_variations)
    entities = {
        "entities": [
            (sentence.index(project), sentence.index(project) + len(project), "PROJECT"),
            (sentence.index(amount), sentence.index(amount) + len(amount), "MONEY"),
            (sentence.index(reason), sentence.index(reason) + len(reason), "REASON"),
        ]
    }
    training_data.append((sentence, entities))

# Print a few samples of the training data
for data in training_data[:5]:
    print(data)

('I want to request 1000 Riyals for procure equipment for project 223.', {'entities': [(64, 67, 'PROJECT'), (18, 29, 'MONEY'), (34, 51, 'REASON')]})
('Please allocate 1000 Riyals for the project Abha University because I need to procure equipment.', {'entities': [(44, 59, 'PROJECT'), (16, 27, 'MONEY'), (78, 95, 'REASON')]})
('I need to request money for project Green Energy Project to purchase hardware. The amount I need is 500 Riyals.', {'entities': [(36, 56, 'PROJECT'), (100, 110, 'MONEY'), (60, 77, 'REASON')]})
('Please allocate 3000 Riyals for the project Abha University because I need to buy some tools.', {'entities': [(44, 59, 'PROJECT'), (16, 27, 'MONEY'), (78, 92, 'REASON')]})
('Requesting 3000 Riyals for project AI Innovation Lab to fund research.', {'entities': [(35, 52, 'PROJECT'), (11, 22, 'MONEY'), (56, 69, 'REASON')]})


In [5]:
#downloading the pretrained en_core_web_md spacy model. Also there are other models sm, lg available
!python -m spacy download en_core_web_md

Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-md
Successfully installed en-core-web-md-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


# Step-2 Fine-tune the spaCy Model for Named Entity Recognition (NER) on our custom dataset

### Overview
This section describes the process of fine-tuning a spaCy model to recognize domain-specific entities, such as project names, monetary amounts, and reasons, from text data.  

---

### Steps for Training

1. **Model Setup**:
   - We used the **pre-trained `en_core_web_md` spaCy model** for its rich linguistic features and vocabulary.

2. **Adding the Named Entity Recognizer (NER)**:
   - Checked if the NER pipeline existed in the model; if not, it was added.
   - Labels (e.g., "PROJECT", "MONEY", "REASON") were introduced to the NER component based on the dataset.

3. **Optimizing Hyperparameters**:
   - **Number of Iterations**: Set to **50** for effective learning.
   - **Batch Size**: Used a batch size of **16** to balance training speed and model accuracy.
   - **Learning Rate**: Adjusted to **0.001** for precise gradient updates during fine-tuning.

4. **Training Process**:
   - All pipelines except NER were disabled for focused training.
   - A minibatch approach was employed to train the model on small, randomized subsets of data for efficiency.
   - For each example, the text was converted into a spaCy `Example` object, and the model was updated iteratively to minimize the loss.

5. **Saving the Fine-Tuned Model**:
   - After training, the fine-tuned model was saved locally for later use in inference tasks.

---

### Outcome
This process successfully produced a fine-tuned spaCy NER model capable of identifying PROJECT, MONEY, and REASON entities from text data with high precision.


In [6]:
import spacy
from spacy.training.example import Example
import random
from spacy.util import minibatch

# Load the pre-trained model or create a blank English model
nlp = spacy.load("en_core_web_md")  # or spacy.blank("en") for a blank model


TRAINING_DATA = training_data

# Get the Named Entity Recognizer (NER) pipeline
if "ner" not in nlp.pipe_names:
    ner = nlp.add_pipe("ner", last=True)
else:
    ner = nlp.get_pipe("ner")

# Add labels to the NER
for _, annotations in TRAINING_DATA:
    for ent in annotations["entities"]:
        ner.add_label(ent[2])

# Disable other pipelines during training
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]

# Optimize hyperparameters
n_iterations = 50  # Increase the number of iterations
batch_size = 16  # Use larger batch size for better learning
learning_rate = 0.001  # Lower learning rate for fine-tuning

def create_batches(data, size):
    return minibatch(data, size=size)

# Train the model
with nlp.disable_pipes(*unaffected_pipes):  # Only train NER
    optimizer = nlp.create_optimizer()
    for itn in range(n_iterations):  # Number of iterations
        random.shuffle(TRAINING_DATA)
        losses = {}
        batches = create_batches(TRAINING_DATA, size=batch_size)
        for batch in batches:
            for text, annotations in batch:
                doc = nlp.make_doc(text)
                example = Example.from_dict(doc, annotations)
                nlp.update([example], drop=0.2, losses=losses, sgd=optimizer)
        print(f"Iteration {itn + 1} - Losses: {losses}")

# Save the fine-tuned model
nlp.to_disk(r"fine_tuned_model")


Iteration 1 - Losses: {'ner': 314.63099747429385}
Iteration 2 - Losses: {'ner': 2.200482643013824e-05}
Iteration 3 - Losses: {'ner': 1.4298861095634986e-05}
Iteration 4 - Losses: {'ner': 21.773064165830647}
Iteration 5 - Losses: {'ner': 15.397861972348933}
Iteration 6 - Losses: {'ner': 0.11569479066907401}
Iteration 7 - Losses: {'ner': 23.77775287350021}
Iteration 8 - Losses: {'ner': 8.503669715910798e-06}
Iteration 9 - Losses: {'ner': 7.270270054002184e-07}
Iteration 10 - Losses: {'ner': 3.300903324582586e-06}
Iteration 11 - Losses: {'ner': 1.450074863159155e-07}
Iteration 12 - Losses: {'ner': 8.019611526038751e-09}
Iteration 13 - Losses: {'ner': 48.54660216803205}
Iteration 14 - Losses: {'ner': 2.5569016112909982e-06}
Iteration 15 - Losses: {'ner': 6.826244408006239e-09}
Iteration 16 - Losses: {'ner': 2.3935390260848404e-09}
Iteration 17 - Losses: {'ner': 1.7867097278623228e-08}
Iteration 18 - Losses: {'ner': 1.0699680056603255e-08}
Iteration 19 - Losses: {'ner': 2.042795473277904e-0

In [6]:
# Install necessary libraries:
# - gradio: For creating user-friendly web interfaces for the application.
# - openai-whisper: For leveraging OpenAI's Whisper model for speech-to-text functionality.

pip install gradio openai-whisper

Collecting openai-whisper
  Downloading openai-whisper-20240930.tar.gz (800 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/800.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m800.5/800.5 kB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken (from openai-whisper)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting triton>=2.0.0 (from openai-whisper)
  Downloading triton-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Downloading triton-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.5/209.5 MB[0m [31m6.0 MB/s[0m eta [36m0:0

# Step-3 Integration of Speech-to-Text, Entity Extraction, and Database Interaction in Gradio App

This Gradio app integrates various functionalities to process money requests for projects through both text and voice inputs. The app performs speech-to-text conversion, entity extraction, and interacts with an SQLite database to store and retrieve money request data.

### Key Steps:
1. **Speech-to-Text Integration**:
   - The Whisper model is used to convert audio input into text for further processing.
   - If the user provides audio, it is transcribed into text before extracting entities.

2. **Entity Extraction**:
   - Using spaCy's fine-tuned model, the app extracts key entities like `project_name`, `amount`, and `reason` from the text input. These entities are essential for processing the money request.

3. **Database Integration**:
   - The app interacts with an SQLite database (`erp.db`) to store money requests with the extracted data (project name, amount, reason).
   - A table (`money_requests`) is created to store requests if it doesn't already exist.
   - Users can view the stored requests by querying the database.

### Gradio Interface:
- **Submit Request**: Users can submit money requests either by typing text or using audio input.
- **View Records**: Users can view existing records from the database through a simple interface.

### Functions:
- `speech_to_text(audio)`: Converts audio input to text.
- `extract_entities(text)`: Extracts key entities from the text input.
- `add_request_to_db(project_name, amount, reason)`: Stores the money request in the SQLite database.
- `view_database_records()`: Retrieves and displays all stored money requests.
- `process_request(text_input)`: Processes the request, extracts entities, and interacts with the database.

The app is built using Gradio, providing an interactive user interface for both submitting requests and viewing records.


In [6]:
import gradio as gr
import whisper
import spacy
import sqlite3

# Load Whisper model for speech-to-text
stt_model = whisper.load_model("base")

# Load spaCy for entity extraction
nlp = spacy.load("fine_tuned_model")

# Global variables to store request data
project_name = ""
amount = ""
reason = ""


# Speech-to-text function
def speech_to_text(audio):
    result = stt_model.transcribe(audio.name)
    print(result["text"])
    return result["text"]


# Extract entities using spaCy
def extract_entities(text):
    doc = nlp(text)
    entities = {}
    for ent in doc.ents:
        entities[ent.label_] = ent.text
    return entities


# Add request to database
def add_request_to_db(project_name, amount, reason):
    conn = sqlite3.connect("erp.db")
    cursor = conn.cursor()
    cursor.execute("CREATE TABLE IF NOT EXISTS money_requests (project_name TEXT, amount REAL, reason TEXT)")
    cursor.execute("INSERT INTO money_requests (project_name, amount, reason) VALUES (?, ?, ?)",
                   (project_name, amount, reason))
    conn.commit()
    conn.close()


# Retrieve records from the database
def view_database_records():
    conn = sqlite3.connect("erp.db")
    cursor = conn.cursor()
    cursor.execute("CREATE TABLE IF NOT EXISTS money_requests (project_name TEXT, amount REAL, reason TEXT)")
    records = cursor.execute("SELECT * FROM money_requests").fetchall()
    conn.close()
    if not records:
        return "No records found."
    return records


# Process text/audio requests
def process_request(text_input):
    global project_name, amount, reason
    
    if not project_name or not amount or not reason:
        # If required data isn't available, extract entities
        entities = extract_entities(text_input)
        project_name = entities.get('PROJECT', '')
        amount = entities.get('MONEY', '')
        reason = entities.get('REASON', '')

        if not project_name or not amount or not reason:
            missing_fields = []
            if not project_name:
                missing_fields.append("project name")
            if not amount:
                missing_fields.append("amount")
            if not reason:
                missing_fields.append("reason")
            return f"Missing fields: {', '.join(missing_fields)}. Please provide the missing information."

        return f"You are going to add a request for project: {project_name}, request amount: {amount}, reason: {reason}. Are you sure you want to proceed? Yes/No"
    
    # Confirmation Step
    if "yes" in text_input.lower() or "okay" in text_input.lower() or "confirm" in text_input.lower():
        add_request_to_db(project_name, amount, reason)
        project_name_temp=project_name
        amount_temp=amount
        reason_temp=reason
        project_name, amount, reason = "", "", ""  # Reset the global variables after the request is added
        return f"Your request for project '{project_name_temp}' with amount {amount_temp} for {reason_temp} has been submitted successfully."
        
    else:
        project_name, amount, reason = "", "", ""  # Reset the global variables if canceled
        return "Request has been canceled."


# Gradio Interface for handling chatbot and database interactions
def chatbot_interface(text_input, audio_input=None):
    if audio_input:
        text_input = speech_to_text(audio_input)
    
    # Process the request and confirmation flow
    return process_request(text_input)


# Gradio Interface for viewing database records
def database_view_interface():
    return view_database_records()


# Create the Gradio Interface
iface = gr.Blocks()

with iface:
    gr.Markdown("# AI Agent for Money Requests")
    gr.Markdown("Use text or voice commands to request money for a project or view existing records.")
    
    with gr.Row():
        with gr.Column():
            gr.Markdown("### Submit Request")
            text_input = gr.Textbox(label="Enter your text request:", placeholder="Type your message here...")
            audio_input = gr.Audio(label="Or use voice input:", sources="microphone")
            submit_button = gr.Button("Submit Request")
            
        
        with gr.Column():
            gr.Markdown("### AI Agent Response")
            response_output = gr.Textbox(label="Response:")
            gr.Markdown("### View Database Records")
            view_records_button = gr.Button("View Records")
            records_output = gr.Textbox(label="Database Records:")

    # Bind functions
    submit_button.click(fn=chatbot_interface, inputs=[text_input, audio_input], outputs=response_output)
    view_records_button.click(fn=database_view_interface, outputs=records_output)

# Launch the Interface
iface.launch()


  checkpoint = torch.load(fp, map_location=device)


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://fddda44c6479e00171.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


