Skip to content

A multi-modal vehicle control system that converts natural language commands into executable control sequences using NLP, computer vision, and speech recognition. Features T5-based command parsing, YOLOv8 object detection, plan normalization, and an interactive React frontend for seamless vehicle control.

Notifications You must be signed in to change notification settings

Swoyesh/Abstract_Vehicle_Control

Repository files navigation

Abstract Vehicle Control System

A multi-modal vehicle control platform that interprets natural language commands and converts them into actionable vehicle control sequences through advanced NLP, computer vision, and speech processing.

πŸ“‹ Table of Contents

✨ Features

1. Natural Language Processing

  • Parse natural language vehicle commands using fine-tuned T5 models
  • Multi-task learning for plan generation, intent classification, and slot detection
  • Special token support for structured command parsing

2. Plan Normalization

  • Validate and normalize parsed plans for consistency
  • Convert raw language into standardized command formats

3. Token Assembly

  • Convert normalized plans into executable token sequences
  • Structured output for vehicle control systems

4. Computer Vision

  • Real-time object detection using YOLOv8
  • Distance calculation between vehicle and detected objects
  • Direction detection for spatial awareness

5. Speech Recognition

  • Voice-to-text conversion for hands-free control
  • Integration with command pipeline

6. Speed & Duration Mapping

  • Intelligent parameter mapping for vehicle dynamics
  • Speed scaling and duration estimation for commands

7. Interactive Frontend

  • Real-time visualization of parsed commands
  • View parsed plans, normalized outputs, and assembled tokens
  • User-friendly interface for system interaction

πŸ“ Project Structure

Abstract-Vehicle-Control/
β”œβ”€β”€ NLP_modeling/              # NLP model training and inference
β”‚   β”œβ”€β”€ main.py               # Main training script
β”‚   β”œβ”€β”€ t5_multitask.py        # Multi-task T5 model
β”‚   β”œβ”€β”€ models/               # Trained model checkpoints
β”‚   └── multitask_tokenizer/   # Custom tokenizer
β”œβ”€β”€ normalization_module/      # Plan normalization
β”‚   β”œβ”€β”€ normalization.py
β”‚   └── model_evaluation.py
β”œβ”€β”€ object_distance_detection/ # Computer vision module
β”‚   β”œβ”€β”€ yolo.py               # YOLOv8 integration
β”‚   β”œβ”€β”€ o_d_d.py              # Distance detection
β”‚   └── object direction.py    # Direction detection
β”œβ”€β”€ speech_recognition/        # Voice input processing
β”‚   └── speech_recognition.py
β”œβ”€β”€ speed_duration_mapping/    # Parameter mapping
β”‚   β”œβ”€β”€ speed_mapper.py
β”‚   └── duration_mapper.py
β”œβ”€β”€ frontend/                  # React TypeScript UI
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx           # Main application
β”‚   β”‚   β”œβ”€β”€ App.css
β”‚   β”‚   └── assets/CarSimulation.tsx
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.ts
β”œβ”€β”€ data/                      # Training datasets
β”‚   β”œβ”€β”€ train.jsonl
β”‚   └── val.jsonl
β”œβ”€β”€ diagnostic_script.py       # System diagnostics
β”œβ”€β”€ token_assembler.py         # Token assembly utility
└── requirements.txt           # Python dependencies

πŸ”§ Requirements

  • Python 3.8+
  • Node.js 16+ (for frontend)
  • PyTorch 2.0+
  • Transformers library
  • CUDA-capable GPU (recommended)

πŸ“¦ Installation

Backend Setup

# Clone the repository
git clone <repository-url>
cd Abstract-Vehicle-Control

# Create Python virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

Frontend Setup

cd frontend

# Install dependencies
npm install

# Build frontend
npm run build

πŸš€ Usage

Running the NLP Model

cd NLP_modeling
python main.py

Running the Full Pipeline

python diagnostic_script.py

Starting the Frontend

cd frontend
npm run dev

The application will be available at http://localhost:5173

πŸ—οΈ Architecture

NLP Pipeline

  1. Input: Natural language command
  2. Processing: T5 multi-task model with adapters
  3. Output: Parsed plan with intent, slots, and structure

Normalization Stage

  1. Input: Parsed plan
  2. Processing: Validation and normalization rules
  3. Output: Standardized plan format

Token Assembly

  1. Input: Normalized plan
  2. Processing: Convert to executable tokens
  3. Output: Token sequence for vehicle control

Multi-Modal Integration

  • Vision: YOLOv8 for object detection and distance measurement
  • Audio: Speech-to-text for voice commands
  • Dynamics: Speed/duration mapping for realistic execution

πŸ“Š Model Information

  • Base Model: T5 (Text-to-Text Transfer Transformer)
  • Vision Model: YOLOv8 (nano, small variants)
  • Fine-tuning: LoRA and adapter-based approaches for parameter efficiency
  • Custom Tokens: <plan>, </plan>, <intent>, </intent>, <none>

πŸ”„ Data Format

Training Data (JSONL)

{
  "command": "move forward 5 meters",
  "plan": "MOVE FORWARD 5",
  "intent": "movement",
  "slots": ["direction: forward", "distance: 5"]
}

🀝 Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

πŸ“ License

[Add your license information here]

πŸ“§ Contact

[Add contact information here]


Note: Ensure all required pre-trained models (YOLOv8n.pt, YOLOv8s.pt) are in the project root before running the system.

About

A multi-modal vehicle control system that converts natural language commands into executable control sequences using NLP, computer vision, and speech recognition. Features T5-based command parsing, YOLOv8 object detection, plan normalization, and an interactive React frontend for seamless vehicle control.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published