A multi-modal vehicle control platform that interprets natural language commands and converts them into actionable vehicle control sequences through advanced NLP, computer vision, and speech processing.
- Parse natural language vehicle commands using fine-tuned T5 models
- Multi-task learning for plan generation, intent classification, and slot detection
- Special token support for structured command parsing
- Validate and normalize parsed plans for consistency
- Convert raw language into standardized command formats
- Convert normalized plans into executable token sequences
- Structured output for vehicle control systems
- Real-time object detection using YOLOv8
- Distance calculation between vehicle and detected objects
- Direction detection for spatial awareness
- Voice-to-text conversion for hands-free control
- Integration with command pipeline
- Intelligent parameter mapping for vehicle dynamics
- Speed scaling and duration estimation for commands
- Real-time visualization of parsed commands
- View parsed plans, normalized outputs, and assembled tokens
- User-friendly interface for system interaction
Abstract-Vehicle-Control/
βββ NLP_modeling/ # NLP model training and inference
β βββ main.py # Main training script
β βββ t5_multitask.py # Multi-task T5 model
β βββ models/ # Trained model checkpoints
β βββ multitask_tokenizer/ # Custom tokenizer
βββ normalization_module/ # Plan normalization
β βββ normalization.py
β βββ model_evaluation.py
βββ object_distance_detection/ # Computer vision module
β βββ yolo.py # YOLOv8 integration
β βββ o_d_d.py # Distance detection
β βββ object direction.py # Direction detection
βββ speech_recognition/ # Voice input processing
β βββ speech_recognition.py
βββ speed_duration_mapping/ # Parameter mapping
β βββ speed_mapper.py
β βββ duration_mapper.py
βββ frontend/ # React TypeScript UI
β βββ src/
β β βββ App.tsx # Main application
β β βββ App.css
β β βββ assets/CarSimulation.tsx
β βββ package.json
β βββ vite.config.ts
βββ data/ # Training datasets
β βββ train.jsonl
β βββ val.jsonl
βββ diagnostic_script.py # System diagnostics
βββ token_assembler.py # Token assembly utility
βββ requirements.txt # Python dependencies
- Python 3.8+
- Node.js 16+ (for frontend)
- PyTorch 2.0+
- Transformers library
- CUDA-capable GPU (recommended)
# Clone the repository
git clone <repository-url>
cd Abstract-Vehicle-Control
# Create Python virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txtcd frontend
# Install dependencies
npm install
# Build frontend
npm run buildcd NLP_modeling
python main.pypython diagnostic_script.pycd frontend
npm run devThe application will be available at http://localhost:5173
- Input: Natural language command
- Processing: T5 multi-task model with adapters
- Output: Parsed plan with intent, slots, and structure
- Input: Parsed plan
- Processing: Validation and normalization rules
- Output: Standardized plan format
- Input: Normalized plan
- Processing: Convert to executable tokens
- Output: Token sequence for vehicle control
- Vision: YOLOv8 for object detection and distance measurement
- Audio: Speech-to-text for voice commands
- Dynamics: Speed/duration mapping for realistic execution
- Base Model: T5 (Text-to-Text Transfer Transformer)
- Vision Model: YOLOv8 (nano, small variants)
- Fine-tuning: LoRA and adapter-based approaches for parameter efficiency
- Custom Tokens:
<plan>,</plan>,<intent>,</intent>,<none>
{
"command": "move forward 5 meters",
"plan": "MOVE FORWARD 5",
"intent": "movement",
"slots": ["direction: forward", "distance: 5"]
}Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
[Add your license information here]
[Add contact information here]
Note: Ensure all required pre-trained models (YOLOv8n.pt, YOLOv8s.pt) are in the project root before running the system.