Skip to content

dharmab/skyeye

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SkyEye: AI Powered GCI Bot for DCS

SkyEye is a concept for a new Ground Controlled Intercept (GCI) bot for the flight simulator Digital Combat Simulator (DCS). A GCI bot allows players to request information about the airspace in English using either voice commands or text entry, and to receive answers via verbal speech and text messages

Unlike previous GCI bots, SkyEye uses Speech-To-Text and Text-To-Speech technology which runs locally on the same server as SkyEye. No cloud APIs are required.

Goals

  • Implement ALPHA CHECK, BOGEY DOPE, DECLARE, FADED, PICTURE, RADIO CHECK, SNAPLOCK, SPIKED and THREAT calls
  • Run entirely locally on reasonable consumer hardware
  • Use modern speech synthesis that sounds like a human (Goodbye, Microsoft SAM! Hello, Piper!)
  • Hybridize real-world air control communication and brevity with pragmatism
  • Proactively inform and update players instead of using static tripwire rules
  • Support accessible interfaces in addition to voice and audio, including keyboard based input and in-game subtitles
  • Excellent documentation for developers, server administrators and players
  • Be easy for a beginner programmer to customize
  • Have useful test coverage, especially of controller logic
  • Support Windows x86-64, Linux x86-64 and Linux ARM
  • Allow multiple GCI bots to run on the same DCS and SRS instance with different callsigns and frequencies
  • Minimize maintenance burden. Ship a static binary with as many pinned dependencies as possible, so this software continues to function with reduced maintainer activity

Anti-Goals

  • Follow grug-brained principles. Avoid unecessary design patterns. Keep it simple!
  • Focused feature set. Don't try to match other bots 1:1 on feature set.
  • Say "no" to complex features. Provide the basics, and sufficient documentation for others to fork and customize for their use case,

Getting Started

  • Developers: See CONTRIBUTING.md for instructions on building, running and modifying the bot.
  • Server admins: Documentation coming Soon™
  • Players: Guides coming Soon™

Technology

Skyeye would not be possible without these people and projects, for whom I am deeply appreciative:

  • DCS-SRS by @ciribob. Ciribob also patiently answered many of my questions on SRS internals and provided helpful debugging tips whenever I ran into a block in the SRS integration.
  • DCS-gRPC provides the interface into DCS World.
  • @rurounijones's OverlordBot was a useful reference against Skyeye during early development, and Jones himself was also patient with my questions on Discord.
  • @ggerganov's whisper.cpp models provides text-to-speech.
  • @rodaine's numwords module is invaluable for parsing numeric quantities from voice input.
  • Piper by the Rhasspy voice assistant project is used for speech-to-text.
  • The Jenny dataset by Dioco provides the feminine voice for Skyeye.
  • @popey's dataset provides the masculine voice for Skyeye.
  • @amitybell's embedded Piper module makes distribution and implementation of Piper a breeze.
  • The Opus codec and the hraban/opus module provides audio compression for the SRS protocol.
  • @lithammer's shortuuid module provides a GUID implementation compatible with the SRS protocols.
  • @zaf's resample module helps with audio format conversion between Piper and SRS.
  • @martinlindhe's unit module provides easy angular, length and frequency unit conversion.
  • MSYS2 provides a Windows build environment.
  • Oto is helpful for debugging audio format conversion problems.
  • Team Lima Kilo and the Flashpoint Levant community provided morale-boosting energy and feedback.
  • And of course, DCS World is produced by Eagle Dynamics.

FAQ

Is this ready?

No- but stay tuned! Progress is speeding up.

I work on it about one night a week. At this rate I hope it will be ready later in 2024.

Current status:

  • ✅ SRS integration - bot can listen to and talk on an SRS channel
  • ✅ Speech recognition - bot can recognize what humans are saying on SRS and turn it into text
  • ✅ Brevity parsing - bot can decode tactical brevity
  • ✅ Brevity composition - bot can phrase radio calls using tactical brevity
  • ✅ Speech synthesis - bot can turn text into human-like speech and say it on SRS
  • ✅ CI/CD pipeline configured for linting, testing and building on Linux and Windows
  • ✅ DCS-gRPC - Prototyped connection to DCS via DCS-gRPC and reading game world state
  • 🚧 Controller: Game world state interface implemenation in progress
  • 🚧 Controller: GCI controller logic implementation in progress
  • 🚧 Accessibility: Keyboard input not yet implemented
  • 🚧 Accessibility: In-game subtitles not yet implemented
  • 🚧 Testing: Some unit test coverage is implemented, but expansion is needed
  • 🚧 Performance: Software runs very well on a standalone dedicated system but performance optimization is needed to run alongside DCS on same machine
  • 🚧 Release: CI/CD pipeline does not publish builds to GitHub Releases
  • 🚧 Documentation: Documentation not written
  • 🚧 Observability: Better logging and tracing is needed

What kind of hardware does it require?

I'm not sure yet but it shouldn't be too bad. Currently the dev build takes about 4GB of RAM and takes a few seconds to recognize audio on an AMD 5900X, but I have done essentially no performance optimization yet and I expect those requirements to drop significantly. Some areas to improve:

  • I'm making unecessary copies of data all over the place - this is usually the default practice in Go unless you either need the receiving function to mutate the passed object, you need to do so for concurrency safety, or you can provably improve performance. I plan to revisit this when the bot is closer to release.
  • I'm using a fairly large, off the shelf general purpose Whisper model in my development environment. There's some exciting research into faster distilled models and custom trained models that will be revisited in a few months. I also strongly suspect a combination of advances in AI and Moore's Law will significantly improve Speech-To-Text performance within the next year or so.
  • I need to investigate tuning Go performance parameters. In particular, the software runs poorly when you try to play DCS at the same time on the same machine, I suspect due to CPU contention.

Why not update OverlordBot?

It would probably be less effort to update OverlordBot to use OpenAI Whisper speech recognition. I certainly wouldn't have had to reimplement the SRS wire protocol from scratch! If you are willing and capable, I encourage you to contribute that change to OverlordBot.

I have some personal, selfish reasons for writing a new bot:

  1. I like programming in Go and *nix more than I like C#/.NET. Instrinic motivation is extremely important for hobby developers
  2. I use Go, Python and Linux professionally so this is more relevant to my career development than .NET development
  3. I want to learn more about practical network programming with coroutine-based concurrency
  4. I believe the TRIPWIRE functionality in OverlordBot is damaging to the community and want to eradicate it.
  5. I want to innovate and deliver new features that would be breaking changes to the OverlordBot community.
  6. Given my lack of .NET development skills, it is faster for me to write new software using technologies to which I am "native" rather than contribute to OverlordBot.

Why aren't you implementing TRIPWIRE?

TRIPWIRE encourages players to think about themselves in a small bubble. It also clutters the channel with information in a format only useful to a specific player. It encourages players to act as lone wolves rather than as members of a team.

Instead, I am implementing THREAT brevity. THREAT provides similar benefit to a player as a TRIPWIRE- it warns you when a hostile aircraft is a danger to you. The advantages:

  • THREAT calls do not require you to individually register with the bot. The bot can see the radar, and it can see which players are currently on the frequency. Therefore, it can automatically make THREAT calls to players on frequency.
  • Locations in THREAT calls can be given in either BRAA or BULLSEYE format, depending on whether the call is relevant to a single aircraft or multiple aircraft.
  • A TRIPWIRE call is only given once, at a single threat range. THREAT calls can be given at multiple threat ranges, which may be configurable based on mission requirements. For example, ATP 3-52.4 recommends 35nmi and 5nmi by default, regardless of aspect.
  • By building trackfiles, the bot can determine the aspect of aircraft and provide calls independent of range. For example, if the bot sees a retreating hostile aircraft change course and turn nose-on to a friendly aircraft 45nmi away, the bot can make a THREAT call immediately for the aircraft under threat.

Can I train the speech recognition on my voice/accent?

Since the software runs 100% locally, the speech recognition model is a local file. Server oprators can provide a trained model as an alternative to the off-the-shelf model. See this blog post for an example.

I don't plan to provide a mechanism for players to submit their voice recordings to the main repostitory due to data privacy concerns.

Will this work with DCS's built-in VoIP?

Hopefully in the future Eagle Dynamics will add support for external GCI bots. If anyone at ED is reading this, access to any relevant preview builds would be really helpful!

When is SkyEye's birthday?

October 12th. At some point I'll put an Ace Combat 04 easter egg in there.

About

AI Powered GCI Bot for DCS

Resources

License

Stars

Watchers

Forks

Releases

No releases published