A2C2 - Natural Language-Intructed Autonomous Agent for Computer Control

This repository is for 👨‍💻 developing / 🛠️ constructing / 🧪 testing and 🚀 moonshoting ideas for our bachelor thesis: Natural Language-Intructed Autonomous Agent for Computer Control (A2C2)

As part of module Machine Learning Operations, we developed a prototype of an A2C2 and integrated several tools that we learnt about in the module to represent the development of our prototype in an ML pipeline.

Motivation

Why do we need an A2C2

The ultimate AI application
Assistant in using computer systems
Helpful in everyday task

Current challenges on the way to an A2C2

Data Generation - How and where to collect training data? -
Dynamic Action Inference - How can the actions relevant for the instruction be determined? -
Refinement with the User - Where does it require further information from the user? -
User Interaction for Critical Tasks - When are further enquiries to the user necessary? -

Goal

User friendly Chatbot
Critical Task Detection
Missing Information Detection
Basic Pipeline for ViT Training

Components

UI

Screen Captioning
Chatbot
I/O Execution

Data Storage

Storing Experience Embeddings

Planning

Task Decomposition & Refinement

Web-Crawler

Gathering real-life Data

ViT Training

Model Fine-Tuning

Conversational Validation Component

Critical Task Detection
Missing Information Detection

LLM & VLM

Browser

Components & ML- Tools

UI interacts with planning component through REST
- UI with Tkinter, Python & pyautogui
- Interaction through REST with FastAPI
Planning Ccomponent does RAG for gathering more information
- Data storage (decomposition prompts & planning prompts) with oxen.ai
Conversational - validator checks if critical action or missing information
- MAD (multi-agent debating) (see Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate)
- Data storage (debate prompts) with oxen.ai
Planning component utilize model for user instruction interpretation & visual analysis
- GPT-4 Vision (first)
- YOLOv8 (fine-tuned but not yet optimized for utilizing with planning component)
Web-crawler interacts with browser to gather training data
- Gathering training data with Selenium
- - Data storage (data from web crawling) with oxen.ai
Model is fine-tuned, stored and re-deployed
- Hyperparameter tuning with RayTune & wandb
(5. & 6)
- Workflow with GitHub Actions (Tried to solve it with Airflow via Google Cloud. Unfortunately without success. Hence the use of GitHub Actions instead. However, no temporal triggering possible, but automated)
(1 -6)
- Automated Testing with GitHub Actions (CICD pipeline for deployment; CI: test with Flake8 whether Python syntax is correct; CD pipeline is triggered using semantic release; CD: Executable for win & mac will be created)

Install Guide

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
img		img
src		src
static		static
thesis		thesis
.gitignore		.gitignore
.releaserc		.releaserc
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A2C2 - Natural Language-Intructed Autonomous Agent for Computer Control

Motivation

Why do we need an A2C2

Current challenges on the way to an A2C2

Goal

Components

UI

Data Storage

Planning

Web-Crawler

ViT Training

Conversational Validation Component

LLM & VLM

Browser

Components & ML- Tools

Install Guide

About

Releases

Packages

Languages

Yingrjimsch/a2c2

Folders and files

Latest commit

History

Repository files navigation

A2C2 - Natural Language-Intructed Autonomous Agent for Computer Control

Motivation

Why do we need an A2C2

Current challenges on the way to an A2C2

Goal

Components

UI

Data Storage

Planning

Web-Crawler

ViT Training

Conversational Validation Component

LLM & VLM

Browser

Components & ML- Tools

Install Guide

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages