Self-Operating Computer Framework

A framework to enable multimodal models to operate a computer.

Using the same inputs and outputs of a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.

Key Features

Compatibility: Designed for various multimodal models.
Integration: Currently integrated with GPT-4v as the default model.
Future Plans: Support for additional models.

Current Challenges

Note: GPT-4V's error rate in estimating XY mouse click locations is currently quite high. This framework aims to track the progress of multimodal models over time, aspiring to achieve human-level performance in computer operation.

Ongoing Development

At HyperwriteAI, we are developing Agent-1-Vision a multimodal model with more accurate click location predictions.

Agent-1-Vision Model API Access

We will soon be offering API access to our Agent-1-Vision model.

If you're interested in gaining access to this API, sign up here.

Additional Thoughts

We recognize that some operating system functions may be more efficiently executed with hotkeys such as entering the Browser Address bar using command + L rather than by simulating a mouse click at the correct XY location. We plan to make these improvements over time. However, it's important to note that many actions require the accurate selection of visual elements on the screen, necessitating precise XY mouse click locations. A primary focus of this project is to refine the accuracy of determining these click locations. We believe this is essential for achieving a fully self-operating computer in the current technological landscape.

Demo

final-low.mp4

Quick Start Instructions

Below are instructions to set up the Self-Operating Computer Framework locally on your computer.

Clone the repo to a directory on your computer:

git clone https://github.com/OthersideAI/self-operating-computer.git

Cd into directory:

cd self-operating-computer

Create a Python virtual environment. Learn more about Python virtual environment.

python3 -m venv venv

Activate the virtual environment:

source venv/bin/activate

Install Project Requirements and Command-Line Interface:

pip install .

Then rename the .example.env file to .env so that you can save your OpenAI key in it.

mv .example.env .env

Add your Open AI key to your new .env file. If you don't have one, you can obtain an OpenAI key here:

OPENAI_API_KEY='your-key-here'

Run it!

operate

Final Step: As a last step, the Terminal app will ask for permission for "Screen Recording" and "Accessibility" in the "Security & Privacy" page of Mac's "System Preferences".

Using `operate` Modes

Voice Mode

Install the additional requirements-audio.txt

pip install -r requirements-audio.txt

Install device requirements

For mac users:

brew install portaudio

For Linux users:

sudo apt install portaudio19-dev python3-pyaudio

Run with voice mode

operate --voice

Contributions are Welcomed!:

If you want to contribute yourself, see CONTRIBUTING.md.

Feedback

For any input on improving this project, feel free to reach out to Josh on Twitter.

Join Our Discord Community

For real-time discussions and community support, join our Discord server.

If you're already a member, join the discussion in #self-operating-computer.
If you're new, first join our Discord Server and then navigate to the #self-operating-computer.

Follow HyperWriteAI for More Updates

Stay updated with the latest developments:

Follow HyperWriteAI on Twitter.
Follow HyperWriteAI on LinkedIn.

Compatibility

This project is compatible with Mac OS, Windows, and Linux (with X server installed).

OpenAI Rate Limiting Note

The gpt-4-vision-preview model is required. To unlock access to this model, your account needs to spend at least $5 in API credits. Pre-paying for these credits will unlock access if you haven't already spent the minimum $5.
Learn more here

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
.github/workflows		.github/workflows
operate		operate
readme		readme
.example.env		.example.env
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements-audio.txt		requirements-audio.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Operating Computer Framework

Key Features

Current Challenges

Ongoing Development

Agent-1-Vision Model API Access

Additional Thoughts

Demo

Quick Start Instructions

Using `operate` Modes

Voice Mode

Contributions are Welcomed!:

Feedback

Join Our Discord Community

Follow HyperWriteAI for More Updates

Compatibility

OpenAI Rate Limiting Note

About

Releases

Packages

Languages

License

1KEBRON/S.O.C

Folders and files

Latest commit

History

Repository files navigation

Self-Operating Computer Framework

Key Features

Current Challenges

Ongoing Development

Agent-1-Vision Model API Access

Additional Thoughts

Demo

Quick Start Instructions

Using operate Modes

Voice Mode

Contributions are Welcomed!:

Feedback

Join Our Discord Community

Follow HyperWriteAI for More Updates

Compatibility

OpenAI Rate Limiting Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Using `operate` Modes

Packages