Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plugins/commands for common I/O operations and train GPT with n-shots to use them #3686

Closed
Emasoft opened this issue May 2, 2023 · 7 comments
Labels

Comments

@Emasoft
Copy link

Emasoft commented May 2, 2023

Duplicates

  • I have searched the existing issues

Summary 馃挕

GPT-3 or 4 often tries to code Python scripts to do basic I/O operations, because it has no choice. We didn't provide it with the basic and most common commands to handle I/O operations on documents. So it can only resort to writing the functions by itself in Python.
The proposed solution is the following:

  1. Give Auto-GPT a set of high level (crossplatform abstractions) plugins/commands (it would be better if those were converted to plugins) to solve these simple but common I/O tasks:
  • Get System Informations (free disk space, free mem, hw info, operating system, language, file system, allowed chars in file system, max file size, current ip, remote host, user privileges, Python version, installed Python libs, shell, working dirs, dot files, gitconfig, etc.)
  • Read a Document (type: txt, rtf, doc, epub, markdown, csv, tsv, json, jsonl, yaml, chatML, excel, powerpoint, pdf, html, xhtml, svg, css, xml, js, py...)
  • Write a New Document (type: txt, rtf, doc, epub, markdown, csv, tsv, json, jsonl, chatML, yaml, excel, powerpoint, pdf, html, xhtml, svg, xml, js, py...)
  • Append/Prepend Text or Data to Existing Document
  • Search and Replace Text/Data inside Existing Document
  • Edit/Modify/Improve/Summarize/Refactor a Document according to given Rules
  • Count chars/words/lines/bytes of Document content
  • Join Two or More Documents in the given Order
  • Split a Document in Two or More Documents at the given lines/chars/pages
  • Search inside existing Document
  • Compress Files/Folders in Zip/Rar/Tar/gz Archive
  • Unpack Zip/Rar/Tar/gz Archive File to Folder
  • Check Zip/Rar/Tar/gz file integrity
  • Index a Document for Similarity Search (or Embed a vector tokenization of it)
  • Compute tokenization costs of Text/Document/Prompt/n-shot training
  • Extract/Scrape Document from URL
  • Extract/Scrape Data from URL
  • Download web assets from a website
  • Upload web assets to a website
  • Hash remote/local file and store hash signature
  • Check remote/local file integrity via Hash signature
  • Render markdown to HTML, DOCX or PDF file
  • Save table to excel file or csv/tsv or json/yaml
  • Load table data from excel file, markdown, csv/tsv or json/yaml
  • Generate n-shot training conversation in chatML format from training data
  • Load Audio File and convert to Text
  • Save generated Audio file from Text
  • Join Audio Files with ffmpeg
  • Split Audio Files with ffmpeg
  • Join Video Files with ffmpeg
  • Split Video Files with ffmpeg
  • Extract and Save Audio track from Video file with ffmpeg
  • Read document metadata (Creation date, Last updated, Last Access, Permissions, ID3 for MP3, Atom for MP4, Audiobooks, manifest for .ePub, etc.)
  • Write document metadata (Permissions, ID3 for MP3, Atom for MP4, Audiobooks, manifest for .ePub, etc.)
  • Change document metadata
  • Duplicate documents
  • Tag document with keyword
  • Find documents by tag
  • Rename file/folder
  • Batch renaming of files/folders
  • Delete document(s)
  • Archive/backup document
  • Restore document from backup/archive
  • Download files from url
  • Upload files to url
  • Create ssh keypair for remote host
  • Change ssh keypair for remote host
  • Backup ssh keypair for remote host
  • Upload web assets to remote host via ssh
  • Add document to local Git versioning
  • Commit/push modified document to local Git
  • Save GPT auto generated change log of modified document
  • Generate diffs of document compared to previous version in Git
  • Revert to previous version of Document using Git
  • Resume Last I/O Task state if interrupted (all I/O tasks should autoresume after a blackout or crash)
  1. The current commands available to Auto-GPT should also be converted in plugins:
  • command_registry.import_commands("autogpt.commands.analyze_code")
  • command_registry.import_commands("autogpt.commands.audio_text")
  • command_registry.import_commands("autogpt.commands.execute_code")
  • command_registry.import_commands("autogpt.commands.file_operations")
  • command_registry.import_commands("autogpt.commands.git_operations")
  • command_registry.import_commands("autogpt.commands.google_search")
  • command_registry.import_commands("autogpt.commands.image_gen")
  • command_registry.import_commands("autogpt.commands.improve_code")
  • command_registry.import_commands("autogpt.commands.twitter")
  • command_registry.import_commands("autogpt.commands.web_selenium")
  • command_registry.import_commands("autogpt.commands.write_tests")
  • command_registry.import_commands("autogpt.app")

We also need to create some specific n-shot training files (chatML, jsonl) for each of the above to let GPT learn how to use them correctly (like for GPT-4 plugins).

  1. Create N-Shot training files (ChatML, jsonl) for the above plugins:
  • Get System Informations (free disk space, free mem, hw info, operating system, language, file system, allowed chars in file system, max file size, current ip, remote host, user privileges, Python version, installed Python libs, shell, working dirs, dot files, gitconfig, etc.)
  • Read a Document (type: txt, rtf, doc, epub, markdown, csv, tsv, json, jsonl, yaml, chatML, excel, powerpoint, pdf, html, xhtml, svg, css, xml, js, py...)
  • Write a New Document (type: txt, rtf, doc, epub, markdown, csv, tsv, json, jsonl, chatML, yaml, excel, powerpoint, pdf, html, xhtml, svg, xml, js, py...)
  • Append/Prepend Text or Data to Existing Document
  • Search and Replace Text/Data inside Existing Document
  • Edit/Modify/Improve/Summarize/Refactor a Document according to given Rules
  • Count chars/words/lines/bytes of Document content
  • Join Two or More Documents in the given Order
  • Split a Document in Two or More Documents at the given lines/chars/pages
  • Search inside existing Document
  • Compress Files/Folders in Zip/Rar/Tar/gz Archive
  • Unpack Zip/Rar/Tar/gz Archive File to Folder
  • Check Zip/Rar/Tar/gz file integrity
  • Index a Document for Similarity Search (or Embed a vector tokenization of it)
  • Compute tokenization costs of Text/Document/Prompt/n-shot training
  • Extract/Scrape Document from URL
  • Extract/Scrape Data from URL
  • Download web assets from a website
  • Upload web assets to a website
  • Hash remote/local file and store hash signature
  • Check remote/local file integrity via Hash signature
  • Render markdown to HTML, DOCX or PDF file
  • Save table to excel file or csv/tsv or json/yaml
  • Load table data from excel file, markdown, csv/tsv or json/yaml
  • Generate n-shot training conversation in chatML format from training data
  • Load Audio File and convert to Text
  • Save generated Audio file from Text
  • Join Audio Files with ffmpeg
  • Split Audio Files with ffmpeg
  • Join Video Files with ffmpeg
  • Split Video Files with ffmpeg
  • Extract and Save Audio track from Video file with ffmpeg
  • Read document metadata (Creation date, Last updated, Last Access, Permissions, ID3 for MP3, Atom for MP4, Audiobooks, manifest for .ePub, etc.)
  • Write document metadata (Permissions, ID3 for MP3, Atom for MP4, Audiobooks, manifest for .ePub, etc.)
  • Change document metadata
  • Duplicate documents
  • Tag document with keyword
  • Find documents by tag
  • Rename file/folder
  • Batch renaming of files/folders
  • Delete document(s)
  • Archive/backup document
  • Restore document from backup/archive
  • Download files from url
  • Upload files to url
  • Create ssh keypair for remote host
  • Change ssh keypair for remote host
  • Backup ssh keypair for remote host
  • Upload web assets to remote host via ssh
  • Add document to local Git versioning
  • Commit/push modified document to local Git
  • Save GPT auto generated change log of modified document
  • Generate diffs of document compared to previous version in Git
  • Revert to previous version of Document using Git
  • Resume Last I/O Task state if interrupted (all I/O tasks should autoresume after a blackout or crash)
  • command_registry.import_commands("autogpt.commands.analyze_code")
  • command_registry.import_commands("autogpt.commands.audio_text")
  • command_registry.import_commands("autogpt.commands.execute_code")
  • command_registry.import_commands("autogpt.commands.file_operations")
  • command_registry.import_commands("autogpt.commands.git_operations")
  • command_registry.import_commands("autogpt.commands.google_search")
  • command_registry.import_commands("autogpt.commands.image_gen")
  • command_registry.import_commands("autogpt.commands.improve_code")
  • command_registry.import_commands("autogpt.commands.twitter")
  • command_registry.import_commands("autogpt.commands.web_selenium")
  • command_registry.import_commands("autogpt.commands.write_tests")
  • command_registry.import_commands("autogpt.app")

Examples 馃寛

See also issue #3445

Motivation 馃敠

GPT-3 or 4 often tries to code Python scripts to do basic I/O operations, because it has no choice. We didn't provide it with the basic and most common commands to handle I/O operations on documents. So it can only resort to writing the functions by itself in Python. What we need to do is to give Auto-GPT a set of plugins/commands (it would be better if those were converted to plugins) to solve these simple but common I/O tasks.

@Boostrix
Copy link
Contributor

Boostrix commented May 2, 2023

Note that as far as I've seen the command system is in the process of being overhauled (centralized) apparently, and so is the plugin system. Implementing commands on top of plugins does sound logical though! There also is the ongoing work on allowing commands to be individually enabled/disabled - if commands are based on plugins, that's something that would be implicitly supported

The other point worth making is that while these commands may come in handy, they would be cluttering up the context window / prompt quite a bit.

In other words, you would want to offer categories of commands and/or lazy (dynamic) command suggestions to prevent that.

Also, keep in mind the combinatorial explosion once these commands are offered - aka the enormous solution space

@Emasoft
Copy link
Author

Emasoft commented May 2, 2023

Without those plugins, which are basic I/O functions, Auto-GPT would be maimed.

@aishwd94
Copy link

aishwd94 commented May 2, 2023

Why does this have to be in any other language than Python ? Wouldn't it just increase the complexity? Almost all of the above tasks are possible in Python and Python is supported on all platforms, so basic I/O in Python should be enough right ? In any case Auto-GPT would be able to write and debug code in any other language if its prompted to do so, so why develop a bunch of plugins in 10 different languages to achieve the same thing which would be achieved by a plugin in only one language on any machine ? Also, Python is just the middle layer between data and the Actual AI Agent like chatGPT. How and in what language we read that data from disk/web and pass on to the AI Agent, and back from AI Agent to disk/web is inconsequential to the task being performed.
One area where it might be required is: where an API/codec is not available in Python, for example lucene, but even for that wrapper APIs like PyLucene exist in Python, but even in that case the work of codec development is best offloaded to the language itself (Python in this case) and must not be a part of this project, isn't it?

@Boostrix
Copy link
Contributor

Boostrix commented May 4, 2023

Without those plugins, which are basic I/O functions, Auto-GPT would be maimed.

Also see #56

@Boostrix
Copy link
Contributor

Boostrix commented May 6, 2023

After having thought about it a little, here's a proposal (draft really, feedback welcome!) to provide a generic and extensible framework to allow people to set up all sorts of workflows in the form of custom wizards (JSON based) right inside Auto-GPT so that these can be shared/reused and extended as needed (including by the agent itself): #3911

Feel free to add your own 2c

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

@github-actions github-actions bot added the Stale label Sep 6, 2023
@github-actions
Copy link
Contributor

This issue was closed automatically because it has been stale for 10 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants