Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce framework to automatically generate doc for agents #504

Merged
merged 10 commits into from
Sep 29, 2023

Conversation

nicoloboschi
Copy link
Member

@nicoloboschi nicoloboschi commented Sep 29, 2023

  • Added new classes for generating the docs from the agent node providers. The classes are on the control plane so we can bind them to the REST API in the future
  • Bound the existing validation model to the agent model generation (only for drop-fields and s3-source in this pr). A java class must be defined using new annotations to describe the properties.

Given a class, the process is the following:

  • we generate the standard json schema using an external library com.github.victools:jsonschema-generator. This perfectly handles all the typings, including objects, collections and nested types
  • from the json schema, we read the @ConfigProperty values and we get more info: required, description, defaultValue
  • in the @AgentConfiguration it's possible to declare a name and description for the agent. the type is inferred.

The validation will then validate the required field.

Example class for S3 source:


@AgentConfiguration(name = "S3 Source", description = "Reads data from S3 bucket")
@Data
public class S3SourceConfiguration {

    protected static final String DEFAULT_BUCKET_NAME = "langstream-source";
    protected static final String DEFAULT_ENDPOINT = "http://minio-endpoint.-not-set:9090";
    protected static final String DEFAULT_ACCESSKEY = "minioadmin";
    protected static final String DEFAULT_SECRETKEY = "minioadmin";
    protected static final String DEFAULT_FILE_EXTENSIONS = "pdf,docx,html,htm,md,txt";

    @ConfigProperty(description = """
            The name of the bucket to read from.
            """,
            defaultValue = DEFAULT_BUCKET_NAME)
    private String bucketName = DEFAULT_BUCKET_NAME;

    @ConfigProperty(description = """
            The endpoint of the S3 server.
            """,
            defaultValue = DEFAULT_ENDPOINT)
    private String endpoint = DEFAULT_ENDPOINT;

    @ConfigProperty(description = """
            Access key for the S3 server.
            """,
            defaultValue = DEFAULT_ACCESSKEY)
    @JsonProperty("access-key")
    private String accessKey = DEFAULT_ACCESSKEY;

    @ConfigProperty(description = """
            Secret key for the S3 server.
            """,
            defaultValue = DEFAULT_SECRETKEY)
    @JsonProperty("secret-key")
    private String secretKey = DEFAULT_SECRETKEY;

    @ConfigProperty(
            required = false,
            description = """
                    Region for the S3 server.
                    """)
    private String region = "";

    @ConfigProperty(
            defaultValue = "5",
            description = """
                    Region for the S3 server.
                    """)
    @JsonProperty("idle-time")
    private int idleTime = 5;


    @ConfigProperty(
            defaultValue = DEFAULT_FILE_EXTENSIONS,
            description = """
                    Comma separated list of file extensions to filter by.
                    """
    )
    @JsonProperty("file-extensions")
    private String fileExtensions = DEFAULT_FILE_EXTENSIONS;

}

The result is the following:

{
  "version" : "0.0.23-SNAPSHOT",
  "agents" : {
    "ai-chat-completions" : { },
    "ai-text-completions" : { },
    "ai-tools" : { },
    "cast" : { },
    "composite-agent" : { },
    "compute" : { },
    "compute-ai-embeddings" : { },
    "document-to-json" : { },
    "drop" : { },
    "drop-fields" : { },
    "flatten" : { },
    "identity" : { },
    "language-detector" : { },
    "merge-key-value" : { },
    "noop" : { },
    "python-processor" : { },
    "python-sink" : { },
    "python-source" : { },
    "query" : { },
    "query-vector-db" : { },
    "s3-source" : {
      "name" : "S3 Source",
      "description" : "Reads data from S3 bucket",
      "properties" : {
        "access-key" : {
          "description" : "Access key for the S3 server.",
          "required" : false,
          "type" : "string",
          "defaultValue" : "minioadmin"
        },
        "bucketName" : {
          "description" : "The name of the bucket to read from.",
          "required" : false,
          "type" : "string",
          "defaultValue" : "langstream-source"
        },
        "endpoint" : {
          "description" : "The endpoint of the S3 server.",
          "required" : false,
          "type" : "string",
          "defaultValue" : "http://minio-endpoint.-not-set:9090"
        },
        "file-extensions" : {
          "description" : "Comma separated list of file extensions to filter by.",
          "required" : false,
          "type" : "string",
          "defaultValue" : "pdf,docx,html,htm,md,txt"
        },
        "idle-time" : {
          "description" : "Region for the S3 server.",
          "required" : false,
          "type" : "integer",
          "defaultValue" : "5"
        },
        "region" : {
          "description" : "Region for the S3 server.",
          "required" : false,
          "type" : "string"
        },
        "secret-key" : {
          "description" : "Secret key for the S3 server.",
          "required" : false,
          "type" : "string",
          "defaultValue" : "minioadmin"
        }
      }
    },
    "sink" : { },
    "source" : { },
    "text-extractor" : { },
    "text-normaliser" : { },
    "text-splitter" : { },
    "unwrap-key-value" : { },
    "vector-db-sink" : { },
    "webcrawler-source" : { }
  }
}

@nicoloboschi nicoloboschi changed the title Introduce framework to generated doc for agents Introduce framework to automatically generate doc for agents Sep 29, 2023
Copy link
Member

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem is that here we are getting ghe information from the "runtime part", but the configuration is primarly handled on the planner side, so with AgentNodeProvider

We should also cover the resources, the asserts, the datasource/vector-database stuff and the AI providers

@@ -26,4 +28,6 @@ public interface AgentCodeProvider {
* @return the new AgentCode
*/
AgentCode createInstance(String agentType);

Collection<String> getSupportedAgentTypes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this ?

configuration
.getOrDefault("endpoint", "http://minio-endpoint.-not-set:9090")
.toString();
String username = configuration.getOrDefault("access-key", "minioadmin").toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an old style agent, now all the new code uses ConfigurationUtils, how can we deal with it ? move everything to ObjectMapper ?

@nicoloboschi
Copy link
Member Author

Thanks @eolivelli I moved everything to use the AgentNodeProvider's instead of the runtime API.

Copy link
Member

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nicoloboschi nicoloboschi merged commit e5fe819 into main Sep 29, 2023
8 checks passed
@nicoloboschi nicoloboschi deleted the agemts-doc branch September 29, 2023 16:10
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants