Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch frontend to new filtering API #1001

Closed
dmos62 opened this issue Jan 20, 2022 · 25 comments
Closed

Switch frontend to new filtering API #1001

dmos62 opened this issue Jan 20, 2022 · 25 comments
Assignees
Labels
type: enhancement New feature or request wontfix This will not be worked on work: frontend Related to frontend code in the mathesar_ui directory

Comments

@dmos62
Copy link
Contributor

dmos62 commented Jan 20, 2022

Over on the backend, we seem to have converged on a filtering API that we're happy with (relevant PR). The purpose of this thread is to discuss the proposal with the frontend team.

To understand the new filtering API, it's probably best to forget about the current/previous API, since there's hardly any things in common. The new API is more powerful, flexible and extensible, but it also requires more from the frontend implementors.

Tutorial

Here's the expected high-level workflow from frontend's perspective; the goal is to describe the filter in terms of a database function that returns a boolean:

  1. Query the /api/v0/functions endpoint to get the list of supported database functions (also known as db_functions or DbFunction subclasses in the backend) and their hints (hints are important).

Here's an example of what the endpoint returns. Notice that hints can nest. The starts_with database function is hinted to return a boolean, and to take two parameters, all of which are expected to be string-like.

GET /api/v0/functions/

[
    ...,
    {
        "id": "starts_with",
        "name": "Starts With",
        "hints": [
            {
                "id": "returns",
                "hints": [
                    {
                        "id": "boolean"
                    }
                ]
            },
            {
                "id": "parameter_count",
                "count": 2
            },
            {
                "id": "all_parameters",
                "hints": [
                    {
                        "id": "string_like"
                    }
                ]
            }
        ]
    },
    ...
]
  1. Query the /api/v0/db_types endpoint to get the hints associated with each database type:

Below sample shows the DB type character varying to have the hint string_like, the DB type uri to have the hint uri, and the DB type numeric to have the hint comparable. Note that these hint sets are incomplete; one should expect most DB types to have multiple hints. Also, note that hints are pretty expressive and contain different types of information, like information about parameters, return types, type attributes (e.g. string_like, comparable). An advanced user might extend the hint and the function sets to suit his use case.

GET /api/v0/db_types

[
    ...,
    {
        "id": "character varying",
        "hints": [
            {
                "id": "string_like"
            }
        ]
    },
   {
        "id": "uri",
        "hints": [
            {
                "id": "uri"
            }
        ]
    },
   {
        "id": "numeric",
        "hints": [
            {
                "id": "comparable"
            }
        ]
    },
    ...,
]
  1. Use above information to suggest to the user functions to use or types of data to input into those functions. If a function's parameter has hints X, Y, Z, those could be considered constraints on what database types can be passed as that parameter. In other words, only database types that have all the corresponding hints X, Y, Z are hinted to be safe for this parameter. Also, you'll probably want the function to resolve to a boolean, since it's a filter.

Note I'm not trying to suggest a UI design. It should be as simple or as advanced as the spec says, but point is that, whether implicitly or explicitly, the user will be assembling a function expression.

Note that to be expressive with database functions you'll have to nest them. For example, to check if a URI's authority starts with "foo", you'll do something like starts_with(extract_uri_authority(column_reference("that_uri_column")), literal("foo")) (pseudo code; actual JSON syntax lower down).

Note that parameter position will matter. We might need a convention or some sort of documentation to tell whether, for example, it's starts_with(string, prefix) or starts_with(prefix, string). For functions with two parameters, the usual string starts_with prefix == starts_with(string, prefix) logic might be enough. But, we could have more complicated functions. Adding doc strings or parameter titles to the DB function declarations would not be complicated. This is not a solved problem as of yet, but I don't consider it a blocker either, since we can get a lot done already without having great support for functions with complicated signatures.

Note that some functions will not hint at their return value type (currently all defined functions could have return hints, but that will probably not hold for all conceivable functions), and/or their parameter count, and/or their parameter type. If a function doesn't have a hint for this or for that, you can consider that aspect of it to not be constrained, or that it's up to the user to judge when to use it. For example, empty doesn't hint at a parameter type (it doesn't matter); and doesn't hint at the parameter count, since it can take any number of inputs. Conceivably, we could introduce a hint like minimum_parameter_count to constrain the lower bound of parameter count, but not the upper bound.

  1. Submit the resulting function expression, in JSON format, as a filter just as you previously would have (e.g. when retrieving records). The JSON format is:
{"in": [
  {"column_reference": ["column1"]}, 
  {"list": [
    {"column_reference": ["column2"]}, 
    {"to_lowercase": [
      {"column_reference": ["column3"]}, 
    ]},
    {"literal": ["foo"]},
    {"literal": ["bar"]},
  ]}, 
]}

That's the equivalent of the following SQL expression:

column1 IN (column2, LOWER(column3), "foo", "bar")

Note that every function is a dict with a single key-value pair. The key is always the function id, and the value is always a list of parameters. A parameter can be a literal, in the case of the literal or the column_reference functions, or a nested function expression (in the form of a dict).

Note that everything is a function in this syntax. A column reference is the column name/identifier wrapped in the column_reference function; a list (as in the right hand argument to IN) is the list function and its parameters are list items; a literal, like the string "foo", is wrapped in the literal function. All of these functions are exposed in the /functions endpoint.

Please submit your feedback and/or questions.

@dmos62 dmos62 added type: enhancement New feature or request work: frontend Related to frontend code in the mathesar_ui directory status: draft labels Jan 20, 2022
@kgodey kgodey added this to the [07] Initial Data Types milestone Jan 20, 2022
@pavish
Copy link
Member

pavish commented Jan 21, 2022

@dmos62 I really really like the hints representation and I imagine it must have been quite fun implementing it.

It suits all usecases from the backend and api perspectives, if a user is to manually look at things and form the query. However, we would require additional information and possibly some change in the json structure for being able to use this on a client.

Following are some of my initial thoughts on how to make it easier for clients. I've added them as multiple top level comments.

@pavish
Copy link
Member

pavish commented Jan 21, 2022

1. Provide a role for each function.

Problem:

From any frontend client perspective, it is absolutely essential to know what each function does i.e what kind of function it is.

Both and and empty returns boolean but they differ significantly on what they do, so it's hard for clients to differentiate them. This will lead to having to hardcode the functions on the client, making the purpose of the functions api void.

Suggestion:

Providing a role for each function will let clients know how to use them, especially for frontend clients when having to construct layouts.

These roles would be a small set of values that the client will have to know, which is okay.

Eg.,

 - and, or :
   - role: "logical_function"
   - parameter_count: -1 // Means any number of parameters

 - not:
   - role: "logical_function"
   - parameter_count: 1
 
 - column_reference :
   - role: "data_manipulator"
   - parameter_count: 1

 - extract_uri_authority:
   - role: "data_manipulator"
   - parameter_count: 1
 
 - empty :
   - role: "conditional_function",
   - parameter_count: 1

 - starts_with, ends_with:
   - role: "conditional_function"
   - parameter_count: 2
 
 - lesser_than, greater_than:
   - role: "conditional_function"
   - parameter_count: 2
 
 - list:
   - role: "data_list"
   - parameter_count: -1

@pavish
Copy link
Member

pavish commented Jan 21, 2022

2. Hints seem to represent both properties and arguments for functions. Split hints into properties and parameters and represent properties as JSON object.

Problem:

  • Hints as an array makes it very hard for clients to manipulate, since hints represent functions, function arguments and properties of functions.
  • For eg., returns, parameter_count etc., are defining properties of the function, and parameters are function arguments.
  • Grouping them based on what they represent, would make it easier to use and also improves readability.

Suggestion:

Split hints into properties and parameters

  [
    {
      id: "and",
      name: "and",
      role: "logical_function",
      properties: {
        returns: "boolean",
        parameter_count: -1
      }
    },
    {
      id: "starts_with",
      name: "Starts with",
      role: "conditional_function",
      properties: {
        returns: "boolean",
        parameter_count: 2
      },
      parameters: [
        // Each of the items in this array can have both `properties` and `parameters`.
        // Essentially same as the hints structure.
        // Explicitly define each parameter when the parameter_count is finite
        {
          id: "string_like"
        },
        {
          id: "string_like"
        }
      ]
    },
    {
      id: "to_lower_case",
      name: "To Lower Case",
      role: "data_manipulator",
      properties: {
        returns: "string_like",
        parameter_count: 1
      },
      parameters: [
        {
          id: "string_like"
        }
      ]
    },
  ]

This will make it easier for the clients to construct something like starts_with(extract_uri_authority(column_reference("that_uri_column")), literal("foo")), based on reading properties.returns instead of having to go through the entire list.

@pavish
Copy link
Member

pavish commented Jan 21, 2022

3. Provide additional properties for specific functions. (This builds on top of 2)

Problem: Certain functions like list might require additional restrictions such as what type of properties to accept. We do not have a way to represent it yet.

Suggestion:

Add additional meta information like allowed_parameter_types to properties

[
  {
    id: "list",
    name: "List",
    properties: {
      returns: "data_list",
      parameter_count: -1,
      allowed_parameter_types: ["comparable", "string_like"]
    }
  }
]

@pavish
Copy link
Member

pavish commented Jan 21, 2022

4. If possible, add names for parameters, so that requests can use the name instead of relying on index. (This builds on top of 2)

Problem: Relying strictly on index would make it hard to use the api in the long run, especially when complex functions are involved and index strictly matters. This is not a priority but does make things easier.

Suggestion:

  • Add a name property to each parameter when parameter_count is finite.
    • When parameter_count is finite, the client request will be an object.
  • When parameter_count is not finite, name property is not required.
    • When parameter_count is not finite, the client request will be an array.
  [
    {
      id: "lesser",
      name: "Lesser than",
      role: "conditional_function",
      properties: {
        returns: "boolean",
        parameter_count: 2
      },
      parameters: [
        {
          id: 'comparable',
          name: 'lhs'
        },
        {
          id: 'comparable',
          name: 'rhs'
        }
      ]
    },
    {
      id: "in",
      name: "In",
      role: "conditional_function",
      properties: {
        returns: "boolean",
        parameter_count: 2
      },
      parameters: [
        {
          id: 'any', //  To represent any type
          name: 'find'
        },
        {
          id: 'list',
          name: 'bucket'
        }
      ]
    },
    {
      id: "column_reference",
      name: "Column Reference",
      properties: {
        returns: "type_of_argument", // This is a hint the client will know and calculate
        parameter_count: 1,
      },
      parameters: [
        {
          id: 'column',
          name: 'column'
        }
      ]
    }
  ]

The request sent from the client will then look something like:

  {
    lesser: {
      lhs: 3,
      rhs: 5
    }
  }

In would be an object. The bucket parameter would be an array since it accepts any number of arguments.

  {
    in: {
      find: { column_reference: { column: 'column1' } },
      bucket: [
        { column_reference: { column: 'column2' } },
        { literal: { value: 'foo' } }
      ]
    }
  }

And would be an array since it can accept any number of arguments.

  {
    and: [
      { ... },
      { ... }
    ]
  }

@pavish
Copy link
Member

pavish commented Jan 21, 2022

5. For the db_types endpoint, rename hints and alter structure for better readability.

Readability suggestions:

  • Rename hints to something more readable like extends or implements.
  • Change structure from array.{id} to just an array with ids, since this endpoint only represents what each db type is.
[
  {
    id: "character_varying",
    extends: ["string_like"]
  },
  {
    id: "uri",
    extends: ["uri", "string_like"]
  }
]

@pavish
Copy link
Member

pavish commented Jan 21, 2022

I'm also assigning the backend team to this issue since the suggestions include some significant changes in the JSON structure.

We could also convert this issue to a discussion to make it easier to discuss each suggestion individually.

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 21, 2022

Thanks for the detailed and very well structured feedback, @pavish!

Reading your suggestions I think I know where you're coming from. I wrote the previous revision of this refactor in a spirit similar to what you're talking about. I'd sum it up with the word strict, maybe? After discussing that previous revision amongst ourselves, we decided to go for another iteration, aiming for something relaxed and highly simple. Especially, from the perspective of user developers. That is users that might jump into the code to add new functions themselves.

  1. Provide a role for each function.

The roles you outlined, if I understand them correctly, can be inferred from the function signatures (types of their inputs and output). If there's some information that is missing in the current hint sets, that can be added via a new hint.

  1. Hints seem to represent both properties and arguments for functions. Split hints into properties and parameters and represent properties as JSON object.

In my eyes argument types, counts, etc. are function properties too. I'd say that given the sense of flexibility we want to achieve with this API, a homogenous hints system for all properties is more congruent with that than exploding it (for the lack of a better word) into multiple other top-level items. From the perspective of frontend implementors, I wouldn't expect this API design choice to be significant. Am I right?

Note that the spirit of this API is that we're not actually saying what the properties of a function are. We're giving hints about what they might be. Hence the term hint. We can create a UI that uses these hints to constrain the user, but I'd say that the backend team is leaning towards an API that's written in the suggestive tone, as opposed to the authoritative or imperative tones.

We're also not saying to the future user developers that hints are mandatory. The ultimate vision of this, as I understand it, is that the user shouldn't have to study the hint system to use his newly written function. He can leave the hints out, and then he won't profit from the suggestions that the UI can give him, but he'll still be able to use the function in the UI. I'm not implying that all this is planned for Alpha, though.

  1. Provide additional properties for specific functions. (This builds on top of 2)

I think that current hints system is capable of carrying that kind of information, but I'm not sure since I don't fully understand the intent of allowed_parameter_types. Do you mean having lists in general only hold specific item types? That could be achieved by putting parameter type hints on the list function.

  1. If possible, add names for parameters, so that requests can use the name instead of relying on index. (This builds on top of 2)

That's a good idea. I've been playing with it and I'm not yet sure how to handle all kinds of signatures, but I'm pretty sure this is the direction the API is headed in. Thanks!

  1. For the db_types endpoint, rename hints and alter structure for better readability.

I'll refer to my note a few paragraphs higher about why we're using the term hints, as opposed to properties or attributes.

As for the structure, the hints key holds hints, which are more than the identifier. Some hints can hold other information, or even nest other hints. It's important not to constrain what kinds of hints you can apply to database types.


I hope I don't come off dismissive. You bring up good points. I think we're currently synchronizing our visions of what this aspect of the product should be. This proposal is coming from a pretty long chain of discussion-iteration cycles amongst the backenders and I'll be interested to read what they have to say about your concerns.

I sympathize with the fact that this proposal requires a lot from the frontend.

And, yes, this was (and still is) fun to implement, haha.

@pavish
Copy link
Member

pavish commented Jan 21, 2022

Okay, I understand the hints structure more clearly now. But there are a few points I feel strongly about and few more lingering questions.

aiming for something relaxed and highly simple

It's relaxed and simple on the backend and a ton of work on the frontend and for clients. :)
We aren't removing complexity, we're placing it on a different area of the code.

Some questions first

  1. By using the hints structure, this cannot be constructed easily: extract_uri_authority(column_reference("that_uri_column").

    • extract_uri_authority has a parameter which accepts uri
    • column_reference has a parameter which accepts a column and has no return type defintion.

    The client needs to know the return type of a function inorder to pass it to another.

    In this case, how would the client know what is the return type of column_reference, without having to hardcode additional logic specifically to calculate it? If such hardcoding is required, what is the point of providing column_reference as a top level function?

    I understand hardcoded checks for values like boolean, string_like etc., since these are essentially data types. But such checks for column_reference goes against the whole concept of hints.

  2. Right now, everything is a hint, and placed in an array which makes them seem optional and unclear on the usage. My frustration is on the choice of using an array here, which poses questions like:

    • Can hints contain multiple "return" and "parameter_count"?
    • Can "parameter_count" be finite and parameters not be defined?
    • Can a hint like string_like contain additional hints?
  3. Why is list a top level function?

    • Consider the case where a function accept_string_list accepts a parameter which a list of only string_like.
    • list function does not have a return type.

    How will the client know how to pass a list of any specific type, without having to hardcode specific logic for list. If we need to hardcode it, why do we need a top level function for it?

    Same goes for literal.

Suggestion: A different format

I'm suggesting a complete different structure here. I understand the amount of discussion that has gone into this an I'm sorry for not participating earlier, but the API is intended to be used by clients and I'm proposing a structure that would ease it for client and also satisfy backend requirements.

I'm not suggesting we completely change hints and use this. The following ideas are focused on the client's interests. The format of hints so far focuses mostly on backend representation. I'm asking everyone to compare them both with a fresh perspective.

Representing a function signature

In the client's point of view, the entire functions endpoint is a json representation of function signatures.

A function signatures absolutely needs the following:

  a. A return type (Could be None but has to be explicitly mentioned) [Mandatory]
  b. A parameter count (Could be -1 for no limit) [Mandatory]
  c. Information on what parameters to accept. [Mandatory if parameter_count is not 0]
  d. Any additional custom information [Optional]

I don't see a case where a user developer or anyone could create functions without having to define both a and b. Why shouldn't a mandatory definition of a function be strictly represented?

Also, if the parameter or return type can be any, explicitly mentioning it as any would be more useful than not mentioning it.

It would also be better to separate ArgumentDataTypes and functions and represent them more clearly and separate from one another. Currently they're mixed with the hints system.

string_like,comparable etc., are not functions, they are data types allowed to be passed to a function as an argument, and they are acceptable return types, I'm calling them ArgumentDataTypes for now.

I find the following structure feels more readable and usable for clients. I've represented them in TS'ish to get the idea across to everyone.

interface FunctionDefinition {
  id: string,
  name: string,
  returns: ArgumentDataType,
  parameter_count: number
  parameters: [
    {
      accept: ArgumentDataType | ArgumentDataType[],
      name: string
    },
    ...,
  ],
  additional_info: { ... } // Could be anything, maybe same as nested hints structure
}
type ArgumentDataType = string | GenericArgumentDataType;

type GenericArgumentDataType = {
  id: string,
  accept: ArgumentDataType | ArgumentDataType[]
}

ArgumentDataTypes

A client will only know the values of ArgumentDataType and nothing more.

ArgumentDataType includes: string_like, comparable, uri, column, list, boolean, any etc., Frontend clients absolutely have to know these inorder to determine the layout and input elements.

ArgumentDataTypes like string_like, comparable and uri are fixed. column and list would be generic types. any accepts anything.

GenericArgumentDataTypes

Generics are represented as:
List<any> would become

{
  id: `list`,
  accept: `any`
}

or in a nested way:
List<List<StringLike>> would become:

{
  id: `list`,
  accept: {
    id: `list`,
    accept: `string_like`
  }
}

The representation of a string_like column would be Column<StringLike> or a list column would be Column<List<Any>>. It's an ArgumentDataType, and can be used in both function argument and as return value. Note, this is different from column_reference which is explained further below.

Example representation

Function endpoint for IN would look like:

[
  {
    id: 'in',
    name: 'In',
    returns: 'boolean',
    parameter_count: 2,
    parameters: [
      {
        accept: 'any',
        name: 'Find'
      },
      {
        accept: {
          id: 'list',
          accept: 'any'
        },
        name: 'Bucket',
      }
    ]
  }
]

We can go one step further and ensure type safety by defining generics in this function definition:

[
  {
    id: 'in',
    name: 'In',
    returns: 'boolean',
    parameter_count: 2,
    generics: {
      A: 'any'
    },
    parameters: [
      {
        accept: 'A',
        name: 'Find'
      },
      {
        accept: {
          id: 'list',
          accept: 'A'
        },
        name: 'Bucket',
      }
    ]
  }
]

This structure helps clients understand that IN accepts any data type but an instance of list would accept only one type.

Functions with generic return types

I would like us to avoid such functions since postgres does not support them either way (as far as I know). The only special cases are column_reference and literal, which we have control over.

The return type of these functions need to be determined in client run-time and the client will have to know how to work with these. Instead of representing them as functions, we need to think of them as reserved keywords, kind of like typeof.

Similar to how the client would have to know the set of all ArgumentDataTypes, the client will also have to know these two keywords.

This is in comparison to the hints system where this problem is not solved as I've mentioned in the question section above.

Passing functions to functions, would work like the following in clients:

  1. Identify the return type of inner function.
  2. See if the outer function accepts the return type as an argument.
  3. If yes, allow it.

The request structure will remain more or less the same as defined in this issue description

{"in": [
  {"column_reference": ["column1"]}, 
  {"list": [
    {"column_reference": ["column2"]}, 
    {"to_lowercase": [
      {"column_reference": ["column3"]}, 
    ]},
    {"literal": ["foo"]},
    {"literal": ["bar"]},
  ]}, 
]}

This will work, and also the alternate format:

  {
    in: {
      find: { column_reference: { column: 'column1' } },
      bucket: [
        { column_reference: { column: 'column2' } },
        { literal: { value: 'foo' } }
      ]
    }
  }

The db_types endpoint will only contain ArgumentDataTypes associated with each db_type.

[
  {
    id: "character_varying",
    accepts: ["string_like"]
  },
  {
    id: "sometype",
    accepts: [
      {
        id: "list",
        accepts: {
          id: "list",
          accepts: ["string_like"]
        }
      }
    ]
  }
]

I find this syntax to satisfy all requirements for representing a function and a data type.

@dmos62 Do you find any scenarios where this format is not flexible enough compared to hints?

@kgodey
Copy link
Contributor

kgodey commented Jan 21, 2022

@pavish @dmos62 I haven't read through all the earlier comments but I'd like to reiterate my idea from the other thread of keeping the functions endpoint as-is and setting up a separate frontend client specific API that uses the same backend code but designed to solve the problems of the frontend more easily. Perhaps something like /client-api/v0/filters/. (note the new client-api namespace, I think it would be nice to use this for APIs that are specific to our frontend rather than more general, Mathesar Types could also live here).

I think it would be great to have the functions lower level API available for all sorts of interesting use cases that we haven't foreseen but also make life easy for the frontend. Those are pretty different goals. Trying to do both in a single API seems like it would result in a worse experience for both.

@pavish
Copy link
Member

pavish commented Jan 21, 2022

keeping the functions endpoint as-is and setting up a separate frontend client specific API

@kgodey Makes sense. I would suggest taking a look at the format mentioned in this comment for the frontend specific API.

@kgodey
Copy link
Contributor

kgodey commented Jan 21, 2022

@kgodey Makes sense. I would suggest taking a look at the format mentioned in this comment for the frontend specific API.

I'll leave it to @dmos62 to come up with a proposal for this if he also thinks it's a good idea.

It's relaxed and simple on the backend and a ton of work on the frontend and for clients. :)
We aren't removing complexity, we're placing it on a different area of the code.

This is what I'd like to avoid. I think we can have a relaxed and simple implementation on the backend and use that as a base to have an API that also makes it simple to use for the frontend.

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 24, 2022

@pavish

what is the point of providing column_reference as a top level function?
why do we need a top level function for [list]?

It's because of how DbFunction works. We can either have a complicated (hard to maintain, etc.) deserialization and application scheme, or we can simplify the interface, where a list, a column reference and a literal are explicit DbFunction subclasses and the corresponding logic is defined on the subclass just like it is for any other DbFunction like In or StartsWith. Then there are a few additional checks during deserialization to make sure that all literals are only direct arguments of Literal or ColumnReference.

The point of having column_reference as a top level function is to differentiate it from literal.

I quite like this setup.

All of these functions are special on the frontend:

  • column_reference has an unknown type, since it's a reference to a column, whose type we don't know. We could use a more elaborate type system for this, like what you illustrated with generics;
  • literal is something a user enters, so the frontend has to decide if that's a suitable input for the enclosing function; again, with something like generics we could specify that it's output is identical to its input; but the frontend would still have to decide what the input is;
  • with list same applies, you'd need generics to turn it into a regular function.

Again, these functions are first-class to avoid syntax sugar, which complicates the interface and its backend implementation, and I think its frontend application too. First-class might be a misnomer, since as you point out, they're "special". But, we can address that with generics if we want to. I'll talk about generics later.

  • Can hints contain multiple "return" and "parameter_count"?

A DbFunction is supposed to only have a single signature. Though the policy we currently seem to be at is that simplicity and plasticity of interface is more important than catching and preventing bad instructions early. Here you'd probably just use one of the return or parameter_count hints.

  • Can "parameter_count" be finite and parameters not be defined?

Yes.

  • Can a hint like string_like contain additional hints?

A hint can contain any information, including nested hints.

everything is a hint, and placed in an array which makes them seem optional and unclear on the usage

Hints are optional. You can write them, and you can follow them, but you're not obligated to do either.

On the choice of array for representing hint sets. I'd normally use a set for this (and I did initially), but JSON only supports maps and lists. I didn't use a map as a makeshift set, because that's awkward (in my experience). And, because, for example, where you have multiple return hints in the same hintset you can use the hints being ordered as a way to deterministically decide on which is the authoritative one.


@pavish

I'll try to sum up your suggestions:

  • Generics;
  • Named parameters;
  • A new declaration format, where it's more obvious what the important hints/properties are.

I like the ideas of generics and named parameters. I've a few reservations about the burden/usefulness tradeoff of generics in our use case. Named parameters is a question of when and how.

I'm sceptical of the proposed declaration format. It seems specialized for a specific UI.

I'd say you're challenging:

  1. the idea of a loose structure (a set/array of hints);
  2. and, the idea of hints, as in being suggestive and non-obligating of the user developer;
    • put another way, the idea of supporting partial/weak declarations, where not all information might be declared and it might even be declared erroniously.

For 1, I think that all function attributes being in an array is fine for the frontend, since it can reshuffle the data however it likes. I feel motivated to keep the format of the hints in the same structure as the user developer declared them (an array of hints currently).

We could populate a data structure derived from the hints in the format suggested (for example) and publish it in compliment to publishing the hints verbatim. That's what @kgodey suggested. But, I think that should be a last resort. Let's see how this discussion pans out first.

For 2, the idea of supporting partial declaration is debatable of course. Initially, I was going for a full-declaration-only API. We shifted to a more relaxed model, which I like. I think it's empowering to the power user, and it forces us to be resilient to bad declarations and bad expressions, which profits the casual user too.

If we cannot find a way to work with partial declarations on the frontend, we can walk this backend decision back.


@pavish

It's relaxed and simple on the backend and a ton of work on the frontend and for clients. :)
We aren't removing complexity, we're placing it on a different area of the code.

It is a fair amount of work on the frontend, but is it due to the problems you've mentioned? Generics and named parameters aside? Can you be more specific in what complexity you find should be shifted to the backend?

By the way, I'm not sure what the distinction between a client and a frontend is in this case. For the sake of minimizing the number of round-trip messages, I'll presume that you mean client, as in the entity served by a server.

Also, let us distinguish the complexity inherent in letting a user compose practically any database function expression and the complexity created by an inefficient API.


Generics

As I mentioned earlier, I'm a bit sceptical of the tradeoff between the complexity added by supporting generics and the benefit of a type-safer interface.

Where generics would be useful:

  • the currently "special" functions column_reference, list, literal would become less "special"
    • special as in requiring hardcoding;
  • we could support generic custom SQL functions, but I'm not sure if that's something we want to support.

Doing generics in the DbFunction declaration on the backend would be fairly easy, I expect, So the (essential) complexity of generics would largely fall to the frontend. Though, again, I think it would be good to be resilient to partial/bad declarations, so that user developers could opt to not use generics, similarly to how you can do typing in Python by just saying List, as opposed to List[Something].

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 24, 2022

I would be interested in hearing @mathemancer's opinion.

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 24, 2022

column_reference has an unknown type, since it's a reference to a column, whose type we don't know. We could use a more elaborate type system for this, like what you illustrated with generics

I realised that on the frontend you do know the type of the referenced column. Duh. Generics is starting to look more useful.

@pavish
Copy link
Member

pavish commented Jan 24, 2022

@dmos I am quite fine with the backend code keeping things simple with the hints structure, what I do not want is for the API to behave the same manner.

I do not understand why the API needs to be loose. When you say users may want it, who does "users" refer to?

As far as I can see, the whole "functions" endpoint is only useful for those building "General purpose" clients, and never for specific clients.

A general purpose client is our frontend, and anyone wanting to build a frontend for Mathesar. Everyone else untilizing the Mathesar API for a specific purpose or application would never need to use the "functions" endpoint.

The term users is very ambiguous in this enrire discussion and I would like us to establish that clearly.

@kgodey
Copy link
Contributor

kgodey commented Jan 24, 2022

As suggested on the weekly discussion, I think it would be useful to set up a call to discuss both the goals and format of the API.

@pavish
Copy link
Member

pavish commented Jan 24, 2022

@dmos62 I can see that you agree with (or atleast consider) the usefulness of generics and named parameters. So, I'll skip further arguments in support of them.

Let me focus on the point we disagree upon. Also, let us refrain from talking about the internal representation on the backend. I'm only concerned on the API representation.

I am in favour of a more obvious API where:

  • Function declarations are made clear.
  • Data types are made clear.

Your argument relies on the reasoning that it's more useful for user developers if the API does not follow the obvious structure.

Here are some key aspects that bother me:

  1. Special functions are being represented in a similar manner to normal functions.

    • Strictly looking at the API, there's no way to differentiate them both.
    • All clients will have to know the usage and functioning of all special functions. If so, then who are we providing this in the API response for?
    • I would rather have the API not provide them than treat them as the same.
  2. Function declarations can be incomplete.

    • Inorder to make use of a function, the clients need to know the following:
      • The return type of the function
      • The arguments it can accept
    • This declaration can be incomplete in the hints structure, leading to clients not being able to use them.
    • What is the point in exposing such incomplete functions in the API, when they cannot be used?
  3. Function declaration can be invalid (eg., contain multiple 'return' values).

    • An invalid function declaration can contain multiple 'return' and 'parameter_count' properties, leaving it to the client to figure out which one is valid.
      • How could the client even figure this out. It would have to always consider either the first or the last one as valid.
      • How do we ensure all clients behave the same?
  4. Differentiation of functions cannot be effectively done because of 2 and 3.

    • The frontend has to be able to differentiate between different types of functions. Eg., The client needs to know and, or, not are logical, and equal, lesser, greater are comparitive.
    • Since 2 and 3 cannot be ensured, this cannot be determined effectively.
  5. Parameter declaration is partial/incomplete

    • This shifts the decision of what parameters to use to the clients, which may lead to usage of incorrect parameters. For eg., What if I pass a list to an 'and' function? While it is programmatically valid, should it be valid for someone using our API?
    • Diffentiating between different types of functions is hard. eg.,:
      • Function that accepts a fixed number of arguments with any type, with
      • Function that accepts any number of arguments with any type with fixed arguments of specific type
    • Will lead to several mistakes in definition when user developers define them, which the backend will allow and work with, but will result in failures or worse incorrect recognition on the clients.
    • There is no guarantee all clients would behave properly.
  6. All hints are optional, including those that should be mandatory

    • Because the hints related to function declarations are optional, it is impossible to compose functions on clients, when those hints are not present. These should essentially be mandatory.
  7. The functions API is not very readable


I'm sceptical of the proposed declaration format. It seems specialized for a specific UI.

It isn't specialized for a specific UI, it makes it more easier for any parser reading the format. I just want it to be very clear for clients to understand and make use of the function declarations and data types.

I feel motivated to keep the format of the hints in the same structure as the user developer declared them (an array of hints currently).

I'd rather have them declare functions correctly, or not allow them to expose them on the API with erreneous declarations.

User developers here seem to refer to someone who's an user who writes backend code directly and do not necessarily have any need for the "functions" API. In constract, user developers like plugin authors would focus more on the clients functioning properly, than ease of adding a new function.

With each function, I think there should be a flag to determine whether or not to expose on the functions endpoint. If the user developers want to expose them, it should be valid.

In the API atleast, our focus should be more on clients and not on user developers.

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 24, 2022

That's a good summary of your position, I think.


All clients will have to know the usage and functioning of all special functions. If so, then who are we providing this in the API response for?

That's a good point. I'll see if I can minimize the "specialness" down to inputs of column_reference and literal. Hints with ids like column_id and literal.


This shifts the decision of what parameters to use to the clients, which may lead to usage of incorrect parameters.

Yes, that's one of the goals of this revision. Ultimately we want an interface (or at least we were imagining an interface) where the user is unconstrained in what expressions he can pass to Postgres, but at the same time the safe choices are gently pointed out to him. The novice can take the safe road and the expert can enjoy interacting with the database in a pretty direct way.


This declaration can be incomplete in the hints structure, leading to clients not being able to use them.
Because the hints related to function declarations are optional, it is impossible to compose functions on clients, when those hints are not present. These should essentially be mandatory.

The idea is to support an interface that would not totally discard those incompletely defined functions. Hints give you niceties, but you can still use the function without it. You have to then rely on the user's judgment and ability to troubleshoot. Of course, we might not want an advanced expression UI for Alpha, but it's good that the API support it.


There is no guarantee all clients would behave properly.
Will lead to several mistakes in definition when user developers define them, which the backend will allow and work with, but will result in failures or worse incorrect recognition on the clients.
How do we ensure all clients behave the same?

I think these possible failures are aligned with the goals of this refactor. We want to hand hold to the best of our ability, but only when the user wants it, and we want to allow the user to work in "look no hands" mode (failing as late as possible). At the same time, we want to empower the user developer that might want to extend the database function set to suit his use case (and then use those functions through the UI).

We want to tell the user that a list as an argument to AND is not recommended, but we want to let him do it anyway, if he wants to.


I am in favour of a more obvious API where:

  • Function declarations are made clear.
  • Data types are made clear.

I'd say my goal was an API where UI's ability to guide the user is as good as:

  • the hint declarations on database functions;
  • the hint declarations on types.

Want to whip up a function and try it out in the UI? Go for it, forget the hints. Want some basic assistance: then add some basic hints. Want even more assistance: then add hints that maybe require more care (generics, etc.).


I'll not address every point you made, since some of them, will be well addressed by others in the upcoming sync call.

@pavish
Copy link
Member

pavish commented Jan 25, 2022

The ideal UI for the hints structure would be a query editor based UI with suggestions, where the user can directly write queries and the editor should show hints.

Any other UI (eg., our current UX) cannot be implemented if the functions are invalid or incomplete. If I do not know the return type of a function, I cannot compose it in other functions. Similarly if I do not know acceptable parameters for a function, I cannot pass values to it.

The novice can take the safe road and the expert can enjoy interacting with the database in a pretty direct way.

That's the problem, if we are showing the functions and allowing incorrect usage without any sort of conditions, showing suggestions would not matter. The novice are our primary users.

Why are we so keen on making it easy for user developers, rather than the users themselves?

If someone is tech savvy enough to write functions and add it to Mathesar, they can include 2 additional fields.
i.e. They will have to provide an id, and possibly a name. Providing a return value and a parameter config should not be hard for them.

you can still use the function without it.

I'm saying let's allow using of those functions while forming the filter query. I'm not opposed to that. But we can't let them do that through clients.

I just don't want invalid and incomplete functions to be exposed on the "functions" endpoint.

The idea is to support an interface that would not totally discard those incompletely defined functions.

This comes with the cost of providing good UX. A slightly strict structure would support all kinds of interfaces. Are those incompletely defined functions that important? As I mentioned before, why not shift a small amount of responsibility towards user developers rather than inconvenience users.

Want to whip up a function and try it out in the UI? Go for it, forget the hints. Want some basic assistance: then add some basic hints. Want even more assistance: then add hints that maybe require more care (generics, etc.).

This is exactly what I want to avoid, because it's not possible unless the UI is an editor. No other UI can be reliably implemented with flaky information.

Providing a well defined structure will support all kinds of interfaces (including editor based UX), whereas the hints structure only seems to focus on an editor based interface.

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 25, 2022

I'll just make two quick points.

User developers are users that happen to be developers too. They might not be the primary user group, but we're looking for a way to accommodate them as well.

A totally hint-constrained UI can be constructed too: just disregard database functions that aren't described fully enough. That would suit the current UX design. I presumed that that's what we would do in the near term. Maybe I wasn't clear enough on that.

@dmos62
Copy link
Contributor Author

dmos62 commented Jan 25, 2022

@pavish There's another tid bit to consider.

Even novice users will probably want to do things like contains(extract_uri_authority(column_reference("some_uri_column_id")), "authorityx"). I don't think that our current UX designs support multi-step filtering, where you first preprocess some data (e.g. extract the authority part from a URI) and then apply a boolean-returning function to it (e.g. check if a string contains another string).

@pavish
Copy link
Member

pavish commented Jan 25, 2022

@dmos62 Our current UX cannot support multi-level composition and we will have to design another for advanced users (note: advanced does not mean technical. Advanced here refers to someone with a bit more know-how than our novice users.)

We can figure out when we might want such an UX, I'd say probably right after the alpha. @kgodey We'd like your thoughts on this.

I've not been focusing on the UX we have at hand for this discussion, but rather the ability to support complex UX to be implemented reliably using the API.

@mathemancer
Copy link
Contributor

There's lots to cover here, so apologies in advance if I miss something, or I'm not understanding something.

I think that input and description should not be so dependent. Not every function needs to be understood by our client (or any client) to be useful. To be more specific, I think it's crucial that our API is set up to accept requests involving non-recommended compositions of functions.

I think that we're eventually going to want to be able to be pretty flexible in what we allow users to do. This would imply (to me) either accepting raw SQL input, or coming up with a UI that's essentially as flexible as raw SQL. I think this means a pretty general API like the one @dmos62 has proposed. As far as moving complexity to the front end goes, I think that's somewhat appropriate in that context. Writing queries is, in the end, the user's job. We're here to help and guide them since they may not have the prerequisite knowledge to write a query that gets the data they want.

I disagree with the conflation of incomplete and incorrect descriptions. Plenty of clients could use the API without knowing anything about output types. (how about curl?)

@dmos62 I don't think the idea of having a specific API to give higher-level functionality is a cop out. Rather, I think it's a good internal-to-the-backend test use case for the query building paradigm. Think of it as something of an internal client to the lower-level API that then exposes higher-level concepts to outside callers.

@kgodey
Copy link
Contributor

kgodey commented Jan 25, 2022

We had a call about this, notes and next steps are here.

I think it probably makes sense to close this issue and open a new one for implementing the filtering API @dmos62.

@dmos62 dmos62 closed this as completed Jan 26, 2022
@kgodey kgodey added the wontfix This will not be worked on label Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request wontfix This will not be worked on work: frontend Related to frontend code in the mathesar_ui directory
Projects
No open projects
Development

No branches or pull requests

6 participants