Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cue validation for query and mutations variables #361

Merged
merged 4 commits into from
Jun 7, 2022
Merged

Cue validation for query and mutations variables #361

merged 4 commits into from
Jun 7, 2022

Conversation

ehsanonline
Copy link
Contributor

checkout tests for examples.

limitations:

  1. only works with variables. can not find a way to cast field args into json or struct.
  2. can not use string inside cue schema (double quote " problem) inside plain graphql query, but it can be done with passing validation schema through variables.

@CLAassistant
Copy link

CLAassistant commented Jun 5, 2022

CLA assistant check
All committers have signed the CLA.

@ehsanonline
Copy link
Contributor Author

examples:

query

query @validation(cue:$validation) {
  users(where: {id:$id},limit:$limit) {
    id
    email
  }
}

variables:

{
  "id":5,
  "limit":1,
  "validation": "id: int, limit: int & <10"
}

mutation

mutation @validation(cue:$validation) {
  users(insert:$inp) {
    id
    email
  }
}

variables:

{
  "inp":{ "id":101, "email":"mail3@example.com", "full_name":"Full Name", "created_at":"now", "updated_at":"now" },
  "validation": "inp: {id?: int & >100 & <102, full_name: string , created_at:\"now\", updated_at:\"now\", email: string}"
}

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

This is truly amazing I had never really looked into cuelang. I think this will make our validations turning complete :) One bug I can see is that while you're correctly compiling the cue script at compile time and saving the compiled object into the qcode struct however you are also validating only at compile time.

In production compiling is only done once and from then on variables are converted to sql values and directly sent to the db prepared statement validation will be skipped.

The fix here is to do the validation (below code) in the validateAndUpdateVars function in core/core.go

if e := cuejson.Validate(JSONvars, v); e != nil {
		return nil, e
	}

@ehsanonline
Copy link
Contributor Author

ehsanonline commented Jun 6, 2022

Thanks! I'm glad to help. I think one of the reason that this amazing project is under-rated is the lack of contribution guide in wiki. also some part of the wiki is outdated. for example there is no ./config/allow.list any more. and a FAQ page would be nice (with referencing link to related wiki page). more query/mutation examples with explanations are helpful too.
for the project it self, error handling needs to be improved. errors should be able to parse in frontend otherwise they are useless. an example for validation error: {kind:"validation",problem:"out of range",path:"input.id",shoud_be:"<5"} but the whole graphql errors needs to be like this.
API version control is necessary too. prepared queries folder can be like this: ./config/queries/v1.2/getUser.yml then use version inside request some how. I'm sure there is an standard for this out there, like cache control in the header, or in the endpoint path like api.site.com/query/v1.2.

@ehsanonline
Copy link
Contributor Author

ehsanonline commented Jun 6, 2022

examples for prepared queries in production

create ./config/queries/getUserWithValidation.yaml with content:

name: getUserWithValidation
query: |-
    query getUserWithValidation @validation(cue:"close({limit:int & <=10})") {
      users(order_by: {created_at: desc}, limit: $limit) {
        ...User
      }
    }
vars: '{}'

in cue schema close( ... ) means only these keys are allowed and they should be provided. (but we can make any of them nullable with ? at the end, ex limit?). it's good to prevent default values to passed if we missed/ignored a var.

an other example with validation as var (in this case we will not have " problem inside cue schema) :

name: getUserWithValidationVar
query: |-
    query getUserWithValidationVar @validation(cue:$schema) {
      users(order_by: {created_at: desc}, limit: $limit) {
        ...User
      }
    }
vars: |-
  {
    "schema":"close({limit: \"4\" | \"5\" })"
  }

cue schema explanation : only limit key is allowed in variables (because of close) , and it should be provided with value "4" or "5" as string.
somehow validation key itself (schema) is getting ignored. but any other extra key will throw an error, for example {"limit":"4", "id":1} gives this error field not allowed: id.
the reason that query works with string as limit value type is that postgres and mysql are smart enough to change it to int.

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

Thanks! I'm glad to help. I think one of the reason that this amazing project is under-rated is the lack of contribution guide in wiki. also some part of the wiki is outdated. for example there is no ./config/allow.list any more. and a FAQ page would be nice (with referencing link to related wiki page). more query/mutation examples with explanations are helpful too.

I totally agree and my main focus is to fix documentation. I'm currently exploring using docusaurus or 'markdoc as an alternative to the Github wiki which I feel gets no love from Github. I'll fix all errors in the docs as well and add a lot more documentation.. Do you have an opinion on if I should do the work to move the wiki to a site or just fix the docs first.

for the project it self, error handling needs to be improved. errors should be able to parse in frontend otherwise they are useless. an example for validation error: {kind:"validation",problem:"out of range",path:"input.id",shoud_be:"<5"} but the whole graphql errors needs to be like this.

I'm not sure what format GraphQL client like Apollo and URQL expect with errors if any. But I'm up for moving to this format I agree more information is better with errors.

API version control is necessary too. prepared queries folder can be like this: ./config/queries/v1.2/getUser.yml then use version inside request some how. I'm sure there is an standard for this out there, like cache control in the header, or in the endpoint path like api.site.com/query/v1.2.

We have a feature called namespace that might help you can set a namespace when using GraphJin as a library and this will namespace the allowlist. It's not exposed in the config as yet for the standalone service but can be used when using GraphJin serve and core as a library. See the test TestAllowListWithNamespace for more details. You can set a namespace either on the GraphJin instance itself or at the request level when calling the GraphQL function.

https://pkg.go.dev/github.com/dosco/graphjin/core#OptionSetNamespace

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

examples for prepared queries in production

create ./config/queries/getUserWithValidation.yaml with content:

name: getUserWithValidation
query: |-
    query getUserWithValidation @validation(cue:"close({limit:int & <=10})") {
      users(order_by: {created_at: desc}, limit: $limit) {
        ...User
      }
    }
vars: '{}'

I like this ^^^ the best it fits the model of how GraphJin does things better.

in cue schema close( ... ) means only these keys are allowed and they should be provided. (but we can make any of them nullable with ? at the end, ex limit?). it's good to prevent default values to passed if we missed/ignored a var.

an other example with validation as var (in this case we will not have " problem inside cue schema) :

name: getUserWithValidationVar
query: |-
    query getUserWithValidationVar @validation(cue:$schema) {
      users(order_by: {created_at: desc}, limit: $limit) {
        ...User
      }
    }
vars: |-
  {
    "schema":"close({limit: \"4\" | \"5\" })"
  }

This wont work since only var are only stored when using a json object with mutations this is so that you don't have to write out the mutation in eg create user and also two of his posts Graphql itself and instead you can use Json for it. Saving the var structure ensures that at compile time in production we know what the mutations needs to insert or update and it cannot be changed at runtime by the client. Since the values don't matter here all values are wiped. Outside of json data for mutations variables are not stored in the allow list.

Example 1 - Mutation using JSON

GQL

mutation createUserAndPost {
  users(insert: $data) {
    id
  }
}

Variable JSON

{
  "data": {
    "user": {
      "id": 1,
      "name": "Bob",
      "posts": [
        {
          "id": 1,
          "title": "Post 1"
        },
        {
          "id": 2,
          "title": "Post 2"
        }
      ]
    }
  }
}

Example 2 - Inline mutation

GQL

mutation createUserAndPost {
  users(
    insert: {
      user: {
        id: $user_id
        name: $user_name
        posts: [
          { id: $post1_id, title: $post1_title }
          { id: $post2_id, title: $post2_title }
        ]
      }
    }
  ) {
    id
  }
}

Variable JSON

{
  "user_id": 1,
  "user_name": "Bob",
  "post1_id": 1,
  "post1_title": "Post 1",
  "post2_id": 2,
  "post2_title": "Post 2"
}

@ehsanonline
Copy link
Contributor Author

ehsanonline commented Jun 6, 2022

the docs should move to the site for sure, Github pages are not suitable for big projects or the ones that reader want navigate a lot. also it helps with SEO. docs that we can leave comments on them are +1 and will improve faster.
current docs are good to figure out the whole idea of GraphJin, an outdated warning on some pages prevents confusion.

about the errors, we can get help from gqlgen, I worked with it. worked fine with Apollo Client and graphql-request.

namespaces are even better, if we can use them with k8s pods, for example having some pod that response to 'A' & 'B' namespaces and some other pods that only response to 'C' namespace. and how to target them with request we send? looking forward to read about this at new docs!

about the contribution guide I like to leave a comment of how I did it:
while building projects with docker, go mod download was a hassle. every time after tons of downloading, an http error happened and it had to restart the whole thing. even GOPROXY=direct and GOSUM=off didn't help.
alse adding -x to log while downloading help with the silent darkness fear.
RUN GORPOXY=off GO111MODULE=on GOSUMDB=off go mod download -x
must of the problem to this point is from go and docker. and maybe too much mods used in project.
so I decided to build the project manually and move the pre-build ./graphjin file to container.
then can't find database error occured, so I had to go inside container while it's running. I commented out command: wtc from docker-compose file. then ran docker-compose run api to got into container and ran ./graphjin db create and ./graphjin db setup (can not run with go run main.go because mods are not installed inside container...). then put command: wtc back and ran docker-compose up . finally it started and worked fine.
for hot-reloading, I had to use go mod vendor so project can run from main.go file.

for prod build had to do the whole thing again (except downloading mods) with GO_ENV=prod in docker-compose.

for testing, docker didn't help at all. I cd into core folder and ran this comment:
sudo /usr/local/go/bin/go test -timeout 30s -run TestCue -buildvcs=false
the key is -buildvcs=false at the end.
I hope it gets easier so we see more contributors.

@ehsanonline
Copy link
Contributor Author

mutation worked too:

name: createUserWithProducts
query: |-
    mutation createUserWithProducts @validation(cue:$schema){
      users(insert: $data) {
        id
      }
    }
vars: |-
 {
   "schema":"data:id: >100, data:products:[{id:>10}, ...]",
   "data": {
    "id": 0.0,
    "email": "",
    "full_name": "",
    "created_at": "",
    "updated_at": "",
    "products": [
      {
        "id": 0.0,
        "category_ids": [0.0],
        "created_at": "",
        "updated_at": ""
      }
    ]
  }
 }

req query : mutation createUserWithProducts with variables:

{
  "data": {
    "id": 103,
    "email": "email3@example.com",
    "full_name": "Bob",
    "created_at": "now",
    "updated_at": "now",
    "products": [
      {
        "id": 13,
        "category_ids": [
          1
        ],
        "created_at": "now",
        "updated_at": "now"
      }
    ]
  }
}

res data:

{
  "data": {
    "users": [
      {
        "id": 103
      }
    ]
  }
}

there was no way to miniplate the schema var, even the user can not see the value of it to find a possible bug, that's what I like about it.

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

there was no way to miniplate the schema var, even the user can not see the value of it to find a possible bug, that's what I like about it.

This wont work since the dev web app (Apollo / URQL) will have the whole GraphQL query in it with the clue lang value of $schema and this is the same frontend code that will be built for production. There is no way way to set this value only in dev but remove it from the code in production? The allow list is only a reflection of the query in dev and in dev its updated every-time the query is executed. It is not the store of truth.

If you want to store this validation in the backend and hide it from users then just set it in the variables map in config.

dev.yml

variables:
  admin_account_id: "5"
  user_production_validation: "data:id: >100, data:products:[{id:>10}, ...]"

GQL

mutation createUserWithProducts @validation(cue:$user_production_validation){
      users(insert: $data) {
        id
      }
    }

https://github.com/dosco/graphjin/wiki/Guide-to-Config-Files

@ehsanonline
Copy link
Contributor Author

ehsanonline commented Jun 6, 2022

the $schema var is not generated inside mutation prepared queries, so it have be set manually after that (is there a way to fix this? not sure it's because of the operation type directive or not).
but if you are concern about manipulating the $schema value, I test that in production. worked fine, even when I changed the $schema value in request, it used the defined one from the prepared query file.
for production client, there is no need to send whole query, just the name of it is fine, all i send with client is this mutation createUserWithProducts + vars (reduce traffic usage and speed up the network to max!). even if you send the whole query, as I said, Graphjin completely ignores the $schema value from request if it has one from prepared vars. no need to put it in config file. that the whole point and the beauty of it.

update: I'm working on it to find a way to pass the schema value into query file.

@ehsanonline
Copy link
Contributor Author

two new bugs I found during playing with prepared queries:

  1. query will not written on file if it has comments at blow
  2. when running query just by name, it has to have an space at end, ex: mutation mymutaion will not work (not found error), but mutation mymutaion will work, notice there is an space in the end of the string.

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

the $schema var is not generated inside mutation prepared queries, so it have be set manually after that (is there a way to fix this? not sure it's because of the operation type directive or not).

The allow list files are not meant to be hand-edited they are auto-generated from the queries executed in dev mode. They are not the right place to put prefixed variables those should be in the yml config files. The second your app while developing executes the same query (like when you reload the page) your allow list file for that query will be over-written and you will loose the schema value that you had manually set.

but if you are concern about manipulating the $schema value, I test that in production. worked fine, even when I changed the $schema value in request, it used the defined one from the prepared query file. for production client, there is no need to send whole query, just the name of it is fine, all i send with client is this mutation createUserWithProducts + vars (reduce traffic usage and speed up the network to max!). even if you send the whole query, as I said, Graphjin completely ignores the $schema value from request if it has one from prepared vars. no need to put it in config file. that the whole point and the beauty of it.

Yes in production it only uses the name of the query and variables which is why this works.

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

two new bugs I found during playing with prepared queries:

  1. query will not written on file if it has comments at blow
  2. when running query just by name, it has to have an space at end, ex: mutation mymutaion will not work (not found error), but mutation mymutaion will work, notice there is an space in the end of the string.

For 99% of the folks out there the same client code makes the query in dev and production there is no way people will have two code paths (client side) this path to use just the name in production queries is not the way we should proceed.

If folks want to make lightweight calls they use APQ which is supported by GraphJin. https://www.apollographql.com/docs/apollo-server/performance/apq/

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

can not use string inside cue schema (double quote " problem) inside plain graphql query, but it can be done with passing validation schema through variables.

If this is the problem you're trying to solve either we can make the parser support double quotes or use single quotes for the schema and then double quotes will be supported inside.

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

If you are ok with merging this without the $schema variable hand-edited into the allow list stuff I can help fix the double quotes issue.

@ehsanonline
Copy link
Contributor Author

ehsanonline commented Jun 6, 2022

I implemented a way that makes $schema value written in prepared query file, but there are some concerns. I make something like "private variables", which are variables that their key starts with underline, ex: $_schema, these variables value will be passed directly into prepared query file vars section, that means even if request has different value for them, it will be ignored.
but it causes some side effects:

  1. all the variables send with req will be written into prepared queries, not just qc.ActionVar
  2. not just mutation, query operation variables will be written too
  3. it un-escapes HTML tags & < > because they are needed for cue
  4. it only works with the top level keys, for example data._id will not have it's value.
  5. I'm not sure to commit it here because it's not related to this PR

result example:

name: myMutation 
query: |-
    mutation myMutation @validation(cue:$_schema) {
      users(insert:$data) {
        id
      }
    }
vars: |-
    {
      "_schema": "data:id:>10",
      "data": {
        "created_at": "",
        "email": "",
        "full_name": "",
        "id": 0,
        "updated_at": ""
      },
      "extra_var_in_req": 0
    }

@ehsanonline
Copy link
Contributor Author

the double quote problem is not a deal breaker. and it doesn't worth the effort to change the parser. the problem is just for strings inside cue schema passed through graphql, not variables. at the end of the day, schema should be passed with vars for safety (and keeping it hidden).

@dosco
Copy link
Owner

dosco commented Jun 6, 2022

I'll highly recommend we keep this merge simple and not focus on changing how the allow-list works. Keeping the validations secret can be something we work on in second pass. For now we can just use config variables to keep the validations secret. I really don't want to introduce this new complex behaviour to the allow list without some though. I'll give this some though to how can have secret variables that as passed in as part of the same dev query after we merge this. Also updating the parser to support double quotes etc is not hard I can handle that.

See this below link for how config variables work:
#361 (comment)

@ehsanonline
Copy link
Contributor Author

all right, it passed the tests and as long as cue works fine, it works fine. I'm ok with merging now if you are. we can improve it in future PRs.

@dosco dosco merged commit 64183ab into dosco:master Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants