Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add template_with_user view to include user contextual data #8568

Merged
merged 17 commits into from
Jul 19, 2023

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Jul 17, 2023

Related:

This joins created_by_username and created_by_avatar_url to template.

The created_by data needs to be joined to be apart of the template struct.

If this PR passes, the same will happen with template versions and workspace builds.

Why do this?

This is in anticipation of removing the permission for members to read other members. (#5002)

In order to relate the context data to the primary resource (template, workspace build, etc), it is best to JOIN this in SQL. How we currently do it is fetch the user in another query, but with the permission change, that fetch will fail.

This has the side effect of reducing the DB calls needed to read a template.

Implementation

database.Template is now always this view. We never return the table without this joined data.

A migration is needed to add a VIEW for SQLc to generate a model struct that can be shared by all template related queries.

sqlc.embed was tried first, but this still creates a new struct for each query. Because we use sqlc's method signatures directly throughout the codebase, using these unique structs is a lot more work to deal with.

The downside to the VIEW is that if a column is added to the view, the view must be dropped and recreated to include the new column. I added a unit test that is run on generate.sh that ensures the view includes all fields from the original table. So if the table is updated, this error will happen if no migration is included to fix it.

Screenshot from 2023-07-18 13-43-01

Slippery Slope

A common objection to this is that this only solves this particular join. What if we want join more information in tomorrow?

We do want to join more today, but we haven't. We only haven't because we do not have a more general solution on how to handle joins. Tackling that problem has proven to be challenging, and has resulted in closed attempts in the past under the common "premature optimization". We currently just resort to making N calls to replace N joins.

This PR is different because I am removing the general permission to read users. So the +1 db call would now fail.

This contextual user information should be available, as we do have user_id as apart of the template/workspacebuild/version, and to the UI, username is more informative.

So this view is to make our permission system consistent with our data model.

I think this PR helps move the needle on needing JOINs, but is not enough to cause the change. When we finally tackle the JOIN challenge, this will just serve as more context for that eventual RFC.

Other Ideas

A branch was made to not use sqlc at all and just write queries using sqlx. The downside is that all sql queries do not have sql highlighting in the IDE. All VIEWs created in this PR become constant strings that use a %s formatting directive instead in the sqlx query.

It works, but you lose the sqlc framework entirely and move to manual queries. This has been opposed in the past.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sold on this. Today we're adding TemplateWithOwner, tomorrow do we add TemplateWithOwnerAndDependentWorkspaces? Where do we draw the line?

Given that the goal here is to support removing the ability for regular users to read all other user data, maybe we can allow all users to read a subset of other users' data, but disallow enumeration (e.g. allow GetPublicUserInfo but not ListAllUsers).

@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

I'm not 100% sold on this. Today we're adding TemplateWithOwner, tomorrow do we add TemplateWithOwnerAndDependentWorkspaces? Where do we draw the line?

Given that the goal here is to support removing the ability for regular users to read all other user data, maybe we can allow all users to read a subset of other users' data, but disallow enumeration (e.g. allow GetPublicUserInfo but not ListAllUsers).

I intentionally only included the owner because this information is arguably always relevant and always ok to join. This PR replaces all instances of Template in the codebase with this new one. If you cannot do that, than the new JOIN is not standard enough.

TemplateWithOwnerAndDependentWorkspaces is not as clear cut imo, and you would realize there are cases where you do not want the dependencies.

So the line would be just that. What is the base standard information to include in the "primitive" resource. If it comes to be we should include say the "Active template version name", then I think it could be argued to join that in too in the future.

Naming gets annoying though... TemplateWithOwnerAndVersionName sucks. Maybe ExtendedTemplate, ExpandedTemplate, ... ?

@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

I did try using sqlc.embed and skipping sqlc and using sqlx and manual queries.


The downside to sqlc.embed was the new struct for each endpoint and the fixed method signatures. An idea I had was to make some interface like:

type Template interface {
  Template() database.Template
}

Then if I join more tables, it just returns interfaces with more fields, but it's still a Template.

type TemplateWithOwner interface {
   Template
   Owner() VisibleUser
}

The next JOIN might be:

type TemplateWithDependencies interface {
   Template
   TemplateWithOwner
   ActiveVersion() database.TemplateVersion
}

This would allow us to still write general functions for things like ConvertTemplate() and what not.

I would use sqlx to make these models and method signatures.

@mafredri
Copy link
Member

The downside to sqlc.embed was the new struct for each endpoint and the fixed method signatures.

Would it be reasonable to patch sqlc to support naming embeds, thus only creating a single named struct (all same-named embeds should have the exact same fields)?

Another idea, would it be reasonable to move the "private" user data into a separate table? This way we can assume querying the regular users table is always OK (sans listing all users). With the private data in a separate table, it requires explicit intent to include it. This may be a bad suggestion as my understanding of what we're trying to do/achieve here is a bit limited.

@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

Would it be reasonable to patch sqlc to support naming embeds, thus only creating a single named struct (all same-named embeds should have the exact same fields)?

This would solve the new struct per query problem, but it would still leave the problem of copy pasting the SQL table being joined. The view prevents having to copy/paste this new "Templates" table on each query. Views cannot use the sqlc.embed functionality, so I lose the ability to decompose the Template from the TemplateWithUser easily.

SELECT
     templates.*, users.username AS created_by_username, users.avatar_url AS created_by_avatar_url
FROM
    templates
LEFT JOIN
    (%s) AS users
ON
    templates.created_by = users.id

Another idea, would it be reasonable to move the "private" user data into a separate table? This way we can assume querying the regular users table is always OK (sans listing all users). With the private data in a separate table, it requires explicit intent to include it. This may be a bad suggestion as my understanding of what we're trying to do/achieve here is a bit limited.

This was my idea with the VisibleUsers view. I did not want to make it a new table, because then all User queries would need to JOIN with the new "Public fields" table. And using JOINs is the problem, so trying not to decompose more resources. I think the VisibleUsers view is fair though because it should never change. As in, the public fields on the user are relatively static.

@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

@johnstcn @mafredri One way to get around a lot of the type stuff is just to make this the new template type. I could use the sqlc.yaml to rename TemplateWithUser -> Template.

All our code will just use this new view. There still is the issue of updating the view on new columns, and that is just a new chore we will have. I can even add a unit test to compare the fields on Template and TemplateTable or something and print an error message of what to do.

@johnstcn I know this doesn't solve the slippery slope of where to stop (TemplateWithDependencies etc)


There is the issue of solving the greater JOIN problem. The greater JOIN problem will be more code changes and change how we write queries. This current PR issue is a bit different since I am effectively just adding "username" and "avatar_url" columns to Template, WorkspaceBuild, and TemplateVersion. Obviously I need to pull these dynamically.

@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

@coadler also pointed out a concern for rebuilding the views. I think I can come up with a unit test that will error nicely. You would have to run CI to see said error though 😢

EDIT: Maybe in the make gen code I can run a checker that will print something 🤔

@kylecarbs
Copy link
Member

@Emyrk can we just maintain our own slight fork of sqlc? We depend on it so heavily that we shouldn't be limited by the most common use-cases imo.

@kylecarbs
Copy link
Member

I like the idea that @mafredri has with naming embeds...

@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

@Emyrk can we just maintain our own slight fork of sqlc? We depend on it so heavily that we shouldn't be limited by the most common use-cases imo.

We would probably only need to fork the "plugin": https://github.com/sqlc-dev/sqlc-go

https://docs.sqlc.dev/en/latest/guides/plugins.html

There is 2 things we would need to support to get this "clean".

I would be in favor of this fork idea. For this specific PR though I would prefer just to use the VIEW.

I think forking sqlc would be a very interesting idea though that opens up possibilities.

@Emyrk Emyrk marked this pull request as ready for review July 18, 2023 17:46
@Emyrk
Copy link
Member Author

Emyrk commented Jul 18, 2023

@matifali @johnstcn @kylecarbs

For this PR and the next types WorkspaceBuild and TemplateVersion, I am making a VIEW to replace the existing type. We will only use the VIEW.

If we want to change this in the future to support more joins or whatever, we can address it then. I do not want to fork SLQc, or use SQLx, or write custom queries, or ..etc.. right now.

JOIN support can come, and do not think this PR would prevent that. I say we do this simple thing now, and when we hit another 2 or 3 cases, we find a better way to generalize it.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks OK; I realise that there is no easy way around the issues Steven desribed. Deferring to @kylecarbs or @ammario for final approval however.

Comment on lines +174 to +177
template, err = tx.GetTemplateByID(ctx, template.ID)
if err != nil {
return xerrors.Errorf("get updated template by ID: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking): Needing to re-fetch the template to populate the audit log is an unfortunate side-effect. It would be nice if we could move this into a trigger on the audit_logs table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it is unfortunate. I could have made the audit log table take TemplateTable, but then we would have 2 types floating around.

We do have 1 less read in the fetch case now as we do not need to fetch the user, so it's a trade for 1 call on the read, to 1 call on the update.

If we are willing to add postgres RULEs we can actually "insert into a view". But that would complicate the view.

Copy link
Member

@johnstcn johnstcn Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a trade for 1 call on the read, to 1 call on the update.

I think this is a fair trade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could move this into a trigger on the audit_logs table.

Unfortunately we support multiple paths for audit logs to be exported, so it can't just be as simple as a trigger :(

Comment on lines +13 to +24
func TestViewSubsetTemplate(t *testing.T) {
t.Parallel()
table := reflect.TypeOf(database.TemplateTable{})
joined := reflect.TypeOf(database.Template{})

tableFields := allFields(table)
joinedFields := allFields(joined)
if !assert.Subset(t, fieldNames(joinedFields), fieldNames(tableFields), "table is not subset") {
t.Log("Some fields were added to the Template Table without updating the 'template_with_users' view.")
t.Log("See migration 000138_join_users_up.sql to create the view.")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice molly-guard

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prevent us from forgetting to update the view @coadler

coderd/schedule/template.go Outdated Show resolved Hide resolved
docs/admin/audit-logs.md Outdated Show resolved Hide resolved
enterprise/coderd/provisionerdaemons.go Outdated Show resolved Hide resolved
enterprise/audit/table.go Outdated Show resolved Hide resolved
Comment on lines 1 to 29
BEGIN;

CREATE VIEW
visible_users
AS
SELECT
id, username, avatar_url
FROM
users;

COMMENT ON VIEW visible_users IS 'Visible fields of users are allowed to be joined with other tables for including context of other resources.';

CREATE VIEW
template_with_users
AS
SELECT
templates.*,
coalesce(visible_users.avatar_url, '') AS created_by_avatar_url,
coalesce(visible_users.username, '') AS created_by_username
FROM
templates
LEFT JOIN
visible_users
ON
templates.created_by = visible_users.id;

COMMENT ON VIEW template_with_users IS 'Joins in the username + avatar url of the created by user.';

COMMIT;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the big change. I am introducing views so sqlc creates a consistent return type to be used.

Comment on lines -454 to -461
createdByNameMap, err := getCreatedByNamesByTemplateIDs(ctx, api.Database, []database.Template{template})
if err != nil {
httpapi.Write(ctx, rw, http.StatusInternalServerError, codersdk.Response{
Message: "Internal error fetching creator name.",
Detail: err.Error(),
})
return
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is no longer needed as now the data is joined

@@ -75,7 +75,7 @@ INSERT INTO
allow_user_cancel_workspace_jobs
)
VALUES
($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) RETURNING *;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove Returning *. Otherwise we would have 2 types used in the codebase. database.Template and database.TemplateWithUser.

This now requires a read to happen after an update or insert to fetch the updated data. This adds 1 extra DB call in those cases, but we remove 1 db call in the read case, and this should be less frequent anyway.

Comment on lines +34 to +35
template: TemplateTable
template_with_user: Template
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template_with_user is the only template to be used throughout the codebase now. The original Template type is needed by DBFake.

coderd/database/dbfake/dbfake.go Show resolved Hide resolved
docs/admin/audit-logs.md Outdated Show resolved Hide resolved
enterprise/coderd/provisionerdaemons.go Outdated Show resolved Hide resolved
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we could patch the db interface to do the update/select in one place vs. everywhere (avoid cost of no returning*). But it's not clear to me how we'd do that.

Btw, do API docs need updating? I see there was no change in documented/generated return types.

coderd/database/dbfake/dbfake.go Show resolved Hide resolved
coderd/schedule/template.go Outdated Show resolved Hide resolved
@Emyrk
Copy link
Member Author

Emyrk commented Jul 19, 2023

Would be nice if we could patch the db interface to do the update/select in one place vs. everywhere (avoid cost of no returning*). But it's not clear to me how we'd do that.

We'd need some layer between the db and the database.Store, which we currently don't have. So there isn't really a trivial way to do it unfortunately.

Btw, do API docs need updating? I see there was no change in documented/generated return types.

Should not be. The fields were always in the api, we just had to do a GetUserByID before.

@Emyrk Emyrk merged commit aceedef into main Jul 19, 2023
19 checks passed
@Emyrk Emyrk deleted the stevenmasley/table_view_with_user branch July 19, 2023 20:07
@github-actions github-actions bot locked and limited conversation to collaborators Jul 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants