[WiP][SIP-125] Proposal for Enhanced Data Access Permissions #28002

mistercrunch · 2024-04-12T03:09:30Z

[SIP-125] Proposal for Enhanced Data Access Permissions

Motivation

Superset's current permission model relies on Flask-AppBuilder (FAB) for both app-level and data access permissions, dictating what users can do within the app and what data they can access (e.g., databases, schemas, tables, rows).

This proposal aims to decouple data access permissions from FAB, integrating them more directly into Superset's core models (Database, Schema, SqlaTables, RLS), thereby bringing the management of these permissions more natively inside Superset. To be clear this proposal is specific to data access permissions and propose to keep app-level permissions (as in what features of the app a user can use) in FAB.

micro glossary

"permissible objects" refers to Dashboards, Datasets, Schemas, Catalogs, RLS, and Databases.

To better justify breaking the security model apart let's lay the differences between app-level permission and data access permissions:

Dynamism: App-level permissions are relatively static, changing slowly and atomically with new versions of the app as we release features. Data access permissions, however, are highly dynamic and are different in each environments.
Scale: While app-level permission are contained (about one hundred for Superset), many Superset environments manage tens of thousands of permissible objects.
Actions: The nature of actions differs; app-level permissions include CRUD operations on app models, data access is primarily binary (access or no access).
Hierarchy: Data access permissions are highly hierarchical, from Database to Catalog, then Schema, Dataset, and finally RLS.
Virtuality: many of the objects referred to for data access are exist outside of Superset, meaning when a new schema or table is created, Superset may be out of the loop, requiring periodic object reconciliation.

Current State

Currently, the creation, update, or deletion of permissible objects necessitates corresponding changes in FAB's permission model, where both a label and an ID are captured. This model has proven brittle and difficult to manage, leading to data sync issues and posing security risks.

In larger environments, the permissions tables have become unwieldy, lacking proper indexing, which complicates both individual access requests and the generation of user-specific dashboards lists.

Proposed Change

Deep end

This proposal introduces a structured, normalized database model that links objects directly through many-to-many relationships with roles and defines the hierarchical relationships between these objects. We also propose a new "DataAccessVerb" to clearly define the permissions associated with each role:

Establish many-to-many relationships between FAB's Role and Dataset (SQLATable), and between Role and DatabaseConnection.
Introduce a new Schema model with many-to-many relationships to both FAB's Role and Dataset, and a many-to-one relationship to Database.
Introduce a new model for Catalogs, applicable to specific database engines, featuring many-to-many relationships with FAB's Role, and many-to-one relationships to both Schema and Database.
Introduce DataAccessVerb with initial verbs like READ, [potentially] DENY, and possibly GRANT (for future use in defining grant permissions). The model including a verb can evolve to support more complex use cases.
Enhance the existing many-to-many relationship with the Dashboard object, making it standard.
Include an action field in all many-to-many tables linking roles to permissible objects.

UX

NOTE: this requires more thinking and the involvement of a product designer, but lays down the key considerations.

I'm suggesting a new Role-centric CRUD view where the Role edit screen shows:

Header:

Role name
Role description
Membership
Groups [accumulator]: pick some groups
Users [accumulator]: pick individual users as member of the role
Data Access Objects
Dashboards [accumulator]
Database:
- Catalog[accumulator]: if it applies to the database engine, allow picking catalogs
  - Schema
    - Datasets
      - RLS

For data access objects, we want to allow users to navigate the potentially massive hierarchy, pick objects + verbs that apply and accumulate them in some capacity. A tree-like structure with checkboxes (or dropdown with action words) seems resonable.

There's a consideration around clarifying direct access VS inherited access in a hierarchy, and we should get that right. Giving access to all 3 current schemas in a database connection is NOT the same as giving access to the database connection and all future schemas it may grow in the future.

Another design consideration is the fact that we have an almost perfect hierarchy, but may deviate from that perfection int he future if and when we integrate with Column-level-permission, splitting datasets two ways between RLS and columns.

New or Changed Public Interfaces

We plan to expand the SecurityManager interface while maintaining compatibility with its existing structure as much as possible.
New CLI methods will be developed to synchronize permissions objects and identify orphaned entries when underlying data objects are removed or modified.
Let's introduce a new PermissibleModelMixin to reuse logic across all models that are permissible

New Dependencies

This should be all baked under our roof. In many ways we're reframing the framework (FAB) under our root here.

Migration Plan and Compatibility

We will implement a comprehensive database migration to transition existing permissions to the new
model. The new model will fully support the semantics of the current system ensuring full backwards capability.

Rejected Alternatives + out-of scope

versioning: while we understand the value of keeping track of "who had access to what when?", it's beyond the scope of Superset to manage that level of complexity. Our goal is limited to apply the state of the rules at a given time. For versioning we believe a data governance tool like the proposal I wrote for governator are better suited to solve for this in a federated fashion
auditing: "who accessed what when" is also somewhat beyond the scope of Superset, though integrating with external systems through hooks seem reasonable. An administrator might want to hook up a system of records to keep track of "who accessed what when?" in some other system
meta-data-access: in data governance systems, sometimes there's a need to define "who can grant access to what resource?" Making sure that say a Finance team manager can grant access to certain resources but not others. This is beyond the scope for Superset at this time, but potentially could be implement through the introduction of a new GRANTOR verb or similar.

Considerations - request for comments

The groups model, should should it live in FAB or in Superset?
Many-to-many table between objects and the Role table, should they all through the same secondary table with an entity field for the different objects, or one table per object? Leaning towards a single table.
uuid-centric: should we just go ahead and store uuid more natively across the board, try to avoid using auto-increment numbers as much as possible. This makes it easier to migrate rules across tools and Superset environments. Leaning towards using uuid as much as possible
Interplay with ownership, does ownership provide rights, or does the act of becoming an owner lead the creation of permissions? I vote the later, keeping perms self-contained
Tags as a data access construct: the same way that groups are nifty/reusable collections of users, tags are nifty, curated sets of different objects, and could be useful as a data access construct. There are clear risks here though, especially around the casualness of tags, and people may not realize that adding a tag to an object has data-access implications. This would require the introduction of special/protected tags, but may be reasonable. Maybe one we keep for V2.
just-in-timeness: which objects do we want to fully sync VS only keep track of the ones for which people want to lay out rules. In many envrionments data warehouse may have up to millions of tables, and syncing all this metadata may not be the best idea. We probably want to keep the idea of bringing tables into Superset as datasets as users require them
Integration/sync with data governance solution?
Allow to defer on external systems in real-time?
Column-level support? Metric-level? both? I'd say out-of-scope for now, but we can keep this in mind for extensibility as we design and build this new data access model
More of a personal thought - but governator could be built as a friendly companion to Superset to pick up where things get out of scope.

The text was updated successfully, but these errors were encountered:

mistercrunch · 2024-04-12T04:03:10Z

Source for the diagram above - plantuml + vscode is pretty neat!

@startuml Superset Data Access Models
title SIP #125 - Superset Data Access Models


' hide the spot
' hide circle

' avoid problems with angled crows feet
skinparam linetype ortho
!theme blueprint

'left to right direction
scale 4


package "FAB Security Models" #black {

    entity "User" as user {
        *id: number <<generated>>
        --
        uuid
        *name : text
    }

    entity "Group" as group {
        *id: number <<generated>>
        --
        uuid
        *name : text
    }

    entity "Role" as role {
        *id: number <<generated>>
        --
        uuid
        *name : text
    }
    entity "Permission" as fab_permission {
        *id: number <<generated>>
        --
        uuid
        *name : text
    } 
    entity "ViewMenu" as fab_viewmenu{
        *id: number <<generated>>
        --
        uuid
        *name : text
    } 
    entity "PermissionView" as fab_permission_view{
        *id: number <<generated>>
        --
        uuid
        *name : text
    } 
}
package "New Data Access Models" #black {
    entity "Verb" as verb {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }

    entity "DataAccessRole" as data_access_role {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }

    entity "DataPermission" as data_permission {
        *data_perm_id: number <<generated>>
        --
        uuid
        verb: str
        entity: str
        object_id: int
    }
}
package "Database Models" #Black {

    entity "DatabaseCatalog" as catalog {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }
    entity "Schema" as schema {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }
    entity "RLS" as rls {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }
    entity "DatabaseConnection" as database_connection {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }

    entity "Dataset" as dataset {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }
}
package "Other Models" #Black {
    entity "Dashboard" as dashboard {
        *user_id : number <<generated>>
        --
        uuid
        *name : text
    }
}


database_connection ||..|{ catalog
database_connection ||..|{ schema 
catalog ||..|{ schema 
schema ||..|{ dataset
dataset ||..|{ rls

data_access_role }|..|{ data_permission
verb ||..|{ data_permission
data_permission }|..|{ database_connection
data_permission }|..|{ dashboard

data_permission }|..|{ dataset
data_permission }|..|{ schema


user }|..|{ group
user }|..|{ role
group }|..|{ role
fab_permission ||..|{ fab_permission_view 
fab_viewmenu ||..|{ fab_permission_view 

fab_permission_view }|..|{ role 

user }|..|{ data_access_role
group }|..|{ data_access_role

@enduml

mistercrunch · 2024-04-15T21:48:49Z

Closing in favor of SIP-126 - a more federated approach to RBAC/ABAC

rusackas · 2024-04-23T15:27:28Z

Should we close this as discarded if we're merging it with the other one?

mistercrunch added the sip Superset Improvement Proposal label Apr 12, 2024

mistercrunch changed the title ~~[DRAFT - WiP][SIP] Proposal for Enhanced Data Access Permissions~~ [WiP][SIP-125] Proposal for Enhanced Data Access Permissions Apr 12, 2024

dosubot bot mentioned this issue May 7, 2024

[SIP-131] Superset Security Model Redesign #28377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WiP][SIP-125] Proposal for Enhanced Data Access Permissions #28002

[WiP][SIP-125] Proposal for Enhanced Data Access Permissions #28002

mistercrunch commented Apr 12, 2024 •

edited

mistercrunch commented Apr 12, 2024 •

edited

mistercrunch commented Apr 15, 2024

rusackas commented Apr 23, 2024

[WiP][SIP-125] Proposal for Enhanced Data Access Permissions #28002

[WiP][SIP-125] Proposal for Enhanced Data Access Permissions #28002

Comments

mistercrunch commented Apr 12, 2024 • edited

[SIP-125] Proposal for Enhanced Data Access Permissions

Motivation

micro glossary

Current State

Proposed Change

Deep end

UX

New or Changed Public Interfaces

New Dependencies

Migration Plan and Compatibility

Rejected Alternatives + out-of scope

Considerations - request for comments

mistercrunch commented Apr 12, 2024 • edited

mistercrunch commented Apr 15, 2024

rusackas commented Apr 23, 2024

mistercrunch commented Apr 12, 2024 •

edited

mistercrunch commented Apr 12, 2024 •

edited