-
Notifications
You must be signed in to change notification settings - Fork 82
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
### Feature or Bugfix - Refactoring (Modularization) ### Relates - Related issues #295 and #412 ### Short Summary First part of migration of `Dataset` (`DatasetTableColumn`) TL;DR :) ### Long Summary Datasets are huge. It's one of the central modules that's spread everywhere across the application. Migrating the entire Dataset piece would be very difficult task and, more importantly, even more difficult to review. Therefore, I decided to break down this work into "small" steps to make it more convenient to review. Dataset's API consist of the following items: * `Dataset` * `DatasetTable` * `DatasetTableColumn` * `DatasetLocation` * `DatasetProfiling` In this PR, there is only creation of `Dataset` module and migration of `DatasetTableColumn` (and some related to it pieces). Why? Because the plan was to migrate it, to see what issues would come up along with it and to address them here. The refactoring of `DatasetTableColumn` will be in other PR. The issues: 1) Glossaries 2) Feed 3) Long tasks for datasets 4) Redshift Glossaries rely on GraphQL UNION of different type (including datasets). Created an abstraction for glossary registration. There was an idea to change frontend, but it'd require a lot of work to do this Feed: same as glossaries. Solved the similar way. For feed, changing frontend API is more feasible, but I wanted it to be consistent with glossaries Long tasks for datasets. They migrated into tasks folder and doesn't require a dedicated loading for its code (at least for now). But there are two concerns: 1) The deployment uses a direct module folder references to run them (e.g. `dataall.modules.datasets....`, so basically when a module is deactivated, then we shouldn't deploy this tasks as well). I left a TODO for it to address in future (when we migrate all modules), but we should bear in mind that it might lead to inconsistencies. 2) There is a reference to `redshift` from long-running tasks = should be address in `redshift` module Redshift: it has some references to `datasets`. So there will be either dependencies among modules or small code duplication (if `redshift` doesn't rely hard on `datasets`) = will be addressed in `redshift` module Other changes: Fixed and improved some tests Extracted glue handler code that related to `DatasetTableColumn` Renamed import mode from tasks to handlers for async lambda. A few hacks that will go away with next refactoring :) Next steps: [Part2 ](nikpodsh#1) in preview :) Extract rest of datasets functionality (perhaps, in a few steps) Refactor extractor modules the same way as notebooks Extract tests to follow the same structure. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
- Loading branch information
Showing
62 changed files
with
620 additions
and
425 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
from dataclasses import dataclass | ||
from typing import Type, Dict | ||
|
||
from dataall.api import gql | ||
from dataall.api.gql.graphql_union_type import UnionTypeRegistry | ||
from dataall.db import Resource, models | ||
|
||
|
||
@dataclass | ||
class FeedDefinition: | ||
target_type: str | ||
model: Type[Resource] | ||
|
||
|
||
class FeedRegistry(UnionTypeRegistry): | ||
"""Registers models for different target types""" | ||
_DEFINITIONS: Dict[str, FeedDefinition] = {} | ||
|
||
@classmethod | ||
def register(cls, definition: FeedDefinition): | ||
cls._DEFINITIONS[definition.target_type] = definition | ||
|
||
@classmethod | ||
def find_model(cls, target_type: str): | ||
return cls._DEFINITIONS[target_type].model | ||
|
||
@classmethod | ||
def find_target(cls, obj: Resource): | ||
for target_type, definition in cls._DEFINITIONS.items(): | ||
if isinstance(obj, definition.model): | ||
return target_type | ||
return None | ||
|
||
@classmethod | ||
def types(cls): | ||
return [gql.Ref(target_type) for target_type in cls._DEFINITIONS.keys()] | ||
|
||
|
||
FeedRegistry.register(FeedDefinition("Worksheet", models.Worksheet)) | ||
FeedRegistry.register(FeedDefinition("DataPipeline", models.DataPipeline)) | ||
FeedRegistry.register(FeedDefinition("DatasetTable", models.DatasetTable)) | ||
FeedRegistry.register(FeedDefinition("DatasetStorageLocation", models.DatasetStorageLocation)) | ||
FeedRegistry.register(FeedDefinition("Dashboard", models.Dashboard)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
from dataclasses import dataclass | ||
from typing import Type, Dict, Optional, Protocol, Union | ||
|
||
from dataall.api import gql | ||
from dataall.api.gql.graphql_union_type import UnionTypeRegistry | ||
from dataall.db import Resource, models | ||
|
||
|
||
class Identifiable(Protocol): | ||
def uri(self): | ||
... | ||
|
||
|
||
@dataclass | ||
class GlossaryDefinition: | ||
"""Glossary's definition used for registration references of other modules""" | ||
target_type: str | ||
object_type: str | ||
model: Union[Type[Resource], Identifiable] # should be an intersection, but python typing doesn't have one yet | ||
|
||
def target_uri(self): | ||
return self.model.uri() | ||
|
||
|
||
class GlossaryRegistry(UnionTypeRegistry): | ||
"""Registry of glossary definition and API to retrieve data""" | ||
_DEFINITIONS: Dict[str, GlossaryDefinition] = {} | ||
|
||
@classmethod | ||
def register(cls, glossary: GlossaryDefinition) -> None: | ||
cls._DEFINITIONS[glossary.target_type] = glossary | ||
|
||
@classmethod | ||
def find_model(cls, target_type: str) -> Optional[Resource]: | ||
definition = cls._DEFINITIONS[target_type] | ||
return definition.model if definition is not None else None | ||
|
||
@classmethod | ||
def find_object_type(cls, model: Resource) -> Optional[str]: | ||
for _, definition in cls._DEFINITIONS.items(): | ||
if isinstance(model, definition.model): | ||
return definition.object_type | ||
return None | ||
|
||
@classmethod | ||
def definitions(cls): | ||
return cls._DEFINITIONS.values() | ||
|
||
@classmethod | ||
def types(cls): | ||
return [gql.Ref(definition.object_type) for definition in cls._DEFINITIONS.values()] | ||
|
||
|
||
GlossaryRegistry.register(GlossaryDefinition("DatasetTable", "DatasetTable", models.DatasetTable)) | ||
GlossaryRegistry.register(GlossaryDefinition("Folder", "DatasetStorageLocation", models.DatasetStorageLocation)) | ||
GlossaryRegistry.register(GlossaryDefinition("Dashboard", "Dashboard", models.Dashboard)) | ||
GlossaryRegistry.register(GlossaryDefinition("DatasetTable", "DatasetTable", models.DatasetTable)) | ||
GlossaryRegistry.register(GlossaryDefinition("Dataset", "Dataset", models.Dataset)) |
Oops, something went wrong.