Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split command #63

Merged
merged 46 commits into from
Jul 7, 2023
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
f316613
POC move a model
dave-connors-3 Jun 8, 2023
b377725
move dbt project back
dave-connors-3 Jun 8, 2023
0ffcb8e
move resource and move resource yml entry methods, lots of renames to…
dave-connors-3 Jun 9, 2023
6ee184e
move resources and their yml entries
dave-connors-3 Jun 13, 2023
c7846d0
attempt to solve some mypys
dave-connors-3 Jun 13, 2023
b2a2ba1
add some workarounds for source yml methods
dave-connors-3 Jun 13, 2023
f04f888
remove breakpoint
dave-connors-3 Jun 13, 2023
cadeae9
add method for writing the project file
dave-connors-3 Jun 13, 2023
34469f4
add custom macro for testing
dave-connors-3 Jun 15, 2023
02d3ef2
method for moving custom macro files
dave-connors-3 Jun 15, 2023
1e1f15f
move packages yml
dave-connors-3 Jun 15, 2023
b29c192
small tweaks, unit tests for removing yml entries
dave-connors-3 Jun 16, 2023
131a31d
add add_yml_entry tests
dave-connors-3 Jun 16, 2023
cd2293b
rename get_manifest_node
dave-connors-3 Jun 20, 2023
8d69608
comments from review
dave-connors-3 Jun 20, 2023
dd614b7
refactor initialize methods into separate class
dave-connors-3 Jun 20, 2023
eaef0db
refactor file manager for new use case and correct broken tests
dave-connors-3 Jun 20, 2023
59f2d93
tests for subproject initialization
dave-connors-3 Jun 23, 2023
897e869
rename file and update imports
dave-connors-3 Jun 23, 2023
5eb952a
method for updating sql refs to two arguments
dave-connors-3 Jun 23, 2023
7089466
python code editor method
dave-connors-3 Jun 23, 2023
e1fb68e
add step for updating refs
dave-connors-3 Jun 23, 2023
53390a5
remove accidentally committed effects of meshify
dave-connors-3 Jun 23, 2023
e9db0e1
disambiguate access and group yml operations
dave-connors-3 Jun 23, 2023
42bc070
update tests with renamed methods
dave-connors-3 Jun 23, 2023
6c6e610
remove contents from project that shouldn't be there
dave-connors-3 Jun 23, 2023
ed752ad
add dependencies.yml logic
dave-connors-3 Jun 23, 2023
dc09510
try subprocess poetry run dbt in test case
dave-connors-3 Jun 26, 2023
43efeb4
small refactor in test suite
dave-connors-3 Jun 26, 2023
947889c
add integration tests for split command
dave-connors-3 Jun 26, 2023
7b24191
merge main
dave-connors-3 Jun 26, 2023
2628464
merge conflicts
dave-connors-3 Jun 27, 2023
57d1aec
copy groups indirectly selected by split operation
dave-connors-3 Jun 27, 2023
9128ad6
Apply suggestions from code review
dave-connors-3 Jul 5, 2023
de3476b
merge main
dave-connors-3 Jul 6, 2023
d907f25
merge main
dave-connors-3 Jul 6, 2023
db8c1e3
fix subproject select resources method
dave-connors-3 Jul 6, 2023
d7868ad
revert groups from errant commit
dave-connors-3 Jul 6, 2023
e8f2791
refactor test to clone project
dave-connors-3 Jul 6, 2023
3db0141
delete yaml editor file
dave-connors-3 Jul 6, 2023
c009159
add basic logging to split operation
dave-connors-3 Jul 6, 2023
307cf66
update test project setup to include seeding db
dave-connors-3 Jul 6, 2023
e20f99e
Update dbt_meshify/storage/file_content_editors.py
dave-connors-3 Jul 7, 2023
4d5e2df
Apply suggestions from code review
dave-connors-3 Jul 7, 2023
dda0021
change tpye hint, use class method
dave-connors-3 Jul 7, 2023
b222ef1
Update dbt_meshify/storage/file_content_editors.py
dave-connors-3 Jul 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions dbt_meshify/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@
help="The path to the dbt project to operate on. Defaults to the current directory.",
)

create_path = click.option(
"--create-path",
type=click.Path(exists=True),
default=None,
help="The path to create the new dbt project. Defaults to the name argument supplied.",
)

exclude = click.option(
"--exclude",
"-e",
Expand Down
97 changes: 82 additions & 15 deletions dbt_meshify/dbt_projects.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,14 @@
from dbt.contracts.project import Project
from dbt.contracts.results import CatalogArtifact, CatalogTable
from dbt.graph import Graph
from dbt.node_types import NodeType

from dbt_meshify.dbt import Dbt
from dbt_meshify.storage.file_content_editors import (
DbtMeshConstructor,
filter_empty_dict_items,
)
from dbt_meshify.storage.file_manager import DbtFileManager

logger = logging.getLogger()

Expand Down Expand Up @@ -100,7 +106,8 @@ def installed_packages(self) -> Set[str]:
if item.package_name:
_hash = hashlib.md5()
_hash.update(item.package_name.encode("utf-8"))
project_packages.append(_hash.hexdigest())
if _hash.hexdigest() != self.manifest.metadata.project_id:
project_packages.append(_hash.hexdigest())
return set(project_packages)

@property
Expand Down Expand Up @@ -152,7 +159,19 @@ def get_catalog_entry(self, unique_id: str) -> Optional[CatalogTable]:

def get_manifest_node(self, unique_id: str) -> Optional[ManifestNode]:
"""Returns the catalog entry for a model in the dbt project's catalog"""
dave-connors-3 marked this conversation as resolved.
Show resolved Hide resolved
return self.manifest.nodes.get(unique_id)
if unique_id.split(".")[0] in [
nicholasyager marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about exposure and metric/measure? Are those not listed on purpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic was to get nodes vs the resources that are also other top-level keys in the manifest. Looking with fresh eyes, I would guess there's a dbt-core class that represents this type that we could leverage instead of a list of strings!

"model",
"seed",
"snapshot",
"test",
"analysis",
"snapshot",
]:
return self.manifest.nodes.get(unique_id)
else:
pluralized = NodeType(unique_id.split('.')[0]).pluralize()
resources = getattr(self.manifest, pluralized)
return resources.get(unique_id)


class DbtProject(BaseDbtProject):
Expand Down Expand Up @@ -216,10 +235,13 @@ def split(
project_name: str,
select: str,
exclude: Optional[str] = None,
selector: Optional[str] = None,
) -> "DbtSubProject":
"""Create a new DbtSubProject using NodeSelection syntax."""

subproject_resources = self.select_resources(select, exclude)
subproject_resources = self.select_resources(
select=select, exclude=exclude, selector=selector, output_key="unique_id"
)

# Construct a new project and inject the new manifest
subproject = DbtSubProject(
Expand All @@ -242,13 +264,65 @@ class DbtSubProject(BaseDbtProject):
def __init__(self, name: str, parent_project: DbtProject, resources: Set[str]):
self.name = name
self.resources = resources
self.parent = parent_project
self.parent_project = parent_project
self.path = parent_project.path / Path(name)

self.manifest = parent_project.manifest.deepcopy()
# self.manifest = parent_project.manifest.deepcopy()
# i am running into a bug with the core deepcopy -- checking with michelle
self.manifest = copy.deepcopy(parent_project.manifest)
self.project = copy.deepcopy(parent_project.project)
self.catalog = parent_project.catalog
self.custom_macros = self._get_custom_macros()
self.groups = self._get_indirect_groups()

self._rename_project()

super().__init__(self.manifest, self.project, self.catalog)
super().__init__(self.manifest, self.project, self.catalog, self.name)

def _rename_project(self) -> None:
"""
edits the project yml to take any instance of the parent project name and update it to the subproject name
"""
project_dict = self.project.to_dict()
for key in [resource.pluralize() for resource in NodeType]:
if self.parent_project.name in project_dict.get(key, {}).keys():
project_dict[key][self.name] = project_dict[key].pop(self.parent_project.name)
project_dict["name"] = self.name
self.project = Project.from_dict(project_dict)

def _get_custom_macros(self) -> Set[str]:
"""
get a set of macro unique_ids for all the selected resources
"""
macros_set = set()
for unique_id in self.resources:
resource = self.get_manifest_node(unique_id)
if not resource:
continue
macros = resource.depends_on.macros
project_macros = [
macro
for macro in macros
if hashlib.md5((macro.split(".")[1]).encode()).hexdigest()
== self.manifest.metadata.project_id
]
macros_set.update(project_macros)
return macros_set

def _get_indirect_groups(self) -> Set[str]:
"""
get a set of group unique_ids for all the selected resources
"""
groups = set()
for unique_id in self.resources:
resource = self.get_manifest_node(unique_id)
if not resource or resource.resource_type in [NodeType.Source, NodeType.Exposure]:
continue
group = resource.group
if group:
group_unique_id = f"group.{self.parent_project.name}.{group}"
groups.update({group_unique_id})
return groups

def select_resources(self, select: str, exclude: Optional[str] = None) -> Set[str]:
"""
Expand All @@ -259,7 +333,7 @@ def select_resources(self, select: str, exclude: Optional[str] = None) -> Set[st
if exclude:
args.extend(["--exclude", exclude])

results = self.parent.dbt.ls(self.parent.path, args)
results = self.parent_project.dbt.ls(self.parent_project.path, args)

return set(results) - self.resources

Expand All @@ -276,7 +350,7 @@ def split(
# Construct a new project and inject the new manifest
subproject = DbtSubProject(
name=project_name,
parent_project=copy.deepcopy(self.parent),
parent_project=copy.deepcopy(self.parent_project),
resources=subproject_resources,
)

Expand All @@ -285,13 +359,6 @@ def split(

return subproject

def initialize(self, target_directory: os.PathLike):
"""Initialize this subproject as a full dbt project at the provided `target_directory`."""

# TODO: Implement project initialization

raise NotImplementedError


class DbtProjectHolder:
def __init__(self) -> None:
Expand Down
50 changes: 20 additions & 30 deletions dbt_meshify/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@
import yaml
from dbt.contracts.graph.unparsed import Owner

from dbt_meshify.storage.dbt_project_creator import DbtSubprojectCreator

from .cli import (
create_path,
exclude,
group_yml_path,
owner,
Expand All @@ -18,7 +21,7 @@
selector,
)
from .dbt_projects import DbtProject, DbtProjectHolder, DbtSubProject
from .storage.yaml_editors import DbtMeshModelConstructor
from .storage.file_content_editors import DbtMeshConstructor


# define cli group
Expand Down Expand Up @@ -59,40 +62,29 @@ def connect(projects_dir):


@cli.command(name="split")
@create_path
@click.argument("project_name")
@exclude
@project_path
@select
@selector
def split():
def split(project_name, select, exclude, project_path, selector, create_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since select now allows multiple arguments, we cannot have --select before the project_name argument.

(dbt-meshify-py3.11) > $ poetry run dbt-meshify split --select "+orders" revenue                                                                                                                                         
Usage: dbt-meshify split [OPTIONS] PROJECT_NAME
Try 'dbt-meshify split --help' for help.

Error: Missing argument 'PROJECT_NAME'.

Instead, we need to order arguments/options specifically

(dbt-meshify-py3.11) > $ poetry run dbt-meshify split revenue --select "+orders"                                                                                                                                   

I don't this is a blocker per se. At the very least, documentation should be refined in a follow-up.

"""
!!! info
This command is not yet implemented

Splits dbt projects apart by adding all necessary dbt Mesh constructs based on the selection syntax.
Splits out a new subproject from a dbt project by adding all necessary dbt Mesh constructs to the resources based on the selected resources.

"""
path_string = input("Enter the relative path to a dbt project you'd like to split: ")

holder = DbtProjectHolder()

path = Path(path_string).expanduser().resolve()
path = Path(project_path).expanduser().resolve()
project = DbtProject.from_directory(path)
holder.register_project(project)

while True:
subproject_name = input("Enter the name for your subproject ('done' to finish): ")
if subproject_name == "done":
break
subproject_selector = input(
f"Enter the selector that represents the subproject {subproject_name}: "
)

subproject: DbtSubProject = project.split(
project_name=subproject_name, select=subproject_selector
)
holder.register_project(subproject)

print(holder.project_map())
subproject = project.split(
project_name=project_name, select=select, exclude=exclude, selector=selector
)
target_directory = Path(create_path) if create_path else None
subproject_creator = DbtSubprojectCreator(
subproject=subproject, target_directory=target_directory
)
Comment on lines +90 to +93
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am of the opinion that DbtSubprojectCreator should not allow None target_directory values. Instead, the calling method should be responsible for passing a valid target. This approach will allow us to reduce the complexity of the underlying API/class.

This is not something that needs to be tackled here, but rather in a refactor ticket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good feedback, definitely agree -- IIRC the init handles this, but it's definitely not necessary to hide that behavior!

subproject_creator.initialize()


@operation.command(name="add-contract")
Expand All @@ -117,8 +109,8 @@ def add_contract(select, exclude, project_path, selector, public_only=False):
for model_unique_id in models:
model_node = project.get_manifest_node(model_unique_id)
model_catalog = project.get_catalog_entry(model_unique_id)
meshify_constructor = DbtMeshModelConstructor(
project_path=project_path, model_node=model_node, model_catalog=model_catalog
meshify_constructor = DbtMeshConstructor(
project_path=project_path, node=model_node, catalog=model_catalog
dave-connors-3 marked this conversation as resolved.
Show resolved Hide resolved
)
meshify_constructor.add_model_contract()

Expand All @@ -145,9 +137,7 @@ def add_version(select, exclude, project_path, selector, prerelease, defined_in)
for model_unique_id in models:
model_node = project.get_manifest_node(model_unique_id)
if model_node.version == model_node.latest_version:
meshify_constructor = DbtMeshModelConstructor(
project_path=project_path, model_node=model_node
)
meshify_constructor = DbtMeshConstructor(project_path=project_path, node=model_node)
meshify_constructor.add_model_version(prerelease=prerelease, defined_in=defined_in)


Expand Down
Loading