Unified Schema #184

benfred · 2022-02-15T01:51:18Z

This modifies merlin-models to be compatabile with the new 'core' schema class that
will be shared with NVTabular

This modifies merlin-models to be compatabile with the new 'core' schema class that will be shared with NVTabular

review-notebook-app · 2022-02-15T01:51:22Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

merlin_models/loader/backend.py

merlin_models/tf/utils/tf_utils.py

tests/tf/test_core.py

karlhigley

Looks pretty good to me. Left a few comments on things we might want to abstract further, and one spot where I think there's a method name typo that didn't get caught by the tests.

karlhigley · 2022-02-18T16:14:48Z

merlin_models/data/synthetic.py

@@ -99,9 +104,9 @@ def read_schema(cls, path: Union[str, Path]) -> Schema:
            os.path.join(str(path), "schema.json") if os.path.isdir(str(path)) else str(path)
        )
        if _schema_path.endswith(".pb") or _schema_path.endswith(".pbtxt"):
-            return Schema().from_proto_text(_schema_path)
+            TensorflowMetadata.from_from_proto_text(_schema_path).to_merlin_schema()


I think this should be from_proto_text_file? Also not sure this works with the way that method is written, since I think the current implementation wants the directory and assumes the file is called schema.pbtxt unless you pass a different value of the filename param. Now wondering if this is covered by tests too...

karlhigley · 2022-02-18T16:18:02Z

merlin_models/tf/block/retrieval.py

-    query_item_schema = schema.select_by_tag(
-        lambda tags: query_id_tag in tags or item_id_tag in tags
-    )
+    query_item_schema = schema.select_by_tag(query_id_tag) + schema.select_by_tag(item_id_tag)


This seems like a reasonable way to implement the same behavior on the other version of the Schema API. I do wonder if we should think about integrating the lambda functionality into core though.

karlhigley · 2022-02-18T16:21:41Z

merlin_models/tf/core.py

+        if isinstance(item, Tags):
+            item = item.value
+        else:
+            item = str(item)


I wonder if this could be a helper on Tags to keep the behavior encapsulated. Something like:

class Tags(Enum): ... @classmethod def value(cls, maybe_tag): if isinstance(maybe_tag, Tags): return item.value else: return str(item)

As pointed out #184 (comment) there was some issues with the SyntheticData.read_schema method. Fix and add a basic unittest that would have caught this

Unified Schema

1e1b317

This modifies merlin-models to be compatabile with the new 'core' schema class that will be shared with NVTabular

benfred marked this pull request as draft February 15, 2022 01:51

marcromeyn reviewed Feb 15, 2022

View reviewed changes

merlin_models/loader/backend.py Outdated Show resolved Hide resolved

marcromeyn reviewed Feb 15, 2022

View reviewed changes

merlin_models/tf/utils/tf_utils.py Outdated Show resolved Hide resolved

benfred added 13 commits February 15, 2022 17:23

Move to merlin.schema package instead of merlin.graph

673156c

Merge branch 'main' into unify_schema

e94bbdc

.

9701fd9

Merge branch 'main' into unify_schema

c532be8

Update requirements to pull in merlin-core

490fc14

.

5cf625a

.

edb9588

.

368a250

.

973f673

Remove loader.dispatch in favour of merlin.core.dispatch

c6dface

Add create_continuous_column

188ec07

remove merlin_standard_lib

1c73a69

Update music_streaming notebook

c15e465

benfred marked this pull request as ready for review February 18, 2022 00:29

benfred requested review from karlhigley and marcromeyn February 18, 2022 00:30

benfred commented Feb 18, 2022

View reviewed changes

tests/tf/test_core.py Outdated Show resolved Hide resolved

Update tests/tf/test_core.py

322baad

marcromeyn approved these changes Feb 18, 2022

View reviewed changes

marcromeyn merged commit c7fad98 into main Feb 18, 2022

marcromeyn deleted the unify_schema branch February 18, 2022 08:27

karlhigley reviewed Feb 18, 2022

View reviewed changes

benfred added a commit that referenced this pull request Feb 18, 2022

Fix SyntheticData.read_schema with proto text files

034c6ba

As pointed out #184 (comment) there was some issues with the SyntheticData.read_schema method. Fix and add a basic unittest that would have caught this

benfred mentioned this pull request Feb 18, 2022

Fix SyntheticData.read_schema with proto text files #191

Merged

viswa-nvidia mentioned this pull request Feb 18, 2022

[RMP] Merlin-core: schemas, standardization, namespace NVIDIA-Merlin/Merlin#103

Closed

13 tasks

marcromeyn pushed a commit that referenced this pull request Feb 22, 2022

Fix SyntheticData.read_schema with proto text files (#191)

4f96693

As pointed out #184 (comment) there was some issues with the SyntheticData.read_schema method. Fix and add a basic unittest that would have caught this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified Schema #184

Unified Schema #184

benfred commented Feb 15, 2022

review-notebook-app bot commented Feb 15, 2022

karlhigley left a comment

karlhigley Feb 18, 2022

karlhigley Feb 18, 2022

karlhigley Feb 18, 2022

Unified Schema #184

Unified Schema #184

Conversation

benfred commented Feb 15, 2022

review-notebook-app bot commented Feb 15, 2022

karlhigley left a comment

Choose a reason for hiding this comment

karlhigley Feb 18, 2022

Choose a reason for hiding this comment

karlhigley Feb 18, 2022

Choose a reason for hiding this comment

karlhigley Feb 18, 2022

Choose a reason for hiding this comment