Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-create tables and schemas #2090

Merged

Conversation

jieguangzhou
Copy link
Collaborator

Description

Related Issues

Checklist

  • Is this code covered by new or existing unit tests or integration tests?
  • Did you run make unit_testing and make integration-testing successfully?
  • Do new classes, functions, methods and parameters all have docstrings?
  • Were existing docstrings updated, if necessary?
  • Was external documentation updated, if necessary?

Additional Notes or Comments

@jieguangzhou jieguangzhou force-pushed the feat/auto-create-tables-and-schemas branch 7 times, most recently from 03fceab to f871325 Compare May 22, 2024 12:39
@jieguangzhou jieguangzhou force-pushed the feat/auto-create-tables-and-schemas branch 4 times, most recently from b2402f3 to 5b8c8ed Compare May 22, 2024 15:54
@jieguangzhou jieguangzhou linked an issue May 22, 2024 that may be closed by this pull request
5 tasks
@jieguangzhou jieguangzhou changed the title Feat/auto create tables and schemas Auto-create tables and schemas May 22, 2024
@jieguangzhou jieguangzhou self-assigned this May 22, 2024
@jieguangzhou jieguangzhou force-pushed the feat/auto-create-tables-and-schemas branch 2 times, most recently from cafa679 to 3788c7b Compare May 23, 2024 04:18
@@ -91,3 +91,19 @@ class ComponentException(BaseException):

:param msg: msg for BaseException
"""


class UnsupportedDatatype(BaseException):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

"""A factory for pil image # noqa."""

@staticmethod
def check(data: t.Any) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense(future) to add a function check=my_checking_function to DataType?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach complicates the DataType; additionally, we need to discover all possible datatypes for implementing auto-schema. If we add to the datatype, we must initialize an instance of the datatype to discover it. For some types used in functions, such as arrays or torch, this is not very elegant. Therefore, implementing a factory class method is preferable. Only the data types that have implemented this class will be discovered.

WDYT?

@@ -1076,6 +1087,16 @@ def infer_schema(
"""
return self.databackend.infer_schema(data, identifier)

def set_cfg(self, cfg: Config):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this?

@cfg.setter
def cfg(self, value):
   ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we really want to do this? That would mess things up: self.compute depends on the config used to build the Datalayer. So you set the cfg and it doesn't match the attributes of the db.

Copy link
Collaborator Author

@jieguangzhou jieguangzhou May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an original issue here. If we start a datalayer by invoking superduperdb(**new_config), the value of the config is actually incorrect at this point because it has not been updated using **new_config. Therefore, when we send tasks to the compute layer, we should send the config with which we actually constructed the datalayer, not the one from the configuration file; otherwise, there will be a mismatch between the client side and the compute side.

I believe that the correct approach should be to launch the datalayer using the received CFG in each compute job, rather than merely retrieving it from a configuration file.

WDYT?

@@ -262,6 +263,7 @@ class Config(BaseConfig):
logging_type: LogType = LogType.SYSTEM

bytes_encoding: BytesEncoding = BytesEncoding.BYTES
auto_schema: bool = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@jieguangzhou jieguangzhou force-pushed the feat/auto-create-tables-and-schemas branch from 12f25b1 to 1347f36 Compare May 23, 2024 10:04
@blythed blythed merged commit 6499ab4 into superduper-io:main May 23, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto-create tables and schemas
2 participants