Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGS-0.2.0] File encodable bug #1973

Closed
Tracked by #1972
blythed opened this issue Apr 14, 2024 · 1 comment · Fixed by #2017
Closed
Tracked by #1972

[BUGS-0.2.0] File encodable bug #1973

blythed opened this issue Apr 14, 2024 · 1 comment · Fixed by #2017
Assignees
Labels
🐛 bug Something isn't working

Comments

@blythed
Copy link
Collaborator

blythed commented Apr 14, 2024

cp -r test test_copy
import uuid
db = superduper('mongomock://test', artifact_store='filesystem://./artifacts')
dt = DataType('my-file', encodable='file')
db.apply(dt)
my_id = str(uuid.uuid4())
db.execute(my_collection.insert_one(Document({'id': my_id, 'x': dt('./test_copy')})))

Trying this code gives:

>>> db.execute(my_collection.find_one({'id': my_id})).unpack(db)
{'id': '2b14133a-f275-461e-b0a2-d6f0eadb8b9b',
 'x': './artifacts/4dc048d4dbf67bed983a1b7a82822347645cc240',
 '_fold': 'train',
 '_id': ObjectId('661b9c229a2e44f23aa16422')}

However the data is missing:

$ ls ./artifacts/4dc048d4dbf67bed983a1b7a82822347645cc240
ls: ./artifacts/fc4a398bf717b6adf2dd5a07376a107c43bb0de0: No such file or directory
@blythed blythed mentioned this issue Apr 14, 2024
24 tasks
@blythed blythed added the 🐛 bug Something isn't working label Apr 14, 2024
@jieguangzhou
Copy link
Collaborator

jieguangzhou commented Apr 15, 2024

When the file type is inserted as data, the saving logic is skipped. I will fix it

    def insert(
        self, insert: Insert, refresh: bool = True, datatypes: t.Sequence[DataType] = ()
    ) -> InsertResult:
        """
        Insert data.

        :param insert: insert query object
        """
        for e in datatypes:
            self.add(e)

        # TODO add this logic to a base Insert class
        artifacts = []
        for r in insert.documents:
            r['_fold'] = 'train'
            if random.random() < s.CFG.fold_probability:
                r['_fold'] = 'valid'
            artifacts.extend(list(r.get_leaves('artifact').values()))

        for a in artifacts:
            if a.x is not None and a.file_id is None:
                a.save(self.artifact_store)

        inserted_ids = insert.execute(self)

@jieguangzhou jieguangzhou self-assigned this Apr 15, 2024
@jieguangzhou jieguangzhou linked a pull request Apr 30, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants