Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LORA] fix model saving and bring into line with other developments in the project #1789

Closed
Tracked by #1634
blythed opened this issue Feb 16, 2024 · 9 comments
Closed
Tracked by #1634
Assignees

Comments

@blythed
Copy link
Collaborator

blythed commented Feb 16, 2024

Currently model saving is not happening in the same or a similar way to other models. We should use the artifact store.
If this doesn't work for some reason we should evaluate a better method.

@jieguangzhou
Copy link
Collaborator

jieguangzhou commented Feb 18, 2024

Now we can set log_to_db=True, then the checkpoint will store to artifact store and update the adapter_id to metadata store. We can load the model after completing the training

@jieguangzhou jieguangzhou self-assigned this Feb 18, 2024
@jieguangzhou
Copy link
Collaborator

We need to implement the S3 artifact store to support saving large files remotely later and support saving folders directly to artifact store.
Now we use zip to save the folder to artifact store, but it is not the best way to save the large model file (maybe >1G).

@blythed
Copy link
Collaborator Author

blythed commented Feb 19, 2024

Idea to add "saving a directory" to Document/ artifact_store.

@blythed
Copy link
Collaborator Author

blythed commented Feb 19, 2024

@jieguangzhou to add proposal for saving list of artifacts in model.

@jieguangzhou
Copy link
Collaborator

jieguangzhou commented Feb 21, 2024

Original Artifact workflow

Input: x, x is defined as an artifact in _artifacts

  1. object.dict().encode()

    1. convert artifact to Encodable → Encodable(x)
    2. Encodable(x).encode(){'_content': xxx}
  2. Save to artifact store

    1. if r['_content']['leaf_type']==encodable, save r['_content']

      save bytes to artifact and delete bytes

  3. create or update message to metadata_store

Loading

  1. load info from metadata_store

  2. check _content in info and decode them and load artifact using _content

    load bytes and use datatype to decode it

    rename the key _content.bytes to _content.x

New Artifact workflow with saving a directory

New DataType instance: file

x is path or directory

encode(x): check x exist and return x

decode(x): check x exiet and return x

Saving

Input: x, x is defined as an artifact in _artifacts

  1. object.dict().encode()

    1. convert artifact to Encodable → Encodable(x)
    2. Encodable(x).encode(){'_content': xxx}
  2. Save to artifact store

    1. if r['_content']['leaf_type']== encodable , save r['_content']

      • if datatype is file datatype:
        _save_path (new function of arfifact_store , copy local file form local file system to artifact _store)
      • else:
        _save_bytes
  3. create or update message to metadata_store

Loading

  1. load info from metadata_store

  2. check _content in info and decode them and load artifact using _content

    • if datatype is file datatype :

    _load_path (new function of artifact_store, copy file from artifact_store to local file system) return new path

    • else:

    load_bytes

    use datatype to decode the output

Other

rename the key _content.bytes to _content.x

@blythed
Copy link
Collaborator Author

blythed commented Feb 21, 2024

Can we go one level higher. Am I correct in thinking you want to build a local cache/ copy of some content in the artifact store? My question is, is this necessary? Especially, for instance, if we are using FileSystemArtifactStore?

What problem are we solving here?

@jieguangzhou
Copy link
Collaborator

jieguangzhou commented Feb 21, 2024

Can we go one level higher. Am I correct in thinking you want to build a local cache/ copy of some content in the artifact store? My question is, is this necessary? Especially, for instance, if we are using FileSystemArtifactStore?

If we use FileSystemArtifactStore, we can use symlink or copy a new directory.

But if we only use symlink, the directory will not actually be saved in artifact_store.

For example: If I want to training a model on server1, and deploy the service on server2.

We need copy the whole directory of FileSystemArtifactStore to server2, But it doesn’t work when using symlink

I think all the artifacts need to save into ArtifactStore.

What problem are we solving here?

This solves the problem that artifact_store could only save bytes data before, but now it can support bytes and files/directory. Not all models and data should be saved in bytes format.

@blythed
Copy link
Collaborator Author

blythed commented Feb 21, 2024

@jieguangzhou I agree with the general proposal.

What will the schema inside _content look like for file/ directory types?

Also, how will we synchronize directories to the artifact store? With, for instance, aws s3 that will easy. But with MongoDB, there's no native support for directories, so you would need to create an additional field on the gridfs files.

@jieguangzhou
Copy link
Collaborator

@jieguangzhou I agree with the general proposal.

What will the schema inside _content look like for file/ directory types?

the _content is same as before, just save the path to bytes

Also, how will we synchronize directories to the artifact store? With, for instance, aws s3 that will easy. But with MongoDB, there's no native support for directories, so you would need to create an additional field on the gridfs files.

I posted a quick implementation for this, please help to take a look. #1805 @blythed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants