Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/custom model repository #849

Merged

Conversation

jgallardorama
Copy link
Contributor

Custom model repository:

  • By parameter in the ML Server constructor
  • By settings with the implementation and parameters settings

Both ways to extend the behaviors allow the custom model repository to handle the persistence model settings externally to mlserver.

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jgallardorama ,

Thanks a lot for taking the lead on this one! 🚀

This is an awesome job! I've added a couple suggestions and questions below, but the main idea looks great! Would be good to hear your thoughts on those comments. 👍

mlserver/repository/repository.py Outdated Show resolved Hide resolved
mlserver/cli/serve.py Outdated Show resolved Hide resolved
mlserver/repository/repository.py Show resolved Hide resolved
mlserver/server.py Outdated Show resolved Hide resolved
@@ -87,6 +87,10 @@ class Config:
model_repository_root: str = "."
"""Root of the model repository, where we will search for models."""

model_repository_implementation: Optional[PyObject]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it non-Optional and set it by default to mlserver.repository.DefaultRepository (or however we decide to call it).

BTW to make it less verbose we could also call the field just model_repository and model_repository_args? This is a bit nitpicky so feel free to ignore it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the idea, but I don't know exactly how you want to code it. I will try it and then you correct me if I'm wrong.

No problem, I'll rename the settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept model_repository_implementation and model_repository_implementation_args. I thought It could be good to have different identifiers for different things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PyObject can point to any Python object, so I think that it should be possible to set something like this in the Settings object:

from mlserver.repository import SchemalessModelRepository

model_repository_implementation: PyObject = SchemalessModelRepository

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I have a new problem, about default implementation and signing constructors, I explained below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi
I don't think I can do this assigment, because SchemalessModelRepository uses the module settings, so I cannot set model_repository_implementation with the Value SchemalessModelRepository, because I got a circular reference. I can move this default assignment to ModelRepositoryFactory implements creation/construction of model_repository object.
Any alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I upload the option Optional[PyObject], I hope it's a showstopper for the PR, but tell me if you find a better alternative

tests/custom_md_repo/test_custom_md_repo.py Outdated Show resolved Hide resolved
tests/custom_md_repo/testdata/model-settings.json Outdated Show resolved Hide resolved
tests/custom_md_repo/testdata/settings-custom-md-repo.json Outdated Show resolved Hide resolved
mlserver/server.py Outdated Show resolved Hide resolved
mlserver/server.py Outdated Show resolved Hide resolved
@adriangonz
Copy link
Contributor

Fixes #821

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jgallardorama,

Thanks a lot for addressing those initial comments! Besides those, I've added a couple new minor ones. We're getting close though - almost there!

Comment on lines 86 to 88
result = settings.model_repository_implementation(
**settings.model_repository_implementation_args
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To respect the current interface (and the current model_repository_root setting), would it make sense to always pass the root folder as a first argument, then followed by model_repository_implementation_args?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was thinking about how to avoid a compatibility issue with the signature of the default and other modelRepository classes. I mean, now SchemalessModelRepository expects model_repository_root from settings, and on the other hand, model_repository_implementation expects parameters from settings. So how do conciliate both interfaces? I think I shouldn't make a breaking change with the previous implementation. I can write the code but please could you give the expected behavior with the default implementation (previous implementation) and the new one?
Model_repository_root is not optional, so, I need to handle when this parameter is passed to ModelRepository. How do you see it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jgallardorama ,

Apologies for the delay getting back to you.

My impression is that model_repository_root should always be passed to the repository implementation. This keeps the previous behaviour consistent with the new one. So we could make this parameter mandatory on every subclass from ModelRepository.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adriangonz ,

My apologies for my delay too, day-to-day!

I see both sides of the solution, if you prefer to keep the parameter, no big deal, I can keep it. IMO it can be confusing to have a parameter that the implementation doesn't always use. I will update code with your choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mlserver/settings.py Show resolved Hide resolved
Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jgallardorama ,

Just had a look at the changes, and this should be good to go once the tests and linter pass! Let me know if you need any help with those 👍

Thanks a lot for your effort on this one! This is one we had planned for a while, so we really really appreciate you taking the lead to contribute the changes. 🚀

@adriangonz
Copy link
Contributor

BTW on the linter errors, it seems to be mainly complaining about the following errors:

flake8 .
./mlserver/repository/repository.py:6:1: F401 'typing.Optional' imported but unused
./tests/repository/conftest.py:14:1: F401 '..utils.get_available_ports' imported but unused
./tests/repository/conftest.py:14:1: E402 module level import not at top of file
./tests/repository/dummymdrepo.py:1:1: F401 'os' imported but unused
./tests/repository/test_custom_md_repo.py:1:1: F401 'asyncio' imported but unused
./tests/repository/test_custom_md_repo.py:2:1: F401 'os' imported but unused
./tests/repository/test_custom_md_repo.py:3:1: F401 'time.sleep' imported but unused

You can run them locally by calling make lint, which may speed up your feedback loop (i.e. as opposed to waiting for the CI to finish each time).

@jgallardorama-itx
Copy link
Contributor

Hi @adriangonz

I've just fixed linting issues, it should pass linting checks, please, let me know if anything else needs to be done

@adriangonz
Copy link
Contributor

Hey @jgallardorama ,

It seems there's still one linter check failing:

mypy ./mlserver
mlserver/repository/factory.py:12: error: "object" not callable  [operator]

Once that's sorted, this one should be good to go 👍

@jgallardorama-itx
Copy link
Contributor

Hi @adriangonz

My bad, I only executed Flake for the previous comment. This time I have run make lint and everything looks good, but let me know if anything is missing.

@adriangonz adriangonz merged commit bce11cf into SeldonIO:master Dec 13, 2022
@adriangonz
Copy link
Contributor

All looking good now @jgallardorama-itx !

Massive thanks for all the work that's gone into this one 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants