Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: Make function_name parametrized #166

Merged

Conversation

mrshu
Copy link
Contributor

@mrshu mrshu commented Feb 14, 2020

Hello everyone,

First of all, thanks a ton for putting this tool/library together -- especially in resource-stranded environments, it does have a potential to literally save lives!

One small problem I was fighting with while using it was the score function it uses in the generated modules. When they are used as drop-in replacements for trained models, using score is a bit strange, as the API generally provides function like predict or predict_proba. It would therefore be of great help to me if this name could be dynamically changed and I would not have to do so manually.

Please do let me know if something like this sounds like a sensible addition. I'd be happy to update the code so that it reflect your vision, so please feel free to just let me know whenever that may be the case.

Thanks!


  • Currently m2cgen generates a module in various languages that has a
    "score"/"Score" function/method. This is not always desirable, as many
    of the trained modules that are to be exported may provide their
    prediction via API functions with different names (such as predict).

  • This commit adds a way of specifying the name of the function both via
    the CLI and in the exporters (that is, in the export_to_ funcitons)
    by specifying the function_name option/parameter while keeping the
    default set to "score"/"Score" for backwards compatilibity.

Signed-off-by: mr.Shu mr@shu.io

* Currently `m2cgen` generates a module in various languages that has a
  "score"/"Score" function/method. This is not always desirable, as many
  of the trained modules that are to be exported may provide their
  prediction via API functions with different names (such as `predict`).

* This commit adds a way of specifying the name of the function both via
  the CLI and in the exporters (that is, in the `export_to_` funcitons)
  by specifying the `function_name` option/parameter while keeping the
  default set to "score"/"Score" for backwards compatilibity.

Signed-off-by: mr.Shu <mr@shu.io>
* Add `function_name` to mock CLI args to be used in the tests.

Signed-off-by: mr.Shu <mr@shu.io>
@coveralls
Copy link

coveralls commented Feb 14, 2020

Coverage Status

Coverage increased (+0.04%) to 95.878% when pulling 6c2a081 on mrshu:mrshu/parametrize-function_name into d28cb00 on BayesWitnesses:master.

@izeigerman
Copy link
Member

@mrshu Thanks a lot for submitting this PR together with such a detailed description!
I think this is a valuable updated and as far as I remember @StrikerRUS mentioned the importance of it some time ago too.
I'll take at it soon.

@mrshu
Copy link
Contributor Author

mrshu commented Feb 14, 2020

Thank you for your kind words @izeigerman!

Right off the bat I'd like to tell you that the way the --function_name CLI option is handed probably introduces a bug -- it currently always defaults to "score" which was not the case for C# and Powershell before. To ensure that it works as before we'd have to deal with some conditional defaults (as in, after --language is set and --function_name is set, ensure Score is the default in case --language was C# or Powershell) and I was not exactly sure where would something like that make the most sense.

I am therefore looking forward to your review and any further discussion!

@mrshu
Copy link
Contributor Author

mrshu commented Feb 14, 2020

Also, I am not completely sold on calling it function_name -- sklearn-porter for instance uses --method_name which may make a bit more sense in this case as well. Please do let me know what you think and I'll be happy to switch it over.

Thanks!

@StrikerRUS
Copy link
Member

@izeigerman

I think this is a valuable updated and as far as I remember @StrikerRUS mentioned the importance of it some time ago too.

To be honest, I don't remember I was saying this 😃
I remember I spoke about argument name: #109 (comment).

@mrshu

To ensure that it works as before we'd have to deal with some conditional defaults

I'm not sure that it is worthy of further consideration. As we are still in alpha phase, we are free to change a lot without any deprecation warnings and worrying about the backward compatibility. Especially for just a naming conventions.

"Development Status :: 3 - Alpha",

Please add a test for your PR. You may use the following tests as examples:

m2cgen/tests/test_cli.py

Lines 89 to 126 in d28cb00

def test_class_name():
infile = _get_pickled_trained_model()
mock_args = _get_mock_args(
infile=infile, language="java", class_name="TestClassName")
generated_code = cli.generate_code(mock_args).strip()
assert generated_code.startswith("public class TestClassName")
def test_package_name():
infile = _get_pickled_trained_model()
mock_args = _get_mock_args(
infile=infile, language="java", package_name="foo.bar.baz")
generated_code = cli.generate_code(mock_args).strip()
assert generated_code.startswith("package foo.bar.baz;")
def test_module_name():
infile = _get_pickled_trained_model()
mock_args = _get_mock_args(
infile=infile, language="visual_basic", module_name="TestModule")
generated_code = cli.generate_code(mock_args).strip()
assert generated_code.startswith("Module TestModule")
def test_namespace():
infile = _get_pickled_trained_model()
mock_args = _get_mock_args(
infile=infile, language="c_sharp", namespace="Tests.ML")
generated_code = cli.generate_code(mock_args).strip()
assert generated_code.find("namespace Tests.ML {") != -1

* Add a test for the `--function_name` CLI option.

Signed-off-by: mr.Shu <mr@shu.io>
@mrshu
Copy link
Contributor Author

mrshu commented Feb 15, 2020

Thanks a lot for your comments @StrikerRUS!

I did add a test for the CLI part of the PR -- if you think it would be worth it, I am happy to add some for the "API" part as well.

Copy link
Member

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshu Thanks a lot for your PR!
Please check my comments below.

m2cgen/cli.py Outdated Show resolved Hide resolved
m2cgen/exporters.py Outdated Show resolved Hide resolved
return _export(model, interpreter)


def export_to_c_sharp(model, namespace="ML", class_name="Model", indent=4):
def export_to_c_sharp(model, namespace="ML", class_name="Model", indent=4,
function_name="Score"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< Here and in all other places >
I believe it can be score for the consistency. Refer to #166 (comment)

I'm not sure that it is worthy of further consideration. As we are still in alpha phase, we are free to change a lot without any deprecation warnings and worrying about the backward compatibility. Especially for just a naming conventions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @StrikerRUS shouldn't we at least comply with basic code style principles of a target language? Like in this case method names are capitalized in C#. I believe this basic logic is worth keeping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not have a strong opinion here, but since C# seems to be using PascalCasing by definition, I would vote for being consistent with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't have a strong opinion on keeping code style for default values too. I just proposed the way which is the easiest to maintain 🙂 .
And then we'll need to apply conditional default values for the CLI part: #166 (comment).

Copy link
Contributor Author

@mrshu mrshu Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StrikerRUS All right, I went ahead and implemented the "default values for the CLI" part such that they directly use the default value of the function_name keyword argument.

It relies on inspecting the export function (the inspect module introduced in Python 3.5 so that should not be an issue) and might therefore feel a bit hacky, but it allows us to specify the default only once (namely in the keyword argument) and be done with it -- it should automatically down-propagate to the CLI options as well.

I also added a test which should give us a greater confidence that this works.

If you do not like this approach, I am happy to revert that particular commit and just leave it as it was.

Thanks again!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshu Thanks a lot for implementing conditional default values! I like your approach very much! Just please add a comment about that default value is conditional and will be set later based on the export function signature.

parser.add_argument(
    "--function_name", "-fn", dest="function_name", type=str,
    default=None,  # some comment here 
    help="Name of the function in the generated code.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @StrikerRUS , I just did that.

Please feel free to point out a simpler version of the comment -- I may have overcomplicated it quite a bit.

m2cgen/interpreters/c/interpreter.py Show resolved Hide resolved
m2cgen/interpreters/php/interpreter.py Show resolved Hide resolved
@izeigerman
Copy link
Member

izeigerman commented Feb 17, 2020

@StrikerRUS, thanks for reviewing this PR!

To be honest, I don't remember I was saying this 😃

Oops, sorry. I confused 2 things 🤦‍♂

* Make sure function_name is actually used in the interpreters.

Signed-off-by: mr.Shu <mr@shu.io>
@izeigerman izeigerman mentioned this pull request Feb 17, 2020
* Make sure that the `function_name` docstring ends with a dot rather
  than the default value.

* Add dot to various CLI argument's help messages.

Signed-off-by: mr.Shu <mr@shu.io>
@mrshu
Copy link
Contributor Author

mrshu commented Feb 17, 2020

@StrikerRUS Thanks for your feedback!

I believe I've addressed most of the things you've pointed out except for the C# one where I'd like to wait for your opinion.

If you still see something worth updating here, feel free to let me know -- I'll gladly update it.

Thanks!

@StrikerRUS StrikerRUS self-requested a review February 19, 2020 04:17
* Make sure the default function_name (when not provided via arguments)
  gets backpropagated from the actual function definitions (in other
  words, make sure that the keyword arguments are respected when the
  function_name is not specified via CLI).

Signed-off-by: mr.Shu <mr@shu.io>
* Add a comment on function_name being conditionally set later.

Signed-off-by: mr.Shu <mr@shu.io>
Copy link
Member

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshu Thank you very much for your valuable contribution! LGTM!

Copy link
Member

@izeigerman izeigerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, thank you 👍

@izeigerman izeigerman merged commit a4ef90f into BayesWitnesses:master Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants