New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added Visual Basic code generator #109
Conversation
|
||
def __init__(self, indent=4, *args, **kwargs): | ||
cg = VbaCodeGenerator(indent=indent) | ||
kwargs["feature_array_name"] = "input_vector" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to strongly recommend changing default value to something else, because input
is a reserved word in many languages. For instance, here, for VBA this arg name results in invalid syntax.
I'm stuck with two things. 1Is there any functionality for breaking long code lines in
2Where this function is used? Is it used only once for returning vectors from the m2cgen/m2cgen/interpreters/interpreter.py Lines 150 to 153 in e33fe2d
The thing is that there is no actually initialization syntax for typed vectors in VBA. And raising NotImplementedError results in NotImplemented error (quite obvious, yeah, but I hoped 😄 ). The closest construction will be
https://github.com/BayesWitnesses/m2cgen/pull/109/files#diff-2df27f221106e3ee330fc9622e90b525R65-R75 However, this causes invalid syntax for return statement in classification.
Looking forward for any help with the issues above. And many thanks in advance! |
For the first issue you can try to play with For the second issue, what exactly causes syntax error? Would it be solved by the following approach (I am not familiar with VBA syntax, so think about it as a pseudo code):
If that's the case, then here you can find an example of using a temporary variable. This example might not be sufficient as it only creates a single variable whereas you might need to create multiple. Again, this is the closest thing we have right now. |
The problem is in
If we have no init mechanism for lists, then it should be:
But now I get the following construction:
|
I tried to play around with Also, I tried to combine existing depth threshold and inserting code line breakage symbol around every binary operator. Unfortunately, it leads to I'm very frustrated. I see only one solution for now: to track code line length somehow and insert the breakage symbol around the nearest binary operator when hit a threshold. |
I think that without any help and answers to my previous questions this PR in its' initial form is stuck for a very long time. So, I decided to change direction and make Visual Basic generator and provide a guide about how to manually convert it to VBA. Also, I'm not sure that it's possible in theory to auto-generate VBA code with all its' limitations. In addition I had no idea how to write unit tests for VBA code. Now I guess this PR is ready for review. |
@StrikerRUS I'm sorry about the lack of traction on this topic. It just seems like the impact of having VBA in our catalog doesn't seem to match the complexity associated with adding it. Thank you for adding the VB support though! Please give us some time to review it. |
Happy to help! 🎉 Sorry for the mess in the latest commits: I'm not familiar enough with CLI interface yet. 🙁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, a few questions in comments. Thank you!
with some small manual changes, see a note below) | ||
code representation of the given model. | ||
|
||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this note should belong to the FAQ section of the readme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it should be in both places: here and there?.. I think some users do not tend to read any sections of READMEs except API reference, they finish reading on docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems unlikely that anyone will look up a guide on how to generate code for VBA in API docs to VB generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe! But I thought this note here can be useful in following scenarios:
- help to discover that it possible to generate VBA for those who use VB;
- some users might come here after failed search for
export_to_vba
as VBA is a dialect of VB; - in case someone in the future modify/update this code, the note helps to indicate that code should be compatible with both VB and VBA.
def __init__(self, module_name="Model", indent=4, *args, **kwargs): | ||
self.module_name = module_name | ||
cg = VisualBasicCodeGenerator(indent=indent) | ||
kwargs["feature_array_name"] = "input_vector" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering is there any reason why we can't use the existing default name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this #109 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! Some languages can successfully distinguish when input
is a built-in function and when it isn't, e.g. Python, but VBA cannot do that 🙁 .
But I think that the best practice is to not name variables with reserved keywords (print
, input
, file
, etc.) in all languages.
@@ -0,0 +1,26 @@ | |||
Function addVectors(ByRef v1() As Double, ByRef v2() As Double) As Double() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no knowledge about VB whatsoever so forgive me my ignorance: why do static files have extension bas
but generated ones - vb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bas
is a file extension of pure BASIC language. I used it here to show that these files are both VBA and VB compatible. vb
is a file extension for modern VB language, for which examples are generated.
tests/e2e/executors/base.py
Outdated
def prepare_global(cls): | ||
if cls._global_resource_tmp_dir is None: | ||
with utils.tmp_dir() as tmp_dirpath: | ||
cls._global_resource_tmp_dir = tmp_dirpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory will be cleaned up after you leave the scope of with
which is not exactly what you want, do you? You probably want to clean it up once executer has done its job. Otherwise I don't see any logic associated with purging this directory anywhere else.
I also don't see much value having the prepare_global
logic in the base class. Why can't this logic be a part of prepare
of the VisualBasicExecutor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to clean it up once executer has done its job.
Why can't this logic be a part of prepare of the VisualBasicExecutor?
I need to create this directory only once for the whole test session. It allows to decrease testing time for ~5 minutes. prepare
is called for every executor which is not the thing I need. Please share more appropriate solution to achieve this goal.
Let me show what I need by example:
diff --git a/.travis.yml b/.travis.yml
index e5e2203..51e50e3 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -30,4 +30,4 @@ script:
# Then we install it as a library, remove source code and run e2e tests.
- python setup.py install
- rm -rfd m2cgen/
- - pytest -v tests/e2e/
+ - pytest -v --capture=no tests/e2e/
diff --git a/tests/e2e/executors/visual_basic.py b/tests/e2e/executors/visual_basic.py
index dbcaacb..f055488 100644
--- a/tests/e2e/executors/visual_basic.py
+++ b/tests/e2e/executors/visual_basic.py
@@ -51,6 +51,7 @@ class VisualBasicExecutor(base.BaseExecutor):
self.model_ast = assembler_cls(model).assemble()
def predict(self, X):
+ print("!!!{}!!!".format(self.target_exec_dir))
exec_args = [os.path.join(self.target_exec_dir, self.project_name)]
exec_args.extend(map(str, X))
return utils.predict_from_commandline(exec_args)
Please notice that the directory is the same during the whole session. That is what I need to speed up tests.
Link to huge part of a huge log, because GitHub doesn't allow to paste it here in comment: https://pastebin.com/mZ65Unti.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you perhaps use a module-scoped pytest fixture for this?
@pytest.fixture(scope="module", autouse=True)
def my_fixture():
print ('Creating a VB executor dir')
# Create dir
yield vb_dir
# Remove dir
print ('TEAR DOWN')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my first attempt to achieve the needed behavior. Unfortunately, I'm not familiar with pytest
enough and didn't get it how to use a created dir inside executor code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should put the fixture into conftest.py
or into the test_e2e.py
itself. Then pass it as an argument into the test_e2e test like:
def test_e2e(estimator, executor_cls, model_trainer, is_fast, my_fixture):
my_fixture
here would be a path to a directory that you yielded in the fixture's body. From there you might want to pass the value into the executor's constructor. This way you'd have to make other constructors accept **kwargs
.
Were there any particular difficulties that you encountered? We can discuss them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guide! I did it. However, actually I need to set up VB project in that temporary dir. Is it acceptable to do executor-dependent stuff inside global fixture? If you are OK with it, I'll replace class method with such fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StrikerRUS In that fixture I only suggest to setup and cleanup the temporary directory that can be passed into the executor itself where VB-specific preparations can be made. I'd rather avoid having VB specific stuff in that fixture though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it! Thanks again for the explanation! Will make changes in a few hours.
Thank you 👍 |
The motivation behind this PR is allowing users with poor programming skills access to strong ML models inside Office applications (mainly in Excel).
Also, if I'm not mistaken, VBA projects can be used in SOLIDWORKS.
After merging this PR users will be able to use ML models inside Excel in the following way.
Usage Example
As usual, generate a model via supported ML algorithm:
After that output VBA code representation of the model via the
m2cgen
Python package:Create empty Visual Basic file
example_module.bas
and paste the copied output there.Now open Excel, enable Developer tab and click
Developer -> Visual Basic (Alt + F11)
. In VBA editor clickFile -> Import File
and choose previously createdexample_module.bas
file.After doing that, one more required action is writing a proxy function that will convert Excel
Range
object toArray
and call the model. For instance, such function for regression, for row-based features placed inside Excel can be:Now this proxy function can be used on Excel sheet as any built-in Excel functions:
Let's compare Excel predictions with ones from the native Python model:
Seems that everything is fine!