Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Haskell code generator #193

Merged
merged 12 commits into from Apr 24, 2020
Merged

added Haskell code generator #193

merged 12 commits into from Apr 24, 2020

Conversation

StrikerRUS
Copy link
Member

Fixed #127.

Finally got a chance to finish this PR! 馃檪

Refactoring of existing codebase (actually just splitting classes with some renaming) was made with the assumption that some programming languages (e.g. pure functional ones) do not have variables.

@coveralls
Copy link

coveralls commented Apr 12, 2020

Coverage Status

Coverage increased (+0.1%) to 95.887% when pulling 8260ebc on haskell into dcd62f4 on master.

@izeigerman
Copy link
Member

Wow, @StrikerRUS, this is awesome! I had never even imagined m2cgen supporting Haskell. I'll do my best to take a look at this PR soon 馃憤 Fantastic job 馃挭

@StrikerRUS
Copy link
Member Author

Thank you! No rush at all!
Thanks to Haskell syntactic sugar! Without it that would be quite painful.

Copy link
Member

@izeigerman izeigerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic job 馃憤 So happy to see one of my favorite languages making its way into m2cgen. Thank you! Few non-major comments, LGTM otherwise.

self._cg.add_block_termination()

return var_name
def interpret_if_expr(self, expr, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can safely drop the definition of the interpret_if_expr from this class to avoid redundancy. Raising NotImplementedError is a default behavior anyway in case when the handler was not found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can! But I thought it will help future contributors indicate what they should implement by reading the code only of parent class. But I'll remove two these lines without any regrets if you feel that they should not be here 馃檪

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion here TBH. It just seemed a bit redundant. Leaving this up to you to decide.

from collections import namedtuple


CacheResult = namedtuple('CacheResult', ['var_name', 'expr_result'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Cache[d]Result sounds just a bit better (as something that was applied to a given "result").

return self._cg.infix_expression(
left=base_result, right=exp_result, op="**")

def _cache_reused_expr(self, expr, expr_result):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the caching mechanism in Haskell Interpreter is very different from the same mechanism in other languages.

In other languages we cache expression to avoid reevaluation of same expressions more than once. As far as I understand this is not the case here. The only benefit of caching that I'm seeing here is a cleaner generated code where reused expressions are encapsulated into functions. Please correct me if I'm wrong here. If that's the case I'd suggest to leave a brief clarification comment here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand perhaps GHC is smart enough to not revaluate a function without arguments. Can't quite remember now (though I doubt it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I'm wrong here.

Unfortunately, I can't say anything on this right now. My knowledge of Haskell is limited by two-week course for complete noobs passed in 2017 馃槂

I believe, the current implementation is in consistency with the following example:

areaTriangleHeron a b c = result           -- use Heron's formula
    where
    result = sqrt (s * (s - a) * (s - b) * (s - c))
    s      = (a + b + c) / 2

https://en.wikibooks.org/wiki/Haskell/Variables_and_functions#Local_definitions

A brief googling gave the following statement:

GHC does not do any automatic memoization of functions, ...
https://stackoverflow.com/a/12390953
https://wiki.haskell.org/Memoization

and the following blog post: https://kseo.github.io/posts/2017-01-14-memoization-in-hasekll.html.

It seems that it's not so trivial to implement the "true cache" mechanism in Haskell. I'm afraid that it is beyond of my current Haskell knowledge and will take much time to learn about it and implement here. But if you have any hints or points for possible implementation, I'll be very grateful and try to improve cache mechanism.

Alternatively, we can live with the current implementation as it's working though not so smart and efficient and create new feature request for improving cache mechanism in Haskell and get back to it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand perhaps GHC is smart enough to not revaluate a function without arguments.

Hmm, I'm afraid to be mistaken, but seems that you are right!

There is no such thing as a function without parameters (anonymous or otherwise). That's just a value (and in Haskell, actions like main :: IO () are also just values).
https://stackoverflow.com/a/33211542

m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
https://stackoverflow.com/a/3951131

CAF is an ...

Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4).
https://wiki.haskell.org/Constant_applicative_form

Seems that we got "true cache" mechanism with our "zero-argument functions" for free.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you so much for digging and confirming 馃憤 Nothing has to be done in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one more way to ensure that Haskell evaluates functions with zero arguments only once: https://stackoverflow.com/a/7721565.

I added trace "called" to func0 to all our snippets from tests. Then compiled them with -O0 and happily saw called only once in all cases.

For example,

module Main where
import Debug.Trace

score :: [Double] -> Double
score input =
    (func0) / (func0)
    where
        func0 =
            trace "called"
            exp (1.0)

main = print (score ([1,2,3,4,5]))
1.0
called

cf.

module Main where
import Debug.Trace

score :: [Double] -> Double
score input =
    (func0 1) / (func0 1)
    where
        func0 x =
            trace "called"
            exp (1.0)

main = print (score ([1,2,3,4,5]))
1.0
called
called

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for going such lengths 馃憤 Now I'll never have this same question again.

def assert_code_equal(actual, expected):
assert actual.strip() == expected.strip()
def assert_code_equal(actual, *expected):
assert any(actual.strip() == expect.strip() for expect in expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to keep the existing function as is and introduce a new one:

def assert_code_equal_any_of(...)

This way the semantics of the check will be clear vs now the logic is obscured unless you read the function's body.

@StrikerRUS
Copy link
Member Author

I think I've addressed all review comments in the latest commits. Please take another look.

@StrikerRUS
Copy link
Member Author

StrikerRUS commented Apr 22, 2020

Hmm, one PHP(!) test failed. I thought that we have pinned all random seeds and all results are the same across CI builds...

=================================== FAILURES ===================================
__ test_e2e[xgboost_XGBClassifier - php - train_model_classification_binary2] __
estimator = XGBClassifier(base_score=0.6, booster='gblinear', colsample_bylevel=None,
              colsample_bynode=None, colsamp...ambda=0, scale_pos_weight=1, subsample=None,
              tree_method=None, validate_parameters=False, verbosity=None)
executor_cls = <class 'tests.e2e.executors.php.PhpExecutor'>

...

expected=[0.04640925 0.95359075], actual=[0.046409638845692, 0.95359036115431]
expected=[0.06362659 0.9363734 ], actual=[0.063627042374693, 0.93637295762531]
expected=[0.12829041 0.8717096 ], actual=[0.12829120274766, 0.87170879725234]
expected=[0.07283068 0.9271693 ], actual=[0.072831574736397, 0.9271684252636]
expected=[0.8102547  0.18974532], actual=[0.81025763223714, 0.18974236776286]

Rerunning that failed CI job for now. But let's give more attention to this test in the further runs.

Copy link
Member

@izeigerman izeigerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic job 馃殌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

add support for declarative languages
3 participants