Make the tokeniser more useful for other people #12

GarkGarcia · 2021-02-02T13:48:04Z

While working in #11 it became clear to me that the tokeniser is still very much tied up to internals of Mathics core. For instance, the whole messaging mechanism is completely useless to anyone other us (the developers of Mathics) and it could likely be entirely replaced by simply throwing errors. Also, there are multiple improvements that could be made to make the public interface cleaner and more intuitive.

I propose the following changes:

Entirely remove the messaging mechanism from Tokeniser and LineFeed (this will require some refactoring in core)
Implement __next__ for Tokeniser by simply calling the next method
Add functionality to control whether we want comments to be skipped or not (this is useful for syntax-highlighting-related usecases)
Remove the tag parameter of Token(tag, text, pos) and mark the type of the token by using subclasses of Token (i.e. Token("Number", "3", 4) becomes NumberToken("3", 4))
Rethink the usage of the incomplete method: I'd like to remove it (since it's more of an implementation detail than anything else), but it's used in core so we'd had to deal with that too. Even if we don't remove it, we should rename it to something descriptive

Ideally, I'd like to take care of this before the release, since this are breaking changes and therefore would require a major version bump if we were to merge them after the first release. However, I understand that the refactoring required will take some time and therefore I'm OK with doing this after the first release.

I also take full responsibility over this. I can do most of this work on my own if the rest of the contributors aren't interested. @rocky @mmatera Thoughts?

The text was updated successfully, but these errors were encountered:

GarkGarcia · 2021-02-02T15:51:00Z

Looking at the code, I can confirm that the messaging mechanism of Tokeniser can replaced by including information about the position of the error in TranslateError.

The messaging mechanism of LineFeeder is much more fundamental though: it is used to store warnings emitted during parsing and scanning. In any case, this could be dealt-with by storing all warnings in the parser itself. In conclusion, it's safe to entirely remove the messaging functionality from this package.

GarkGarcia · 2021-02-02T16:32:50Z

Let me elaborate on my conclusions and give some context. The messaging functionality of the tokeniser is used to store warnings about errors encountered while parsing (yes, errors from the parser are stored here). Both Tokeniser and LineFeeder have methods to append messages (in the format used internally by Mathics) to a message queue (the latter simply delagates the call to it's feeder field, so the message queue is actually a property of the feeder).

However, the vast majority of the calls to the message-appending methods are immediately followed by a raise clause, which suggests we could simply include the information of the message in the following error. Looking around, messages with the similar structure are followed by raise clauses whose argument (i.e. the error being raised) have the same type, which again, suggests that the including the information of the messages in the error is relatively strait forward.

There's actually a single place (inside of this library) where a call isn't immediately followed by a raise clause, which is in line 113 of prescanner.py:

mathics-scanner/mathics_scanner/prescanner.py

Line 113 in 295fe97

self.feeder.message("Syntax", "sntufn", longname)

I don't know what's up with that, but since this is the only place where this happens, including the information of the messages in the errors should be safe (we don't really lose any information besides the one concerning this single line of code).

It's also worth noting that the messages queues aren't used internally by the feeder at all. The reason why the message qeueu is a property of the feeder is likely that the feeder is shared between the tokeniser and the prescanner, both of which delegate the message-appending calls to the feeder (so it's convenient to place this functionality in there, even though it makes no conceptual sense).

rocky · 2021-02-02T19:16:40Z

Please don't make any changes to master like this until we release what we have.

Thanks.

mmatera · 2021-02-02T19:27:32Z

@rocky, are you planning to release this package this week? I think we need at least to ensure that each "syntax error" message raises an exception, in a way that the front ends and the interpreter be able to catch them in real-time.
I think that at least the interface of this library should be more or less table before releasing in order to avoid future problems with the other parts that rely on this one.

rocky · 2021-02-02T19:46:35Z

Probably over the weekend if I have time.

I think we need at least to ensure that each "syntax error" message raises an exception, in a way that the front ends and the interpreter be able to catch them in real-time.

That can happen in 1.0.1 or 1.1.0 or 2.0, not 1.0.0. The focus here is just what has already been done. You wanted to fix some bugs based on consistency checking and that's been done. I will add some more checks and if there are further bugs in consistency those will be fixed.

It is going to be enough work to hook in the existing changes work in mathicsscript's inputrc and use in Mathics.

rocky · 2021-02-02T19:56:24Z

I should say that there is plenty of things that can be done that don't cause disruption. For example you could work on a program to generate the readthe docs tables for example.

Or go over the readthedocs tutorial.

GarkGarcia added the meta label Feb 2, 2021

GarkGarcia self-assigned this Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the tokeniser more useful for other people #12

Make the tokeniser more useful for other people #12

GarkGarcia commented Feb 2, 2021 •

edited

Loading

GarkGarcia commented Feb 2, 2021

GarkGarcia commented Feb 2, 2021 •

edited

Loading

rocky commented Feb 2, 2021

mmatera commented Feb 2, 2021 •

edited

Loading

rocky commented Feb 2, 2021 •

edited

Loading

rocky commented Feb 2, 2021 •

edited

Loading

Make the tokeniser more useful for other people #12

Make the tokeniser more useful for other people #12

Comments

GarkGarcia commented Feb 2, 2021 • edited Loading

GarkGarcia commented Feb 2, 2021

GarkGarcia commented Feb 2, 2021 • edited Loading

rocky commented Feb 2, 2021

mmatera commented Feb 2, 2021 • edited Loading

rocky commented Feb 2, 2021 • edited Loading

rocky commented Feb 2, 2021 • edited Loading

GarkGarcia commented Feb 2, 2021 •

edited

Loading

GarkGarcia commented Feb 2, 2021 •

edited

Loading

mmatera commented Feb 2, 2021 •

edited

Loading

rocky commented Feb 2, 2021 •

edited

Loading

rocky commented Feb 2, 2021 •

edited

Loading