Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the tokeniser more useful for other people #12

Open
GarkGarcia opened this issue Feb 2, 2021 · 6 comments
Open

Make the tokeniser more useful for other people #12

GarkGarcia opened this issue Feb 2, 2021 · 6 comments
Assignees
Labels

Comments

@GarkGarcia
Copy link
Contributor

GarkGarcia commented Feb 2, 2021

While working in #11 it became clear to me that the tokeniser is still very much tied up to internals of Mathics core. For instance, the whole messaging mechanism is completely useless to anyone other us (the developers of Mathics) and it could likely be entirely replaced by simply throwing errors. Also, there are multiple improvements that could be made to make the public interface cleaner and more intuitive.

I propose the following changes:

  • Entirely remove the messaging mechanism from Tokeniser and LineFeed (this will require some refactoring in core)
  • Implement __next__ for Tokeniser by simply calling the next method
  • Add functionality to control whether we want comments to be skipped or not (this is useful for syntax-highlighting-related usecases)
  • Remove the tag parameter of Token(tag, text, pos) and mark the type of the token by using subclasses of Token (i.e. Token("Number", "3", 4) becomes NumberToken("3", 4))
  • Rethink the usage of the incomplete method: I'd like to remove it (since it's more of an implementation detail than anything else), but it's used in core so we'd had to deal with that too. Even if we don't remove it, we should rename it to something descriptive

Ideally, I'd like to take care of this before the release, since this are breaking changes and therefore would require a major version bump if we were to merge them after the first release. However, I understand that the refactoring required will take some time and therefore I'm OK with doing this after the first release.

I also take full responsibility over this. I can do most of this work on my own if the rest of the contributors aren't interested. @rocky @mmatera Thoughts?

@GarkGarcia GarkGarcia added the meta label Feb 2, 2021
@GarkGarcia GarkGarcia self-assigned this Feb 2, 2021
@GarkGarcia
Copy link
Contributor Author

Looking at the code, I can confirm that the messaging mechanism of Tokeniser can replaced by including information about the position of the error in TranslateError.

The messaging mechanism of LineFeeder is much more fundamental though: it is used to store warnings emitted during parsing and scanning. In any case, this could be dealt-with by storing all warnings in the parser itself. In conclusion, it's safe to entirely remove the messaging functionality from this package.

@GarkGarcia
Copy link
Contributor Author

GarkGarcia commented Feb 2, 2021

Let me elaborate on my conclusions and give some context. The messaging functionality of the tokeniser is used to store warnings about errors encountered while parsing (yes, errors from the parser are stored here). Both Tokeniser and LineFeeder have methods to append messages (in the format used internally by Mathics) to a message queue (the latter simply delagates the call to it's feeder field, so the message queue is actually a property of the feeder).

However, the vast majority of the calls to the message-appending methods are immediately followed by a raise clause, which suggests we could simply include the information of the message in the following error. Looking around, messages with the similar structure are followed by raise clauses whose argument (i.e. the error being raised) have the same type, which again, suggests that the including the information of the messages in the error is relatively strait forward.

There's actually a single place (inside of this library) where a call isn't immediately followed by a raise clause, which is in line 113 of prescanner.py:

self.feeder.message("Syntax", "sntufn", longname)

I don't know what's up with that, but since this is the only place where this happens, including the information of the messages in the errors should be safe (we don't really lose any information besides the one concerning this single line of code).

It's also worth noting that the messages queues aren't used internally by the feeder at all. The reason why the message qeueu is a property of the feeder is likely that the feeder is shared between the tokeniser and the prescanner, both of which delegate the message-appending calls to the feeder (so it's convenient to place this functionality in there, even though it makes no conceptual sense).

@rocky
Copy link
Member

rocky commented Feb 2, 2021

Please don't make any changes to master like this until we release what we have.

Thanks.

@mmatera
Copy link
Contributor

mmatera commented Feb 2, 2021

@rocky, are you planning to release this package this week? I think we need at least to ensure that each "syntax error" message raises an exception, in a way that the front ends and the interpreter be able to catch them in real-time.
I think that at least the interface of this library should be more or less table before releasing in order to avoid future problems with the other parts that rely on this one.

@rocky
Copy link
Member

rocky commented Feb 2, 2021

Probably over the weekend if I have time.

I think we need at least to ensure that each "syntax error" message raises an exception, in a way that the front ends and the interpreter be able to catch them in real-time.

That can happen in 1.0.1 or 1.1.0 or 2.0, not 1.0.0. The focus here is just what has already been done. You wanted to fix some bugs based on consistency checking and that's been done. I will add some more checks and if there are further bugs in consistency those will be fixed.

It is going to be enough work to hook in the existing changes work in mathicsscript's inputrc and use in Mathics.

@rocky
Copy link
Member

rocky commented Feb 2, 2021

I should say that there is plenty of things that can be done that don't cause disruption. For example you could work on a program to generate the readthe docs tables for example.

Or go over the readthedocs tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants