-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the parser #52
Comments
When thinking about the new parser, I've decided not to reinvent the wheel for the second time. I'm no expert in parsers, and I would end up creating a second buggy one. So the decision was to reuse something already available. I got very excited when I found https://github.com/GillesArcas/PythonSed, an implementation of GNU sed in Python. I did not need a full sed, but I could use just its parser to fulfill sedsed needs. I forked the project, and for some weeks I've worked on adapting it, because I needed to preserve the original sed script code (comments, blank lines, |
My second attempt was looking into the original C code for GNU sed, to check if I could understand its parser and maybe get some ideas from it. While reading the code, I got the idea to convert it to Python. Why not? It is a decades-old battle-tested parser, and the code seemed to be simple to understand and adapt. That was the start of a solid month of work (and fun!) every night before sleeping, to get an initial working version. You can see the loooong list of commits in my dev branch (the first one is from 24 Jun). I kept working on it in the following months, fixing bugs and adding the extra features My original idea was to insert the new parser code into the existent Later, I felt that this parser and its test suite deserved a dedicated repository. Maybe other projects could make use of it? So https://github.com/aureliojargas/sedparse was born and I kept working on it in isolation, as a stand alone project. After 5 months of work in total, since the beginning of this "I need a new parser" quest, I had a first official 0.1.0 release of |
(Note: this ticket and the comments are all written after the fact. I'm just writing it here to document my "new parser" quest...)
For the sedsed magic to work, in the first place it needs to read and parse a sed script.
So many years ago I've written a "home made" sed script parser for sedsed. With no previous experience on writing a good parser, my idea of a "simple" parser was to always split the sed script by newlines and
;
to detect the commands.It worked for simple commands such as
5d; s/foo/bar/; 10q
, but any command with a literal;
or newline was a challenge. For example, as/foo;/bar;/g
would be broken in three pieces and then the parser rejoined those pieces until finding a valid command. Very hacky and buggy.Most of the reported bugs in the issue tracker are related to the parser not handling corner cases, or failing to detect invalid code. I don't think patching it is sustainable. I need a real robust parser.
The text was updated successfully, but these errors were encountered: