New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated fix for #1652: autodetect line breaks type in editor #49

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@pfalcon
Contributor

pfalcon commented Dec 6, 2014

This is rebased and updated branch https://github.com/MidnightCommander/mc/tree/1652_autodetect_lb , addressing bugreport http://www.midnight-commander.org/ticket/1652 .

It consists of rebasing to the latest codebase original patches (editor functionality by myself, editor options by Ilia Maslakov, line break detection unit tests by Slava Zanko), and additional changes to make linebreak type detection much more conservative and robust (plus some refactoring of unittests).

This patcheset adheres to following requirements:

  1. By default, linebreak type autodetection is off, and editor behavior is fully as before (including binary-safety, etc.)
  2. When autodetection is on, it is done in conservative manner: bigger sampling buffer size is used to increase confidence; a non-default linebreak type is chosen only if it there was no conflicts in detection; any suspicion that file is binary disables detection. If any of the above conditions are not met, file is loaded with default "as is" line break type (which is the original, binary-safe mode).
@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator May 30, 2015

Contributor

i would choose an entirely different approach: instead of converting upon loading/saving, deal with line endings "on the fly". this has the advantage that even the worst misdetection cannot corrupt the file, you can always make a best effort to display it, and you can switch the mode dynamically when you find it wrong.
do you think this make sense and is feasible?

Contributor

ossilator commented May 30, 2015

i would choose an entirely different approach: instead of converting upon loading/saving, deal with line endings "on the fly". this has the advantage that even the worst misdetection cannot corrupt the file, you can always make a best effort to display it, and you can switch the mode dynamically when you find it wrong.
do you think this make sense and is feasible?

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 30, 2015

Contributor

@ossilator:

i would choose an entirely different approach

Feel free to choose an entirely different approach and provide patches. For 6 years the original issue (http://www.midnight-commander.org/ticket/1652) exists, this is the only patch which surfaced, on which 3+ people worked over last 2 years. If you think that these people didn't consider other approaches, then sorry, it's a bit naive.

Contributor

pfalcon commented May 30, 2015

@ossilator:

i would choose an entirely different approach

Feel free to choose an entirely different approach and provide patches. For 6 years the original issue (http://www.midnight-commander.org/ticket/1652) exists, this is the only patch which surfaced, on which 3+ people worked over last 2 years. If you think that these people didn't consider other approaches, then sorry, it's a bit naive.

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator May 30, 2015

Contributor

if you did consider different approaches, then document your conclusions for others to verify.

Contributor

ossilator commented May 30, 2015

if you did consider different approaches, then document your conclusions for others to verify.

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 30, 2015

Contributor

Bother to read the linked ticket.

Contributor

pfalcon commented May 30, 2015

Bother to read the linked ticket.

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator May 30, 2015

Contributor

yes, sorry.
anyway, i think your conclusions are wrong.

on the technical side, your "fear" of adding the necessary abstraction seems entirely overblown. there is already an abstraction for multi-byte chars (for the utf-8 support). adding the necessary abstraction for multi-byte line endings isn't exactly rocket science. and quite frankly, for somebody who volunteers for maintainership, you have a funny reluctance to getting to know the code.

on the user side, the concerns about the potential detriments of the patch have led you to choosing an approach that is so conservative that many people won't even notice that the feature is there. this is entirely counterproductive. the right solution would be always on, and yet without risk of accidentally destroying a file (fwiw, at least for me, this isn't about editing actual binaries - it's about "funny" text files, text interspersed with binary, etc.).

Contributor

ossilator commented May 30, 2015

yes, sorry.
anyway, i think your conclusions are wrong.

on the technical side, your "fear" of adding the necessary abstraction seems entirely overblown. there is already an abstraction for multi-byte chars (for the utf-8 support). adding the necessary abstraction for multi-byte line endings isn't exactly rocket science. and quite frankly, for somebody who volunteers for maintainership, you have a funny reluctance to getting to know the code.

on the user side, the concerns about the potential detriments of the patch have led you to choosing an approach that is so conservative that many people won't even notice that the feature is there. this is entirely counterproductive. the right solution would be always on, and yet without risk of accidentally destroying a file (fwiw, at least for me, this isn't about editing actual binaries - it's about "funny" text files, text interspersed with binary, etc.).

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 30, 2015

Contributor

anyway, i think your conclusions are wrong.

Alternative patches welcome.

and quite frankly, for somebody who volunteers for maintainership, you have a funny reluctance to getting to know the code.

Better maintainer welcome.

on the user side, the concerns about the potential detriments of the patch has led you

Note: not my concerns, but concerns of the overly conservative maintainer. It did not "led" me, I just proposed that as a compromise, to get some progress on this issue.

Currently, we have funny situation that there're concerns of binary safety of text editor, while the text editor has problem working with real-world text files.

And finally: seeing you were a contributor, I'm not interested in fingerpointing between you and me. As you're not bad maintainer, I'm not interested in having quarrels with you. And as you're not a maintainer, I'm not interested spending time on arguments which then are empty.

Contributor

pfalcon commented May 30, 2015

anyway, i think your conclusions are wrong.

Alternative patches welcome.

and quite frankly, for somebody who volunteers for maintainership, you have a funny reluctance to getting to know the code.

Better maintainer welcome.

on the user side, the concerns about the potential detriments of the patch has led you

Note: not my concerns, but concerns of the overly conservative maintainer. It did not "led" me, I just proposed that as a compromise, to get some progress on this issue.

Currently, we have funny situation that there're concerns of binary safety of text editor, while the text editor has problem working with real-world text files.

And finally: seeing you were a contributor, I'm not interested in fingerpointing between you and me. As you're not bad maintainer, I'm not interested in having quarrels with you. And as you're not a maintainer, I'm not interested spending time on arguments which then are empty.

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator May 30, 2015

Contributor

Currently, we have funny situation that there're concerns of binary safety of text editor, while the text editor has problem working with real-world text files.

i don't find this strange at all. seriously, if you want a real text editor, by all means, use one. i use mcedit only for quick edits of more or less random files - and i know i can trust it to behave consistently irrespective of file type and not mess up the file. i really wouldn't want to lose that, even if it goes at the cost of other features.

Contributor

ossilator commented May 30, 2015

Currently, we have funny situation that there're concerns of binary safety of text editor, while the text editor has problem working with real-world text files.

i don't find this strange at all. seriously, if you want a real text editor, by all means, use one. i use mcedit only for quick edits of more or less random files - and i know i can trust it to behave consistently irrespective of file type and not mess up the file. i really wouldn't want to lose that, even if it goes at the cost of other features.

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 30, 2015

Contributor

i don't find this strange at all. seriously, if you want a real text editor, by all means, use one. i use mcedit only for quick edits of more or less random files - and i know i can trust it to behave consistently irrespective of file type and not mess up the file.

And I use mc as a super-IDE for all my development work, thru the years, leaving behind vims, emacses, ideas, eclipses, and other "exciting-new" stuff popping up and down thru the time. So, as a casual mc user (feel free to skip arguing on this definition, you said what you said with your "quick edits") you will keep the behavior you're used to. And core users, like myself, will be able to work with real-world text files adequately. So, proposed compromise looks very viable in practice.

Contributor

pfalcon commented May 30, 2015

i don't find this strange at all. seriously, if you want a real text editor, by all means, use one. i use mcedit only for quick edits of more or less random files - and i know i can trust it to behave consistently irrespective of file type and not mess up the file.

And I use mc as a super-IDE for all my development work, thru the years, leaving behind vims, emacses, ideas, eclipses, and other "exciting-new" stuff popping up and down thru the time. So, as a casual mc user (feel free to skip arguing on this definition, you said what you said with your "quick edits") you will keep the behavior you're used to. And core users, like myself, will be able to work with real-world text files adequately. So, proposed compromise looks very viable in practice.

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator May 30, 2015

Contributor

i've been using mcedit like that myself a decade ago, and the slavaz team clearly did so as well judging by the enhancements they made.
and while i can totally understand why somebody would deem vim, emacs, and whatever other pseudo-IDE no sufficient improvement over mcedit to bother, it's beyond me how one can seriously disregard a modern IDE which is targeted towards (and understands) the programming language one uses - the amount of useless monkey work this has cut for me is outstanding.
but anyway ...

how is the EOL feature relevant for your use as an IDE? you're supposed to use the SCM system's EOL conversions.

while you're right that i could disable the feature, at that point it just becomes dead code from my perspective. why would i be supposed to approve that?

there is an additional factor: i could (very occasionally) actually make good use of the feature, so i don't want it to be so cumbersome to use. but i don't want to enable it by default because of the risk (or the inconsistent behavior when it needs to fall back to an entirely different mode when it encounters a file it cannot deal with). so it's not just that i dislike the kinda hacky proposed solution - i'd also really like to see the feature done properly, which is why i'm trying to convince you to give it another shot.

Contributor

ossilator commented May 30, 2015

i've been using mcedit like that myself a decade ago, and the slavaz team clearly did so as well judging by the enhancements they made.
and while i can totally understand why somebody would deem vim, emacs, and whatever other pseudo-IDE no sufficient improvement over mcedit to bother, it's beyond me how one can seriously disregard a modern IDE which is targeted towards (and understands) the programming language one uses - the amount of useless monkey work this has cut for me is outstanding.
but anyway ...

how is the EOL feature relevant for your use as an IDE? you're supposed to use the SCM system's EOL conversions.

while you're right that i could disable the feature, at that point it just becomes dead code from my perspective. why would i be supposed to approve that?

there is an additional factor: i could (very occasionally) actually make good use of the feature, so i don't want it to be so cumbersome to use. but i don't want to enable it by default because of the risk (or the inconsistent behavior when it needs to fall back to an entirely different mode when it encounters a file it cannot deal with). so it's not just that i dislike the kinda hacky proposed solution - i'd also really like to see the feature done properly, which is why i'm trying to convince you to give it another shot.

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 30, 2015

Contributor

i've been using mcedit like that myself a decade ago, and the slavaz team clearly did so as well judging by the enhancements they made.

I figure, visible tabs, visible trailing spaces are pretty advanced features, and one of the reasons why I think elaborating mcedit further makes good sense.

it's beyond me how one can seriously disregard a modern IDE which is targeted towards (and understands) the programming language one uses

That's mostly offtopic, but the passage above gives the answer - there're too many such IDEs, and they're usually language-specific. I'm an open-source hacker, and need to quickly "scroll" thru many projects in short time, and I don't know what next language I make hack on next, and for how long, and don't want to spend extended time setting up an infrastructure for it. Instead, I want to have a universal IDE to work with any kind of source, and mc and mcedit serve that pretty well, modulo few genric last-mile imrovements like this (well, I my also imagine few more adhoc improvements, but I don't even want to think about them now too much, as I value mc's simplicity and generic nature).

how is the EOL feature relevant for your use as an IDE? you're supposed to use the SCM system's EOL conversions.

Yes, and I use that. But as I mentioned, I'm an open-source hacker deal with steady stream of new projects regularly (say, at least dozen each month, usually more). Not all of them set up right, a lot have line-endings mix up. In the end, I figured that I spend too much effort just to report such issues upstream (which are hardly appreciated), or fork and clean up somebody's stuff, etc. Going Internet-style, i.e. being conservative regarding what I produce, but liberal regarding what can be dealt with without much trouble is what revived my interest in this patch, so I tried to clean it up based on collected feedback, and get into mainline after all, instead of maintaining in my fork.

while you're right that i could disable the feature, at that point it just becomes dead code from my perspective. why would i be supposed to approve that?

mc open-source project, and I think it's important that it would cater itself for working well with other open-source projects (including quirks as described above).

so it's not just that i dislike the kinda hacky proposed solution - i'd also really like to see the feature done properly, which is why i'm trying to convince you to give it another shot.

mc itself is kinda hacky coded thing. mc is buggy, including hard bugs like segfaults, hangs, history of data corruption. We live in real world, and this patch provides real-world solution to a problem. As I mentioned, my study of the problem showed that alternative to this way is either doing myriad small patches throughout the editor codebase, which while small, not a few chars, but make control structure more complicated; or rewrite editor pretty much. Both of these choices would be irresponsible - both on my side to propose and on maintainers' to accept - that's what can easily lead to omission/overlook which may lead to segfaults/data corruption. Done like it is, the patch does solve the problem and is contained - you can look at it, and raise concerns, unlike some spaghetti patch, whose effect you cannot predict. Once raise, concerns can be addressed, even if it won't be a complete solution, but compromise. But that's what art of real-world software engineering is.

Contributor

pfalcon commented May 30, 2015

i've been using mcedit like that myself a decade ago, and the slavaz team clearly did so as well judging by the enhancements they made.

I figure, visible tabs, visible trailing spaces are pretty advanced features, and one of the reasons why I think elaborating mcedit further makes good sense.

it's beyond me how one can seriously disregard a modern IDE which is targeted towards (and understands) the programming language one uses

That's mostly offtopic, but the passage above gives the answer - there're too many such IDEs, and they're usually language-specific. I'm an open-source hacker, and need to quickly "scroll" thru many projects in short time, and I don't know what next language I make hack on next, and for how long, and don't want to spend extended time setting up an infrastructure for it. Instead, I want to have a universal IDE to work with any kind of source, and mc and mcedit serve that pretty well, modulo few genric last-mile imrovements like this (well, I my also imagine few more adhoc improvements, but I don't even want to think about them now too much, as I value mc's simplicity and generic nature).

how is the EOL feature relevant for your use as an IDE? you're supposed to use the SCM system's EOL conversions.

Yes, and I use that. But as I mentioned, I'm an open-source hacker deal with steady stream of new projects regularly (say, at least dozen each month, usually more). Not all of them set up right, a lot have line-endings mix up. In the end, I figured that I spend too much effort just to report such issues upstream (which are hardly appreciated), or fork and clean up somebody's stuff, etc. Going Internet-style, i.e. being conservative regarding what I produce, but liberal regarding what can be dealt with without much trouble is what revived my interest in this patch, so I tried to clean it up based on collected feedback, and get into mainline after all, instead of maintaining in my fork.

while you're right that i could disable the feature, at that point it just becomes dead code from my perspective. why would i be supposed to approve that?

mc open-source project, and I think it's important that it would cater itself for working well with other open-source projects (including quirks as described above).

so it's not just that i dislike the kinda hacky proposed solution - i'd also really like to see the feature done properly, which is why i'm trying to convince you to give it another shot.

mc itself is kinda hacky coded thing. mc is buggy, including hard bugs like segfaults, hangs, history of data corruption. We live in real world, and this patch provides real-world solution to a problem. As I mentioned, my study of the problem showed that alternative to this way is either doing myriad small patches throughout the editor codebase, which while small, not a few chars, but make control structure more complicated; or rewrite editor pretty much. Both of these choices would be irresponsible - both on my side to propose and on maintainers' to accept - that's what can easily lead to omission/overlook which may lead to segfaults/data corruption. Done like it is, the patch does solve the problem and is contained - you can look at it, and raise concerns, unlike some spaghetti patch, whose effect you cannot predict. Once raise, concerns can be addressed, even if it won't be a complete solution, but compromise. But that's what art of real-world software engineering is.

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator May 31, 2015

Contributor

mc itself is kinda hacky coded thing.

how could one argue with that ...

ok, let's assume a maintainer who wants to merge this pops up.

the series obviously needs a rebase, and i think it should be squashed (slavaz' remaining parts are rather small, so i think a simple done-with: slavaz footer will do for attribution).

i really don't like the option being there. why not make the thing so conservative that it cannot possibly fail - check the whole file? i don't really understand why you are using mc_read in the first place - the file is read in its entirety anyway, so why not insert the detection and adjustment between the regular read and the insertion into the edit buffer?

where is the inverse conversion on file writeout?

Contributor

ossilator commented May 31, 2015

mc itself is kinda hacky coded thing.

how could one argue with that ...

ok, let's assume a maintainer who wants to merge this pops up.

the series obviously needs a rebase, and i think it should be squashed (slavaz' remaining parts are rather small, so i think a simple done-with: slavaz footer will do for attribution).

i really don't like the option being there. why not make the thing so conservative that it cannot possibly fail - check the whole file? i don't really understand why you are using mc_read in the first place - the file is read in its entirety anyway, so why not insert the detection and adjustment between the regular read and the insertion into the edit buffer?

where is the inverse conversion on file writeout?

Ticket #1652: autodetect line-endings
    * On opening file, detect line-endings used by sampling some initial content.
    * If it happen to be CR or CRLF, skip fast load path, and in edit_insert_file()
      convert such line endings to '\n'.
    * Save detected line ending type for editor.

Signed-off-by: Paul Sokolovsky <pfalcon@users.sourceforge.net>
Signed-off-by: Slava Zanko <slavazanko@gmail.com>
@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 31, 2015

Contributor

Patchset rebased onto latest master. Will respond to other questions a bit later.

Contributor

pfalcon commented May 31, 2015

Patchset rebased onto latest master. Will respond to other questions a bit later.

ilia-maslakov and others added some commits Mar 29, 2011

Added configuration option editor_autodetect_linebreak.
Allows to enable or disable line break type detection in the editor. Default
value is off, based on concerns expressed regarding binary-safety of the
editor. With this setting conservatively set to off, editor behavior stays
exactly the same as before, unless user explicitly enables this option.

Signed-off-by: Ilia Maslakov <il.smind@gmail.com>
Signed-off-by: Paul Sokolovsky <pfalcon@users.sourceforge.net>
Added unit tests for checking "detect type of line breaks" functionality
Signed-off-by: Slava Zanko <slavazanko@gmail.com>
edit.c: Make line break detection much more conservative.
Detection now requires consistent line breaks being present within sampling
buffer. If breaks of different types are found, or there's a slightest
suspicion that it may be binary content, it reverts to binary-safe "as-is"
mode.

Also, refactor detection routines to facilitate unit testing.

Signed-off-by: Paul Sokolovsky <pfalcon@users.sourceforge.net>
tests: detect_linebreaks.c: Revamp for latest codebase.
Also, test only line break detection algorithm, not anything else (no
file operations, etc.).

Signed-off-by: Paul Sokolovsky <pfalcon@users.sourceforge.net>
@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 31, 2015

Contributor

and i think it should be squashed (slavaz' remaining parts are rather small, so i think a simple done-with: slavaz footer will do for attribution).

I keep patches separate and with original authorship as an argument that this is not my patch - 2 official maintainers worked on it too. The patch can be squashed easily anytime, once it's confirmed as ready for merging.

Contributor

pfalcon commented May 31, 2015

and i think it should be squashed (slavaz' remaining parts are rather small, so i think a simple done-with: slavaz footer will do for attribution).

I keep patches separate and with original authorship as an argument that this is not my patch - 2 official maintainers worked on it too. The patch can be squashed easily anytime, once it's confirmed as ready for merging.

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 31, 2015

Contributor

why not make the thing so conservative that it cannot possibly fail - check the whole file?

Which whole file - 1GB, 4GB, 1TB one? It's one of the known issues that mistakenly pressing F4 on a wrong large file is a good way to DoS current mc session and possibly whole system. So, scanning thru entire file is irresponsible, you need to make that operation be O(1) of file size, not O(n). Then you just select suitable window size and few Ks already offer 99% guarantee of detection.

This implementation also nicely decouples from the rest of editor, to not make it more of spaghetti.

Contributor

pfalcon commented May 31, 2015

why not make the thing so conservative that it cannot possibly fail - check the whole file?

Which whole file - 1GB, 4GB, 1TB one? It's one of the known issues that mistakenly pressing F4 on a wrong large file is a good way to DoS current mc session and possibly whole system. So, scanning thru entire file is irresponsible, you need to make that operation be O(1) of file size, not O(n). Then you just select suitable window size and few Ks already offer 99% guarantee of detection.

This implementation also nicely decouples from the rest of editor, to not make it more of spaghetti.

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 31, 2015

Contributor

where is the inverse conversion on file writeout?

That's handled by the existing "Save As..." code, which allows to save a file with different line-breaks. So, the detection process just sets another default value for Save/Save As to use (standard default is "don't change").

Contributor

pfalcon commented May 31, 2015

where is the inverse conversion on file writeout?

That's handled by the existing "Save As..." code, which allows to save a file with different line-breaks. So, the detection process just sets another default value for Save/Save As to use (standard default is "don't change").

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon May 31, 2015

Contributor

i really don't like the option being there.

Again, that option was added as a compromise to get going with that feature. I was pointed out that this patch changes behavior of editor, and I made changes so it wasn't, unless user opts in. Yes, I myself think that this feature should be on by default, as apparently more users use text editor to edit text files than binary ones. But it's not me to decide.

(For perspective, I also think that "Esc key mode: single press" should be default, as realistically nobody uses Esc to manually enter escape sequences nowadays, or use 300 or so baud rates. If I were a maintainer, I would raise a question to make it a default, perhaps with a command-line switch to force-disable - I'm sure we lose bunch of users because of the current default).

Contributor

pfalcon commented May 31, 2015

i really don't like the option being there.

Again, that option was added as a compromise to get going with that feature. I was pointed out that this patch changes behavior of editor, and I made changes so it wasn't, unless user opts in. Yes, I myself think that this feature should be on by default, as apparently more users use text editor to edit text files than binary ones. But it's not me to decide.

(For perspective, I also think that "Esc key mode: single press" should be default, as realistically nobody uses Esc to manually enter escape sequences nowadays, or use 300 or so baud rates. If I were a maintainer, I would raise a question to make it a default, perhaps with a command-line switch to force-disable - I'm sure we lose bunch of users because of the current default).

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator Jun 1, 2015

Contributor

Again, that option was added as a compromise [...]

my point is that the compromise should not be necessary:

It's one of the known issues that mistakenly pressing F4 on a wrong large file is a good way to DoS current mc session and possibly whole system

this is entirely true. but this problem exists regardless of your patch. given that the line ending scan is at least an order of magnitude faster than the code to insert the file into the buffer (char by char!), injecting it between the existing mc_read and insertion loop would not noticeably worsen the situation.

and when a fix is implemented (probably a more or less arbitrary file size limitation inducing a confirmation dialog), the detection code would be automatically covered by it.

This implementation also nicely decouples from the rest of editor, to not make it more of spaghetti.

i totally don't buy that. in the worst case you need an out parameter which the load function sets as a "side effect" (i haven't looked at the code in detail, but it isn't that bad).

Contributor

ossilator commented Jun 1, 2015

Again, that option was added as a compromise [...]

my point is that the compromise should not be necessary:

It's one of the known issues that mistakenly pressing F4 on a wrong large file is a good way to DoS current mc session and possibly whole system

this is entirely true. but this problem exists regardless of your patch. given that the line ending scan is at least an order of magnitude faster than the code to insert the file into the buffer (char by char!), injecting it between the existing mc_read and insertion loop would not noticeably worsen the situation.

and when a fix is implemented (probably a more or less arbitrary file size limitation inducing a confirmation dialog), the detection code would be automatically covered by it.

This implementation also nicely decouples from the rest of editor, to not make it more of spaghetti.

i totally don't buy that. in the worst case you need an out parameter which the load function sets as a "side effect" (i haven't looked at the code in detail, but it isn't that bad).

@pfalcon

This comment has been minimized.

Show comment
Hide comment
@pfalcon

pfalcon Jun 10, 2015

Contributor

(sorry for delay with response)

but this problem exists regardless of your patch.

Sorry, I don't understand what you hint at - if a design is compromised, then everyone should feel free to compromise it even further? Sorry, I don't think that's sustainable or responsible approach.

given that the line ending scan is at least an order of magnitude faster than the code to insert the file into the buffer (char by char!), injecting it between the existing mc_read and insertion loop would not noticeably worsen the situation.

So, if you mean detecting line-break type in parallel with loading file contents itself, then in general that wouldn't be possible, because before loading each line, one already needs to know line break type of a file. And doing it in 2 passes over file has exactly DoS potential I mentioned above: on a system with low free physical memory, doing that will cause swapping. If system also has a lot of other I/O activity, that can lead to true system-wide I/O due to bugs in contemporary OSes: https://bugzilla.kernel.org/show_bug.cgi?id=12309 (these bugs happen because people do not think about issues we discuss here). Such issues may affect stability and performance of a modern desktop, but people (e.g. myself) run mc on small systems (like home routers/NASes/etc.) where 32MB of RAM or less is norm. It's huge difference whether unneeded DoS-like operation will scan something once and completes in 10mins or twice and completes in 20mins.

i totally don't buy that.

And yet that's generally accepted software engineering practice - consider that the most "efficient" way to write something is to write tightly-couples spaghetti code in assembler, nobody any longer does that. Vice-versa, there're design patterns how to decouple things to achieve other kinds of efficiencies than the one implied by spaghetti assembler code. You miss the fact that this is just a first version of the implementation of a fix for this issue (trac:1652). If got into mainline, but deemed as requiring more improvements, it can be further elaborated. You also miss the fact that many people contributed to even this solution, and if you see immediate need for further tweaks, please contribute them. Otherwise, the way this patch is done, it addresses the original issue, while being conservative enough to not affect users which want to be unaffected, and has other important traits, like being easy to maintain on top of the mainline sources. And that's how it is maintained so far. If official maintainers see some (non DoS imposing) changes as a pre-requisite to it being merged, I'm happy to address them.

Contributor

pfalcon commented Jun 10, 2015

(sorry for delay with response)

but this problem exists regardless of your patch.

Sorry, I don't understand what you hint at - if a design is compromised, then everyone should feel free to compromise it even further? Sorry, I don't think that's sustainable or responsible approach.

given that the line ending scan is at least an order of magnitude faster than the code to insert the file into the buffer (char by char!), injecting it between the existing mc_read and insertion loop would not noticeably worsen the situation.

So, if you mean detecting line-break type in parallel with loading file contents itself, then in general that wouldn't be possible, because before loading each line, one already needs to know line break type of a file. And doing it in 2 passes over file has exactly DoS potential I mentioned above: on a system with low free physical memory, doing that will cause swapping. If system also has a lot of other I/O activity, that can lead to true system-wide I/O due to bugs in contemporary OSes: https://bugzilla.kernel.org/show_bug.cgi?id=12309 (these bugs happen because people do not think about issues we discuss here). Such issues may affect stability and performance of a modern desktop, but people (e.g. myself) run mc on small systems (like home routers/NASes/etc.) where 32MB of RAM or less is norm. It's huge difference whether unneeded DoS-like operation will scan something once and completes in 10mins or twice and completes in 20mins.

i totally don't buy that.

And yet that's generally accepted software engineering practice - consider that the most "efficient" way to write something is to write tightly-couples spaghetti code in assembler, nobody any longer does that. Vice-versa, there're design patterns how to decouple things to achieve other kinds of efficiencies than the one implied by spaghetti assembler code. You miss the fact that this is just a first version of the implementation of a fix for this issue (trac:1652). If got into mainline, but deemed as requiring more improvements, it can be further elaborated. You also miss the fact that many people contributed to even this solution, and if you see immediate need for further tweaks, please contribute them. Otherwise, the way this patch is done, it addresses the original issue, while being conservative enough to not affect users which want to be unaffected, and has other important traits, like being easy to maintain on top of the mainline sources. And that's how it is maintained so far. If official maintainers see some (non DoS imposing) changes as a pre-requisite to it being merged, I'm happy to address them.

@ossilator

This comment has been minimized.

Show comment
Hide comment
@ossilator

ossilator Jun 13, 2015

Contributor

I don't understand what you hint at - if a design is compromised, then everyone should feel free to compromise it even further?

no, that's actually your argument:

mc itself is kinda hacky coded thing.

my argument is that not duplicating work is good, and comes with the added bonus of getting fixed for free when the code path that exists anyway is fixed.

but you're right that my idea won't work as-is, because the files are read in chunks, so a whole-sale detection and conversion can't happen before the file has been loaded in its entirety. that sounds like a somewhat bigger refactoring exercise.

however, your argumentation still makes no sense. the existing code path is sufficient to DoS the system. adding a second full-file reader would double the problem, but at that point this isn't very relevant any more. so the rather obvious conclusion is not avoiding it, but adding an adequate protection mechanism to the new code (which would automatically cover the old code as well if it was always enabled).

Contributor

ossilator commented Jun 13, 2015

I don't understand what you hint at - if a design is compromised, then everyone should feel free to compromise it even further?

no, that's actually your argument:

mc itself is kinda hacky coded thing.

my argument is that not duplicating work is good, and comes with the added bonus of getting fixed for free when the code path that exists anyway is fixed.

but you're right that my idea won't work as-is, because the files are read in chunks, so a whole-sale detection and conversion can't happen before the file has been loaded in its entirety. that sounds like a somewhat bigger refactoring exercise.

however, your argumentation still makes no sense. the existing code path is sufficient to DoS the system. adding a second full-file reader would double the problem, but at that point this isn't very relevant any more. so the rather obvious conclusion is not avoiding it, but adding an adequate protection mechanism to the new code (which would automatically cover the old code as well if it was always enabled).

@seyko2

This comment has been minimized.

Show comment
Hide comment
@seyko2

seyko2 Nov 6, 2015

Contributor

Wow! It work's! Nice. Thanks.

Contributor

seyko2 commented Nov 6, 2015

Wow! It work's! Nice. Thanks.

@seyko2

This comment has been minimized.

Show comment
Hide comment
@seyko2

seyko2 Nov 8, 2015

Contributor

Do you look at the vi editor from the openwatcom-v2? Looks like mcedit, mouse support, multiwindow, crlf autodetection....

Contributor

seyko2 commented Nov 8, 2015

Do you look at the vi editor from the openwatcom-v2? Looks like mcedit, mouse support, multiwindow, crlf autodetection....

@seyko2

This comment has been minimized.

Show comment
Hide comment
@seyko2

seyko2 Nov 8, 2015

Contributor

Suggestion: always show LB type. Curretly LF type is not displayed

Contributor

seyko2 commented Nov 8, 2015

Suggestion: always show LB type. Curretly LF type is not displayed

@Tomas-M

This comment has been minimized.

Show comment
Hide comment
@Tomas-M

Tomas-M Sep 25, 2016

Is this going to be merged? I would really appreciate it. Thanks!

Tomas-M commented Sep 25, 2016

Is this going to be merged? I would really appreciate it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment