add support for markup language #40

n3f4s · 2017-05-20T13:14:43Z

It would be nice to have support for mark-up language like latex or markdown.
Using grammarous with those languages raise a lot of error due to the tag and automatic styling (for latex mostly because the latex compiler handle a lot of style issues).

Furthermore, having things like piece of code in the text can make the usage of grammarous less pleasant due to number of error raised. Supporting those mark-up languages would mean being able to disable spell checking for things like the verbatim environment in latex or code quote (``) in markdown.

I've asked LanguageTool if they could add the support of markup language, their answer is that it's up to the editor to handle the markup language parsing.

This look like an enhanced version of #10 (if I've understood the issue).

An idea could be to have a dictionary of command (by filetype). Those command would parse the file and return the raw text, without the markup and ignored text (like the content of the verbatim environment)

The text was updated successfully, but these errors were encountered:

rhysd · 2017-06-23T03:00:02Z

I've asked LanguageTool if they could add the support of markup language, their answer is that it's up to the editor to handle the markup language parsing.

Yeah, it must be nice to support markdown or latex as you said. However, there is some problem to do that.

At first, it's comparably easy to strip markdown into plain text. There are already some tools to strip them and I can also use markdown parser library via scripting language interface like if_python.

The main problem is that the position (line, column, offset) of grammatical error is not correct after stripping markdown or latex. LanguageTool will return the result for stripped plain text, not for an original markdown or latex code. So we need to use sourcemap to maintain the relations of positions between markdown or latex code and coverted plaintext. As long as I googled, there is no markdown/latex conversion tool which supports sourcemap. So I need to make it but there is no resource to do that currently.

Thank you for your suggestion but current my opinion is that it's hard to support.

n3f4s · 2017-06-23T10:07:41Z

I understand that it's hard to support and that's why I initially made a request in LanguageTool. They can do the stripping/conversion and return the right position for the errors.

A solution might be to simply replace the "markup" by space/newlines and disable the formatting (spaces, newlines, ...) error of LanguageTool. Most of the markup language deal with formatting themselves. It would be harder for LaTeX (and other languages with macro) since some macro can expand to several word. But removing the formatting markup (*, _, \textit, ...) and the code block would at least remove a lot of false errors.

Then again, I understand it's not straightforward.

copyme · 2017-08-22T17:50:07Z

I would love to use your plugin with LaTeX documents. TexStudio supports LanguageTool (not in a perfect way but still it is useful). Maybe reading their code would give you some clues how to implement something similar.

DerWeh · 2017-11-06T17:57:52Z

You already offer the possibility to ignore everything but comments and I think I have read that you identify comments by the highlight group. If that is the case would it be possible to do the reverse way and specify highlight groups we want to ignore?
For Latex, I use e.g. use vimtex which defines (think a lot is already provided by vim by default) for everything I would like to ignore (commands, math mode!!, curly brackets, comments, spacing ('@', ...)). This would be the most convenient way as we can pass the difficult task of getting the multitude of Latex specifiers to dedicated plug-ins and just fetch (maybe also per regular expression) the highlight groups we want to ignore from them.

languitar · 2017-12-21T09:42:51Z

For LaTeX, simply ignoring all commands (from highlight groups) could become a real issue. For instance, \emph{this word} is important for the sentence while with \footnote{some footnote} the contained text should be parsed, but not at that position. Finally, with commands like \cite{Foo}, the contained arguments should probably ignored at all. So simply using highlight groups will most likely still produce some ugly errors.

I hacked a special-purpose solution for my own needs with LaTeX by adding a preprocessor that replaces most of the commands I frequently use with the plain-text representations + enough spaces to avoid shifting the result locations.

I have integrated this into the plugin by locally setting

let g:grammarous#languagetool_cmd = getcwd() . '/detex-languagetool'

where detex-languagetools is a simple wrapper like:

#!/usr/bin/env python3

import os
import subprocess
import sys

dir_path = os.path.dirname(os.path.realpath(__file__))

subprocess.call('cat ' + sys.argv[-1] + ' | '
                + os.path.join(dir_path, 'detex.py') + ' | '
                + 'languagetool ' + ' '.join(sys.argv[1:-1]),
                shell=True)

and detex.py does the actual stripping of LaTeX, which I hand-crafted with some regular expressions for my specific needs.

pinpox · 2018-03-24T13:30:37Z

Hello,
is this feature still planned eventually/already in progress?
I understand it's not very easy to implement, but LaTex files would be the main reason I would like to use this.

@languitar's commet looks like a viable workaround. Could this not be integrated into the plugin?

pinpox · 2018-03-24T14:48:31Z

Sorry for double-posting, but here is a possible workaround.
If I understand correctly the problem with stripping latex commands is that it changes the length of the code. Also, you want to treat the commands differently:

In commands like \section{My section name} you want "My section name" to be checked.
In commands like \cite{MyCitation} you do not want "MyCitation" to be checked.

I created this small ruby script:

def get_commands line

	fullreplace = ["cite", "label"]
	partreplace = ["section", "subsection"]
	# puts line

	fullreplace.each do |fr|
		len = (line.gsub(/\\#{fr}\{([^}]*)}.*/,"#{" " *fr.length} \\1 ")).length
		line = line.gsub(/\\#{fr}\{[^}]*}/," " * len)
	end

	partreplace.each do |pr|
		line = line.gsub(/\\#{pr}\{([^}]*)}/,"#{" " *pr.length}  \\1 ")
	end
	puts line
end

ARGF.each_line do |line|
	get_commands(line)
end

You can pipe tex into it and it outputs the latex commands replaced with the correct amount of whitespace.

I now it's not pretty, just a proof-of-concept, but couldn't be this used if we specify all latex commands in the two arrays?

languitar · 2018-03-24T14:58:36Z

Just for the reference, this is my custom script: https://gist.github.com/languitar/2037fccd8520586639aa9f1227bbf8e6 It handles a few more cases.

real-or-random · 2018-04-07T20:29:23Z

This is related:
dpelle/vim-LanguageTool#4
My texlive installation includes the "detex" tool.

This is interesting too:
https://github.com/pkubowicz/opendetex

copyme · 2018-04-16T17:39:17Z

@real-or-random unfortunately, opendetex/detex does not work well with real life LaTeX documents. I would say that all the tools to convert (la)tex to text that I have checked offer very basic functionalities. Maybe pandoc at some point will be good enough but so far it has some problems too.

MarcelRobitaille · 2018-12-29T21:40:24Z

Since we don't (yet) have tool to strip markdown with sourcemaps, would it be possible to open the plain text version in a split and leave it up to the user to find the error in the source? This wouldn't be a huge deal for markdown and would really help.

krishnakumarg1984 · 2019-04-27T10:41:22Z

@rhysd there is finally a (platform-independent) solution for the problem that blocks you from implementing a markup parser - textidote!

Your Problem

".... option is to remove all this markup, leaving only the "clear" text; however, when a grammar tool points to a problem at a specific line in this clear text, it becomes hard to retrace that location in the original LaTeX file.

Solution

TeXtidote solves this problem; it can read your original LaTeX file and perform various sanity checks on it: for example, making sure that every figure is referenced in the text, enforcing the correct capitalization of titles, etc. In addition, TeXtidote can remove markup from the file and send it to the Language Tool library, which performs a verification of both spelling and grammar in a dozen languages. What is unique to TeXtidote is that it keeps track of the relative position of words between the original and the "clean" text. This means that it can translate the messages from Language Tool back to their proper location directly in your source file.

Can you try to port their logic to vim-grammorous?

copyme · 2019-04-27T12:25:05Z

@krishnakumarg1984 thanks for the info about texttidote, it looks pretty interesting.

matze-dd · 2020-03-31T12:32:53Z

YaLafi does filter LaTeX text, too.
This project is still in an early stage, however.

oblitum · 2020-10-01T08:00:24Z

Would be great if TeXtidote support was added, it even supports json results for easy parsing. Great wrapper for LanguageTool, would make this plugin work on LaTeX and Markdown.

kevincox · 2022-05-12T14:17:03Z

I wonder if this can be fixed in the plugin instead of LanguageTool. With more and more highlighters supporting fenced languages Vim now knows what part of my markdown doc are markdown and what are code blocks. Even if the markdown inline syntax causes a few errors it would be great if the code blocks could be ignored since those cause a huge amount of problems and are annoying to skip over every time.

pinpox · 2022-05-18T11:36:06Z

Vim now knows what part of my markdown doc are markdown and what are code blocks

If I'm not mistaken, this should be possible with tree-sitter.

DerWeh · 2022-05-18T14:08:46Z

Recently I switched to ltex which is based on LanguageTool and does a reasonable job supporting LaTeX and other markup languages. It can simply be used as a language server in vim/neovim, see https://valentjn.github.io/ltex/installation-usage.html.

It tends to use excessive amounts of memory and so far I wasn't able to add words to the dictionary, but else it seems fine. Maybe it's an option for vim-grammarous to replace bare language tool by ltex.

With tree-sitter, I am rather doubtful. I am still waiting for a simple dictionary-type spell checking that works as good as vim's default. There is spellsitter, but it didn't convince me yet to switch.

jdhao · 2022-11-28T10:09:45Z

@DerWeh Are you using both ltex and vim-grammarous? What is your take on their pros and cons? I also find vale. Haven't used vale and not sure how is it compared langtool.

rhysd mentioned this issue Oct 22, 2017

Repeated white space #49

Closed

rhysd mentioned this issue Nov 6, 2017

handle syntax/markup #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for markup language #40

add support for markup language #40

n3f4s commented May 20, 2017

rhysd commented Jun 23, 2017 •

edited

Loading

n3f4s commented Jun 23, 2017

copyme commented Aug 22, 2017

DerWeh commented Nov 6, 2017

languitar commented Dec 21, 2017 •

edited

Loading

pinpox commented Mar 24, 2018 •

edited

Loading

pinpox commented Mar 24, 2018

languitar commented Mar 24, 2018

real-or-random commented Apr 7, 2018 •

edited

Loading

copyme commented Apr 16, 2018

MarcelRobitaille commented Dec 29, 2018

krishnakumarg1984 commented Apr 27, 2019 •

edited

Loading

copyme commented Apr 27, 2019

matze-dd commented Mar 31, 2020

oblitum commented Oct 1, 2020 •

edited

Loading

kevincox commented May 12, 2022

pinpox commented May 18, 2022

DerWeh commented May 18, 2022 •

edited

Loading

jdhao commented Nov 28, 2022

add support for markup language #40

add support for markup language #40

Comments

n3f4s commented May 20, 2017

rhysd commented Jun 23, 2017 • edited Loading

n3f4s commented Jun 23, 2017

copyme commented Aug 22, 2017

DerWeh commented Nov 6, 2017

languitar commented Dec 21, 2017 • edited Loading

pinpox commented Mar 24, 2018 • edited Loading

pinpox commented Mar 24, 2018

languitar commented Mar 24, 2018

real-or-random commented Apr 7, 2018 • edited Loading

copyme commented Apr 16, 2018

MarcelRobitaille commented Dec 29, 2018

krishnakumarg1984 commented Apr 27, 2019 • edited Loading

Your Problem

Solution

copyme commented Apr 27, 2019

matze-dd commented Mar 31, 2020

oblitum commented Oct 1, 2020 • edited Loading

kevincox commented May 12, 2022

pinpox commented May 18, 2022

DerWeh commented May 18, 2022 • edited Loading

jdhao commented Nov 28, 2022

rhysd commented Jun 23, 2017 •

edited

Loading

languitar commented Dec 21, 2017 •

edited

Loading

pinpox commented Mar 24, 2018 •

edited

Loading

real-or-random commented Apr 7, 2018 •

edited

Loading

krishnakumarg1984 commented Apr 27, 2019 •

edited

Loading

oblitum commented Oct 1, 2020 •

edited

Loading

DerWeh commented May 18, 2022 •

edited

Loading