Error with Chinese (possibly other multibye) characters #1

zhaocai · 2012-09-10T08:40:52Z

let g:switch_definitions += [['是', '否']]

Error detected while processing function <SNR>83_Switch..switch#Switch..<SNR>318_Replace:
line    4:
E486: Pattern not found: \%>48c\V否\m\%<51c

AndrewRadev · 2012-09-10T13:27:10Z

Multibyte characters are tricky :). I should remember to test my plugins with them more often.

I've pushed some commits that should fix this, could you check it out and confirm that it works now?

zhaocai · 2012-09-10T15:44:41Z

The update works OK now.

one more thing I notice is case matching. True is switched to false

AndrewRadev · 2012-09-11T18:04:17Z

There's really no way to somehow automatically detect capitalization. The plugin attemts to do a substitution, and that respects your ignorecase option. But even if you have ignorecase on, Vim has no way of matching the T in true to F in false automatically. I could write the built-in pattern so that it handles that, but that would probably make it too complicated, especially if I try to respect all patterns like that.

Still, the True and False case is probably important for Python at least. I added a built-in for ['True', 'False'], so this particular example should work capitalize correctly now.

zhaocai · 2012-09-11T19:22:10Z

There's really no way to somehow automatically detect capitalization.

I disagree on this one.

I have some sample code from vim-cycle:

function! s:imitate_case(text, reference) "{{{
  if a:reference =~# '^\u*$'
    return toupper(a:text)
  elseif a:reference =~# '^\U*$'
    return tolower(a:text)
  else
    let uppers = substitute(a:reference, '\U', '0', 'g')
    let new_text = tolower(a:text)
    while uppers !~ '^0\+$'
      let index = match(uppers, '[^0]')
      if len(new_text) < index
        break
      endif
      let new_text = substitute(new_text, '\%' . (index + 1) . 'c[a-z]', toupper(new_text[index]), '')
      let uppers = substitute(uppers, '\%' . (index + 1) . 'c.', '0', '')
    endwhile
    return new_text
  endif
endfunction "}}}

from neocomplcache

  " Convert words.
  if neocomplcache#is_text_mode() "{{{
    let convert_candidates = filter(copy(complete_words),
          \ "get(v:val, 'neocomplcache__convertable', 1)")

    if a:cur_keyword_str =~ '^\l\+$'
      for keyword in convert_candidates
        let keyword.word = tolower(keyword.word)
        let keyword.abbr = tolower(keyword.abbr)
      endfor
    elseif a:cur_keyword_str =~ '^\u\+$'
      for keyword in convert_candidates
        let keyword.word = toupper(keyword.word)
        let keyword.abbr = toupper(keyword.abbr)
      endfor
    elseif a:cur_keyword_str =~ '^\u\l\+$'
      for keyword in convert_candidates
        let keyword.word = toupper(keyword.word[0]).
              \ tolower(keyword.word[1:])
        let keyword.abbr = toupper(keyword.abbr[0]).
              \ tolower(keyword.abbr[1:])
      endfor
    endif
  endif"}}}

AndrewRadev · 2012-09-13T06:36:00Z

Imagine I have the following definition:

let g:switch_definitions =
      \ [
      \   {
      \     '\<\(\k\+\)\>': '@_\1',
      \     '@_\(\k\+\)\>': '\1',
      \   }
      \ ]

This changes between fooBarBaz and @_fooBarBaz. This could be useful in coffeescript if you want to switch between local functions and class-level "private" functions (starting with an underscore). It's a bit of a weird example, I suppose, but I think it's fairly realistic. If I use the s:imitateCase function to normalize the case of the result based on the original, then from fooBarBaz, I'd get @_fOobArbaz. Which is really quite wrong.

This is what I meant when I said that it's impossible to detect capitalization -- it's impossible to correctly detect capitalization, since in many cases, capitalization holds meaning.

For an example closer to what vim-cycle would do, consider ['nextAll', 'siblings']. Both of these words are jquery methods that have a similar functionality, so I think it's fairly realistic that you'd want to switch between them. Normalizing with s:imitateCase leads to nextAll being transformed into subsTitute.

It may be possible to normalize case on a pattern-by-pattern basis, but I'd have to change the API and somehow put in a flag, and I don't think that this is a case that would be worth the complexity.

Incidentally, I didn't actually see this function in the vim-cycle source code. Is it from an older version? There's a TODO item in the README that says "Operate on non-lowercase text and retain case", so I assume the author doesn't really have something working quite yet.

As for neocomplcache, I haven't really experimented with the code, but it seems like it's simply attempting to match two strings together, by trying out foo, FOO, and Foo, which is a much simpler case by far. I may be wrong, though, in which case -- could you explain it in more detail?

A thread on reddit that discusses this can be found here. The general thoughts there are that you can define the very patterns to hold the information. They get a bit complicated, but that's a lot better than any automatic transformation that I can think of on the plugin side. There's also a keepcase plugin mentioned, but I'd rather not attempt any weird heuristics to get this working.

zhaocai · 2012-09-14T13:35:07Z

The answer depends on the grand goal of this plugin.

For this particular case, I am thinking of context (filetype, syntax, etc. ) to trigger the switch.

text context: syntax [Comment, String, ...], filetype [text, markdown, rst, tex, ...]
code context: filetype[ruby, python, ...]

Capitalization match is triggered only for text context. and I am not talking about weird heuristics to imitate case; check capitalization for the first letter usually is good enough.

AndrewRadev · 2012-09-15T14:00:36Z

I am not talking about weird heuristics to imitate case; check capitalization for the first letter usually is good enough.

You should have mentioned that earlier :). Both examples you gave seem to do more than just that, so I assumed you wanted a more complete solution.

In any case, this still can't be done for all patterns. The solution you propose -- to separate "text" and "code" doesn't look viable to me at all. There are a lot of file formats out there, many of which are not built into Vim. It's completely impossible for me to separate them, especially "comment" and "text" areas in code. Spellchecking is a good example of something that works exactly that way, but that's encoded in the syntax files, each of which is maintained by a different person. It's not feasible for me to attempt to do something like this in the plugin. I could make the separation a job for the user -- make them set variables with filetypes they use as "text" and as "code", but that goes in the realm of yak shaving, I think -- you need to set a list of filetypes in order for the plugin to figure out if it's code or text in order to take care of capitalization... And even so, there's no guarantee that you wouldn't want a particular pattern to behave differently. Making the transformation automatic for one or the other case means taking the freedom of the user to choose one or the other.

As for the user's choice in this matter, it's possible to simply duplicate the patterns:

let g:switch_definitions =
      \ [
      \   { '\Ctrue': 'false', '\Cfalse': 'true' },
      \   { '\CTrue': 'False', '\CFalse': 'True' },
      \ ]

Admittedly, this makes them more complicated and makes it impossible to use the shorthand form of the plugin. A compromise I propose is something like this:

let g:switch_definitions =
      \ [
      \   ['normalize_case', ['true', 'false']]
      \ ]

And this pattern would capitalize the first letter if it has to. I can probably implement it easily by simply duplicating the patterns -- getting the form I described above.

Considering I'm probably going to make more complicated ways to define patterns anyway, this seems like a reasonable start. I could add additional transformation flags (that I may think of in the future) in the list, having the last item be the actual switch definition. That way, the shorthand version remains the same, but if you want some additional tweaks to the pattern, you can do that as well in this new form. What do you think?

bootleq · 2013-02-03T12:58:05Z

Hi, just for information:
vim-cycle with s:imitate_case was from another plugin with the same name https://github.com/bootleq/vim-cycle

I've asked @mjbrownie and @zef for borrowing the name, and we felt okay at that time.

AndrewRadev closed this as completed Sep 11, 2012

AndrewRadev mentioned this issue Jun 18, 2015

Preserve casing #28

Closed

AndrewRadev mentioned this issue Sep 2, 2019

Add case insesitivity #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with Chinese (possibly other multibye) characters #1

Error with Chinese (possibly other multibye) characters #1

zhaocai commented Sep 10, 2012

AndrewRadev commented Sep 10, 2012

zhaocai commented Sep 10, 2012

AndrewRadev commented Sep 11, 2012

zhaocai commented Sep 11, 2012

AndrewRadev commented Sep 13, 2012

zhaocai commented Sep 14, 2012

AndrewRadev commented Sep 15, 2012

bootleq commented Feb 3, 2013

Error with Chinese (possibly other multibye) characters #1

Error with Chinese (possibly other multibye) characters #1

Comments

zhaocai commented Sep 10, 2012

AndrewRadev commented Sep 10, 2012

zhaocai commented Sep 10, 2012

AndrewRadev commented Sep 11, 2012

zhaocai commented Sep 11, 2012

AndrewRadev commented Sep 13, 2012

zhaocai commented Sep 14, 2012

AndrewRadev commented Sep 15, 2012

bootleq commented Feb 3, 2013