Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bounds read with CMARK_OPT_VALIDATE_UTF8 #206

Closed
philipturnbull opened this issue Jun 22, 2017 · 2 comments
Closed

Out of bounds read with CMARK_OPT_VALIDATE_UTF8 #206

philipturnbull opened this issue Jun 22, 2017 · 2 comments

Comments

@philipturnbull
Copy link
Contributor

A single-byte out of bounds read can be triggered when a markdown file contains invalid UTF8 bytes and --validate-utf8 is enabled. This can be reproduced with:

$ make asan
$ echo IGNoci9zdHkKbmdsPC9zdFlZAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFl5c3R5c3R59p6LhvWRmIo8L3N0WQpuZ2w8bmdsPC9zdDwvZw== \
| base64 -D | ./build/src/cmark --validate-utf8
=================================================================
==76495==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61400000a3c8 at pc 0x00010ed20269 bp 0x7fff50ee3610 sp 0x7fff50ee3608
READ of size 1 at 0x61400000a3c8 thread T0
    #0 0x10ed20268 in S_process_line blocks.c:1150
    #1 0x10ed1f990 in S_parser_feed blocks.c:555
    #2 0x10ed1fcca in cmark_parser_feed blocks.c:521
    #3 0x10ed6191c in main main.c:171
    #4 0x7fffbbd67234 in start (libdyld.dylib:x86_64+0x5234)

0x61400000a3c8 is located 0 bytes to the right of 392-byte region [0x61400000a240,0x61400000a3c8)
allocated by thread T0 here:
    #0 0x10eebb520 in wrap_realloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x56520)
    #1 0x10ed1c228 in xrealloc cmark.c:23
    #2 0x10ed558d8 in cmark_strbuf_grow buffer.c:58
    #3 0x10ed557e4 in cmark_strbuf_init buffer.c:32
    #4 0x10ed1f3e2 in cmark_parser_new_with_mem blocks.c:89
    #5 0x10ed1f54f in cmark_parser_new blocks.c:112
    #6 0x10ed617b9 in main main.c:149
    #7 0x7fffbbd67234 in start (libdyld.dylib:x86_64+0x5234)

SUMMARY: AddressSanitizer: heap-buffer-overflow blocks.c:1150 in S_process_line
Shadow bytes around the buggy address:
  0x1c2800001420: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2800001430: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2800001440: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x1c2800001450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c2800001460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1c2800001470: 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa
  0x1c2800001480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2800001490: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c28000014a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c28000014b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c28000014c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==76495==ABORTING
Abort trap: 6

which is this line in S_process_line:

  if (bytes == 0 || !S_is_line_end_char(parser->curline.ptr[bytes - 1]))

I don't fully understand the bug but it seems the problem is triggered when four consecutive invalid utf8 bytes are encountered. encode_unknown will then replace those four bytes with the three-byte replacement-character. So, maybe bytes needs to be re-initialised after calling cmark_utf8proc_check because parser->curline may not necessarily contain bytes bytes?

This bug was found using a WIP branch of commonmark integration with google/oss-fuzz which can be found here. I can take care of getting the fuzzer upstreamed into the main oss-fuzz repo if you would like.

Google require one or more email addresses of maintainers to receive crash reports. Are you happy for me to put down your email address as a contact? Are there any other maintainers that it would be useful to CC on crash reports?

kivikakk pushed a commit to github/cmark-gfm that referenced this issue Jun 23, 2017
kivikakk pushed a commit to github/cmark-gfm that referenced this issue Jun 23, 2017
@jgm
Copy link
Member

jgm commented Jun 23, 2017 via email

@jgm
Copy link
Member

jgm commented Jun 27, 2017

Closed by #207.

@jgm jgm closed this as completed Jun 27, 2017
talum pushed a commit to github/cmark-gfm that referenced this issue Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants