Skip to content

Commit

Permalink
EbmlUnicodeString: don't read beyond end of string
Browse files Browse the repository at this point in the history
The conversion from an UTF-8 encoded string into a wchar_t one was
reading from beyond the end of the source buffer if the length indicated
by a UTF-8 character's first byte exceeds the number of bytes actually
present afterwards.

Fixes the issue reported as Cisco TALOS-CAN-0036.
  • Loading branch information
mbunkus committed Oct 20, 2015
1 parent c161e60 commit ababb64
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 23 deletions.
9 changes: 9 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
2015-10-20 Moritz Bunkus <moritz@bunkus.org>

* EbmlUnicodeString::UpdateFromUTF8(): Fixed an invalid memory
access. When reading from a UTF-8 string in which the length
indicated by a UTF-8 character's first byte exceeds the string's
actual number of bytes the parser would access beyond the end of
the string resulting in a heap information leak. Fixes the issue
reported as Cisco TALOS-CAN-0036.

2015-10-17 Moritz Bunkus <moritz@bunkus.org>

* Released v1.3.2.
Expand Down
61 changes: 38 additions & 23 deletions src/EbmlUnicodeString.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,21 @@ START_LIBEBML_NAMESPACE

// ===================== UTFstring class ===================

static unsigned int UTFCharLength(uint8 lead)
{
if (lead < 0x80)
return 1;
else if ((lead >> 5) == 0x6)
return 2;
else if ((lead >> 4) == 0xe)
return 3;
else if ((lead >> 3) == 0x1e)
return 4;
else
// Invalid size?
return 0;
}

UTFstring::UTFstring()
:_Length(0)
,_Data(NULL)
Expand Down Expand Up @@ -143,39 +158,39 @@ void UTFstring::UpdateFromUTF8()
delete [] _Data;
// find the size of the final UCS-2 string
size_t i;
for (_Length=0, i=0; i<UTF8string.length(); _Length++) {
uint8 lead = static_cast<uint8>(UTF8string[i]);
if (lead < 0x80)
i++;
else if ((lead >> 5) == 0x6)
i += 2;
else if ((lead >> 4) == 0xe)
i += 3;
else if ((lead >> 3) == 0x1e)
i += 4;
const size_t SrcLength = UTF8string.length();
for (_Length=0, i=0; i<SrcLength; _Length++) {
const unsigned int CharLength = UTFCharLength(static_cast<uint8>(UTF8string[i]));
if ((CharLength >= 1) && (CharLength <= 4))
i += CharLength;
else
// Invalid size?
break;
}
_Data = new wchar_t[_Length+1];
size_t j;
for (j=0, i=0; i<UTF8string.length(); j++) {
uint8 lead = static_cast<uint8>(UTF8string[i]);
if (lead < 0x80) {
for (j=0, i=0; i<SrcLength; j++) {
const uint8 lead = static_cast<uint8>(UTF8string[i]);
const unsigned int CharLength = UTFCharLength(lead);
if ((CharLength < 1) || (CharLength > 4))
// Invalid char?
break;

if ((i + CharLength) > SrcLength)
// Guard against invalid memory access beyond the end of the
// source buffer.
break;

if (CharLength == 1)
_Data[j] = lead;
i++;
} else if ((lead >> 5) == 0x6) {
else if (CharLength == 2)
_Data[j] = ((lead & 0x1F) << 6) + (UTF8string[i+1] & 0x3F);
i += 2;
} else if ((lead >> 4) == 0xe) {
else if (CharLength == 3)
_Data[j] = ((lead & 0x0F) << 12) + ((UTF8string[i+1] & 0x3F) << 6) + (UTF8string[i+2] & 0x3F);
i += 3;
} else if ((lead >> 3) == 0x1e) {
else if (CharLength == 4)
_Data[j] = ((lead & 0x07) << 18) + ((UTF8string[i+1] & 0x3F) << 12) + ((UTF8string[i+2] & 0x3F) << 6) + (UTF8string[i+3] & 0x3F);
i += 4;
} else
// Invalid char?
break;

i += CharLength;
}
_Data[j] = 0;
}
Expand Down

0 comments on commit ababb64

Please sign in to comment.