Skip to content

Broken text length and entity positions for a message that contain text with emojis #413

Closed
@unintended

Description

@unintended

Context

  • Operating System:
  • Python Version: 3.7.7
  • aiogram version: 2.9.2

if you send a message "🚀Hello Hello" to the bot and call message.parse_entities(False), you will get 🚀H*ello Hello*

Expected Behavior

message.parse_entities(False) returns 🚀*Hello Hello*

Current Behavior

message.parse_entities(False) returns 🚀H*ello Hello*

Failure Information (for bugs)

The problem is in json conversion and decoding of emoji surrogates. I debugged the protocol and getUpdates method returns the following json:
{"ok":true,"result":[{"update_id":337194970,"message":{"message_id":12146,"from":{"id":535059,"is_bot":false,"first_name":"\u0418\u043b\u044c\u044f","username":"igonzo","language_code":"en"},"chat":{"id":535059,"first_name":"\u0418\u043b\u044c\u044f","username":"igonzo","type":"private"},"date":1599124626,"text":"\ud83d\ude80Hello Hello","entities":[{"offset":2,"length":11,"type":"bold"}]}}]}
So the "text" has length 13 chars. And the bold entity starts at second char (offset=2) because emoji is encoded by two symbols (\ud83d\ude80). However after all the transformations the message object contains text of 12 unicode chars (because \ud83d\ude80 are joined into one 🚀char), but the entities are still 13-length based. After you try to apply the entities to the text (e.g. as it's done in message.parse_entities()), you get a wrong positioning.

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

  1. Run the bot with the following code:
import os

from aiogram import Bot, Dispatcher, executor, types
from aiogram.types import ContentType

bot = Bot(token=os.environ['BOT_TOKEN'])
dp = Dispatcher(bot)


@dp.message_handler(content_types=ContentType.ANY)
async def demarkdown(message: types.Message):
  await message.reply(message.parse_entities(False),
                      disable_web_page_preview=True)


if __name__ == '__main__':
  executor.start_polling(dp)

  1. Send "🚀Hello Hello" to the bot (formatted)
  2. You get 🚀H*ello Hello* replied

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is wrong with the framework

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions