Description
Context
- Operating System:
- Python Version: 3.7.7
- aiogram version: 2.9.2
if you send a message "🚀Hello Hello" to the bot and call message.parse_entities(False), you will get 🚀H*ello Hello*
Expected Behavior
message.parse_entities(False) returns 🚀*Hello Hello*
Current Behavior
message.parse_entities(False) returns 🚀H*ello Hello*
Failure Information (for bugs)
The problem is in json conversion and decoding of emoji surrogates. I debugged the protocol and getUpdates method returns the following json:
{"ok":true,"result":[{"update_id":337194970,"message":{"message_id":12146,"from":{"id":535059,"is_bot":false,"first_name":"\u0418\u043b\u044c\u044f","username":"igonzo","language_code":"en"},"chat":{"id":535059,"first_name":"\u0418\u043b\u044c\u044f","username":"igonzo","type":"private"},"date":1599124626,"text":"\ud83d\ude80Hello Hello","entities":[{"offset":2,"length":11,"type":"bold"}]}}]}
So the "text" has length 13 chars. And the bold entity starts at second char (offset=2) because emoji is encoded by two symbols (\ud83d\ude80
). However after all the transformations the message object contains text of 12 unicode chars (because \ud83d\ude80
are joined into one 🚀char), but the entities are still 13-length based. After you try to apply the entities to the text (e.g. as it's done in message.parse_entities()), you get a wrong positioning.
Steps to Reproduce
Please provide detailed steps for reproducing the issue.
- Run the bot with the following code:
import os
from aiogram import Bot, Dispatcher, executor, types
from aiogram.types import ContentType
bot = Bot(token=os.environ['BOT_TOKEN'])
dp = Dispatcher(bot)
@dp.message_handler(content_types=ContentType.ANY)
async def demarkdown(message: types.Message):
await message.reply(message.parse_entities(False),
disable_web_page_preview=True)
if __name__ == '__main__':
executor.start_polling(dp)
- Send "🚀Hello Hello" to the bot (formatted)
- You get
🚀H*ello Hello*
replied