Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs` #61

johannesvollmer · 2022-05-05T10:00:19Z

Hi! I tried running the example async_reply_to_message_updates.rs. It crashed with a stack overflow error:

thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\telegram-bot-test.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)

Process finished with exit code -1073741571 (0xC00000FD)

I was able to pinpoint the overflowing call in the example:

frankenstein/examples/async_reply_to_message_updates.rs

Line 18 in 1217d29

let result = api.get_updates(&update_params).await;

Running frankenstein = { version = "0.13.0", features = ["async-http-client"] }, rustc 1.60.0 (7737e0b5c 2022-04-04), Win 10.

I will investigate further. Maybe the error is within the example, or my project setup

The text was updated successfully, but these errors were encountered:

johannesvollmer · 2022-05-05T11:19:22Z

frankenstein/src/api/async_telegram_api_impl.rs

Line 58 in 1217d29

let json_result: Result<T, serde_json::Error> = serde_json::from_str(body);

The json deserialization seems to be the problem. Stack overflow seems to happen here

johannesvollmer · 2022-05-05T11:22:15Z

Note: The bot existed and has some updates. It's sending the same json every time. It's a json list of updates, not looking suspicious to me

johannesvollmer · 2022-05-05T11:24:30Z

This is an excerpt of the response from telegram that crashed the json parser (ids scrambled):

{
    "ok": true,
    "result": [
        {
            "update_id": 77345345072,
            "poll": {
                "id": "5345345261554",
                "question": "What would you like to order, master?",
                "options": [
                    {
                        "text": "The Amici for some unhealthy snack",
                        "voter_count": 1
                    },
                    {
                        "text": "The Beef Club for distinguished Ladies and Gentlemen",
                        "voter_count": 0
                    },
                    {
                        "text": "Kaishi Combo for the extraordinarily exotic",
                        "voter_count": 1
                    }
                ],
                "total_voter_count": 2,
                "is_closed": false,
                "is_anonymous": true,
                "type": "regular",
                "allows_multiple_answers": false
            }
        },
        {
            "update_id": 7343450073,
            "poll": {
                "id": "544757133453451554",
                "question": "What would you like to order, master?",
                "options": [
                    {
                        "text": "The Amici for some unhealthy snack",
                        "voter_count": 1
                    },
                    {
                        "text": "The Beef Club for distinguished Ladies and Gentlemen",
                        "voter_count": 0
                    },
                    {
                        "text": "Kaishi Combo for the extraordinarily exotic",
                        "voter_count": 0
                    }
                ],
                "total_voter_count": 1,
                "is_closed": false,
                "is_anonymous": true,
                "type": "regular",
                "allows_multiple_answers": false
            }
        },
        {
            "update_id": 7734345074,
            "message": {
                "message_id": 34345,
                "from": {
                    "id": 2934534548,
                    "is_bot": false,
                    "first_name": "xxxx"
                },
                "chat": {
                    "id": -13453457001,
                    "title": "bot test group",
                    "type": "supergroup"
                },
                "date": 1634535345,
                "text": "Nice"
            }
        },

    ]
}

and then some more updates. about 500 lines when the json is formatted

johannesvollmer · 2022-05-05T11:27:22Z

example async_get_me.rs works just fine
example reply_to_message_updates.rs also overflows

johannesvollmer · 2022-05-05T11:33:36Z

It also overflows for the following response (ids scrambled):

{
    "ok": true,
    "result": [
        {
            "update_id": 234523452345,
            "message": {
                "message_id": 234532452345,
                "from": {
                    "id": 23453452435,
                    "is_bot": false,
                    "first_name": "Johannes",
                    "username": "cvyxcvxycv",
                    "language_code": "en"
                },
                "chat": {
                    "id": 234523452435,
                    "first_name": "Johannes",
                    "username": "cvyxcvxycv",
                    "type": "private"
                },
                "date": 234523452345,
                "text": "hi"
            }
        }
    ]
}

ayrat555 · 2022-05-05T11:38:23Z

@johannesvollmer yes, this issue is happening only on windows. we need to investigate this.

I haven't seen this happening on *nix

cc @EdJoPaTo

johannesvollmer · 2022-05-05T11:46:24Z

I tried to add the following test to async_telegram_api_impls.rs test module

(ids scrambled)

    #[test]
    fn parse_message_success() {
        let response_string = r#"
        {
            "ok": true,
            "result": [
                {
                    "update_id": 234523452345,
                    "message": {
                        "message_id": 234532452345,
                        "from": {
                            "id": 23453452435,
                            "is_bot": false,
                            "first_name": "XXX",
                            "username": "cvyxcvxycv",
                            "language_code": "en"
                        },
                        "chat": {
                            "id": 234523452435,
                            "first_name": "XXX",
                            "username": "cvyxcvxycv",
                            "type": "private"
                        },
                        "date": 234523452345,
                        "text": "hi"
                    }
                }
            ]
        }
        "#;

        let json_result: MethodResponse<Vec<Update>> = {
            serde_json::from_str(response_string).unwrap()
        };
    }

I cannot even get it to run lol

I got it to run but the test did not fail. Running the example still overflows though, exactly at the serde_json call

johannesvollmer · 2022-05-05T12:00:04Z

Have you tried creating a bot, sending messages on telegram before starting once, and then running the bot with those expected updates?

EdJoPaTo · 2022-05-05T13:35:38Z

Maybe #8 is related here as it was a windows only problem too? Would be interesting to find out the cause.

johannesvollmer · 2022-05-05T14:12:22Z

I wanted to present it to my colleagues tomorrow. Any ideas what I can try next?

Calling serde_json::from_str(message) succeeds within a test, but while running the example fails. The test passes with the same message string as the example, how can this be? Maybe some kind of global state that is different when running in testing mode than in example mode?

This is what I tried:

in

frankenstein/src/api/telegram_api_impl.rs

Line 51 in 1217d29

let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);

                dbg!(&message);
                dbg!(std::any::type_name::<T>()); // gets printed as expected.
                
                let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);
                
                dbg!(json_result.is_ok()); // never gets printed.

johannesvollmer · 2022-05-05T14:18:07Z

Interesting News! When dividing serde_json::from_str into two separate stages, it works:

Replace

frankenstein/src/api/telegram_api_impl.rs

Line 51 in 1217d29

let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);

with

let json_result: Result<T, serde_json::Error> = serde_json::Value::from_str(&message).and_then(serde_json::from_value);

Of course, this is a barely a work-around, the performance will be inferior due to the increased number of allocations and two stages

johannesvollmer · 2022-05-05T14:27:02Z

@EdJoPaTo yes seems like the same problem

johannesvollmer · 2022-05-05T14:55:03Z

after reading rust-lang/rust#80289, I now have a theory:

stack memory is 8MB on unix by default, but only 1MB on Windows
stack memory is exhausted differently when running tests vs when running full examples
json deserialize function might allocate multiple large objects:
note: std::mem::size_of::<Update>() is 30384bytes. This means a single Update instance is 30kb! I presume this type is allocated a lot, maybe even recursively, in the json deserialization function

If this theory turns out to be true, I see three solutions:

library user increases stack size to a larger number, maybe 10MB. can be done immediately without updating the crate. but how could we document that
the library finds a way of reducing stack memory
- use an enum instead of having a ton of optional fields in struct Update and in struct Message (assuming most fields will never be set at the same time)
- use Option<Box<Message>> in struct Update,
  or box some fields in the Message type (may have performance implications)
serde finds a way to allocate large structs on the heap

johannesvollmer · 2022-05-05T15:05:44Z

The following observations make me think the stack overflow theory is actually a legitimate: Boxing all Message fields in struct Update made the example run on my device, finally. The type Update, formerly 30kb, now has 9872b, which is about 1kb. Boxing all the fields in Update makes it about 120 bytes.

However, I think making these fields an enum instead of boxing them would be the cleanest solution.

johannesvollmer · 2022-05-05T15:27:51Z

I have a fork with the quickfix changes, it seems to work for now. If the enum solution turns out to take too long, this could serve as a temporary work-around. What do you think? Any ideas how we can verify the stack overflow theory?

EdJoPaTo · 2022-05-05T17:01:13Z

That's interesting!

Boxing everything seems like horrible code to me. And the storage is not gone, its just elsewhere (on the heap) shifting the problem. Designing the structs differently to improve their memory usage (whereever it is stored) is the best solution to me. This should solve this problem and results in more effective RAM uses on all platforms.

Using an enum for Update seems like an interesting solution here. Due to it being used a lot changes here should have the most impact overall. Even if that requires custom (de)serialization code. Taking a look at teloxide they did exactly that: https://github.com/teloxide/teloxide-core/blob/master/src/types/update.rs (Their code is MIT licensed).

Not sure if we could just use the teloxide-core create for the types in general 🤔

johannesvollmer · 2022-05-05T17:35:25Z

Even if that requires custom (de)serialization code.

I think we might get away without custom de/serialization, using externally tagged enums and struct flattening. That way, we might just do the following change (field renaming ommited for brevity):

#[derive(Serialize, Deserialize)]
struct Update {
  update_id: u32,

  #[serde(flatten)]
  content: UpdateContent,
}

#[derive(Serialize, Deserialize)]
enum UpdateContent {

  #[serde(rename="message")]
  Message(Message),

  EditedMessage(Message),
  Poll(Poll),
  // ...
}

If I'm not mistaken, this should result in the following json:

{
  "update_id": 32,
  "message": {
    "message_id": 32,
    ...
  } 
}

Which is the json we are currently having, isn't it?

johannesvollmer · 2022-05-05T17:37:18Z

Again, (as I'm unsure how to interpret the official telegram documentation) this will only work if all of these states never occur at the same time, otherwise an enum will not make sense obviously

EdJoPaTo · 2022-05-05T17:44:30Z

Again, (as I'm unsure how to interpret the official telegram documentation) this will only work if all of these states never occur at the same time, otherwise an enum will not make sense obviously

The official documentation states exactly that: https://core.telegram.org/bots/api#update

johannesvollmer · 2022-05-05T17:45:44Z

haha didn't find that one. So, any objections about the enum approach? @ayrat555

For memory representation this would be optimal, since the enum will only be about as large as the largest variant, which is probably the Message type. Which would probably also be refactored to contain an enum. So, in the end, memory usage should go down by a lot.

ayrat555 · 2022-05-05T18:59:48Z

no objections.

@johannesvollmer great investigation skills. good work!

Not sure if we could just use the teloxide-core create for the types in general 🤔

@EdJoPaTo I'd rather keep teloxide and Frankenstein independent.

EdJoPaTo · 2022-05-05T19:08:53Z

@EdJoPaTo I'd rather keep teloxide and Frankenstein independent.

Their types look useful. Using the bot api types in two projects would simplify a lot of work. And they will be the same for every Telegram Library. But yeah, teloxide-core seems to do more than just the types.

johannesvollmer · 2022-05-05T19:45:47Z

Also, I noticed that the allowed_updates functionality could be made type safe. Right now, one might pass an invalid string to the function.

Look at teleoxide for inspiration: https://github.com/teloxide/teloxide-core/blob/master/src/types/allowed_update.rs

EdJoPaTo · 2022-05-05T21:36:52Z

Go ahead and create some pull requests. I will happily review them :)

johannesvollmer · 2022-05-06T07:01:33Z

In the message documentation, I can't find any statement as to whether some of the fields will be present at the same time. Nevertheless, teloxide has an enum for some of the fields

There's probably no guarantee that two of those fields might be present in the future. Do we want to take the risk?

ayrat555 · 2022-05-09T13:33:08Z

I think as the first step we can just migrate Update to enum

btw @johannesvollmer are you working on this?

I can try to pick this up this week (most probably on the weekend) if you're not

johannesvollmer · 2022-05-09T14:11:15Z

had the change done today in the morning, it actually was a task of only 5 minutes, but i feel like some more tests should be added though

see #62

johannesvollmer · 2022-05-09T19:32:39Z

Merged. Awesome, thanks!

johannesvollmer closed this as completed May 9, 2022

johannesvollmer mentioned this issue May 9, 2022

Reduce Message type memory size #63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs` #61

Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs` #61

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022

ayrat555 commented May 5, 2022

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022 •

edited

ayrat555 commented May 5, 2022 •

edited

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 6, 2022 •

edited

ayrat555 commented May 9, 2022

johannesvollmer commented May 9, 2022 •

edited

johannesvollmer commented May 9, 2022

Stack Overflow at api.get_updates in example async_reply_to_message_updates.rs #61

Stack Overflow at api.get_updates in example async_reply_to_message_updates.rs #61

Comments

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022

ayrat555 commented May 5, 2022

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022 • edited

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022 • edited

johannesvollmer commented May 5, 2022

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022 • edited

ayrat555 commented May 5, 2022 • edited

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 5, 2022

EdJoPaTo commented May 5, 2022

johannesvollmer commented May 6, 2022 • edited

ayrat555 commented May 9, 2022

johannesvollmer commented May 9, 2022 • edited

johannesvollmer commented May 9, 2022

Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs` #61

Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs` #61

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

johannesvollmer commented May 5, 2022 •

edited

ayrat555 commented May 5, 2022 •

edited

johannesvollmer commented May 6, 2022 •

edited

johannesvollmer commented May 9, 2022 •

edited