Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack Overflow at api.get_updates in example async_reply_to_message_updates.rs #61

Closed
johannesvollmer opened this issue May 5, 2022 · 28 comments

Comments

@johannesvollmer
Copy link
Contributor

johannesvollmer commented May 5, 2022

Hi! I tried running the example async_reply_to_message_updates.rs. It crashed with a stack overflow error:

thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\telegram-bot-test.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)

Process finished with exit code -1073741571 (0xC00000FD)

I was able to pinpoint the overflowing call in the example:

let result = api.get_updates(&update_params).await;

Running frankenstein = { version = "0.13.0", features = ["async-http-client"] }, rustc 1.60.0 (7737e0b5c 2022-04-04), Win 10.

I will investigate further. Maybe the error is within the example, or my project setup

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

let json_result: Result<T, serde_json::Error> = serde_json::from_str(body);

The json deserialization seems to be the problem. Stack overflow seems to happen here

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

Note: The bot existed and has some updates. It's sending the same json every time. It's a json list of updates, not looking suspicious to me

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

This is an excerpt of the response from telegram that crashed the json parser (ids scrambled):

{
    "ok": true,
    "result": [
        {
            "update_id": 77345345072,
            "poll": {
                "id": "5345345261554",
                "question": "What would you like to order, master?",
                "options": [
                    {
                        "text": "The Amici for some unhealthy snack",
                        "voter_count": 1
                    },
                    {
                        "text": "The Beef Club for distinguished Ladies and Gentlemen",
                        "voter_count": 0
                    },
                    {
                        "text": "Kaishi Combo for the extraordinarily exotic",
                        "voter_count": 1
                    }
                ],
                "total_voter_count": 2,
                "is_closed": false,
                "is_anonymous": true,
                "type": "regular",
                "allows_multiple_answers": false
            }
        },
        {
            "update_id": 7343450073,
            "poll": {
                "id": "544757133453451554",
                "question": "What would you like to order, master?",
                "options": [
                    {
                        "text": "The Amici for some unhealthy snack",
                        "voter_count": 1
                    },
                    {
                        "text": "The Beef Club for distinguished Ladies and Gentlemen",
                        "voter_count": 0
                    },
                    {
                        "text": "Kaishi Combo for the extraordinarily exotic",
                        "voter_count": 0
                    }
                ],
                "total_voter_count": 1,
                "is_closed": false,
                "is_anonymous": true,
                "type": "regular",
                "allows_multiple_answers": false
            }
        },
        {
            "update_id": 7734345074,
            "message": {
                "message_id": 34345,
                "from": {
                    "id": 2934534548,
                    "is_bot": false,
                    "first_name": "xxxx"
                },
                "chat": {
                    "id": -13453457001,
                    "title": "bot test group",
                    "type": "supergroup"
                },
                "date": 1634535345,
                "text": "Nice"
            }
        },

    ]
}

and then some more updates. about 500 lines when the json is formatted

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

example async_get_me.rs works just fine
example reply_to_message_updates.rs also overflows

@johannesvollmer
Copy link
Contributor Author

It also overflows for the following response (ids scrambled):

{
    "ok": true,
    "result": [
        {
            "update_id": 234523452345,
            "message": {
                "message_id": 234532452345,
                "from": {
                    "id": 23453452435,
                    "is_bot": false,
                    "first_name": "Johannes",
                    "username": "cvyxcvxycv",
                    "language_code": "en"
                },
                "chat": {
                    "id": 234523452435,
                    "first_name": "Johannes",
                    "username": "cvyxcvxycv",
                    "type": "private"
                },
                "date": 234523452345,
                "text": "hi"
            }
        }
    ]
}

@ayrat555
Copy link
Owner

ayrat555 commented May 5, 2022

@johannesvollmer yes, this issue is happening only on windows. we need to investigate this.

I haven't seen this happening on *nix

cc @EdJoPaTo

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

I tried to add the following test to async_telegram_api_impls.rs test module

(ids scrambled)

    #[test]
    fn parse_message_success() {
        let response_string = r#"
        {
            "ok": true,
            "result": [
                {
                    "update_id": 234523452345,
                    "message": {
                        "message_id": 234532452345,
                        "from": {
                            "id": 23453452435,
                            "is_bot": false,
                            "first_name": "XXX",
                            "username": "cvyxcvxycv",
                            "language_code": "en"
                        },
                        "chat": {
                            "id": 234523452435,
                            "first_name": "XXX",
                            "username": "cvyxcvxycv",
                            "type": "private"
                        },
                        "date": 234523452345,
                        "text": "hi"
                    }
                }
            ]
        }
        "#;

        let json_result: MethodResponse<Vec<Update>> = {
            serde_json::from_str(response_string).unwrap()
        };
    }

I cannot even get it to run lol

I got it to run but the test did not fail. Running the example still overflows though, exactly at the serde_json call

@johannesvollmer
Copy link
Contributor Author

Have you tried creating a bot, sending messages on telegram before starting once, and then running the bot with those expected updates?

@EdJoPaTo
Copy link
Collaborator

EdJoPaTo commented May 5, 2022

Maybe #8 is related here as it was a windows only problem too? Would be interesting to find out the cause.

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

I wanted to present it to my colleagues tomorrow. Any ideas what I can try next?

Calling serde_json::from_str(message) succeeds within a test, but while running the example fails. The test passes with the same message string as the example, how can this be? Maybe some kind of global state that is different when running in testing mode than in example mode?

This is what I tried:

in

let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);

                dbg!(&message);
                dbg!(std::any::type_name::<T>()); // gets printed as expected.
                
                let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);
                
                dbg!(json_result.is_ok()); // never gets printed.

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

Interesting News! When dividing serde_json::from_str into two separate stages, it works:

Replace

let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);

with

let json_result: Result<T, serde_json::Error> = serde_json::Value::from_str(&message).and_then(serde_json::from_value);

Of course, this is a barely a work-around, the performance will be inferior due to the increased number of allocations and two stages

@johannesvollmer
Copy link
Contributor Author

@EdJoPaTo yes seems like the same problem

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

after reading rust-lang/rust#80289, I now have a theory:

  • stack memory is 8MB on unix by default, but only 1MB on Windows
  • stack memory is exhausted differently when running tests vs when running full examples
  • json deserialize function might allocate multiple large objects:
    note: std::mem::size_of::<Update>() is 30384bytes. This means a single Update instance is 30kb! I presume this type is allocated a lot, maybe even recursively, in the json deserialization function

If this theory turns out to be true, I see three solutions:

  • library user increases stack size to a larger number, maybe 10MB. can be done immediately without updating the crate. but how could we document that
  • the library finds a way of reducing stack memory
    • use an enum instead of having a ton of optional fields in struct Update and in struct Message (assuming most fields will never be set at the same time)
    • use Option<Box<Message>> in struct Update,
      or box some fields in the Message type (may have performance implications)
  • serde finds a way to allocate large structs on the heap

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

The following observations make me think the stack overflow theory is actually a legitimate: Boxing all Message fields in struct Update made the example run on my device, finally. The type Update, formerly 30kb, now has 9872b, which is about 1kb. Boxing all the fields in Update makes it about 120 bytes.

However, I think making these fields an enum instead of boxing them would be the cleanest solution.

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

I have a fork with the quickfix changes, it seems to work for now. If the enum solution turns out to take too long, this could serve as a temporary work-around. What do you think? Any ideas how we can verify the stack overflow theory?

@EdJoPaTo
Copy link
Collaborator

EdJoPaTo commented May 5, 2022

That's interesting!

Boxing everything seems like horrible code to me. And the storage is not gone, its just elsewhere (on the heap) shifting the problem. Designing the structs differently to improve their memory usage (whereever it is stored) is the best solution to me. This should solve this problem and results in more effective RAM uses on all platforms.

Using an enum for Update seems like an interesting solution here. Due to it being used a lot changes here should have the most impact overall. Even if that requires custom (de)serialization code. Taking a look at teloxide they did exactly that: https://github.com/teloxide/teloxide-core/blob/master/src/types/update.rs (Their code is MIT licensed).

Not sure if we could just use the teloxide-core create for the types in general 🤔

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

Even if that requires custom (de)serialization code.

I think we might get away without custom de/serialization, using externally tagged enums and struct flattening. That way, we might just do the following change (field renaming ommited for brevity):

#[derive(Serialize, Deserialize)]
struct Update {
  update_id: u32,

  #[serde(flatten)]
  content: UpdateContent,
}

#[derive(Serialize, Deserialize)]
enum UpdateContent {

  #[serde(rename="message")]
  Message(Message),

  EditedMessage(Message),
  Poll(Poll),
  // ...
}

If I'm not mistaken, this should result in the following json:

{
  "update_id": 32,
  "message": {
    "message_id": 32,
    ...
  } 
}

Which is the json we are currently having, isn't it?

@johannesvollmer
Copy link
Contributor Author

Again, (as I'm unsure how to interpret the official telegram documentation) this will only work if all of these states never occur at the same time, otherwise an enum will not make sense obviously

@EdJoPaTo
Copy link
Collaborator

EdJoPaTo commented May 5, 2022

Again, (as I'm unsure how to interpret the official telegram documentation) this will only work if all of these states never occur at the same time, otherwise an enum will not make sense obviously

The official documentation states exactly that: https://core.telegram.org/bots/api#update

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 5, 2022

haha didn't find that one. So, any objections about the enum approach? @ayrat555

For memory representation this would be optimal, since the enum will only be about as large as the largest variant, which is probably the Message type. Which would probably also be refactored to contain an enum. So, in the end, memory usage should go down by a lot.

@ayrat555
Copy link
Owner

ayrat555 commented May 5, 2022

no objections.

@johannesvollmer great investigation skills. good work!

Not sure if we could just use the teloxide-core create for the types in general 🤔

@EdJoPaTo I'd rather keep teloxide and Frankenstein independent.

@EdJoPaTo
Copy link
Collaborator

EdJoPaTo commented May 5, 2022

@EdJoPaTo I'd rather keep teloxide and Frankenstein independent.

Their types look useful. Using the bot api types in two projects would simplify a lot of work. And they will be the same for every Telegram Library. But yeah, teloxide-core seems to do more than just the types.

@johannesvollmer
Copy link
Contributor Author

Also, I noticed that the allowed_updates functionality could be made type safe. Right now, one might pass an invalid string to the function.

Look at teleoxide for inspiration: https://github.com/teloxide/teloxide-core/blob/master/src/types/allowed_update.rs

@EdJoPaTo
Copy link
Collaborator

EdJoPaTo commented May 5, 2022

Go ahead and create some pull requests. I will happily review them :)

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 6, 2022

In the message documentation, I can't find any statement as to whether some of the fields will be present at the same time. Nevertheless, teloxide has an enum for some of the fields

There's probably no guarantee that two of those fields might be present in the future. Do we want to take the risk?

@ayrat555
Copy link
Owner

ayrat555 commented May 9, 2022

I think as the first step we can just migrate Update to enum

btw @johannesvollmer are you working on this?

I can try to pick this up this week (most probably on the weekend) if you're not

@johannesvollmer
Copy link
Contributor Author

johannesvollmer commented May 9, 2022

had the change done today in the morning, it actually was a task of only 5 minutes, but i feel like some more tests should be added though

see #62

@johannesvollmer
Copy link
Contributor Author

Merged. Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants