Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory (leak?): Server crashes / atomic-server shuts down without logging anything #270

Open
joepio opened this issue Jan 3, 2022 · 7 comments

Comments

@joepio
Copy link
Member

joepio commented Jan 3, 2022

I've restarted atomic-server on https://atomicdata.dev for a couple of times now. This appeared after updating actix to v4. I thought the shutdowns / crashes had something to do with the HTTP Issue #240 (as these were the only errors / warnings in the logs), but we still have shutdowns.

I'll try logging at a more verbose level. ./atomic-server --log-level trace &> log-30.3-2

Update

  • It was an out of memory issue, which could indicate a memory leak, but it only happened on very low-end hardware (512MB ram), so it could also mean the app was just too much for the server.
@joepio
Copy link
Member Author

joepio commented Jan 3, 2022

Crashed again, trace logs didn't indicate anything suspicious.

@joepio
Copy link
Member Author

joepio commented Jan 3, 2022

./atomic-server --log-level trace 2>&1 >my-log should capture more

joepio added a commit that referenced this issue Jan 3, 2022
@joepio
Copy link
Member Author

joepio commented Jan 3, 2022

All logs looked like this, and then it crashed:

2022-01-03T18:37:16.287226Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 5242880 to 5550227
2022-01-03T18:37:16.288003Z DEBUG sled::pagecache::iobuf: rolling to new segment after clearing 5550227-5747104
2022-01-03T18:37:16.288055Z DEBUG sled::pagecache::segment: segment accountant returning offset: 3670016 for lsn 1420685737984 on deck: {7864320, 8912896, 9961472, 20971520, 21495808, 22020096, 22544384, 23068672, 23592960, 24117248, 24641536, 25165824, 25690112, 26214400, 26738688, 27787264, 28311552, 28835840, 29360128, 29884416, 30408704, 31457280, 31981568, 32505856, 33030144, 33554432, 34078720, 34603008, 35127296, 35651584, 36175872, 36700160, 37224448, 37748736, 38273024, 38797312, 39321600, 39845888, 40370176, 40894464, 41418752, 41943040, 42467328, 42991616, 43515904, 44040192, 44564480, 45088768, 45613056, 46137344, 46661632, 47185920, 47710208, 48234496, 48758784}
2022-01-03T18:37:16.288075Z DEBUG sled::pagecache::iobuf: storing lsn 1420685737984 in beginning of buffer
2022-01-03T18:37:16.289008Z DEBUG sled::pagecache::iobuf: wrote lsns 1420685213696-1420685521042 to disk at offsets 5242880-5550226, maxed false complete_len 307347
2022-01-03T18:37:16.289026Z DEBUG sled::pagecache::iobuf: mark_interval(1420685213696, 307347)
2022-01-03T18:37:16.289031Z DEBUG sled::pagecache::iobuf: new highest interval: 1420685213696 - 1420685521042
2022-01-03T18:37:16.289997Z DEBUG sled::pagecache::iobuf: wrote lsns 1420685521043-1420685737983 to disk at offsets 5550227-5767167, maxed true complete_len 216941
2022-01-03T18:37:16.290027Z DEBUG sled::pagecache::iobuf: mark_interval(1420685521043, 216941)
2022-01-03T18:37:16.290050Z DEBUG sled::pagecache::iobuf: new highest interval: 1420685521043 - 1420685737983
2022-01-03T18:37:16.290087Z DEBUG sled::pagecache::segment: freeing segment 5767168
2022-01-03T18:37:16.291399Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Ping { ack: true, payload: [59, 124, 219, 122, 11, 135, 22, 180]

Doesn't look suspicious. I don't think sled is the culprit here.

joepio added a commit that referenced this issue Jan 3, 2022
@joepio
Copy link
Member Author

joepio commented Jan 3, 2022

Also on tracing level not suspicious:

}:poll:poll_ready:FramedWrite::buffer{frame=Ping { ack: false, payload: [59, 124, 219, 122, 11, 135, 22, 180] }}: h2::frame::ping: encoding PING; ack=false len=8
2022-01-03T21:55:42.047051Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:poll_ready:FramedWrite::buffer{frame=Ping { ack: false, payload: [59, 124, 219, 122, 11, 135, 22, 180] }}: h2::codec::framed_write: encoded ping rem=17
2022-01-03T21:55:42.047064Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next: h2::codec::framed_read: poll
2022-01-03T21:55:42.047073Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::streams::prioritize: poll_complete
2022-01-03T21:55:42.047080Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::streams::prioritize: schedule_pending_open
2022-01-03T21:55:42.047088Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedWrite::flush: h2::codec::framed_write: queued_data_frame=false
2022-01-03T21:55:42.047175Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedWrite::flush: h2::codec::framed_write: flushing buffer
2022-01-03T21:55:42.218912Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::connection: connection.state=Open
2022-01-03T21:55:42.219052Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next: h2::codec::framed_read: poll
2022-01-03T21:55:42.219107Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next: h2::codec::framed_read: read.bytes=17
2022-01-03T21:55:42.219133Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next:FramedRead::decode_frame{offset=17}: h2::codec::framed_read: decoding frame from 17B
2022-01-03T21:55:42.219148Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next:FramedRead::decode_frame{offset=17}: h2::codec::framed_read: frame.kind=Ping
2022-01-03T21:55:42.219167Z DEBUG server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next: h2::codec::framed_read: received frame=Ping { ack: true, payload: [59, 124, 219, 122, 11, 135, 22, 180] }
2022-01-03T21:55:42.219189Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::connection: recv PING frame=Ping { ack: true, payload: [59, 124, 219, 122, 11, 135, 22, 180] }
2022-01-03T21:55:42.219211Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::ping_pong: recv PING USER ack
2022-01-03T21:55:42.219233Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next: h2::codec::framed_read: poll
2022-01-03T21:55:42.219270Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::streams::prioritize: poll_complete
2022-01-03T21:55:42.219285Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::streams::prioritize: schedule_pending_open
2022-01-03T21:55:42.219318Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedWrite::flush: h2::codec::framed_write: flushing buffer
2022-01-03T21:55:42.219358Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::connection: connection.state=Open
2022-01-03T21:55:42.219389Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedRead::poll_next: h2::codec::framed_read: poll
2022-01-03T21:55:42.219409Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::streams::prioritize: poll_complete
2022-01-03T21:55:42.219421Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll: h2::proto::streams::prioritize: schedule_pending_open
2022-01-03T21:55:42.219434Z TRACE server_handshake{io=actix_tls::accept::rustls::TlsStream<tokio::net::tcp::stream::TcpStream>}:Connection{peer=Server}:poll:FramedWrite::flush: h2::codec::framed_write: flushing buffer

@joepio
Copy link
Member Author

joepio commented Jan 4, 2022

Seems an out of memory (OOM) issue

dmesg:

[44807566.669609] Out of memory: Killed process 3595263 (atomic-server) total-vm:652248kB, anon-rss:164276kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:588kB oom_score_adj:0 [44807566.688568] oom_reaper: reaped process 3595263 (atomic-server), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

@joepio joepio changed the title Server crashes / atomic-server shuts down without logging anything Memory Leak: Server crashes / atomic-server shuts down without logging anything Jan 4, 2022
@joepio
Copy link
Member Author

joepio commented Jan 4, 2022

Since the server had only 512MB Ram available, it could also very well not be a leak... Maybe it's memory usage sometimes peaks.

@joepio joepio changed the title Memory Leak: Server crashes / atomic-server shuts down without logging anything Out of memory (leak?): Server crashes / atomic-server shuts down without logging anything Jan 4, 2022
@joepio
Copy link
Member Author

joepio commented Jan 25, 2022

Not sure if this is still happening. Can't find suspicious logs journalctl -u atomic.service --since "10 hour ago" -f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant