New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't place CrawlerSessionManagerValve into session, place data-holder only #154
Conversation
It helps to serialize this sessions with libraries like kryo. Background: we've found out, that our kryo library serializes the whole valve. Our stack indicates, that jdk internals are getting serialized into the sessions. It's unnecessary and can be solves more easily with this fix.
We are getting this message in our log all the time.
|
The current implementation of CrawlerSessionManagerValve does not implement java.io.Serializable, so it is not written out. Likewise, your CrawlerHttpSessionBindingListener does not implement Serializable. (*) The hashmaps of ids owned by CrawlerSessionManagerValve should exist in 1 instance only (in memory). It does not make sense to serialize them with a session. What am I missing, and what exactly is your problem? (And again, problems would better be filed into Bugzilla). (*) See javadoc for CrawlerSessionManagerValve => "All implemented interfaces": |
I've filed an issue: https://bz.apache.org/bugzilla/show_bug.cgi?id=63324 The problem is, that we serialize sessions between tomcat instances and the CrawlerSessionManagerValve has dependencies on unwanted content (like JDK internals) About the understanding: I wanted to write a private inner class accessing your maps directly, but I think, that the scope of this class would contain the outer class (not so with private static inner classes). This would bring us the desired effect. |
Perhaps the title of this PR should be "don't place CrawlerSessionManagerValve into session, place data-holder only" and it will make more sense. I think this idea does make sense. There is no reason to put a Valve into the session. But the implementation doesn't actually stop anything from being serialized: those two big maps are still serialized along with the rest of the stream. What you are avoiding is serializing the class loader which is happening because of the Valve itself. You should not serialize the whole session mapping into a single session. You shouldn't have to serialize anything, actually. The session already knows its own session id. The trick is that the CrawlerSessionManagerValve needs to be notified if a session is terminated. Instead of storing something in the session, CrawlerSessionManagerValve should instead be changed to be an |
Using an The patch looks reasonable. I can't think of a better way to handle this. |
In comparison to the container object (which holds plenty of stuff, as I've seen) this doesn't seem like a big deal. As I understood, there is one entry for each crawler (1 crawler = 1 entry in map x2 = clientId x2 + sessionId x2). we would need to have plenty of crawlers within 60 seconds to fill this map big enough to make it serious.
That sounds good, but I would need to hook into the lifecycle of this valve (or into the lifecycle of the web application itself). Due to the dynamic nature of your valves, I don't know where this class get's initialized. Can you point me out, where I would register this listener?
I see it the same way. And thanks. I'll change the title again. |
I commented in Bugzilla, starting with |
It helps to serialize this sessions with libraries like kryo. Apply feedback from PR review fix 63324
Patch applied manually with a few changes to keep Checkstyle etc happy. I also added a changelog entry. |
It helps to serialize this sessions with libraries like kryo.
Background:
we've found out, that our kryo library serializes the whole valve.
Our stack indicates, that jdk internals are getting serialized into the
sessions. It's unnecessary and can be solves more easily with this fix.