Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site hangs #9

Open
wliao008 opened this issue Jan 28, 2014 · 29 comments
Open

Site hangs #9

wliao008 opened this issue Jan 28, 2014 · 29 comments

Comments

@wliao008
Copy link

Hi, I'm having a strange problem after putting the RedisSessionStateStoreProvider in place, occasionally my site would hangs, and if i go on the server and issue a "monitor" command to my redis server, here's a portion of the output:

"watch" "bc/4i0sekkgl3nhis33ddooqyds"
"hgetall" "bc/4i0sekkgl3nhis33ddooqyds"
"unwatch"
"watch" "bc/4i0sekkgl3nhis33ddooqyds"
"hgetall" "bc/4i0sekkgl3nhis33ddooqyds"
"unwatch"
"watch" "bc/4i0sekkgl3nhis33ddooqyds"
"hgetall" "bc/4i0sekkgl3nhis33ddooqyds"
"unwatch"
"watch" "bc/4i0sekkgl3nhis33ddooqyds"
"hgetall" "bc/4i0sekkgl3nhis33ddooqyds"
"unwatch"
"watch" "bc/4i0sekkgl3nhis33ddooqyds"
"hgetall" "bc/4i0sekkgl3nhis33ddooqyds"
"unwatch"
...
...

and it would just keeps going on and on..

doing a hget bc/4i0sekkgl3nhis33ddooqyds locked shows that this key is locked, which I assume is what is causing it to hang?

I run into the same problem when doing load testing with JMeter.

I used the code as is without much modification. I have a master-slave setup (redis v2.8.3). servicestack.redis v3.9.60.

anyone seen this?

Thanks!
-wl

@TheCloudlessSky
Copy link
Owner

Hey,

The key prefix that is in the output of the watch doesn't match the default. You can see in the Initialize() method that AspNetSession is the default prefix for Redis keys.

Did you change the key setting to bc somewhere in your configuration?

One thing to note is that ASP.NET will block concurrent requests to the same session. Can you try using JMeter with multiple different sessions?

I can help more if you can post a simple ASP.NET sample somewhere that you can hit with JMeter and I'll try to reproduce it. Thanks!

@wliao008
Copy link
Author

Hi, yes I did changed the prefix in my webconfig.

<sessionState mode="Custom" cookieless="false" timeout="30" customProvider="bc">
    <providers>
        <clear />
        <add name="bc" type="..." />
    </providers>
</sessionState>

anyway, yes we're using different sessions in jmeter (we have a http cookie manager set up), and i can see the different sessions listed in redis as well.

What concerns me is that even when testing locally and manually, with only me accessing the site, occasionally i still ran into the same problem.

I'll see if I can get something setup.

Thanks!
-wl

@ProTip
Copy link

ProTip commented Feb 10, 2014

Has this issue been played out? We are running into threads blocking because a response is not coming back from the session state provider. Due to a bug this was locking the whole site. The bug has been fixed but I'm still needing to look into the hung requests :| I'm curious if the issue is with the newest Redis client.. It's quite a few versions ahead of 2 years ago. Is there a particular, tested version of ServiceStack.Redis to try?

@wliao008
Copy link
Author

I was pulled off to work on another area so I haven't been able to resolve this. As you said, I should probably try again with ServiceStack 3.8.3.

@ProTip
Copy link

ProTip commented Feb 10, 2014

Hmm, the oldest Nuget package available through the package explorer is 3.9.29 . You wouldn't happen to know how to get a hold of 3.8.3 without building from source?

@wliao008
Copy link
Author

All the servicestack dlls are in the lib folder of this repo. Good luck, please let me know how it goes :)

@TheCloudlessSky
Copy link
Owner

If either of you can create a reproducable case I'd like to help fix this issue. We don't use the ASP.NET session anymore, but when we were using it we had a significant load without any lockups (or none that we noticed). Right now, we still use SS.Redis v3.9.29 with other areas of our app without any issues.

@ProTip
Copy link

ProTip commented Feb 10, 2014

Still getting the hangs. Once the threads lock up all the traffic turns into this:

1392072383.569702 "HGETALL" "InnerProvider/vaq43jyhiru0sef0i2z3kost"
1392072383.571567 "UNWATCH"
1392072383.573037 "WATCH" "InnerProvider/juns3nwigyegjpyhbqd4g4ch"
1392072383.574505 "HGETALL" "InnerProvider/juns3nwigyegjpyhbqd4g4ch"
1392072383.598808 "UNWATCH"
1392072383.600329 "WATCH" "InnerProvider/lab5g32aoye1iprk5mjzpx5k"
1392072383.601793 "HGETALL" "InnerProvider/lab5g32aoye1iprk5mjzpx5k"
1392072383.603676 "UNWATCH"
1392072383.605130 "WATCH" "InnerProvider/lahiw1xdvhxqino5iwbslbad"
1392072383.606576 "HGETALL" "InnerProvider/lahiw1xdvhxqino5iwbslbad"
1392072383.608380 "UNWATCH"

I may need to build the project in debug mode to get a good stack trace.

@TheCloudlessSky
Copy link
Owner

@ProTip Can you reproduce locally with JMeter or other load testing software? Please provide a stack trace if you can!

@ProTip
Copy link

ProTip commented Feb 10, 2014

The analysis is a lot different with the debug build. I have a lot of threads waiting for information from the server. This particular thread is also holding a lock which is blocking other threads(this has supposedly been fixed):

Thread 118 - System ID 2228

Entry point   clr!Thread::intermediateThreadProc 
Create time   2/10/2014 11:12:37 PM 
Time spent in user mode   0 Days 00:00:00.780 
Time spent in kernel mode   0 Days 00:00:00.046 

This thread is waiting on data to be returned from another server via WinSock.

 The call to WinSock originated from System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f

.NET Call Stack

Function

System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f 
[[InlinedCallFrame] (System.Net.UnsafeNclNativeMethods+OSSOCK.recv)] System.Net.UnsafeNclNativeMethods+OSSOCK.recv(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags) 
System_ni!System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef)+fa 
System_ni!System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32)+c8 
mscorlib_ni!System.IO.BufferedStream.ReadByte()+67 
ServiceStack.Redis.RedisNativeClient.ReadLine()+58 
ServiceStack.Redis.RedisNativeClient.ReadData()+1a 
ServiceStack.Redis.RedisNativeClient.get_Info()+6c 
ServiceStack.Redis.RedisNativeClient.Watch(System.String[])+26 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetClientAndWatch(System.String)+dc 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetItem(Boolean, System.Web.HttpContext, System.String, Boolean ByRef, System.TimeSpan ByRef, System.Object ByRef, System.Web.SessionState.SessionStateActions ByRef)+146 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetItemExclusive(System.Web.HttpContext, System.String, Boolean ByRef, System.TimeSpan ByRef, System.Object ByRef, System.Web.SessionState.SessionStateActions ByRef)+7e 

This one has been active much longer:

Thread 45 - System ID 4540

Entry point   clr!Thread::intermediateThreadProc 
Create time   2/10/2014 11:08:00 PM 
Time spent in user mode   0 Days 00:00:18.189 
Time spent in kernel mode   0 Days 00:00:00.717 

This thread is waiting on data to be returned from another server via WinSock.

 The call to WinSock originated from System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f

.NET Call Stack

Function

System_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)+7f 
[[InlinedCallFrame] (System.Net.UnsafeNclNativeMethods+OSSOCK.recv)] System.Net.UnsafeNclNativeMethods+OSSOCK.recv(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags) 
System_ni!System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef)+fa 
System_ni!System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32)+c8 
mscorlib_ni!System.IO.BufferedStream.ReadByte()+67 
ServiceStack.Redis.RedisNativeClient.ReadLine()+58 
ServiceStack.Redis.RedisNativeClient.ReadData()+1a 
ServiceStack.Redis.RedisNativeClient.get_Info()+6c 
ServiceStack.Redis.RedisNativeClient.Watch(System.String[])+26 
Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider.GetClientAndWatch(System.String)+dc 

@TheCloudlessSky
Copy link
Owner

Are you saying the first issue was fixed in SS.Redis?

@ProTip
Copy link

ProTip commented Feb 11, 2014

Well, my application is hanging and I'm seeing similar output to the OP. Every time the application hangs most of my threads are waiting on data from Redis..

I was getting this issue with the latest compatible SS.Redis, now I'm using a debug build of the project that uses the bundled SS.Redis version.

@wliao008
Copy link
Author

@TheCloudlessSky, what ver of redis are you using in your setup. thanks.

@ProTip
Copy link

ProTip commented Feb 11, 2014

Hi guys. I believe I have resolved my issue of the threads waiting for data from Redis. Apparently this application/client combination is creating about 5+ connections to the Redis server per thread. I was simulating 100 users so this was creating 500+(minimum) connections running right through the open file limit. I have raised the limit and no longer see the threads waiting for data from Redis.

@TheCloudlessSky
Copy link
Owner

@wliao008 We're using 3.9.57. See the details below for our setup.

@ProTip That's quite a lot of connections per-thread 💥! What type of Redis client manager are you using (pooled, basic)? Is it configured through the web.confg or via RedisSessionStateStoreProvider.SetClientManager()? Are you using an inversion of control container (Ninject, StructureMap, Windsor etc.)? If so, how's it configured?

Here's our setup:

  • Ninject as our IoC container
  • Use the PooledRedisClientManager and have the IRedisClient as a singleton per-request (because we use Redis in other areas of our app):
var pooledClientManager = new PooledRedisClientManager(redisConnection);

// IMPORTANT: Ninject's default Pipeline/ActivationCache only deactivates (disposes) 
// references once. And because the pool returns the same instances, Ninject does not 
// dispose of the IRedisClient the 2nd time its requested. This causes the client to never
// become deactivated. Therefore, we wrap the client with the DisposablePooledClient
// which always use a new reference for Ninject to manage and internally disposes of the
// IRedisClient.
// http://stackoverflow.com/questions/17047732/how-can-ninject-be-configured-to-always-deactivate-pooled-references

Bind<PooledRedisClientManager.DisposablePooledClient<RedisClient>>()
    .ToMethod(ctx => pooledClientManager.GetDisposableClient<RedisClient>())
    .InRequestScope();

Bind<IRedisClient, ICacheClient>()
    .ToMethod(ctx => ctx.Kernel.Get<PooledRedisClientManager.DisposablePooledClient<RedisClient>>().Client)
    .InTransientScope();

This setup ensures that any part of our application (other than the RedisSessionStateStore) only uses a single instance per-request.

We then configure the RedisSessionStateStore and use the pooled client manager reference directly:

RedisSessionStateStore.SetClientManager(pooledClientManager);

So, you shouldn't be creating a lot of connections to the Redis server per-thread if you're using the PooledRedisClientManager since it only instantiates a new IRedisClient if another one is in use (it'll put it back into the pool once it's done being used!).

If you haven't done already, switch to the pooled client manager. If you're using Redis some where else in your application, make sure that you're properly cleaning up the client so that it'll be put back into the pool.

@wliao008
Copy link
Author

Thanks @TheCloudlessSky, that's good info! Regarding @ProTip's suggestion of increasing the file limit, on my default Redis instance, we have 3984 max clients, but we still experience the locked up issue load testing with 2 users. I did go back to the initial version of the session provider without the Watch/Unwatch changes, and it seems fine so far, loaded test 100 users without a single problem.

@TheCloudlessSky
Copy link
Owner

@wliao008 Hmm, that's weird that the watch/unwatch would be causing it. There's got to be a deeper bug within ServiceStack.Redis then. I don't use key watching in any of our other Redis-based code. For any atomic operation, we use a generational approach (like with NHibernate.Caches.Redis), Redis transactions with MULTI/EXEC (only useful for commands that don't return data and need that data mutated) or use a Lua script that has more logic when necessary (since they are atomic!). I'd like to use the Lua approach, but it'd depend if people are using Redis 2.6.0.

However, doing watch/unwatch shouldn't be causing any issues like I said. If so, it'd mean there's a bug in ServiceStack.Redis.

@ProTip
Copy link

ProTip commented Feb 13, 2014

Unfortunately I don't have direct control over the instantiation ATM. The software I'm deploying wraps the provider and passes the config through. I am specifying "pooled" for the clientType though.. But seeing 500 files for the Redis process under /proc makes me suspicious that it's not being used correctly in that regard with a 100 concurrent user simulation. I believe it may be creating 2 session keys per ASP.NET session though, so that could account for up to 200? Not sure how they get cleaned up after that..

I may try the pre-watch version as well just confirm whether or not that improves my situation.

@wliao008
Copy link
Author

I was looking thru the msdn documentation on session provider yesterday trying to find out the steps of execution, and came across this, for the GetItemExclusive method:

If no session item data is found at the data store, the GetItemExclusive method sets the locked output parameter to false and returns null.

In the provider, there's:

private SessionStateStoreData GetItem(...)
{
    //code removed for brevity
    RedisSessionState state;
    if (!RedisSessionState.TryParse(stateRaw, out state))
    {
         client.UnWatch();
         return null;
    }
    //code removed for brevity
}

So we changed it to:

private SessionStateStoreData GetItem(...)
{
    //code removed for brevity
    RedisSessionState state;
    if (!RedisSessionState.TryParse(stateRaw, out state))
    {
         client.UnWatch();
         locked = false; // <- add this line
         return null;
    }
    //code removed for brevity
}

And believe it or not, this seems to fix the problem for us, I've been load testing it on and off with variable number of users and so far did not have any locking issue.

@TheCloudlessSky
Copy link
Owner

@ProTip Yes, please try the pre-watch version and see if it fixes anything. The problem could lie in ServiceStack.Redis.

@wliao008 That shouldn't have done anything different because this line sets locked = false. Are you running a release build?

@TheCloudlessSky
Copy link
Owner

I just released v1.2.0 which hopefully fixes this problem. I noticed that there were placed where UNWATCH was not being called (because the transaction was not committed -- or even started). Update and give it a spin and please let me know if it fixes this issue.

@TheCloudlessSky
Copy link
Owner

v1.3.0 released that uses a distributed lock rather than watch/unwatch. Please update and let me know if this fixes your issues. Also update ServiceStack.Common, ServiceStack.Redis, and ServiceStack.Text to 3.9.71 (The dependencies are now >= 3.9 && < 4.0).

@wliao008
Copy link
Author

@TheCloudlessSky you were right about that locked=false! i was so hung up i didn't even noticed it was there already. I'll try out the new ver tomorrow, thanks!

@ProTip
Copy link

ProTip commented Feb 27, 2014

Need to test further(much futher) but this appears to be much more stable for me.

@TheCloudlessSky
Copy link
Owner

Awesome! If you can, try testing with JMeter, Apache Bench or another similar load tool.

@ProTip
Copy link

ProTip commented Feb 27, 2014

@TheCloudlessSky Hi, I'm testing with gatling and a simulation of 100 concurrent users. I'm very happy with the results so far. Max connections are around 60 now and I have not experienced hangs :)

@TheCloudlessSky
Copy link
Owner

@wliao008 @ProTip Have you guys had any luck with this? is it stable now?

@wliao008
Copy link
Author

@TheCloudlessSky, sorry for the late update. I did try to test out the newer version but we ended up using the older ver in our production. And I've been assigned to work on other stuffs since, so haven't been able to follow it up. As far as I'm concerned this issue is ok for me. Thank you!

@TheCloudlessSky
Copy link
Owner

Great, thanks!
On 2014-04-22 3:07 PM, "Wei Liao" notifications@github.com wrote:

@TheCloudlessSky https://github.com/TheCloudlessSky, sorry for the late
update. I did try to test out the newer version but we ended up using the
older ver in our production. And I've been assigned to work on other stuffs
since, so haven't been able to follow it up. As far as I'm concerned this
issue is ok for me. Thank you!


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-41073717
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants