Skip to content
This repository has been archived by the owner on Dec 5, 2022. It is now read-only.

MyAccount: Storage Retrieval Requests sometimes has missing items #10

Closed
patrickzurek opened this issue Apr 13, 2016 · 4 comments
Closed
Assignees

Comments

@patrickzurek
Copy link

JIRA issue created by: Chris Delis (cedelis)
Originally opened: 2016-03-17 10:18 AM

I've noticed that intermittently items that should be listed are not. Restarting VXWS services seems to resolve the issue, but more analysis needs to be done.

@patrickzurek
Copy link
Author

JIRA User: Chris Delis (cedelis)
JIRA Timestamp: 2016-03-17 04:28 PM

It seems that when there are missing items, the ones belonging to the patron's home library continue to display.

@patrickzurek
Copy link
Author

JIRA User: Chris Delis (cedelis)
JIRA Timestamp: 2016-03-21 10:37 AM

I'm just cut&pasting a sanity script that can be run continuously in order to test the validity of the UB calls, specifically the storage retrieval requests:

#!/bin/sh
VALID_NUMBER=9
CALLS_BETWEEN_SLEEP=5
SLEEP_IN_SECS=2

while true; do
CNT=expr ${CNT} + 1;
if [ expr ${CNT} % ${CALLS_BETWEEN_SLEEP} -eq 0 ] ; then
sleep ${SLEEP_IN_SECS};
fi ;
date;
OUTPUT=wget -O- -q 'http://voyager-pooled-test.carli.illinois.edu:19913/uiu/vxws/patron/21054/circulationActions/requests/callslips?patron_homedb=1@UIUDB20020422223437&view=full'
NUMBER=echo ${OUTPUT} | xmllint --format - | grep "<callslip " | wc -l
echo -n "Number of callslips: $NUMBER"
if [ $NUMBER -eq $VALID_NUMBER ] ; then
echo " is valid."
else
echo " is *_NOT VALID_* Should be: ${VALID_NUMBER}."
echo $OUTPUT | xmllint --format -
fi
echo
done

@patrickzurek
Copy link
Author

JIRA User: Chris Delis (cedelis)
JIRA Timestamp: 2016-03-21 01:36 PM

I think I have this solved.

I temporarily turned off pooled opacsvr and noticed the problem went away.

It then got me thinking about how the pooled opacsvr was set up. I noticed that the ub_timeout_secs was set to 60 seconds. I changed it to 0 (meaning indefinitely):

more props/pool_UIU.properties

local_port=14500
remote_server=localhost
remote_port=14501
initial_pool=20
minimum_pool=20
maximum_pool=60
maximum_servers=180
grow_pool_by=2
client_timeout_secs=0
server_timeout_secs=0
ub_timeout_secs=0
log4jconfig=log4j_UIU.properties
init_request=../init_request.txt
init_response=../init_response.txt

There are 3 timeout settings, client/server/ub. Here's what they mean:

Background: the pooled opacsvr acts as a proxy to a backend "real" opacsvr (where it keeps a pool of opacsvr connections alive at the ready - which is fast - and so that it can re-use these connections)

client_timeout_secs : the pooled opacsvr is the client in this case (since it is acting as a proxy). this means that the pooled opacsvr will close this connection if the backend "real" opacsvr does not respond in client_timeout_secs.

server_timeout_secs: the pooled opacsvr is the server in this case (the client is the one making use of our service). If the pooled opacsvr detects no activity from the client, it will close this connection in server_timeout_secs.

ub_timeout_secs: sometimes the pooled opacsvr is called by the "real" opacsvr. this usually (always?) occurs when a backend "real" opacsvr needs to make a UB request. These connections have their own timeout, ub_timeout_secs.

So, basically, what I think was going on was this:

VXWS employs its own pool of opacsvrs; it is called VACS. But VACS doesn't PRE-load the opacsvr connections; and nor does it PRE-load those intermediary UB opacsvr connections. This is why our pooled opacsvr implementation is still relevant: it's a LOT faster because it preloads the opacsvr connections. In any case, the VACS assumes that the intermediary UB opacsvr connections remain intact after each use, i.e., we need to make sure the our own pooled opacsvr honors this assumption. Still, the VACS should have been able to handle the prematurely closed UB opacsvr connections by either creating a new connection or reporting an error; instead, it simply failed silently, which gave the impression of success (which is a terrible, terrible thing).

tl;dr - setting ub_timeout_secs to 0 (never) in our pooled opacsvr seems to have solved this issue.

@patrickzurek
Copy link
Author

Resolved: 2016-03-21 01:40 PM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants