Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor disconnect RemoteSolrException Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search #490

Open
WolfgangFahl opened this issue Jul 8, 2020 · 1 comment

Comments

@WolfgangFahl
Copy link

#427 already points to an issue with cursors. With my script:
`#

download from crossref RESTful API via cursor

downloadWithCursor() {
local l_rows="$1"
local l_index="$2"
local l_cursor="$3"
target=$sampledir/crossref-$l_index.json
src="https://api.crossref.org/types/proceedings/works?select=event,title,DOI&rows=$l_rows&cursor=$l_cursor"
download $src $target
}

get Crossref data

see also https://github.com/TIBHannover/confIDent-dataScraping

getCrossRef() {
rows=1000
index=1
totalRows=0

force while entry

total=$rows
downloadWithCursor $rows $index ""
while [ $totalRows -lt $total ]
do
target=$sampledir/crossref-$index.json
status=$(jq '.status' $target | tr -d '"')
total=$(jq '.message["total-results"]' $target)
# get and remove quotes from cursor
cursor=$(jq '.message["next-cursor"]' $target | tr -d '"')
startindex=$(jq '.message.query["start-index"]' $target)
perpage=$(jq '.message["items-per-page"]' $target)
index=$[$index+1]
if [ "$status" == "ok" ]
then
totalRows=$[$totalRows+$rows]
else
# force while exit
totalRows=1
total=0
# remove invalid
mv $target $target.err
fi
echo "status: $status index: $index $totalRows of $total startindex: $startindex perpage=$perpage cursor:$cursor"
if [ $totalRows -lt $total ]
then
# wait a bit
sleep 2
downloadWithCursor $rows $index "$cursor"
fi
done
cat $sampledir/crossref-
.json | jq .message.items[].title | cut -f2 -d'[' | cut -f2 -d'"' | grep -v "]" | tr -s '\n' > $sampledir/proceedings-crossref.txt
}
`
I run into a similar issue:

{
  "status": "error",
  "message-type": "exception",
  "message-version": "1.0.0",
  "message": {
    "name": "class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException",
    "description": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:\/\/mds3:8984\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ4 NDNi\/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xNzc1OC9laXJhaTU=",
    "message": "Error from server at http:\/\/mds3:8984\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ4 NDNi\/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xNzc1OC9laXJhaTU=",

jq . *.err | grep "search:" | cut -f7 -d:

gives me:
value must either be '*' or the 'nextCursorMark' returned by a previous search

 AoJ7o 7Hk/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xMTQ1LzI1MzA1NDQ=",
 value must either be '*' or the 'nextCursorMark' returned by a previous search
 AoJ3pL 1svECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xMTQ1LzExMzg5NTM=",
 value must either be '*' or the 'nextCursorMark' returned by a previous search
 AoJ teyWtfECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4zMTE1LzEyMjU3MzM=",
 value must either be '*' or the 'nextCursorMark' returned by a previous search
 AoJx6NyU0 8CPwhodHRwOi8vZHguZG9pLm9yZy8xMC4xMDYxLzk3ODA3ODQ0ODEwMTE="

so i suspect the space in the token is the issue.

Please update the documentation of what kind of encoding you expect or better fix the upstream library to use tokens that need no encoding (do not use spaces). Also improving the error message and point to the FAQ would be helpful.

To close this issue please let me know whether my space assumption is right and replacing space with "+" will fix the problem.

@WolfgangFahl
Copy link
Author

cursor=$(jq '.message["next-cursor"]' $target | tr -d '"' | python -c "import urllib.parse;print (urllib.parse.quote(input()))"

fixes the issue see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant