Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Nearby into a fast query for coordinates + a details query for each pin #4560

Closed
nicolas-raoul opened this issue Aug 18, 2021 · 25 comments · Fixed by #5731
Closed

Split Nearby into a fast query for coordinates + a details query for each pin #4560

nicolas-raoul opened this issue Aug 18, 2021 · 25 comments · Fixed by #5731

Comments

@nicolas-raoul
Copy link
Member

nicolas-raoul commented Aug 18, 2021

Nearby is too slow to use in Paris:

  1. Be in the center of Paris (or use a fake GPS app to pretend)
  2. Open Nearby
  3. Spinning forever, never finishes

I suggest splitting the current big SPARQL requests into:

  • a first simple request that only returns QID,coordinates pairs.
  • for each pin, an asynchronous request that loads the place's details: name, description, class, existence, etc.

I did some tests (both with 1km radius):

Due to radius stepping, the request that fetches coordinates is often performed several times, making it even more important to fine-tune.
Fetching details afterwards means that some pins will disappear when Exists is enabled, but I don't think it is problematic as it is disabled by default, users who willingly enabled it will understand why the pins disappear.

Latest master.

@misaochan
Copy link
Member

misaochan commented Aug 18, 2021

Hi @nicolas-raoul , this is a great idea, thanks! We will try to start implementing it. :)

One concern that we have is that this would result in a very large number of async calls, in typical use. For instance if there are 40 pins, we will be doing 40 calls. Do you think that would be an issue, or is it routinely done with no problems?

An alternative that I was thinking about is:

  • First request - we fetch everything that is needed for the map to load, but not everything that we currently do. So QID, coordinates, name, place type and existence (for filters), property type (WLM vs non-WLM)
  • The second request is only triggered when a pin is selected. Here we will fetch description, Commons link, Wikipedia link, etc

What do you think? I reckon it wouldn't result in time savings as big as your suggestion, so yours would be the ideal if the number of async calls doesn't become an issue. Also there is the downside of the user having to wait a bit each time a pin is selected. Not sure what effects it will have on limited connection mode, besides.

@nicolas-raoul
Copy link
Member Author

I don't think that 40 calls is that bad.
A slightly different idea is to make a second request that get the details of all places together.

I suspect that existence checking is the most expensive. And it will get even more expensive in the future.

@misaochan
Copy link
Member

Out of curiosity, why is existence so expensive?

@nicolas-raoul
Copy link
Member Author

Here is an example of what the "second query" I mentioned above could look like: https://w.wiki/3ut$

SELECT ?item ?commonsCategory
WHERE {
  SERVICE <https://query.wikidata.org/sparql> {
    values ?item {
wd:Q100234656
wd:Q100234661
# etc other places found by the first simple request
    }
    OPTIONAL {?item wdt:P373 ?commonsCategory}
    # etc other properties, including optional ones
  }
}

About existence: I only suspect it. I will try to perform some measures tomorrow. But the important measure is between the two queries in the original post, and it clearly shows that showing coordinates could be much much faster :-)

@ashishkumar468
Copy link
Collaborator

So just to summarise what @nicolas-raoul said, we will be making 2 calls

  1. This will be the initial call which nearby anyways makes, this will include being a part of the radius expander logic, just with a lesser number of properties. Once this is done, we show the nearby with the pins.
  2. Another call will be made for other properties, this will happen in the background, and unlike the first query this will not block the map and once the results are fetched, we update the existing pins with the info of additional properties.

@nicolas-raoul, please verify if I have understood it properly.

@nicolas-raoul
Copy link
Member Author

@ashishkumar468 Exactly! 🙂

@nicolas-raoul
Copy link
Member Author

Please let me know if you need any help with the SPARQL syntax.

@ashishkumar468
Copy link
Collaborator

ashishkumar468 commented Aug 21, 2021

Hi @nicolas-raoul @misaochan , I was trying the implementation for the way Nicolas had suggested. Sharing a sample query for my location. Not sure why, specifically for this id - wd:Q65976956, the query fails, otherwise seems to work. Sharing the exception

SPARQL-QUERY: queryStr=SELECT
     (SAMPLE(?location) as ?location)
     ?item
     (SAMPLE(COALESCE(?itemLabelPreferredLanguage, ?itemLabelAnyLanguage)) as ?label)
     (SAMPLE(COALESCE(?itemDescriptionPreferredLanguage, ?itemDescriptionAnyLanguage, "?")) as ?description)
     (SAMPLE(?classId) as ?class)
     (SAMPLE(COALESCE(?classLabelPreferredLanguage, ?classLabelAnyLanguage, "?")) as ?classLabel)
     (SAMPLE(COALESCE(?icon0, ?icon1)) as ?icon)
     ?wikipediaArticle
     ?commonsArticle
     (SAMPLE(?commonsCategory) as ?commonsCategory)
     (SAMPLE(?pic) as ?pic)
     (SAMPLE(?destroyed) as ?destroyed)
     (SAMPLE(?endTime) as ?endTime)
   WHERE {
     # Around given location...
     SERVICE <https://query.wikidata.org/sparql> {
              values ?item {
          wd:Q24947183 wd:Q598985 wd:Q6418713 wd:Q63371376 wd:Q63371413 wd:Q63356612 wd:Q63986772 wd:Q26791672 wd:Q63371403 wd:Q65976956
              }
            }


     # Get the label in the preferred language of the user, or any other language if no label is available in that language.
     OPTIONAL {?item rdfs:label ?itemLabelPreferredLanguage. FILTER (lang(?itemLabelPreferredLanguage) = "en")}
     OPTIONAL {?item rdfs:label ?itemLabelAnyLanguage}

     # Get the description in the preferred language of the user, or any other language if no description is available in that language.
     OPTIONAL {?item schema:description ?itemDescriptionPreferredLanguage. FILTER (lang(?itemDescriptionPreferredLanguage) = "en")}
     OPTIONAL {?item schema:description ?itemDescriptionAnyLanguage }

     # Get Commons category (P373)
     OPTIONAL { ?item wdt:P373 ?commonsCategory. }

     # Get (P18)
     OPTIONAL { ?item wdt:P18 ?pic. }

     # Get (P576)
     OPTIONAL { ?item wdt:P576 ?destroyed. }

     # Get (P582)
     OPTIONAL { ?item wdt:P582 ?endTime. }

     # Get the class label in the preferred language of the user, or any other language if no label is available in that language.
     OPTIONAL {
       ?item p:P31/ps:P31 ?classId.
       OPTIONAL {?classId rdfs:label ?classLabelPreferredLanguage. FILTER (lang(?classLabelPreferredLanguage) = "en")}
       OPTIONAL {?classId rdfs:label ?classLabelAnyLanguage}

       OPTIONAL {
           ?wikipediaArticle   schema:about ?item ;
                               schema:isPartOf <https://en.wikipedia.org/> .
         }
       OPTIONAL {
           ?wikipediaArticle   schema:about ?item ;
                               schema:isPartOf <https://en.wikipedia.org/> .
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
         }

         OPTIONAL {
           ?commonsArticle   schema:about ?item ;
                               schema:isPartOf <https://commons.wikimedia.org/> .
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
         }
     }
   }
   GROUP BY ?item ?wikipediaArticle ?commonsArticle

java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=4cd24526-5b55-4f6f-8bcc-2412a097e076,bopId=72,partitionId=-1,sinkId=73,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
	at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292)
	at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlQuery(QueryServlet.java:678)
	at com.bigdata.rdf.sail.webapp.QueryServlet.doGet(QueryServlet.java:290)
	at com.bigdata.rdf.sail.webapp.RESTServlet.doGet(RESTServlet.java:240)
	at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doGet(MultiTenancyServlet.java:273)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
	at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
	at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
	at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:49)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
	at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:117)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
	at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
	at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
	at org.eclipse.jetty.server.Server.handle(Server.java:503)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=4cd24526-5b55-4f6f-8bcc-2412a097e076,bopId=72,partitionId=-1,sinkId=73,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlQueryTask.call(QueryServlet.java:889)
	at com.bigdata.rdf.sail.webapp.QueryServlet$SparqlQueryTask.call(QueryServlet.java:695)
	at com.bigdata.rdf.task.ApiTaskForIndexManager.call(ApiTaskForIndexManager.java:68)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more
Caused by: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=4cd24526-5b55-4f6f-8bcc-2412a097e076,bopId=72,partitionId=-1,sinkId=73,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at com.bigdata.rdf.sail.Bigdata2Sesame2BindingSetIterator.hasNext(Bigdata2Sesame2BindingSetIterator.java:188)
	at info.aduna.iteration.IterationWrapper.hasNext(IterationWrapper.java:68)
	at org.openrdf.query.QueryResults.report(QueryResults.java:155)
	at org.openrdf.repository.sail.SailTupleQuery.evaluate(SailTupleQuery.java:76)
	at com.bigdata.rdf.sail.webapp.BigdataRDFContext$TupleQueryTask.doQuery(BigdataRDFContext.java:1722)
	at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.innerCall(BigdataRDFContext.java:1579)
	at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:1544)
	at com.bigdata.rdf.sail.webapp.BigdataRDFContext$AbstractQueryTask.call(BigdataRDFContext.java:757)
	... 4 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=4cd24526-5b55-4f6f-8bcc-2412a097e076,bopId=72,partitionId=-1,sinkId=73,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at com.bigdata.rdf.sail.RunningQueryCloseableIterator.checkFuture(RunningQueryCloseableIterator.java:59)
	at com.bigdata.rdf.sail.RunningQueryCloseableIterator.close(RunningQueryCloseableIterator.java:73)
	at com.bigdata.rdf.sail.RunningQueryCloseableIterator.hasNext(RunningQueryCloseableIterator.java:82)
	at com.bigdata.striterator.ChunkedWrappedIterator.hasNext(ChunkedWrappedIterator.java:197)
	at com.bigdata.rdf.sail.Bigdata2Sesame2BindingSetIterator.hasNext(Bigdata2Sesame2BindingSetIterator.java:134)
	... 11 more
Caused by: java.util.concurrent.ExecutionException: java.lang.Exception: task=ChunkTask{query=4cd24526-5b55-4f6f-8bcc-2412a097e076,bopId=72,partitionId=-1,sinkId=73,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at com.bigdata.util.concurrent.Haltable.get(Haltable.java:273)
	at com.bigdata.bop.engine.AbstractRunningQuery.get(AbstractRunningQuery.java:1516)
	at com.bigdata.bop.engine.AbstractRunningQuery.get(AbstractRunningQuery.java:104)
	at com.bigdata.rdf.sail.RunningQueryCloseableIterator.checkFuture(RunningQueryCloseableIterator.java:46)
	... 15 more
Caused by: java.lang.Exception: task=ChunkTask{query=4cd24526-5b55-4f6f-8bcc-2412a097e076,bopId=72,partitionId=-1,sinkId=73,altSinkId=null}, cause=java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1367)
	at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTaskWrapper.run(ChunkedRunningQuery.java:926)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
	at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run(ChunkedRunningQuery.java:821)
	... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1347)
	... 8 more
Caused by: java.lang.RuntimeException: Headless value factory should not be asked for its namespace
	at com.bigdata.rdf.model.BigdataValueFactoryImpl.getNamespace(BigdataValueFactoryImpl.java:85)
	at com.bigdata.rdf.internal.encoder.IVSolutionSetEncoder.encodeSolution(IVSolutionSetEncoder.java:386)
	at com.bigdata.rdf.internal.encoder.IVSolutionSetEncoder.encodeSolution(IVSolutionSetEncoder.java:292)
	at com.bigdata.rdf.internal.encoder.IVSolutionSetEncoder.encodeSolution(IVSolutionSetEncoder.java:285)
	at com.bigdata.rdf.internal.encoder.SolutionSetStreamEncoder.encode(SolutionSetStreamEncoder.java:134)
	at com.bigdata.bop.solutions.SolutionSetStream.put(SolutionSetStream.java:277)
	at com.bigdata.bop.engine.LocalNativeChunkMessage.<init>(LocalNativeChunkMessage.java:213)
	at com.bigdata.bop.engine.LocalNativeChunkMessage.<init>(LocalNativeChunkMessage.java:147)
	at com.bigdata.bop.engine.StandaloneChunkHandler.handleChunk(StandaloneChunkHandler.java:92)
	at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.outputChunk(ChunkedRunningQuery.java:1699)
	at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.outputBufferedChunk(ChunkedRunningQuery.java:1716)
	at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.flush(ChunkedRunningQuery.java:1739)
	at com.bigdata.bop.solutions.PipelinedAggregationOp$ChunkTask.call(PipelinedAggregationOp.java:846)
	at com.bigdata.bop.solutions.PipelinedAggregationOp$ChunkTask.call(PipelinedAggregationOp.java:519)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1346)
	... 8 more

This can also be seen by running the query link I have shared. Can anything be done about this?

@nicolas-raoul
Copy link
Member Author

nicolas-raoul commented Aug 21, 2021

@ashishkumar468
Maybe because this item has no description in any language?
I will try to fix the request, but for now could you please just comment out the line (SAMPLE(COALESCE(?itemDescriptionPreferredLanguage, ?itemDescriptionAnyLanguage, "?")) as ?description)? That seems to make the request succeed.

@nicolas-raoul
Copy link
Member Author

nicolas-raoul commented Aug 22, 2021

@ashishkumar468 This SPARQL query works in all situations, and returns all information, please use it, thanks 😊 :

SELECT
  ?item
  (SAMPLE(?label) AS ?label)
  (SAMPLE(?description) AS ?description)
  (SAMPLE(?class) AS ?class) # TODO Is it really used?
  (SAMPLE(?classLabel) AS ?classLabel)
  (SAMPLE(?pic) AS ?pic)
  (SAMPLE(?destroyed) AS ?destroyed)
  (SAMPLE(?endTime) AS ?endTime)
  (SAMPLE(?wikipediaArticle) AS ?wikipediaArticle)
  (SAMPLE(?commonsArticle) AS ?commonsArticle)
  (SAMPLE(?commonsCategory) AS ?commonsCategory)
WHERE {
  SERVICE <https://query.wikidata.org/sparql> {
    values ?item {
      wd:Q24947183 wd:Q598985 wd:Q6418713 wd:Q63371376 wd:Q63371413 wd:Q63356612 wd:Q63986772 wd:Q26791672 wd:Q63371403 wd:Q65976956
    }
  }

  # Get the label in the preferred language of the user, or any other language if no label is available in that language.
  OPTIONAL {?item rdfs:label ?itemLabelPreferredLanguage. FILTER (lang(?itemLabelPreferredLanguage) = "en")}
  OPTIONAL {?item rdfs:label ?itemLabelAnyLanguage}
  BIND(COALESCE(?itemLabelPreferredLanguage, ?itemLabelAnyLanguage, "?") as ?label)

  # Get the description in the preferred language of the user, or any other language if no description is available in that language.
  OPTIONAL {?item schema:description ?itemDescriptionPreferredLanguage. FILTER (lang(?itemDescriptionPreferredLanguage) = "en")}
  OPTIONAL {?item schema:description ?itemDescriptionAnyLanguage}
  BIND(COALESCE(?itemDescriptionPreferredLanguage, ?itemDescriptionAnyLanguage, "?") as ?description)

  # Get the class label in the preferred language of the user, or any other language if no label is available in that language.
  OPTIONAL {
  ?item p:P31/ps:P31 ?class.
    OPTIONAL {?class rdfs:label ?classLabelPreferredLanguage. FILTER (lang(?classLabelPreferredLanguage) = "en")}
    OPTIONAL {?class rdfs:label ?classLabelAnyLanguage}
    BIND(COALESCE(?classLabelPreferredLanguage, ?classLabelAnyLanguage, "?") as ?classLabel)
  }

  # Get picture
  OPTIONAL {?item wdt:P18 ?pic}

  # Get existence
  OPTIONAL {?item wdt:P576 ?destroyed}
  OPTIONAL {?item wdt:P582 ?endTime}

  # Get Commons category
  OPTIONAL {?item wdt:P373 ?commonsCategory}

  # Get Wikipedia article
  OPTIONAL {
    ?wikipediaArticle schema:about ?item.
    ?wikipediaArticle schema:isPartOf <https://en.wikipedia.org/>. # TODO internationalization
  }

  # Get Commons article
  OPTIONAL {
    ?commonsArticle schema:about ?item.
    ?commonsArticle schema:isPartOf <https://commons.wikimedia.org/>.
  }
}
GROUP BY ?item

@misaochan
Copy link
Member

Thanks for helping with the query @nicolas-raoul ! Pinging @ashishkumar468 instead, I don't think the other username works. ;)

@misaochan
Copy link
Member

Hi @nicolas-raoul , I just talked with Ashish. There are a few things that we need help clearing up, as we aren't very familiar with SPARQL.

  1. I just realized we are actually running two separate queries, one for Nearby (https://github.com/commons-app/apps-android-commons/blob/master/app/src/main/resources/queries/nearby_query.rq) and one for Monuments (https://github.com/commons-app/apps-android-commons/blob/master/app/src/main/resources/queries/monuments_query.rq). Am I correct in thinking that if we ONLY use nearby_query.rq, we should be able to get all the pins we need including the Monument pins, and then we can filter them client-side?

  2. Could you post the SPARQL for the first query (QID/coords) as well please?

Thanks a lot!

@nicolas-raoul
Copy link
Member Author

nicolas-raoul commented Aug 23, 2021

My new suggestion:

Always running this as the first query:

SELECT
  ?location
  ?item
  ?monument
WHERE {
  # Around given location
  SERVICE wikibase:around {
    ?item wdt:P625 ?location.
    bd:serviceParam wikibase:center "Point(4.89 52.37)"^^geo:wktLiteral. # Longitude latitude
    bd:serviceParam wikibase:radius "1". # Radius in kilometers.
  }

  # Wiki Loves Monuments
  OPTIONAL {?item p:P1435 ?monument}
  OPTIONAL {?item p:P2186 ?monument}
  OPTIONAL {?item p:P1459 ?monument}
  OPTIONAL {?item p:P1460 ?monument}
  OPTIONAL {?item p:P1216 ?monument}
  OPTIONAL {?item p:P709 ?monument}
  OPTIONAL {?item p:P718 ?monument}
  OPTIONAL {?item p:P5694 ?monument}
}

It is super-fast and only gives places and their coordinates and their WLM identifier if they have one, which is enough to start showing pins on the map.

... and then the second query from my comment above to get details. This second query is much slower (I now suspect it is due to my implementation of language fallback), so I would advise running it both:

  • Little by little (for instance 10 items at a time), prioritizing places that are currently visible on the user's screen.
  • AND when a pin is selected, just for that item.

(if there is no time to implement this, running the second query for all fetched items should work in most places, though)

For performance, I suggest removing the OPTIONAL {?item .* ?monument} lines whenever WLM is not activated.


By the way, I use this map to get an idea of Wikidata places density, it seems that Amsterdam and London are very crowded. If you know precise locations of the most crowded places, please share coordinates, that will be useful for testing :-)

@ashishkumar468
Copy link
Collaborator

Hi @nicolas-raoul , Thanks for sharing this, one of the other problems we realised that my PR was making separate calls for nearby and monuments while @misaochan suggests that this can be done in a single call with client side filtering. I am not really comfortable very much with SPARQL queries, would it be possible for you too help with that too (without the split query implementation - for now). What we are planning is to get that in place first and take this one as an improvement.

@misaochan
Copy link
Member

misaochan commented Aug 23, 2021

Yeah, as mentioned above I just realized that we are making two separate queries for Nearby and Monuments, and I'm wondering if that was part of the issue, since we are getting a lot of points and their details twice, and both queries are done sequentially and block map load in the meantime. If so, I think it would make sense to try combining the two first, as that would be a much simpler implementation that could get the WLM release out the door in time for the start date. Even for this split query, I think we should sort out the combining before implementing it, otherwise I'm not sure how it would work.

What do you think @nicolas-raoul ? Do you figure we can combine https://github.com/commons-app/apps-android-commons/blob/master/app/src/main/resources/queries/nearby_query.rq with https://github.com/commons-app/apps-android-commons/blob/master/app/src/main/resources/queries/monuments_query.rq ? It looks to me like nearby_query.rq is already getting all the points we need, we just need to filter for the monument properties client-side. Or if a modification is needed to nearby_query to filter for monuments, what should we add to the query?

I tried running both queries in https://query.wikidata.org/ for Brisbane City, and every point that exists in monuments_query, also exists in nearby_query. So for that area we were sending duplicate requests for 105 out of 227 points, at a 0.7km radius.

Pinging @neslihanturan for input as well.

@nicolas-raoul
Copy link
Member Author

Sure, here is a query that gives everything, coordinates and details and whether it is a WLM monument:

SELECT
  ?item
  (SAMPLE(?label) AS ?label)
  (SAMPLE(?description) AS ?description)
  (SAMPLE(?class) AS ?class) # TODO Is it really used?
  (SAMPLE(?classLabel) AS ?classLabel)
  (SAMPLE(?pic) AS ?pic)
  (SAMPLE(?destroyed) AS ?destroyed)
  (SAMPLE(?endTime) AS ?endTime)
  (SAMPLE(?wikipediaArticle) AS ?wikipediaArticle)
  (SAMPLE(?commonsArticle) AS ?commonsArticle)
  (SAMPLE(?commonsCategory) AS ?commonsCategory)
  (SAMPLE(?monument) AS ?monument)
WHERE {
  # Around given location
  SERVICE wikibase:around {
    ?item wdt:P625 ?location.
    bd:serviceParam wikibase:center "Point(4.89 52.37)"^^geo:wktLiteral. # Longitude latitude
    bd:serviceParam wikibase:radius "0.1". # Radius in kilometers.
  }

  # Get the label in the preferred language of the user, or any other language if no label is available in that language.
  OPTIONAL {?item rdfs:label ?itemLabelPreferredLanguage. FILTER (lang(?itemLabelPreferredLanguage) = "en")}
  OPTIONAL {?item rdfs:label ?itemLabelAnyLanguage}
  BIND(COALESCE(?itemLabelPreferredLanguage, ?itemLabelAnyLanguage, "?") as ?label)

  # Get the description in the preferred language of the user, or any other language if no description is available in that language.
  OPTIONAL {?item schema:description ?itemDescriptionPreferredLanguage. FILTER (lang(?itemDescriptionPreferredLanguage) = "en")}
  OPTIONAL {?item schema:description ?itemDescriptionAnyLanguage}
  BIND(COALESCE(?itemDescriptionPreferredLanguage, ?itemDescriptionAnyLanguage, "?") as ?description)

  # Get the class label in the preferred language of the user, or any other language if no label is available in that language.
  OPTIONAL {
  ?item p:P31/ps:P31 ?class.
    OPTIONAL {?class rdfs:label ?classLabelPreferredLanguage. FILTER (lang(?classLabelPreferredLanguage) = "en")}
    OPTIONAL {?class rdfs:label ?classLabelAnyLanguage}
    BIND(COALESCE(?classLabelPreferredLanguage, ?classLabelAnyLanguage, "?") as ?classLabel)
  }

  # Get picture
  OPTIONAL {?item wdt:P18 ?pic}

  # Get existence
  OPTIONAL {?item wdt:P576 ?destroyed}
  OPTIONAL {?item wdt:P582 ?endTime}

  # Get Commons category
  OPTIONAL {?item wdt:P373 ?commonsCategory}

  # Get Wikipedia article
  OPTIONAL {
    ?wikipediaArticle schema:about ?item.
    ?wikipediaArticle schema:isPartOf <https://en.wikipedia.org/>. # TODO internationalization
  }

  # Get Commons article
  OPTIONAL {
    ?commonsArticle schema:about ?item.
    ?commonsArticle schema:isPartOf <https://commons.wikimedia.org/>.
  }

  # Wiki Loves Monuments
  OPTIONAL {?item p:P1435 ?monument}
  OPTIONAL {?item p:P2186 ?monument}
  OPTIONAL {?item p:P1459 ?monument}
  OPTIONAL {?item p:P1460 ?monument}
  OPTIONAL {?item p:P1216 ?monument}
  OPTIONAL {?item p:P709 ?monument}
  OPTIONAL {?item p:P718 ?monument}
  OPTIONAL {?item p:P5694 ?monument}
}
GROUP BY ?item

Obviously it times out in crowded places. The example above took 30 seconds even with a tiny radius of 0.1.

@misaochan
Copy link
Member

Thanks a lot Nicolas!

@misaochan
Copy link
Member

To be handled after WLM.

@kanahia1
Copy link
Contributor

Hey @nicolas-raoul, Can I work on this issue 🙂

@shashankiitbhu
Copy link
Contributor

shashankiitbhu commented Mar 25, 2024

@nicolas-raoul I think this Issue should be Locked for GSOC, as it is mentioned on Phabricator too, and other tasks are left for GSOC which are mentioned on Phabricator.

Since The other Optional Task is Also Completed and the one main task, which is - Modernize the app by replacing Kotlin Android Extensions and Butterknife with ViewBinding: #4664 is also completed, only this task is left to replace the main completed task #4664.

Since it was mentioned on Phabricator with tasks that are already locked I have added (and other contributors too) this task in the proposal and made the proposal according to that so I think it's better to be left for GSOC since less than a week is left for application submission.

@kanahia1
Copy link
Contributor

Oh sorry @shashankiitbhu, started working already. Will be sharing pr asap 🙂. Maybe @nicolas-raoul we can have a different issue in GSoC.

@shashankiitbhu
Copy link
Contributor

@kanahia1 I mean it has not been much time since you were assigned this issue, and honestly, I have also done a a lot of starting work on this for the GSOC proposal that I have submitted since I had to make the timeline for it. So all of us are even till this point, good that you have seen this, it'll save you a lot time working on this.

The thing is this Issue is already mentioned and proposals are already built on that , so it'll be really counter productive to change now.

However, I am waiting for @nicolas-raoul suggestions on what should be done here , so it's best to wait for confirmation, thanks.

@kanahia1 kanahia1 removed their assignment Mar 25, 2024
@kanahia1
Copy link
Contributor

Sure, No Worries @shashankiitbhu. Sorry for bothering you 😞

@shashankiitbhu
Copy link
Contributor

@kanahia1 no problem, and you didn't bother me at all and thanks for understanding what I was trying to explain here.

I am sure there are many other Issues you can work on , in this project and help improve this project.

@kanahia1
Copy link
Contributor

Sorry for the delay, I was bit out of station. I have already restructured the queries given above so that anyone who works on this issue during GSoC can use these code without any hassle. Thanks to Nicolas Raoul, Neslihanturan, Josephine Lim, Ashish for amazing inputs.

rectangle_query_for_nearby_only_coordinates.rq

SELECT
  ?location
  ?item
WHERE {
  # Around given location
  SERVICE wikibase:box {
    ?item wdt:P625 ?location.
     bd:serviceParam wikibase:cornerWest "Point(84.2 26.3)"^^geo:wktLiteral.
     bd:serviceParam wikibase:cornerEast "Point(84.6 26.6)"^^geo:wktLiteral.
  }
}

rectangle_query_for_nearby_monuments_only_coordinates.rq

SELECT
  ?location
  ?item
  ?monument
WHERE {
  # Around given location
  SERVICE wikibase:box {
    ?item wdt:P625 ?location.
     bd:serviceParam wikibase:cornerWest "Point(80.2 26.3)"^^geo:wktLiteral.
     bd:serviceParam wikibase:cornerEast "Point(84.6 26.6)"^^geo:wktLiteral.
  }
  
    # Wiki Loves Monuments
  OPTIONAL {?item p:P1435 ?monument}
  OPTIONAL {?item p:P2186 ?monument}
  OPTIONAL {?item p:P1459 ?monument}
  OPTIONAL {?item p:P1460 ?monument}
  OPTIONAL {?item p:P1216 ?monument}
  OPTIONAL {?item p:P709 ?monument}
  OPTIONAL {?item p:P718 ?monument}
  OPTIONAL {?item p:P5694 ?monument}
  OPTIONAL {?item p:P3426 ?monument}
}

query_for_item.rq

SELECT
  ?item
  (SAMPLE(?label) AS ?label)
  (SAMPLE(?description) AS ?description)
  (SAMPLE(?class) AS ?class) # TODO Is it really used?
  (SAMPLE(?classLabel) AS ?classLabel)
  (SAMPLE(?pic) AS ?pic)
  (SAMPLE(?destroyed) AS ?destroyed)
  (SAMPLE(?endTime) AS ?endTime)
  (SAMPLE(?wikipediaArticle) AS ?wikipediaArticle)
  (SAMPLE(?commonsArticle) AS ?commonsArticle)
  (SAMPLE(?commonsCategory) AS ?commonsCategory)
WHERE {
  SERVICE <https://query.wikidata.org/sparql> {
    values ?item {
      wd:Q24947183
    }
  }

  # Get the label in the preferred language of the user, or any other language if no label is available in that language.
  OPTIONAL {?item rdfs:label ?itemLabelPreferredLanguage. FILTER (lang(?itemLabelPreferredLanguage) = "en")}
  OPTIONAL {?item rdfs:label ?itemLabelAnyLanguage}
  BIND(COALESCE(?itemLabelPreferredLanguage, ?itemLabelAnyLanguage, "?") as ?label)

  # Get the description in the preferred language of the user, or any other language if no description is available in that language.
  OPTIONAL {?item schema:description ?itemDescriptionPreferredLanguage. FILTER (lang(?itemDescriptionPreferredLanguage) = "en")}
  OPTIONAL {?item schema:description ?itemDescriptionAnyLanguage}
  BIND(COALESCE(?itemDescriptionPreferredLanguage, ?itemDescriptionAnyLanguage, "?") as ?description)

  # Get the class label in the preferred language of the user, or any other language if no label is available in that language.
  OPTIONAL {
  ?item p:P31/ps:P31 ?class.
    OPTIONAL {?class rdfs:label ?classLabelPreferredLanguage. FILTER (lang(?classLabelPreferredLanguage) = "en")}
    OPTIONAL {?class rdfs:label ?classLabelAnyLanguage}
    BIND(COALESCE(?classLabelPreferredLanguage, ?classLabelAnyLanguage, "?") as ?classLabel)
  }

  # Get picture
  OPTIONAL {?item wdt:P18 ?pic}

  # Get existence
  OPTIONAL {?item wdt:P576 ?destroyed}
  OPTIONAL {?item wdt:P582 ?endTime}

  # Get Commons category
  OPTIONAL {?item wdt:P373 ?commonsCategory}

  # Get Wikipedia article
  OPTIONAL {
    ?wikipediaArticle schema:about ?item.
    ?wikipediaArticle schema:isPartOf <https://en.wikipedia.org/>. # TODO internationalization
  }

  # Get Commons article
  OPTIONAL {
    ?commonsArticle schema:about ?item.
    ?commonsArticle schema:isPartOf <https://commons.wikimedia.org/>.
  }
}
GROUP BY ?item

// Note for contributors
We can get items' information from first two queries, then we can run third query using item which will give information to populate bottom sheet. I ran the third query it took me 72ms (that is really instant 🤯), I guess User can wait for that much time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment