# Query the index for threads and topics

## Prerequisites

* Enable **port forwarding** from port 23456 to port localhost:23456
* Point your browser to **http://localhost:23456**
* You will see the default Solr interface

## Basic querying

* Select "s24_top" from *Core selector*
* Click *Query*
* Click *Execute Query" on the bottom of the screen

* This returns all documents, since we searched for everything (\*:\* in the *q* field). The queries are of the form "field:query" (so \*:\* means everything in all fields). What fields do we have?

* **id** thread ID in Suomi24
* **thread_txt_fi** the text of the thread
* **date_dt** the date of the first post
* **best1_s** the highest-scoring topic for this thread
* **best1_f** the probability of the best scoring topic
* **bestN_ss** few highest_scoring topics
* **s24_area_s** and **s24_subarea_s** S24 sections

*Topic names* 
[http://localhost:23456/solr/s24_top/select?stats=on&stats.field=best1_s&rows=0&stats.calcdistinct=true&q=*:*] (defined originally in file `top_names_50_400k.txt`)

## Querying for topics and sorting

* **best1_s:työ** gives all documents whose best topic is topic0, in no particular order
* **bestN_ss:remontti** gives all documents for which *remontti* is among the top topics
* **+best1_s:raskaus +thread_txt_fi:koira** gives all documents with topic 0 and the word koira in them

sorting is easy

* **date_dt desc** added to the *sort* field sorts by date, from new to old
* **best1_f desc** in the *sort* field, combined with **best1_s:opiskelu** finds the most representative threads for topic opiskelu.

Full query language documentation: http://www.solrtutorial.com/solr-query-syntax.html

# Topic distribution in S24

* We can query (programmatically) each topic in a row, and ask how many hits we've got
* Then we can sort by which topics are largest

This needs a bit of Python:


In [2]:
import pysolr
solr=pysolr.Solr("http://localhost:23456/solr/s24_top")
result=solr.search(q="*:*",sort="date_dt desc",fl="id,best1_s")
print("Found this many:",result.hits)
print("Here they are:",result.docs)

Found this many: 43927
Here they are: [{'id': '13680991', 'best1_s': 'ulkopolitiikka_sota_nato'}, {'id': '13680981', 'best1_s': 'roska_6'}, {'id': '13680971', 'best1_s': 'perhe_suhteet'}, {'id': '13680961', 'best1_s': 'matkailu'}, {'id': '13680951', 'best1_s': 'matkailu'}, {'id': '13680941', 'best1_s': 'englanti'}, {'id': '13680931', 'best1_s': 'positiivinen_elämä'}, {'id': '13680921', 'best1_s': 'matkailu'}, {'id': '13680911', 'best1_s': 'autohuolto'}, {'id': '13680901', 'best1_s': 'it_hankinta'}]


In [4]:
# Must be modified to use topics from 
# http://localhost:23456/solr/s24_top/select?stats=on&stats.field=best1_s&rows=0&stats.calcdistinct=true&q=*:*
for topic_id in range(3):
    response=solr.search("best1_s:t_{}".format(topic_id))
    print("topic",topic_id,"hits",response.hits)

topic 0 hits 0
topic 1 hits 0
topic 2 hits 0
