# *Chaining search*: Sandbox

Use this notebook to combine linguistic resources yourself: corpora, lexica and treebanks. 
 * Use methods from our library *chaininglib*, [described in the documentation](doc/_build/html/index.html)
 * To get an idea of the possibilities and to copy code, go to the [Examples notebook](Examples.ipynb).
 * If you encounter any bugs or errors, please let us know via our [GitHub issue tracker](https://github.com/INL/chaining-search/issues) or send an e-mail to servicedesk@ivdnt.org.

In [1]:
#import chaininglib

from chaininglib.search.CorpusQuery import *

from chaininglib.process.corpus import *

from chaininglib.ui.dfui import *

 

from chaininglib.search.LexiconQuery import *

from chaininglib.ui.dfui import display_df

from chaininglib.search.CorpusQuery import *
from chaininglib.search.LexiconQuery import *
from IPython.core.display import display, HTML
from chaininglib.search.corpusQueries import corpus_query
from chaininglib.process.lexicon import get_diamant_synonyms
from chaininglib.ui.dfui import display_df
import re



Lexicon service query types: 
* expand
* get_wordforms 
* get_lemma_from_wordform 
* get_related_lemmata 
* get_wordforms_from_lemma_id 
* get_related_lemmata

In [2]:
lexicon_name='nameslex'

search_word='mathieu'

lex = create_lexicon(lexicon_name).query_type('expand').word(search_word).search()

df_lexicon = lex.kwic()

display_df(df_lexicon)

...Searching nameslex...[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=mathieu
                                                                    [F

Unnamed: 0,wordform,query_lemma_id,query_pos,query_word
0,,,,mathieu
1,lemaheu,,,mathieu
2,lemahie,,,mathieu
3,lemahieu,,,mathieu
4,lemahui,,,mathieu
5,lemavie,,,mathieu
6,maahijs,,,mathieu
7,maarthis,,,mathieu
8,maate,,,mathieu
9,maatea,,,mathieu


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

In [3]:
lexicon_name='nameslex'

search_word='jesse'

lex = create_lexicon(lexicon_name).query_type('get_lemma_from_wordform').word(search_word).search()

df_lexicon = lex.kwic()

display_df(df_lexicon)

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_lemma_from_wordform?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jesse
[F                                                                 

Unnamed: 0,dataset,lemma,lemma_id,pos
0,names_sns,,8381,
1,names_gn,,435,F M


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

In [4]:
lex = create_lexicon(lexicon_name).query_type('get_wordforms_from_lemma_id').lemma_id('8381').search()

df_lexicon = lex.kwic()

display_df(df_lexicon)

...Searching nameslex...[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_wordforms_from_lemma_id?case_sensitive=false&tweaked_queries=true&database=nameslex&lemma_id=8381
                                                                    [F

Unnamed: 0,wordform,query_lemma_id,query_pos,query_word
0,jees,8381,,
1,jeessen,8381,,
2,jehse,8381,,
3,jes,8381,,
4,jeschen,8381,,
5,jeseer,8381,,
6,jesel,8381,,
7,jesella,8381,,
8,jesen,8381,,
9,jeseph,8381,,


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

In [5]:
lex = create_lexicon(lexicon_name).query_type('get_related_lemmata').lemma_id('8381').search()

df_lexicon = lex.kwic()

display_df(df_lexicon)

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_related_lemmata?case_sensitive=false&tweaked_queries=true&database=nameslex&lemma_id=8381
[F                                                                 

Unnamed: 0,lemma,lemma_id,pos,relation
0,,8480,,synonym
1,,8388,,synonym
2,,8381,,synonym


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

In [7]:
# Query expansie met names lexicon


corpus= "zeebrieven"

lexicon_name='nameslex'

search_word='jan'



# First, lookup synonyms in Names lexicon
lq = create_lexicon(lexicon_name).query_type('expand').word(search_word).search()
df_lexicon = lq.kwic()

syns = df_lexicon['wordform'].tolist() # get the wordform column from df_lexicon
syns.append(search_word) # Also add search word itself
display(HTML('Naamsvarianten bij <b>' + search_word + '</b>: ' + ", ".join(syns[1:30])))

# Search for all synonyms in corpus
## Create queries: search by lemma
filtered_syns = [syn for syn in syns if len(syn) > 0 and re.match("^[a-zA-Z]+$", syn)]
syns_queries = [corpus_query(word=syn) for syn in filtered_syns][0:30]
## display(syns_queries)
## Search for all synonyms in corpus
df = pd.DataFrame()
# separate queries (slow)
if 0==1:
    for one_pattern in syns_queries:
        print("Searching for pattern: " + one_pattern)
        cq = create_corpus(corpus).pattern(one_pattern).max_results(10).search()
        kwic = cq.kwic()
        print("Results: " + str(len(kwic.index)))
        df = df.append(kwic)
    display_df(df)

# one query (faster but should be carried out in portions)
bigOr = '|'.join(filtered_syns[0:500])

or_pattern = corpus_query(word=bigOr,pos='NEPER')
#print(or_pattern)

cq = create_corpus(corpus).pattern(or_pattern).max_results(1000).search()
kwic = cq.kwic()
display(HTML("<i>Results for big or</i>: " + str(len(kwic.index))))
df = df.append(kwic)
display_df(df)


[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jan
[F                                                                 

[F                                                                    

Unnamed: 0,left context,lemma 0,pos 0,word 0,right context
0,Aen mijn welbeminde man schipper,Hans,NEPER,hans,louwerensen decker leggende mijt sijn
1,u l Dienstwilligen huijsvrouw goetien,Hansen,NEPER,hansen,deckers seght tegen klaes bartels
2,Aen schiper Sr.,Hans,NEPER,hans,louwerensz Decker Legende met sijn
3,j ck wil sellfe bij,Hans,NEPER,hans,Iongh gaen wan he na
4,blijve u l vroeuw goetien,Hansen,NEPER,hansen,Deckens dietelof hendrick wrouw is
5,om voorte behandighen an onsen,Jan,NEPER,Ian,die sal se dan voort
6,"Jans, Jck u.l. beminde Man",Hans,NEPER,hans,Claesen lalander laet u.l. op
7,mij u l beminde Man,Hans,NEPER,hans,Claesen […] Jnder dooet soo
8,van vl geekregen die ijn,Jan,NEPER,ijan,toebeiassen brijef stack twelleke mijn
9,bemijnde maet geeschreven die ijn,Jan,NEPER,ijan,toebeiassen brief geesloeten voors ijck


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…