# *Chaining search*: Names lexicon service demonstration

Use this notebook to combine linguistic resources yourself: corpora, lexica and treebanks. 
 * Use methods from our library *chaininglib*, [described in the documentation](doc/_build/html/index.html)
 * To get an idea of the possibilities and to copy code, go to the [Examples notebook](Examples.ipynb).
 * If you encounter any bugs or errors, please let us know via our [GitHub issue tracker](https://github.com/INL/chaining-search/issues) or send an e-mail to servicedesk@ivdnt.org.

In [1]:
#import chaininglib

from chaininglib.search.CorpusQuery import *
from chaininglib.process.corpus import *
from chaininglib.ui.dfui import *
from chaininglib.search.LexiconQuery import *
from chaininglib.ui.dfui import display_df
from chaininglib.search.CorpusQuery import *
from chaininglib.search.LexiconQuery import *
from IPython.core.display import display, HTML
from chaininglib.search.corpusQueries import corpus_query
from chaininglib.process.lexicon import get_diamant_synonyms
from chaininglib.ui.dfui import display_df
import re

lexicon_name='nameslex'

names_lexicon=create_lexicon(lexicon_name)

def get_variants(search_word, max):
    df_lexicon = names_lexicon.query_type('expand').word(search_word).search().kwic()
    syns = df_lexicon['wordform'].tolist() 
    syns.append(search_word) 
    filtered_syns = [syn for syn in syns if len(syn) > 0 and re.match("^[a-zA-Z]+$", syn)]
    return filtered_syns[0:max]

def expanded_query(search_word, max):
    variants = get_variants(search_word, max)
    bigOr = '|'.join(variants)
    return corpus_query(word=bigOr,pos='NEPER')

def separate_variant_queries(search_word, max):
    variants = get_variants(search_word, max)
    variant_queries = [corpus_query(word=variant,pos='NEPER') for variant in variants]
    return variant_queries

def find_with_expanded_query(corpus,search_word, max_variants, max_results):
    query = expanded_query(search_word, max_variants)
    search_results = create_corpus(corpus).pattern(query).max_results(max_results).search().kwic()
    return search_results

def find_with_separate_queries(corpus,search_word, max_variants, max_results_per_variant):
    variant_queries = separate_variant_queries(search_word, max_variants)
    combined_results = pd.DataFrame()
    for one_query in variant_queries:
        kwic = create_corpus(corpus).pattern(one_query).max_results(max_results_per_variant).search().kwic()
        combined_results = combined_results.append(kwic)
    return combined_results


Lexicon service query types: 
* expand (get all word forms which are in a common paradigm with search_word)
* get_wordforms (get all wordforms from the paradim of lemmata with lemma form search_word)
* get_lemma_from_wordform (find lemmata l such that search_word belongs to the paradigm of l)
* get_related_lemmata (find related lemmata to lemma with id lemma_id)
* get_wordforms_from_lemma_id (get all wordforms from the paradim of lemmata with lemma id lemma_id)

# Expand query: from wordform to wordform

In [2]:
search_word='mathieu'

df_lexicon = names_lexicon.query_type('expand').word(search_word).search().kwic()

display_df(df_lexicon)

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=mathieu
[F                                                                 

Unnamed: 0,query_lemma_id,query_word,query_pos,wordform
0,,mathieu,,
1,,mathieu,,lemaheu
2,,mathieu,,lemahie
3,,mathieu,,lemahieu
4,,mathieu,,lemahui
5,,mathieu,,lemavie
6,,mathieu,,maahijs
7,,mathieu,,maarthis
8,,mathieu,,maate
9,,mathieu,,maatea


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Get lemma from wordform

In [3]:
search_word='jesse'

df_lexicon  = create_lexicon(lexicon_name).query_type('get_lemma_from_wordform').word(search_word).search().kwic()

display_df(df_lexicon)

...Searching nameslex...[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_lemma_from_wordform?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jesse
                                                                    [F

Unnamed: 0,dataset,lemma,lemma_id,pos
0,names_sns,,8381,
1,names_gn,,435,F M


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Get wordforms from lemma id

In [4]:
lex = create_lexicon(lexicon_name).query_type('get_wordforms_from_lemma_id').lemma_id('8381').search()

df_lexicon = lex.kwic()

display_df(df_lexicon)

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_wordforms_from_lemma_id?case_sensitive=false&tweaked_queries=true&database=nameslex&lemma_id=8381
[F                                                                 

Unnamed: 0,query_lemma_id,query_word,query_pos,wordform
0,8381,,,jees
1,8381,,,jeessen
2,8381,,,jehse
3,8381,,,jes
4,8381,,,jeschen
5,8381,,,jeseer
6,8381,,,jesel
7,8381,,,jesella
8,8381,,,jesen
9,8381,,,jeseph


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Get related lemmata

In [5]:
df_lexicon = names_lexicon.query_type('get_related_lemmata').lemma_id('8381').search().kwic()

display_df(df_lexicon)

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_related_lemmata?case_sensitive=false&tweaked_queries=true&database=nameslex&lemma_id=8381
[F                                                                 

Unnamed: 0,lemma,lemma_id,pos,relation
0,,8480,,synonym
1,,8388,,synonym
2,,8381,,synonym


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Query expansie met names lexicon, zoek in zeebrieven

In [6]:
corpus= "zeebrieven"
search_word='jan'


        

#display(HTML('Naamsvarianten bij <b>' + search_word + '</b>: ' + ", ".join(syns[1:30])))


kwic = find_with_expanded_query(corpus,search_word, 500, 50)

display(HTML("<i>Results for expanded query </i>: " + str(len(kwic.index))))

display_df(kwic)


kwic = find_with_separate_queries(corpus,search_word, 30, 50)

display(HTML("<i>Results for separate queries </i>: " + str(len(kwic.index))))

display_df(kwic)


[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jan
[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/zeebrieven/hits?&number=50&first=0&patt=%5Bword%3D%22ancko%7Canco%7Cancoch%7Cancolina%7Canko%7Ccohannes%7Ceijans%7Cgeiatje%7Cgian%7Cgianelli%7Cgianetti%7Cgiani%7Cgianiel%7Cgianni%7Cgiantten%7Cgijaan%7Cgioani%7Cgioanna%7Cgioanni%7Cgioannie%7Cgiofani%7Cgion%7Cgionani%7Cgionanni%7Cgioranni%7Cgiovan%7Cgiovana%7Cgiovani%7Cgiovanice%7Cgiovanna%7Cgiovanni%7Cgiovannie%7Cgiovannina%7Cgiovannio%7Cgiovanno%7Cgiuvanna%7Cgyon%7Cha%7Chaagt%7Chaagtje%7Chaan%7Chaana%7Chaaneman%7Chaanes%7Chaanke%7Chaanpje%7Chaansen%7Chaanskie%7Chaanstje%7Chaantie%7Chaantje%7Chaantke%7Chaantzen%7Chaanzen%7Chaemtje%7Chaenske%7Chaentje%7Chagnes%7Chagnus%7Chagte%7Chagteltje%7Chagtje%7Chalchen%7Chan%7Chanaatje%7Chananja%7Chanasia%7Chanasse%7Chanatasia%7Chanca%7Chance%7Chanchen%7Chanchien%7Chanchin%7Chan

Unnamed: 0,left context,lemma 0,pos 0,word 0,right context
0,Aen mijn welbeminde man schipper,Hans,NEPER,hans,louwerensen decker leggende mijt sijn
1,u l Dienstwilligen huijsvrouw goetien,Hansen,NEPER,hansen,deckers seght tegen klaes bartels
2,Aen schiper Sr.,Hans,NEPER,hans,louwerensz Decker Legende met sijn
3,j ck wil sellfe bij,Hans,NEPER,hans,Iongh gaen wan he na
4,blijve u l vroeuw goetien,Hansen,NEPER,hansen,Deckens dietelof hendrick wrouw is
5,om voorte behandighen an onsen,Jan,NEPER,Ian,die sal se dan voort
6,"Jans, Jck u.l. beminde Man",Hans,NEPER,hans,Claesen lalander laet u.l. op
7,mij u l beminde Man,Hans,NEPER,hans,Claesen […] Jnder dooet soo
8,van vl geekregen die ijn,Jan,NEPER,ijan,toebeiassen brijef stack twelleke mijn
9,bemijnde maet geeschreven die ijn,Jan,NEPER,ijan,toebeiassen brief geesloeten voors ijck


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jan
[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/zeebrieven/hits?&number=50&first=0&patt=%5Bword%3D%22ancko%22%20%26%20pos%3D%22NEPER%22%5D&filter=
[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/zeebrieven/hits?&number=50&first=0&patt=%5Bword%3D%22anco%22%20%26%20pos%3D%22NEPER%22%5D&filter=
[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/zeebrieven/hits?&number=50&first=0&patt=%5Bword%3D%22ancoch%22%20%26%20pos%3D%22NEPER%22%5D&filter=
[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/zeebrieven/hits?&number=50&first=0&patt=%5Bword%3D%22ancolina%22%20%26%20pos%3D%22NEPER%22%5D&filter=
[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/zeebrieven/hits?&number=50&first=0&patt=%5Bword%3D%22anko%22%20%26%20pos%3D%22NEPER%22%5

HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

In [7]:
corpus="opus"
query="[pos_head='VRB']"
search_results=create_corpus(corpus).pattern(query).max_results(33).search().kwic()
display_df(search_results)



[FCorpus Query url:http://brievenalsbuit.ato.ivdnt.org/blacklab-server/OPUS/hits?&number=33&first=0&patt=%5Bpos_head%3D%27VRB%27%5D&filter=
[F                                                                 

Unnamed: 0,left context,lemma 0,pos 0,word 0,right context
0,hem niet uit om te,komen,VRB(finiteness=ger|inf),komen,eten zijn beesten zullen nog
1,niet uit om te komen,eten,VRB(finiteness=ger|inf),eten,zijn beesten zullen nog meer
2,uit om te komen eten,zijn,"VRB(finiteness=fin,mood=ind,tense=pres,number=pl)",zijn,beesten zullen nog meer komen
3,te komen eten zijn beesten,zullen,"VRB(finiteness=fin,mood=ind,tense=pres,number=pl)",zullen,nog meer komen halen Hoge
4,zijn beesten zullen nog meer,komen,VRB(finiteness=ger|inf),komen,halen Hoge Priesteres Misabo Ahm
5,beesten zullen nog meer komen,halen,VRB(finiteness=ger|inf),halen,Hoge Priesteres Misabo Ahm in
6,geen enkel spoor en ik,hebben,"VRB(finiteness=fin,mood=imp|ind,tense=pres,number=sg)",heb,een flinke kou gepakt Ik
7,ik heb een flinke kou,pakken,"VRB(finiteness=part,tense=past)",gepakt,Ik kan niet wachten tot
8,een flinke kou gepakt Ik,kunnen,"VRB(finiteness=fin,mood=imp|ind,tense=pres,number=sg)",kan,niet wachten tot we weer
9,kou gepakt Ik kan niet,wachten,VRB(finiteness=ger|inf),wachten,tot we weer terug op


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…