# *Chaining search*: Names lexicon service demonstration

Use this notebook to combine linguistic resources yourself: corpora, lexica and treebanks. 
 * Use methods from our library *chaininglib*, [described in the documentation](doc/_build/html/index.html)
 * To get an idea of the possibilities and to copy code, go to the [Examples notebook](Examples.ipynb).
 * If you encounter any bugs or errors, please let us know via our [GitHub issue tracker](https://github.com/INL/chaining-search/issues) or send an e-mail to servicedesk@ivdnt.org.

In [1]:
#import chaininglib

from chaininglib.search.CorpusQuery import *
from chaininglib.process.corpus import *
from chaininglib.ui.dfui import *
from chaininglib.search.LexiconQuery import *
from chaininglib.ui.dfui import display_df
from chaininglib.search.CorpusQuery import *
from chaininglib.search.LexiconQuery import *
from IPython.core.display import display, HTML
from chaininglib.search.corpusQueries import corpus_query
from chaininglib.process.lexicon import get_diamant_synonyms
from chaininglib.ui.dfui import display_df
import re

lexicon_name='nameslex'

names_lexicon=create_lexicon(lexicon_name)

def get_variants(search_word, max):
    df_lexicon = names_lexicon.query_type('expand').word(search_word).search().kwic()
    syns = df_lexicon['wordform'].tolist() 
    syns.append(search_word) 
    filtered_syns = [syn for syn in syns if len(syn) > 0 and re.match("^[a-zA-Z]+$", syn)]
    return filtered_syns[0:max]

def expanded_query(search_word, max):
    variants = get_variants(search_word, max)
    bigOr = '|'.join(variants)
    return corpus_query(word=bigOr,pos='NEPER')

def separate_variant_queries(search_word, max):
    variants = get_variants(search_word, max)
    variant_queries = [corpus_query(word=variant,pos='NEPER') for variant in variants]
    return variant_queries

def find_with_expanded_query(corpus,search_word, max_variants, max_results):
    query = expanded_query(search_word, max_variants)
    search_results = create_corpus(corpus).pattern(query).max_results(max_results).search().kwic()
    return search_results

def find_with_separate_queries(corpus,search_word, max_variants, max_results_per_variant):
    variant_queries = separate_variant_queries(search_word, max_variants)
    combined_results = pd.DataFrame()
    for one_query in variant_queries:
        #print("Searching for pattern: " + one_query)
        kwic = create_corpus(corpus).pattern(one_query).max_results(max_results_per_variant).search().kwic()
        #print("#Corpus search results: " + str(len(kwic.index)))
        combined_results = combined_results.append(kwic)
    return combined_results


Lexicon service query types: 
* expand (get all word forms which are in a common paradigm with search_word)
* get_wordforms (get all wordforms from the paradim of lemmata with lemma form search_word)
* get_lemma_from_wordform (find lemmata l such that search_word belongs to the paradigm of l)
* get_related_lemmata (find related lemmata to lemma with id lemma_id)
* get_wordforms_from_lemma_id (get all wordforms from the paradim of lemmata with lemma id lemma_id)

# Expand query: from wordform to wordform

In [2]:
search_word='mathieu'

df_lexicon = names_lexicon.query_type('expand').word(search_word).search().kwic()

display_df(df_lexicon)

...Searching nameslex...[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=mathieu
                                                                    [F

Unnamed: 0,query_lemma_id,query_word,query_pos,wordform
0,,mathieu,,
1,,mathieu,,lemaheu
2,,mathieu,,lemahie
3,,mathieu,,lemahieu
4,,mathieu,,lemahui
5,,mathieu,,lemavie
6,,mathieu,,maahijs
7,,mathieu,,maarthis
8,,mathieu,,maate
9,,mathieu,,maatea


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Get lemma from wordform

In [3]:
search_word='jesse'

df_lexicon  = create_lexicon(lexicon_name).query_type('get_lemma_from_wordform').word(search_word).search().kwic()

display_df(df_lexicon)

...Searching nameslex...[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_lemma_from_wordform?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jesse
                                                                    [F

Unnamed: 0,dataset,lemma,lemma_id,pos
0,names_sns,,8381,
1,names_gn,,435,F M


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Get wordforms from lemma id

In [4]:
lex = create_lexicon(lexicon_name).query_type('get_wordforms_from_lemma_id').lemma_id('8381').search()

df_lexicon = lex.kwic()

display_df(df_lexicon)

...Searching nameslex...[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_wordforms_from_lemma_id?case_sensitive=false&tweaked_queries=true&database=nameslex&lemma_id=8381
                                                                    [F

Unnamed: 0,query_lemma_id,query_word,query_pos,wordform
0,8381,,,jees
1,8381,,,jeessen
2,8381,,,jehse
3,8381,,,jes
4,8381,,,jeschen
5,8381,,,jeseer
6,8381,,,jesel
7,8381,,,jesella
8,8381,,,jesen
9,8381,,,jeseph


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Get related lemmata

In [5]:
df_lexicon = names_lexicon.query_type('get_related_lemmata').lemma_id('8381').search().kwic()

display_df(df_lexicon)

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/get_related_lemmata?case_sensitive=false&tweaked_queries=true&database=nameslex&lemma_id=8381
[F                                                                 

Unnamed: 0,lemma,lemma_id,pos,relation
0,,8480,,synonym
1,,8388,,synonym
2,,8381,,synonym


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

# Query expansie met names lexicon, zoek in zeebrieven

In [6]:
corpus= "zeebrieven"
search_word='jan'


        

#display(HTML('Naamsvarianten bij <b>' + search_word + '</b>: ' + ", ".join(syns[1:30])))


kwic = find_with_expanded_query(corpus,search_word, 500, 50)

display(HTML("<i>Results for expanded query </i>: " + str(len(kwic.index))))

display_df(kwic)


kwic = find_with_separate_queries(corpus,search_word, 30, 50)

display(HTML("<i>Results for separate queries </i>: " + str(len(kwic.index))))

display_df(kwic)


[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jan
[F...Searching zeebrieven at result 0...                                                                                                  

Unnamed: 0,left context,lemma 0,pos 0,word 0,right context
0,Aen mijn welbeminde man schipper,Hans,NEPER,hans,louwerensen decker leggende mijt sijn
1,u l Dienstwilligen huijsvrouw goetien,Hansen,NEPER,hansen,deckers seght tegen klaes bartels
2,Aen schiper Sr.,Hans,NEPER,hans,louwerensz Decker Legende met sijn
3,j ck wil sellfe bij,Hans,NEPER,hans,Iongh gaen wan he na
4,blijve u l vroeuw goetien,Hansen,NEPER,hansen,Deckens dietelof hendrick wrouw is
5,om voorte behandighen an onsen,Jan,NEPER,Ian,die sal se dan voort
6,"Jans, Jck u.l. beminde Man",Hans,NEPER,hans,Claesen lalander laet u.l. op
7,mij u l beminde Man,Hans,NEPER,hans,Claesen […] Jnder dooet soo
8,van vl geekregen die ijn,Jan,NEPER,ijan,toebeiassen brijef stack twelleke mijn
9,bemijnde maet geeschreven die ijn,Jan,NEPER,ijan,toebeiassen brief geesloeten voors ijck


HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…

[FQuery URL: http://sk.taalbanknederlands.inl.nl/LexiconService/lexicon/expand?case_sensitive=false&tweaked_queries=true&database=nameslex&wordform=jan
[F...Searching zeebrieven at result 0...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      

HBox(children=(Label(value='Sla uw resultaten op:'), Text(value='mijn_resultaten.csv'), Button(button_style='w…