# Creating a concordance with TokenHandler

The `type2toks` attribute of the `nephosem.TokenHandler` class is a dictionary with type names as keys and `nephosem.TypeNode` objects as values.
The `TypeNode` objects have a `tokens` attribute, which is a list of `nephosem.TokenNode` objects with information on each collected token. From them, we can create a concordance with a function like `tokenConcordance()` below.

In [1]:
import pandas as pd # to see concordance
import sys
nephosemdir = "../../nephosem/"
sys.path.append(nephosemdir)
mydir = "./"
from nephosem import ConfigLoader, Vocab, TokenHandler
conf = ConfigLoader()
settings = conf.update_config('config.ini')

## Collect tokens

In [2]:
query = Vocab({'girl/N' : 0}) # dummy query just for illustration
# alternatively, if you already have a vocabulary, vocab.subvocab(['girl/N'])

In [3]:
tokhan = TokenHandler(query, settings=settings)
tokens = tokhan.retrieve_tokens()
tokens

Not provide the temporary path!
Use the default tmp directory: '~/tmp'!


[21, 39]                    say/V  healthy/J  what/W  she/P  boy/N  ask/V  be/V  ...
girl/N/StanfDepSents.6/6    NaN    3          NaN     NaN    -3     NaN    NaN   ...
girl/N/StanfDepSents.6/21   NaN    NaN        NaN     NaN    NaN    NaN    NaN   ...
girl/N/StanfDepSents.2/29   NaN    NaN        NaN     NaN    NaN    NaN    -4    ...
girl/N/StanfDepSents.8/3    NaN    NaN        NaN     NaN    NaN    NaN    NaN   ...
girl/N/StanfDepSents.8/15   NaN    NaN        NaN     NaN    -4     NaN    NaN   ...
girl/N/StanfDepSents.8/25   NaN    NaN        NaN     NaN    -3     NaN    NaN   ...
girl/N/StanfDepSents.10/13  NaN    NaN        NaN     NaN    NaN    NaN    NaN   ...
...                         ...    ...        ...     ...    ...    ...    ...   ...

## type2toks

The concordance information is stored in the `type2toks` attribute.

In [4]:
tokhan.type2toks

{'girl/N': girl/N}

In [5]:
type(tokhan.type2toks["girl/N"])

nephosem.core.terms.TypeNode

In [6]:
firstToken = tokhan.type2toks["girl/N"].tokens[0]
type(firstToken)

nephosem.core.terms.TokenNode

The following function obtains the `word` attribute of each context word and brings them together in a character string. Such attributes are named based on the labels in `settings['global-columns']`.

In [7]:
def tokenConcordance(token):
    leftContext = " ".join([x.word for x in token.lcollocs])
    rightContext = " ".join([x.word for x in token.rcollocs])
    return f"{leftContext} {token.word} {rightContext}"

In [8]:
tokenConcordance(firstToken)

'The boy gives the girl a tasty healthy apple'

In [9]:
pd.DataFrame([
    {"token_id" : str(token), "text" : tokenConcordance(token)}
    for token in tokhan.type2toks["girl/N"].tokens]).head()

Unnamed: 0,text,token_id
0,The boy gives the girl a tasty healthy apple,girl/N/StanfDepSents.6/6
1,The girl does n't eat,girl/N/StanfDepSents.6/21
2,are eaten by the girl,girl/N/StanfDepSents.2/29
3,The girl sat on the apple,girl/N/StanfDepSents.8/3
4,boy looked at the girl 's apple,girl/N/StanfDepSents.8/15


If you have several types in your query, you just need to replace `tokhan.type2toks["girl/N"].tokens` with a flattened version `[tok for typ in tokhan.type2toks.values() for tok in ctyp.tokens]`.