# N-gram Tracing

This notebook will be used to test out n-gram tracing for use with author verification methods. The end goal is to ensure the code works to find common n-grams between two texts and that we can return the text prior to those n-grams.

In [223]:
import sys

import pandas as pd

from from_root import from_root

sys.path.insert(0, str(from_root("src")))

from model_loading import load_model
from read_and_write_docs import read_txt
from n_gram_tracing import (
    common_ngrams,
    tokens_to_text,
    texts_around_each_ngram,
    get_trimmed_context_before_span
)
from n_gram_scoring import score_ngrams, score_ngrams_to_df

In [None]:
tokenizer, model = load_model("/Volumes/BCross/models/Llama-3.2-3B")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [None]:
max_len = getattr(model.config, "max_position_embeddings", None) or getattr(model.config, "n_positions", None)
print(max_len)

32768


In [157]:
known_text = read_txt("../../data/kevin_hyatt_mail_1.txt")
unknown_text = read_txt("../../data/kevin_hyatt_mail_2.txt")

## Get Common N-Grams

Here we get the n-grams in common between the two texts.

In [158]:
common = common_ngrams(
    text1=known_text,
    text2=unknown_text,
    n=2,
    tokenizer=tokenizer,
    include_subgrams=False,
    lowercase=True
)

Token indices sequence length is longer than the specified maximum sequence length for this model (1098 > 1024). Running this sequence through the model will result in indexing errors


In [178]:
common

[[',', 'Ġas'],
 [',', 'Ġbut'],
 ['.', 'Ġif'],
 ['.', 'Ġthe'],
 ['ie', ','],
 ['Ġa', 'Ġdraft'],
 ['Ġa', 'Ġfax'],
 ['Ġand', 'Ġhe'],
 ['Ġand', 'Ġi'],
 ['Ġarea', '.'],
 ['Ġcall', 'Ġme'],
 ['Ġdid', 'Ġnot'],
 ['Ġdo', 'Ġnot'],
 ['Ġfor', 'Ġthe'],
 ['Ġhave', 'Ġa'],
 ['Ġhelp', 'Ġin'],
 ['Ġhim', 'Ġto'],
 ['Ġi', 'Ġam'],
 ['Ġi', 'Ġcan'],
 ['Ġi', 'Ġneed'],
 ['Ġi', 'Ġsent'],
 ['Ġif', 'Ġwe'],
 ['Ġin', 'Ġ2002'],
 ['Ġin', 'Ġthe'],
 ['Ġit', 'Ġis'],
 ['Ġj', 'une'],
 ['Ġme', '.'],
 ['Ġme', 'Ġthat'],
 ['Ġme', 'Ġto'],
 ['Ġmeeting', '.'],
 ['Ġneed', 'Ġto'],
 ['Ġnot', 'Ġget'],
 ['Ġof', 'Ġyou'],
 ['Ġon', 'Ġthe'],
 ['Ġon', 'Ġthis'],
 ['Ġon', 'Ġto'],
 ['Ġone', 'Ġof'],
 ['Ġred', 'Ġrock'],
 ['Ġste', 've'],
 ['Ġt', 'uesday'],
 ['Ġthat', 'Ġi'],
 ['Ġthe', 'Ġbullets'],
 ['Ġthis', 'Ġweek'],
 ['Ġto', 'Ġbe'],
 ['Ġto', 'Ġensure'],
 ['Ġto', 'Ġget'],
 ['Ġto', 'Ġsend'],
 ['Ġto', 'Ġthe'],
 ['Ġwanted', 'Ġto'],
 ['Ġweek', '.'],
 ['Ġwhen', 'Ġi'],
 ['Ġwill', 'Ġhave'],
 ['Ġwould', 'Ġbe'],
 ['Ġyou', '.'],
 ['Ġyou', 'Ġneed'],
 ['Ġyou

In [172]:
sample_tokens = common[5]

In [211]:
len(sample_tokens)

2

In [179]:
sample_text = tokens_to_text(['.', 'Ġi', 'Ġwould', 'Ġlike', 'Ġto'], tokenizer)
sample_text

'. i would like to'

## Find Starting Positions

Two options here, to find the starting positions of n-grams and return the text before that or to include the n-gram in the text.

In [180]:
example_texts = texts_around_each_ngram(
    unknown_text,
    sample_text,
    return_spans=True,
    return_token_spans=True,
    tokenizer=tokenizer,
    return_tokenized_text=True
)

[(1081, 1086)]


In [198]:
example_texts[0]

["they also have private parking at the cheap price of euro 18 per day and only a thousand uphill each way steps to it. i do not know if they have two standard room and two superior rooms or just one of each type available. please let me know what your preference is. if both our preferences are for the same type room and they only have one of each, shall we draw straws or have a pizza eating contest? also, there was one other hotel in positano that i looked into before finding hotel marincanto for our sept. i may look them up tonight and fax them tomorrow just to compare. we also have a fax from hotel firenze in rome that we are confirmed for june 7 8! if you have no bullets to report, please reply to me that you have none. the 131 is 100 of the cost of the project. lorraine will be attending for our group. will you be able to attend and give us your advise? earl, as i understand it, you have a copy also. please confirm that you have what you need. i will send my copy to mansoor unless

In [181]:
example_texts

(["they also have private parking at the cheap price of euro 18 per day and only a thousand uphill each way steps to it. i do not know if they have two standard room and two superior rooms or just one of each type available. please let me know what your preference is. if both our preferences are for the same type room and they only have one of each, shall we draw straws or have a pizza eating contest? also, there was one other hotel in positano that i looked into before finding hotel marincanto for our sept. i may look them up tonight and fax them tomorrow just to compare. we also have a fax from hotel firenze in rome that we are confirmed for june 7 8! if you have no bullets to report, please reply to me that you have none. the 131 is 100 of the cost of the project. lorraine will be attending for our group. will you be able to attend and give us your advise? earl, as i understand it, you have a copy also. please confirm that you have what you need. i will send my copy to mansoor unles

In [182]:
tokens = example_texts[3]
token_span = example_texts[2]

In [186]:
token_span

[(1081, 1086)]

In [196]:
tokens[-5:]

['Ġon', 'Ġaug', 'ust', 'Ġ22', '.']

In [208]:
res = get_trimmed_context_before_span(
    tokens = tokens,
    token_span = token_span[0],
    max_tokens = 1000,
    return_text = False,
    tokenizer = tokenizer
)

In [209]:
res

['Ġa',
 'Ġpizza',
 'Ġeating',
 'Ġcontest',
 '?',
 'Ġalso',
 ',',
 'Ġthere',
 'Ġwas',
 'Ġone',
 'Ġother',
 'Ġhotel',
 'Ġin',
 'Ġposit',
 'ano',
 'Ġthat',
 'Ġi',
 'Ġlooked',
 'Ġinto',
 'Ġbefore',
 'Ġfinding',
 'Ġhotel',
 'Ġmar',
 'inc',
 'anto',
 'Ġfor',
 'Ġour',
 'Ġse',
 'pt',
 '.',
 'Ġi',
 'Ġmay',
 'Ġlook',
 'Ġthem',
 'Ġup',
 'Ġtonight',
 'Ġand',
 'Ġfax',
 'Ġthem',
 'Ġtomorrow',
 'Ġjust',
 'Ġto',
 'Ġcompare',
 '.',
 'Ġwe',
 'Ġalso',
 'Ġhave',
 'Ġa',
 'Ġfax',
 'Ġfrom',
 'Ġhotel',
 'Ġfire',
 'n',
 'ze',
 'Ġin',
 'Ġr',
 'ome',
 'Ġthat',
 'Ġwe',
 'Ġare',
 'Ġconfirmed',
 'Ġfor',
 'Ġj',
 'une',
 'Ġ7',
 'Ġ8',
 '!',
 'Ġif',
 'Ġyou',
 'Ġhave',
 'Ġno',
 'Ġbullets',
 'Ġto',
 'Ġreport',
 ',',
 'Ġplease',
 'Ġreply',
 'Ġto',
 'Ġme',
 'Ġthat',
 'Ġyou',
 'Ġhave',
 'Ġnone',
 '.',
 'Ġthe',
 'Ġ131',
 'Ġis',
 'Ġ100',
 'Ġof',
 'Ġthe',
 'Ġcost',
 'Ġof',
 'Ġthe',
 'Ġproject',
 '.',
 'Ġl',
 'or',
 'raine',
 'Ġwill',
 'Ġbe',
 'Ġattending',
 'Ġfor',
 'Ġour',
 'Ġgroup',
 '.',
 'Ġwill',
 'Ġyou',
 'Ġbe',
 'Ġable',

In [210]:
len(res)

1005

In [192]:
res[1]

['.', 'Ġi', 'Ġwould', 'Ġlike', 'Ġto']

In [191]:
res[0][-5:]

['Ġlate', 'Ġthis', 'Ġafternoon', 'Ġto', 'Ġdiscuss']

## Find Positions all Options

In [176]:
for com in common:
    test_text = tokens_to_text(com, tokenizer)
    
    prefixes, tok_spans = texts_around_each_ngram(known_text, test_text, return_token_spans=True, tokenizer=tokenizer)
    
    print(f"Phrase: {test_text} - Span: {tok_spans}")

[(571, 573)]
Phrase: , as - Span: [(571, 573)]
[(390, 392)]
Phrase: , but - Span: [(390, 392)]
[(144, 146), (621, 623)]
Phrase: . if - Span: [(144, 146), (621, 623)]
[(443, 445)]
Phrase: . the - Span: [(443, 445)]
[(671, 673)]
Phrase: ie, - Span: [(671, 673)]
[(257, 259)]
Phrase:  a draft - Span: [(257, 259)]
[(29, 31)]
Phrase:  a fax - Span: [(29, 31)]
[(72, 74)]
Phrase:  and he - Span: [(72, 74)]
[(183, 185), (614, 616)]
Phrase:  and i - Span: [(183, 185), (614, 616)]
[(442, 444)]
Phrase:  area. - Span: [(442, 444)]
[(556, 558), (806, 808)]
Phrase:  call me - Span: [(556, 558), (806, 808)]
[(47, 49)]
Phrase:  did not - Span: [(47, 49)]
[(97, 99)]
Phrase:  do not - Span: [(97, 99)]
[(101, 103), (278, 280), (785, 787)]
Phrase:  for the - Span: [(101, 103), (278, 280), (785, 787)]
[(256, 258), (464, 466), (654, 656), (797, 799)]
Phrase:  have a - Span: [(256, 258), (464, 466), (654, 656), (797, 799)]
[(484, 486)]
Phrase:  help in - Span: [(484, 486)]
[(193, 195)]
Phrase:  him to - Span:

In [177]:
best = None  # global best across all phrases

# tokenize the full unknown text once so we can slice tokens later
enc = tokenizer(unknown_text, add_special_tokens=False)
input_ids = enc.get("input_ids")
if input_ids is None:
    input_ids = tokenizer.encode(unknown_text, add_special_tokens=False)
elif input_ids and isinstance(input_ids[0], (list, tuple)):
    input_ids = input_ids[0]

unk_tokens = (
    tokenizer.convert_ids_to_tokens(input_ids)
    if hasattr(tokenizer, "convert_ids_to_tokens")
    else input_ids
)

for com in common:
    test_text = tokens_to_text(com, tokenizer)

    prefixes, tok_spans = texts_around_each_ngram(
        unknown_text,
        test_text,
        return_token_spans=True,
        tokenizer=tokenizer
    )

    if not tok_spans:
        continue

    # pick the span whose tuple contains the highest number (usually the biggest tok_end)
    best_span = max(tok_spans, key=lambda sp: max(sp))
    best_val = max(best_span)

    print(f"Phrase: {test_text} - Best span: {best_span} (highest={best_val})")

    # keep global best across all phrases
    if best is None or best_val > best["highest"]:
        best_idx = tok_spans.index(best_span)
        best = {
            "phrase": test_text,
            "span": best_span,
            "highest": best_val,
            "prefix": prefixes[best_idx],  # optional
        }

print("\n--- GLOBAL BEST ---")
if best is None:
    print("No spans found.")
else:
    s_tok, e_tok = best["span"]

    best_tokens = unk_tokens[s_tok:e_tok]
    best_tokens_text = (
        tokenizer.decode(input_ids[s_tok:e_tok], skip_special_tokens=True)
        if hasattr(tokenizer, "decode")
        else str(best_tokens)
    )

    print(f"Phrase:  {best['phrase']}")
    print(f"Span:    {best['span']}")
    print(f"Highest: {best['highest']}")
    print(f"Tokens:  {best_tokens}")
    print(f"Text:    {best_tokens_text}")
    # optional:
    # print(f"Prefix:  {best['prefix']}")


[(200, 202)]
Phrase: , as - Best span: (200, 202) (highest=202)
[(703, 705)]
Phrase: , but - Best span: (703, 705) (highest=705)
[(55, 57), (431, 433)]
Phrase: . if - Best span: (431, 433) (highest=433)
[(164, 166), (970, 972)]
Phrase: . the - Best span: (970, 972) (highest=972)
[(1025, 1027)]
Phrase: ie, - Best span: (1025, 1027) (highest=1027)
[(584, 586)]
Phrase:  a draft - Best span: (584, 586) (highest=586)
[(128, 130)]
Phrase:  a fax - Best span: (128, 130) (highest=130)
[(922, 924)]
Phrase:  and he - Best span: (922, 924) (highest=924)
[(665, 667), (803, 805), (1069, 1071)]
Phrase:  and i - Best span: (1069, 1071) (highest=1071)
[(912, 914)]
Phrase:  area. - Best span: (912, 914) (highest=914)
[(504, 506)]
Phrase:  call me - Best span: (504, 506) (highest=506)
[(399, 401)]
Phrase:  did not - Best span: (399, 401) (highest=401)
[(26, 28)]
Phrase:  do not - Best span: (26, 28) (highest=28)
[(61, 63), (564, 566), (608, 610)]
Phrase:  for the - Best span: (608, 610) (highest=610)
[(

In [103]:
score_ngrams(['.', 'Ġi', 'Ġwould', 'Ġlike', 'Ġto'], model, tokenizer, unknown_text, use_bos=True)

IndexError: index out of range in self

In [30]:
score_ngrams(sample_tokens, model, tokenizer, use_bos=True)

{'phrase': ' let me know if',
 'tokens': ['Ġlet', 'Ġme', 'Ġknow', 'Ġif'],
 'num_tokens': 4,
 'log_probs': [-11.52054500579834,
  -3.1591553688049316,
  -2.4348859786987305,
  -1.0943843126296997],
 'sum_log_probs': -18.2089706659317,
 'text_tokens': ['Ġlet', 'Ġme', 'Ġknow', 'Ġif'],
 'text_len': 4,
 'text_log_probs': [-11.52054500579834,
  -3.1591553688049316,
  -2.4348859786987305,
  -1.0943843126296997]}

In [29]:
score_ngrams(sample_text, model, tokenizer, use_bos=True)

{'phrase': ' let me know if',
 'tokens': ['Ġlet', 'Ġme', 'Ġknow', 'Ġif'],
 'num_tokens': 4,
 'log_probs': [-11.52054500579834,
  -3.1591553688049316,
  -2.4348859786987305,
  -1.0943843126296997],
 'sum_log_probs': -18.2089706659317,
 'text_tokens': ['Ġlet', 'Ġme', 'Ġknow', 'Ġif'],
 'text_len': 4,
 'text_log_probs': [-11.52054500579834,
  -3.1591553688049316,
  -2.4348859786987305,
  -1.0943843126296997]}

In [213]:
df_no = score_ngrams_to_df(common, model, tokenizer, full_text=None, use_bos=True)

In [215]:
df_no

Unnamed: 0,phrase_num,phrase_occurrence,phrase,tokens,num_tokens,log_probs,sum_log_probs,text_tokens,text_len,text_log_probs
0,1,1,", as","[,, Ġas]",2,"[-5.659754753112793, -5.123129367828369]",-10.782884,"[,, Ġas]",2,"[-5.659754753112793, -5.123129367828369]"
1,2,1,", but","[,, Ġbut]",2,"[-5.659754753112793, -4.751577854156494]",-10.411333,"[,, Ġbut]",2,"[-5.659754753112793, -4.751577854156494]"
2,3,1,. if,"[., Ġif]",2,"[-4.482760429382324, -9.537545204162598]",-14.020306,"[., Ġif]",2,"[-4.482760429382324, -9.537545204162598]"
3,4,1,. the,"[., Ġthe]",2,"[-4.482760429382324, -7.372399806976318]",-11.855160,"[., Ġthe]",2,"[-4.482760429382324, -7.372399806976318]"
4,5,1,"ie,","[ie, ,]",2,"[-8.445429801940918, -4.862000465393066]",-13.307430,"[ie, ,]",2,"[-8.445429801940918, -4.862000465393066]"
...,...,...,...,...,...,...,...,...,...,...
85,86,1,if you are available.,"[Ġif, Ġyou, Ġare, Ġavailable, .]",5,"[-9.735991477966309, -1.6311017274856567, -2.1...",-24.022795,"[Ġif, Ġyou, Ġare, Ġavailable, .]",5,"[-9.735991477966309, -1.6311017274856567, -2.1..."
86,87,1,. let me know if you,"[., Ġlet, Ġme, Ġknow, Ġif, Ġyou]",6,"[-4.482760429382324, -10.545089721679688, -3.8...",-21.918498,"[., Ġlet, Ġme, Ġknow, Ġif, Ġyou]",6,"[-4.482760429382324, -10.545089721679688, -3.8..."
87,88,1,if you have any questions or,"[Ġif, Ġyou, Ġhave, Ġany, Ġquestions, Ġor]",6,"[-9.735991477966309, -1.6311017274856567, -2.7...",-18.474388,"[Ġif, Ġyou, Ġhave, Ġany, Ġquestions, Ġor]",6,"[-9.735991477966309, -1.6311017274856567, -2.7..."
88,89,1,", i would like to visit with","[,, Ġi, Ġwould, Ġlike, Ġto, Ġvisit, Ġwith]",7,"[-5.659754753112793, -6.466437339782715, -6.28...",-32.375830,"[,, Ġi, Ġwould, Ġlike, Ġto, Ġvisit, Ġwith]",7,"[-5.659754753112793, -6.466437339782715, -6.28..."


In [216]:
df_known = score_ngrams_to_df(common, model, tokenizer, full_text=known_text, use_bos=True, num_tokens=1000)

[(571, 573)]
[(390, 392)]
[(144, 146), (621, 623)]
[(443, 445)]
[(671, 673)]
[(257, 259)]
[(29, 31)]
[(72, 74)]
[(183, 185), (614, 616)]
[(442, 444)]
[(556, 558), (806, 808)]
[(47, 49)]
[(97, 99)]
[(101, 103), (278, 280), (785, 787)]
[(256, 258), (464, 466), (654, 656), (797, 799)]
[(484, 486)]
[(193, 195)]
[(529, 531)]
[(246, 248), (540, 542), (615, 617), (754, 756)]
[(345, 347)]
[(27, 29)]
[(622, 624)]
[(137, 139)]
[(300, 302), (437, 439)]
[(749, 751)]
[(167, 169)]
[(807, 809)]
[(76, 78)]
[(223, 225)]
[(283, 285)]
[(346, 348)]
[(98, 100)]
[(6, 8), (147, 149), (704, 706), (730, 732)]
[(386, 388)]
[(11, 13), (322, 324)]
[(205, 207)]
[(146, 148)]
[(330, 332)]
[(218, 220)]
[(603, 605)]
[(26, 28)]
[(260, 262), (268, 270)]
[(32, 34), (174, 176), (522, 524), (709, 711), (723, 725)]
[(176, 178)]
[(93, 95)]
[(224, 226)]
[(774, 776)]
[(206, 208), (297, 299)]
[(773, 775)]
[(213, 215)]
[(673, 675)]
[(595, 597)]
[(161, 163)]
[(620, 622)]
[(235, 237), (482, 484)]
[(23, 25), (731, 733)]
[(563, 566)

In [217]:
df_known

Unnamed: 0,phrase_num,phrase_occurrence,phrase,tokens,num_tokens,log_probs,sum_log_probs,text_tokens,text_len,text_log_probs
0,1,1,", as","[,, Ġas]",2,"[-1.019666314125061, -5.439155578613281]",-6.458822,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",573,"[-11.021262168884277, -3.822641372680664, -0.2..."
1,2,1,", but","[,, Ġbut]",2,"[-2.6605143547058105, -2.4401988983154297]",-5.100713,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",392,"[-11.021262168884277, -3.822641372680664, -0.2..."
2,3,1,. if,"[., Ġif]",2,"[-0.9009577035903931, -4.4446234703063965]",-5.345581,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",146,"[-11.021262168884277, -3.822641372680664, -0.2..."
3,3,2,. if,"[., Ġif]",2,"[-0.48230841755867004, -3.4513957500457764]",-3.933704,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",623,"[-11.021262168884277, -3.822641372680664, -0.2..."
4,4,1,. the,"[., Ġthe]",2,"[-0.8463349342346191, -4.308838844299316]",-5.155174,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",445,"[-11.021262168884277, -3.822641372680664, -0.2..."
...,...,...,...,...,...,...,...,...,...,...
110,86,1,if you are available.,"[Ġif, Ġyou, Ġare, Ġavailable, .]",5,"[-3.888493061065674, -1.3347547054290771, -2.0...",-10.362843,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",529,"[-11.021262168884277, -3.822641372680664, -0.2..."
111,87,1,. let me know if you,"[., Ġlet, Ġme, Ġknow, Ġif, Ġyou]",6,"[-1.82478928565979, -5.64450740814209, -0.2195...",-8.822046,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",483,"[-11.021262168884277, -3.822641372680664, -0.2..."
112,88,1,if you have any questions or,"[Ġif, Ġyou, Ġhave, Ġany, Ġquestions, Ġor]",6,"[-1.1867774724960327, -1.0457713603973389, -1....",-7.948379,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",468,"[-11.021262168884277, -3.822641372680664, -0.2..."
113,89,1,", i would like to visit with","[,, Ġi, Ġwould, Ġlike, Ġto, Ġvisit, Ġwith]",7,"[-1.1141384840011597, -1.9723618030548096, -4....",-16.292405,"[many, ,, Ġmany, Ġthanks, Ġfor, Ġall, Ġof, Ġyo...",367,"[-11.021262168884277, -3.822641372680664, -0.2..."


In [218]:
df_unknown = score_ngrams_to_df(common, model, tokenizer, full_text=unknown_text)

IndexError: index out of range in self

In [221]:
df_unknown = score_ngrams_to_df(common, model, tokenizer, full_text=unknown_text, use_bos=True, num_tokens=1000)

In [222]:
df_unknown

Unnamed: 0,phrase_num,phrase_occurrence,phrase,tokens,num_tokens,log_probs,sum_log_probs,text_tokens,text_len,text_log_probs
0,1,1,", as","[,, Ġas]",2,"[-3.0795679092407227, -5.415594100952148]",-8.495162,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",202,"[-9.514735221862793, -5.6802144050598145, -2.7..."
1,2,1,", but","[,, Ġbut]",2,"[-1.9199104309082031, -2.626206874847412]",-4.546117,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",705,"[-9.514735221862793, -5.6802144050598145, -2.7..."
2,3,1,. if,"[., Ġif]",2,"[-1.1455343961715698, -3.911022186279297]",-5.056557,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",57,"[-9.514735221862793, -5.6802144050598145, -2.7..."
3,3,2,. if,"[., Ġif]",2,"[-1.3444461822509766, -3.487243413925171]",-4.831690,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",433,"[-9.514735221862793, -5.6802144050598145, -2.7..."
4,4,1,. the,"[., Ġthe]",2,"[-1.065126657485962, -4.100943565368652]",-5.166070,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",166,"[-9.514735221862793, -5.6802144050598145, -2.7..."
...,...,...,...,...,...,...,...,...,...,...
117,86,1,if you are available.,"[Ġif, Ġyou, Ġare, Ġavailable, .]",5,"[-3.424461841583252, -0.23520515859127045, -1....",-10.285177,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",756,"[-9.514735221862793, -5.6802144050598145, -2.7..."
118,87,1,. let me know if you,"[., Ġlet, Ġme, Ġknow, Ġif, Ġyou]",6,"[-1.1237517595291138, -5.131729602813721, -0.6...",-8.408055,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",619,"[-9.514735221862793, -5.6802144050598145, -2.7..."
119,88,1,if you have any questions or,"[Ġif, Ġyou, Ġhave, Ġany, Ġquestions, Ġor]",6,"[-2.703808069229126, -0.8416548371315002, -1.6...",-7.866544,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",882,"[-9.514735221862793, -5.6802144050598145, -2.7..."
120,89,1,", i would like to visit with","[,, Ġi, Ġwould, Ġlike, Ġto, Ġvisit, Ġwith]",7,"[-0.9718462824821472, -1.1969939470291138, -1....",-15.629446,"[they, Ġalso, Ġhave, Ġprivate, Ġparking, Ġat, ...",485,"[-9.514735221862793, -5.6802144050598145, -2.7..."
