# search with lunr.js

generate a standalone html webpage with a full search application of assistive technology.

this application depends on data generated by [`2025-03-01-aggregate.ipynb`](2025-03-01-aggregate.ipynb).

In [1]:
from pathlib import Path
import pandas, numpy, midgy, urllib
from pandas import *
from toolz.curried import *

df = pandas.read_json("at.json.gz")

explicity label the service the information comes from.

In [2]:
df["service"] = df.index.to_series().apply(urllib.parse.urlparse).apply(operator.attrgetter("netloc")).str.removeprefix("www.").str.removesuffix(".com")

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1267 entries, https://github.com/ai-collection/ai-collection to https://www.ravelry.com/patterns/library/i-love-you-in-braille
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   description     1254 non-null   object 
 1   stargazerCount  196 non-null    float64
 2   forkCount       196 non-null    float64
 3   license         614 non-null    object 
 4   tags            1218 non-null   object 
 5   name            1267 non-null   object 
 6   service         1267 non-null   object 
dtypes: float64(2), object(5)
memory usage: 79.2+ KB


## create an basic application to return search responses

1. `d3` is used for updating adding and deleting rows
2. `markdown-it` is used to render the descriptions from different services. we'll NEVER have control over this. github at least is strictly gfm.
3. `font-awesome` for icons
4. `lunr.js` provides search indexing capabilities, its not a database, just search.

in this approach the html document is the database containg the both the data and formal representations.

In [7]:
%%
    styles=\
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/brands.min.css" crossorigin="anonymous" referrerpolicy="no-referrer" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/solid.min.css" crossorigin="anonymous" referrerpolicy="no-referrer" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.7.2/css/fontawesome.min.css" crossorigin="anonymous" referrerpolicy="no-referrer" />

    script =\
<script type="module">
import lunr from "https://esm.sh/lunr"
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm"
import markdownIt from 'https://cdn.jsdelivr.net/npm/markdown-it@14.1.0/+esm'

var documents = {{df[["description", "service", "name"]].reset_index().dropna().sample(frac=1).to_json(orient="records")}};
var store = Object.fromEntries(documents.map(x => [x.index, x]))
var idx = lunr(function () {
  this.ref('index')
  this.field('description')
  documents.forEach(function (doc) {this.add(doc)}, this)
})
d3.select("#completion").selectAll("option").data(
    Object.keys(idx.invertedIndex)
).join("option").text(d => d)
function updateSearch(event) {
    event?.preventDefault()
    var table = d3.select(document.forms.search).select("table");
    var body = table.select("tbody");
    var ordering = "name service description".split(" ");
    var tpl = document.forms.search.querySelector("table>template");
    
    var rows = body.selectAll("tr")
        .data(idx.search(document.forms.search.q.value))
      	.join("tr")
        .each((d, i, nodes)=>{
            var self = d3.select(nodes[i]);
            var entry = store[d.ref]
            self.classed(entry.service, true)
            self.selectAll("th").data([entry.name]).join("th").text(d => d)
            self.selectAll("td").data([entry.service]).join(
                (enter) => {
                    var a = enter.append("td").append("a").attr("href", entry.index).attr("title", entry.service)
                    var fa = entry.service;
                    var fa_cat = "brands";
                    if (entry.service == "thingiverse") {
                        fa_cat = "solid"
                        fa="t"
                    }
                    a.append("i")
                    .classed(`fa-${fa_cat}`, true)
                    .classed(`fa-${fa}`, true)
                    .classed(`fa-2xl`, true)
                }
            )
            self.selectAll("td").data([entry.service, entry.description]).join(
                (enter) => enter.append("td").html(d => markdownIt().render(d))
            )
        })
       
        
}

var form = d3.select(document.forms.search)
document.forms.search.onsubmit = updateSearch
globalThis.lunr =  lunr
globalThis.d3 =  d3
globalThis.idx =  idx
globalThis.form =  form
globalThis.documents =  documents

updateSearch()
</script>
<form name=search>
    <label>query<input type=text name=q list=completion value=bottle></label>
    <input type=submit >
    <fieldset name=results>
        <legend>results</legend>
        <table>
            <thead>
                 <tr>
                     <th>project</th>
                     <th>service</th>
                     <th>description</th>
                 </tr>
            </thead>
            <tbody>
            </tbody>
            <template><tr><th></th><td></td></tr></template>
        </table>
    </fieldset>
    <datalist id=completion></datalist>
</form>

project,service,description


## write the application to a standalone program

In [9]:
file = F"""<html>
    <head>{styles}</head>
    <body>{script}</body>
</html>"""

In [10]:
(target := Path("2025-03-12-lunr.html")).write_text(midgy.types.HTML(file).render(globals()))
!zip -p lunr-search.zip $target

  adding: 2025-03-12-lunr.html (deflated 64%)
