# creating static sites from notebooks and dataframes in notebooks

this is a short notebook that shows an end to end of making a static
site from notebooks, which works with markdown or html content.
by going end to end we can see the arc from indexing the site contents
to working with the html and eventually writing to disc.
this is AN implementation, not the THE implementation. it is meant for discussion with reduced complexities.

the power in the dataframe forward is replacing object orientied programming
techniques and control flow with fluent programming interfaces that operate rows and columns of data.
this approach to programming transforms traditional static site generated from OOP TO FP.

In [1]:
%%
<div hidden type="text/x-python">

    from tonyfast.tonyfast.xxiv.schema_frame import tag
    import nbconvert, nbformat, bs4, shlex

    async def sh(cmd, **kwargs):
        kwargs.setdefault("stdout", asyncio.subprocess.PIPE) 
        kwargs.setdefault("stderr", asyncio.subprocess.PIPE) 
        return list(map(bytes.decode, await (await asyncio.subprocess.create_subprocess_exec(*shlex.split(cmd), **kwargs)).communicate()))

</div>

<style>
div.container .jp-CodeCell .jp-Cell-inputWrapper, [data-mime-type="application/vnd.jupyter.stderr"] {
    display: none;
}
</style>

In [2]:
%%
## create the index of input files

the substrate for all static site data frame work is creating an index of all the content files.
typically, a static site is generated from a git repo, and it is possible request things like
updated times and authors from the revision history.



    index = Index(Path("~/research/").expanduser().glob("*.ipynb")).rename("path")

In [3]:
%%
## loading content without control flows

load in all of the file contents

    df = Series(
        await gather(*index.map(compose_left(anyio.Path, anyio.Path.read_text))), index
    ).apply(json.loads).rename("data").to_frame()

<details><summary>ensure some types in the notebook formats</summary>

    for nb in df.data:
        for cell in nb["cells"]:
            cell["source"] = "".join(cell["source"])
            for output in cell.get("outputs", ""):
                if "data" in output:
                    for k, v in output["data"].items():
                        if k == "text/markdown":
                            output["data"][k] = "".join(v)
</details>



now that we have out data structured we can perform simple operations like creating a target for the content in a static site context.

    df = df.index.to_series().apply(lambda x: Path(x.with_suffix(x.suffix + ".html").name)).to_frame("target").combine_first(df)
    df = df.head(20)

or a more complicated scenario where we extract that time the  content was updated from the git history.

    df = Series(await gather(*(df.index.to_series().apply(
        lambda x: sh("""git log --oneline -n1  --pretty="format:%H %ct" -- """ + x.name + "", cwd=x.parent)
    ))), df.index).apply(first).str.split(expand=True).rename(columns={0: "hash", 1: "updated_at"}).combine_first(df)

to recap, from our index we read in the files contents and extract file level metadata from teh dataframe. these actions represent some of the ways we can work with documentation as structured data.

{{df.T._repr_html_()}}



path,/Users/tonyfast/research/Untitled1.ipynb,/Users/tonyfast/research/2025-06-10-illusion.ipynb,/Users/tonyfast/research/Untitled3.ipynb,/Users/tonyfast/research/Untitled.ipynb,/Users/tonyfast/research/2025-06-20-hocr.ipynb,/Users/tonyfast/research/Untitled4.ipynb,/Users/tonyfast/research/2025-01-14-graphql.ipynb,/Users/tonyfast/research/2025-03-01-colab.ipynb,/Users/tonyfast/research/2025-03-12-lunr.ipynb,/Users/tonyfast/research/Untitled2.ipynb,/Users/tonyfast/research/2025-06-09-illusion.ipynb,/Users/tonyfast/research/2025-01-27-ravelry.ipynb,/Users/tonyfast/research/2025-04-17-workflows.ipynb,/Users/tonyfast/research/2025-02-24-ravelry.ipynb,/Users/tonyfast/research/2025-06-23-hocr.ipynb,/Users/tonyfast/research/2025-03-27-thingiverse.ipynb,/Users/tonyfast/research/2025-02-23-thingiverse.ipynb,/Users/tonyfast/research/2025-01-10-a11y-metadata.ipynb,/Users/tonyfast/research/2025-01-24-research.ipynb,/Users/tonyfast/research/2025-03-27-github-graphql.ipynb
data,"{'cells': [{'cell_type': 'code', 'execution_co...","{'cells': [{'cell_type': 'markdown', 'id': '2f...",{'cells': [{'attachments': {'ff69e65f-8d73-44b...,"{'cells': [{'cell_type': 'markdown', 'id': 'e8...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': '10...","{'nbformat': 4, 'nbformat_minor': 0, 'metadata...","{'cells': [{'cell_type': 'markdown', 'id': 'bc...","{'cells': [{'cell_type': 'code', 'execution_co...",{'cells': [{'attachments': {'ff69e65f-8d73-44b...,"{'cells': [{'cell_type': 'markdown', 'id': '64...","{'cells': [{'cell_type': 'markdown', 'id': '48...","{'cells': [{'cell_type': 'markdown', 'id': '64...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': '82...","{'cells': [{'cell_type': 'markdown', 'id': '82...","{'cells': [{'cell_type': 'markdown', 'id': 'b0...","{'cells': [{'cell_type': 'raw', 'id': '778adbd...","{'cells': [{'cell_type': 'markdown', 'id': '3c..."
hash,,,,,,,f703dc629c41fd22c5d4f3d42ab064aed48460db,72113ce4564958c1c3b3113395e04397d1b30804,939a0a496182546359476b58e2487a84f8b31b9f,,,7ca9b5be10312e0407bdadc8202bddc1af219d1a,932e2afc7a1a20a9c947571c20b64bb50fd58ccb,39bf2cc47742935daf63650790d678ad85f4ae34,,72a804a0edd9e9327f95a631bca89af22b9057f1,bb68d29e5c8906080820ee7ad82ebb8bda27850c,f703dc629c41fd22c5d4f3d42ab064aed48460db,9401e6f72d5f18ca556639cb9133bbc6b7d3d32b,a7446d771cb294ce89d319c21e8f8cfbcaf2b6c6
target,Untitled1.ipynb.html,2025-06-10-illusion.ipynb.html,Untitled3.ipynb.html,Untitled.ipynb.html,2025-06-20-hocr.ipynb.html,Untitled4.ipynb.html,2025-01-14-graphql.ipynb.html,2025-03-01-colab.ipynb.html,2025-03-12-lunr.ipynb.html,Untitled2.ipynb.html,2025-06-09-illusion.ipynb.html,2025-01-27-ravelry.ipynb.html,2025-04-17-workflows.ipynb.html,2025-02-24-ravelry.ipynb.html,2025-06-23-hocr.ipynb.html,2025-03-27-thingiverse.ipynb.html,2025-02-23-thingiverse.ipynb.html,2025-01-10-a11y-metadata.ipynb.html,2025-01-24-research.ipynb.html,2025-03-27-github-graphql.ipynb.html
updated_at,,,,,,,1736884797,1740880313,1744920971,,,1738033058,1746036917,1744065812,,1744086998,1743101174,1736884797,1738000351,1744086624


In [9]:
%%
### a digression on the intermediate value of dataframes for static sites

before generating our html, i want to share how current dataframe provides value as a search medium.
the predicatable structure of the notebook allows us to unravel the dataframe in searchable units.

it is common to search for text in a document. `searchable` text is found in the `cells.source`, `outputs.data`, and `outputs.text`.

we need to prepare our data for search, we always do, and dataframes a v natural interfaces to that.
this preparation mimics feature engineering in data science and machine learning applications.
_it is important to remember this is using an explicit API for the sake of demonstration. with aligned goals a lot of complexity can be hidden._
    
    cells = df.data.apply(Series)[["cells"]].stack().apply(Series).stack().apply(Series)
    outputs = cells[cells.outputs.fillna("").astype(bool)][["outputs"]].stack().apply(Series).stack().apply(Series)
    outputs = outputs.pop("data").apply(Series).stack().rename('data').reset_index(-1).rename(columns=dict(level_5="mimetype")).combine_first(outputs)
    outputs.loc[idx, "data"] = outputs[idx := outputs.mimetype.fillna("").str.startswith("text")].data.apply("".join)

    # pad the cells index 
    cells = cells.set_index(Index([None]*len(cells)), append=True).set_index(Index([None]*len(cells)), append=True) 
    searchable = pandas.concat([cells.source, outputs.data, outputs.text])

#### the flexibility of searching dataframes

now we perform a brute force query on the data, this example searches for the use "pandas". 
this demonstrates the broad use of dataframes in the a11yhood project and a need to perform operations on them

    q = "DataFrame"
    search_results = searchable[searchable.fillna("").str.contains(q).fillna(False)]

<details open><summary>naive search results</summary>
{% set repr = search_results.to_frame("results").style %}
{{repr.set_caption("cells containing the search term:" + q)._repr_html_()}}
</details>
<style>
#T_{{repr.uuid}} {height: 400px; display: block; overflow: auto;}
</style>

or we could use duckdb sql or other dataframe sql interfaces. this is the shape of the data structure that exports the lunr search index for a11yhood.
    https://github.com/a11yhood/research/blob/main/2025-03-12-lunr.ipynb
                                                                                                                                         
    def main():
further, an intermediate of purpose of docs as dataframes is search from a command line interface.



Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,results,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0
path,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Unnamed: 0_level_2,Unnamed: 1_level_2,name,lev,itemprop,short,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
group,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Unnamed: 0_level_4,description,name,sort_order,idgroup,namegroup,permalink,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4
id,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5
/Users/tonyfast/research/2025-01-10-a11y-metadata.ipynb,cells,1,,,"%% ## scraping in the document lets make a delicious warm bowl BeautifulSoup of with disability related noodles roasted over a warm cache. import requests; __import__(""requests_cache"").install_cache() soup = bs4.BeautifulSoup(requests.get(url := https://www.w3.org/community/reports/a11y-discov-vocab/CG-FINAL-vocabulary-20241209/ ).text) there is an implicit structure in the document we can use to scoop up the noodles we want; headings, the headings are noodles. df = Series(soup.select(""h1,h2,h3,h4,h5,h6"")).apply(lambda s: Series(dict(name=s.text, lev=int(s.name[1])))) now we get out as many spoons we have, yes even the stabby one which is better known as a shiv. anyway, we'll rearrange the noodles to spell out something useful.  the magician stumbles off stage mapping = df.groupby(df.lev.eq(2).cumsum().rename(""group"")).apply(  lambda df: df[filter] if (filter := df.lev.isin((2, 4, 5))).sum() > 1 else DataFrame() ).assign(itemprop=None) mapping.itemprop = mapping[mapping.lev.eq(2)].name.str.rpartition("" "")[0].str.rpartition("" "")[2] mapping.itemprop = mapping.itemprop.ffill() mapping = mapping[mapping.lev.isin((4,5))] mapping = mapping.groupby(""group"").apply(  lambda df: (  df[df.lev.ne(4)]  ) if df.lev.eq(5).sum() > 0 else df ).droplevel(level=0) mapping = mapping.assign(short=mapping.name.str.partition("" "")[2]) tada! a mapping! clap please. {{mapping.repr_html()}} and for my last trick, i give you an organized definition list of metadata terms that formally describe asistive technology affordances metadata = mapping.short.groupby(mapping.itemprop).agg(list).to_dict() accessibility discovery terms {% for k, v in metadata.items() %} {{k}} {% for i in v %}: {{i}} {% endfor %} {% endfor %}",,,,,
/Users/tonyfast/research/2025-01-14-graphql.ipynb,cells,4,,,"df = DataFrame(await search(""assistive technology""))",,,,,
/Users/tonyfast/research/2025-01-14-graphql.ipynb,cells,6,,,"df = DataFrame(await search(""screen reader""))",,,,,
/Users/tonyfast/research/2025-01-14-graphql.ipynb,cells,8,,,"df = DataFrame(await search(""assistive technology""))",,,,,
/Users/tonyfast/research/2025-01-27-ravelry.ipynb,cells,1,,,"%% ## gather the pattern attributes for search docs for the pattern attributes list. __import__(""dotenv"").load_dotenv() __import__(""requests_cache"").install_cache() import requests  attributes = requests.get( https://api.ravelry.com/pattern_attributes/groups.json  , auth=requests.auth.HTTPBasicAuth(os.environ[""RAVELRY_USERNAME""], os.environ[""RAVELRY_PASSWORD""]) ) df = (  df := pandas.DataFrame(pandas.Series(attributes.json()).attribute_groups) ).pop(""pattern_attributes"").explode().series().join(df, rsuffix=""group"").set_index(""id"") df.pop(""children""); full table of ravelry search attributes {{df.style.repr_html()}}",,,,,
/Users/tonyfast/research/2025-02-24-ravelry.ipynb,cells,2,,,"import requests_cache, platformdirs, os, pandas, urllib  from pandas import Index, Series, DataFrame  from toolz.curried import *  from pathlib import Path  __import__(""dotenv"").load_dotenv()  auth = (os.environ[""RAVELRY_USERNAME""], os.environ[""RAVELRY_PASSWORD""])  cache = platformdirs.user_cache_path(""a11yhood"") / ""ravelry""  cache = Path(""data/ravelry"")  CachedSession = partial(requests_cache.CachedSession, backend=""filesystem"", serializer=""json"")  search_cache = CachedSession(cache / ""search_responses"")  patterns_cache = CachedSession(cache / ""patterns_responses"")  searches: dict = {  ""adaptive"": {""pa"": ""adaptive""},  ""medical device access"": {""pa"": ""medical-device-access""},  ""medical device support"": {""pa"": ""medical-device-accessory""},  # ""mobility aid support"": {""add"": ""mobility-aid-support""},  # ""other"": {""add"": ""other-add-accessibility""},  ""therapy aid/toy"": {""pa"": ""therapy-aid""},  ""medical"": {""pc"": ""medical""}  }  seed_urls = (""https://api.ravelry.com/patterns/search.json?"" + Series(searches).apply(urllib.parse.urlencode))  first_pages = seed_urls.apply(compose(do(print), search_cache.get), auth=auth)",,,,,
/Users/tonyfast/research/2025-03-01-colab.ipynb,cells,7,,,"words = (  df.description.dropna()  # .str.lower()  .apply(nltk.tokenize.sent_tokenize)  .explode()  .dropna()  .apply(nltk.tokenize.word_tokenize)  .apply(compose(lambda x: pipe(x, map(list), partial(DataFrame, columns=""word pos"".split())), nltk.pos_tag))  # .str.lower()  ) words = pandas.concat(pipe(words.items(), dict)).reset_index(-1, drop=True) words[""word""] = words[""word""].str.lower()</td> </tr> <tr>  <th id=""T_4802b_level0_row7"" class=""row_heading level0 row7"" rowspan=""2"">/Users/tonyfast/research/2025-03-27-github-graphql.ipynb</th>  <th id=""T_4802b_level1_row7"" class=""row_heading level1 row7"" rowspan=""2"">cells</th>  <th id=""T_4802b_level2_row7"" class=""row_heading level2 row7"" >2</th>  <th id=""T_4802b_level3_row7"" class=""row_heading level3 row7"" >nan</th>  <th id=""T_4802b_level4_row7"" class=""row_heading level4 row7"" >nan</th>  <td id=""T_4802b_row7_col0"" class=""data row7 col0"" > import os, pandas from toolz.curried import * from pandas import DataFrame, Series, Index __import__(""dotenv"").load_dotenv() client = __import__(""python_graphql_client"").GraphqlClient(  ""https://api.github.com/graphql"", dict(Authorization=F""token {os.environ['GITHUB_TOKEN']}"") )",,,,,
7,,,"df = pandas.concat([  DataFrame(results := await search(""topic:assistive-technology"", os.environ.get(""PAGES"", 1))),  DataFrame(results := await search(""topic:screen-reader"", os.environ.get(""PAGES"", 1)))])",,,,,,,
/Users/tonyfast/research/2025-03-27-thingiverse.ipynb,cells,1,,,"import os, pandas, time, random, platformdirs, requests_cache  from toolz.curried import *  from pandas import DataFrame, Series, Index  from pathlib import Path  __import__(""dotenv"").load_dotenv()  params = dict(access_token=os.environ[""THINGIVERSE_ACCESS_TOKEN""])  cache = platformdirs.user_cache_path(""a11yhood"") / ""thingiverse""  cache = Path(""data"") / ""thingiverse""  CachedSession = partial(requests_cache.CachedSession, backend=""filesystem"", serializer=""json"")  search_session = CachedSession(cache / ""search_responses"")  thing_session = CachedSession(cache / ""thing_responses"")",,,,,
/Users/tonyfast/research/2025-04-17-workflows.ipynb,cells,5,,,"(  df := DataFrame(  (runs := requests_get(  GH + ""repos/a11yhood/research/actions/runs"", params=dict(status=""completed"", branch=""main"")  )).json())  .workflow_runs  .apply(Series)  .set_index(""id"") ) (df := df.assign(**df[df.columns[df.columns.str.endswith(""_at"")]].apply(to_datetime))).T",,,,,

Unnamed: 0_level_0,Unnamed: 1_level_0,name,lev,itemprop,short
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
5,11,2.2.1 AndroidAccessibility,4.0,accessibilityAPI,AndroidAccessibility
5,12,2.2.2 ARIA (deprecated),4.0,accessibilityAPI,ARIA (deprecated)
5,13,2.2.3 ATK,4.0,accessibilityAPI,ATK
5,14,2.2.4 AT-SPI,4.0,accessibilityAPI,AT-SPI
5,15,2.2.5 BlackberryAccessibility (obsolete),4.0,accessibilityAPI,BlackberryAccessibility (obsolete)
...,...,...,...,...,...
10,119,7.3.7 textOnVisual,4.0,accessMode,textOnVisual
11,123,8.2.1 auditory,4.0,accessModeSufficient,auditory
11,124,8.2.2 tactile,4.0,accessModeSufficient,tactile
11,125,8.2.3 textual,4.0,accessModeSufficient,textual

Unnamed: 0_level_0,description,name,sort_order,idgroup,namegroup,permalink
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
322,Design has features to support disability or medical needs,adaptive design,1.0,27,Accessibility,accessibility
323,Design provides for access to/from a medical device,medical device access,1.0,27,Accessibility,accessibility
324,Design is an accessory for a medical device,medical device support,1.0,27,Accessibility,accessibility
325,Design is an accessory for a mobility aid,mobility aid support,1.0,27,Accessibility,accessibility
327,,other,100.0,27,Accessibility,accessibility
326,Therapy aids and toys,therapy aid/toy,1.0,27,Accessibility,accessibility
307,,placeholder,,4,Age / Size / Fit,fit
309,a two-color ribbing in which the knits are worked in one color and the purls in a second color.,corrugated ribbing,,10,Colorwork,colorwork
183,a knitting and crochet technique that produces an hidden image that is only viewable from an angle.,illusion/shadow,,10,Colorwork,colorwork
184,colorwork technique used to create separate blocks of color and in which each area of color has its own separate length of yarn.,Intarsia,,10,Colorwork,colorwork


In [5]:
%%
## rendering html

the approach using the notebook format as a specification for loading files as structure data.
from the `nbformat` we can produce files and archives in many formats. the singular target of html outputs 
in static sites makes it hard to generate other formats. the <var>exporter</var> transforms notebook formats into full html pages,
including rendering markdown and templating documents with jinja. the dataframe approach integrates all of the tradtional
static site generation tools into the dataframe data structure. it is viewed in the same medium as the target, unlike traditional site generation systems that operate with readline terminals.

    exporter = nbconvert.get_exporter("html")(embed_images=True)
    df = df.data.apply(compose_left(nbformat.from_dict, exporter.from_notebook_node, first)).to_frame("html").combine_first(df)
    df = df["html"].apply(bs4.BeautifulSoup, features="lxml").to_frame("bs4").combine_first(df)


now our expanded dataframe includes the content as html, and a beautiful soup object that provides post processing abilities.
       
{{df.T._repr_html_()}}

  {%- elif type == 'text/vnd.mermaid' -%}


path,/Users/tonyfast/research/Untitled1.ipynb,/Users/tonyfast/research/2025-06-10-illusion.ipynb,/Users/tonyfast/research/Untitled3.ipynb,/Users/tonyfast/research/Untitled.ipynb,/Users/tonyfast/research/2025-06-20-hocr.ipynb,/Users/tonyfast/research/Untitled4.ipynb,/Users/tonyfast/research/2025-01-14-graphql.ipynb,/Users/tonyfast/research/2025-03-01-colab.ipynb,/Users/tonyfast/research/2025-03-12-lunr.ipynb,/Users/tonyfast/research/Untitled2.ipynb,/Users/tonyfast/research/2025-06-09-illusion.ipynb,/Users/tonyfast/research/2025-01-27-ravelry.ipynb,/Users/tonyfast/research/2025-04-17-workflows.ipynb,/Users/tonyfast/research/2025-02-24-ravelry.ipynb,/Users/tonyfast/research/2025-06-23-hocr.ipynb,/Users/tonyfast/research/2025-03-27-thingiverse.ipynb,/Users/tonyfast/research/2025-02-23-thingiverse.ipynb,/Users/tonyfast/research/2025-01-10-a11y-metadata.ipynb,/Users/tonyfast/research/2025-01-24-research.ipynb,/Users/tonyfast/research/2025-03-27-github-graphql.ipynb
bs4,"[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met..."
data,"{'cells': [{'cell_type': 'code', 'execution_co...","{'cells': [{'cell_type': 'markdown', 'id': '2f...",{'cells': [{'attachments': {'ff69e65f-8d73-44b...,"{'cells': [{'cell_type': 'markdown', 'id': 'e8...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': '10...","{'nbformat': 4, 'nbformat_minor': 0, 'metadata...","{'cells': [{'cell_type': 'markdown', 'id': 'bc...","{'cells': [{'cell_type': 'code', 'execution_co...",{'cells': [{'attachments': {'ff69e65f-8d73-44b...,"{'cells': [{'cell_type': 'markdown', 'id': '64...","{'cells': [{'cell_type': 'markdown', 'id': '48...","{'cells': [{'cell_type': 'markdown', 'id': '64...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': '82...","{'cells': [{'cell_type': 'markdown', 'id': '82...","{'cells': [{'cell_type': 'markdown', 'id': 'b0...","{'cells': [{'cell_type': 'raw', 'id': '778adbd...","{'cells': [{'cell_type': 'markdown', 'id': '3c..."
hash,,,,,,,f703dc629c41fd22c5d4f3d42ab064aed48460db,72113ce4564958c1c3b3113395e04397d1b30804,939a0a496182546359476b58e2487a84f8b31b9f,,,7ca9b5be10312e0407bdadc8202bddc1af219d1a,932e2afc7a1a20a9c947571c20b64bb50fd58ccb,39bf2cc47742935daf63650790d678ad85f4ae34,,72a804a0edd9e9327f95a631bca89af22b9057f1,bb68d29e5c8906080820ee7ad82ebb8bda27850c,f703dc629c41fd22c5d4f3d42ab064aed48460db,9401e6f72d5f18ca556639cb9133bbc6b7d3d32b,a7446d771cb294ce89d319c21e8f8cfbcaf2b6c6
html,"<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me..."
target,Untitled1.ipynb.html,2025-06-10-illusion.ipynb.html,Untitled3.ipynb.html,Untitled.ipynb.html,2025-06-20-hocr.ipynb.html,Untitled4.ipynb.html,2025-01-14-graphql.ipynb.html,2025-03-01-colab.ipynb.html,2025-03-12-lunr.ipynb.html,Untitled2.ipynb.html,2025-06-09-illusion.ipynb.html,2025-01-27-ravelry.ipynb.html,2025-04-17-workflows.ipynb.html,2025-02-24-ravelry.ipynb.html,2025-06-23-hocr.ipynb.html,2025-03-27-thingiverse.ipynb.html,2025-02-23-thingiverse.ipynb.html,2025-01-10-a11y-metadata.ipynb.html,2025-01-24-research.ipynb.html,2025-03-27-github-graphql.ipynb.html
updated_at,,,,,,,1736884797,1740880313,1744920971,,,1738033058,1746036917,1744065812,,1744086998,1743101174,1736884797,1738000351,1744086624


In [6]:
%%
## indexes as joins

commonly, a static site generator will aggregate blog posts using a seperate template,
the dataframe approach doesn't require a switch in interfaces. in dataframe parlance,
we performing joins on dataframe elements

we extract the titles from the rendered html content.

    df = df.bs4.apply(
        bs4.Tag.select_one, args=("h1,h2,h3,h4,h5,h6",)
    ).dropna().apply(bs4.Tag.get_text).to_frame("title").combine_first(df).combine_first(
        df.index.to_series().apply(compose_left(operator.attrgetter("stem"))).to_frame("title")
    )

then we can take all of the titles and render them as html elements.

    index = tag.section(
        tag.h1("blog posts"),
        df.title.html.tag("a", href=df.target).html.tag("li").html.group("ol")
    )

<details><summary>sample <var>index</var></summary>
{{index}}
</details>

now we can do a spot check of our <var>index</var> elements representation
before we aggregate it into an entire webpage.

    indexes = Series([nbformat.v4.new_notebook(cells=[
        nbformat.v4.new_markdown_cell(str(index))
    ])], Index([Path("index.html")], name="target")).to_frame("data")
    indexes = indexes.data.apply(compose_left(
        nbformat.from_dict, exporter.from_notebook_node, first
    )).to_frame("html").combine_first(indexes)

we can use a similar technique to cast the dataframe as an `atom.rss` or `feed.xml` file

In [7]:
%%
<details open><summary><h3>pagination as groupby</h3></summary>

commonly, static site generators will have pagination indexes that limit the percievable items on a page.
this is a natural groupby action with a dataframe.

    grouped_indexes = df.groupby(RangeIndex(len(df))//5).apply(
        lambda df: Series([tag.section(
            tag.h1("blog posts"),
            df.title.html.tag("a", href=df.target).html.tag("li").html.group("ol")
        )], [F"index{df.name and str(df.name) or ""}.html"])
    )
    
    grouped_indexes = grouped_indexes.apply(
        lambda index: nbformat.v4.new_notebook(cells=[
            nbformat.v4.new_markdown_cell(str(index))
        ])
    ).to_frame("data").reset_index(0, drop=True).rename_axis(index="target")
    
    grouped_indexes = grouped_indexes.data.apply(compose_left(
        nbformat.from_dict, exporter.from_notebook_node, first
    )).to_frame("html").combine_first(grouped_indexes)

{{grouped_indexes.T._repr_html_()}}

</details>

target,index.html,index1.html,index2.html,index3.html
data,"{'nbformat': 4, 'nbformat_minor': 5, 'metadata...","{'nbformat': 4, 'nbformat_minor': 5, 'metadata...","{'nbformat': 4, 'nbformat_minor': 5, 'metadata...","{'nbformat': 4, 'nbformat_minor': 5, 'metadata..."
html,"<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me..."


In [8]:
%%
## writing the files

writing the files flips our index from the source content to the target content

    targets = df.reset_index().set_index("target").combine_first(indexes)

{{targets.T._repr_html_()}}

the target frame provides all the information to write our static site to disc.

    for target, row in targets.iterrows():
        target = "site" / target
        target.parent.mkdir(parents=True, exist_ok=True)
        target.write_text(str(row.html))

we wrote <data>{{len(targets)}}</data> files to disc, and the contents are show in the list below


{{Series(Path("site").rglob("*.html")).to_frame("contents").T._repr_html_()}}

target,2025-01-10-a11y-metadata.ipynb.html,2025-01-14-graphql.ipynb.html,2025-01-24-research.ipynb.html,2025-01-27-ravelry.ipynb.html,2025-02-23-thingiverse.ipynb.html,2025-02-24-ravelry.ipynb.html,2025-03-01-colab.ipynb.html,2025-03-12-lunr.ipynb.html,2025-03-27-github-graphql.ipynb.html,2025-03-27-thingiverse.ipynb.html,...,2025-06-09-illusion.ipynb.html,2025-06-10-illusion.ipynb.html,2025-06-20-hocr.ipynb.html,2025-06-23-hocr.ipynb.html,Untitled.ipynb.html,Untitled1.ipynb.html,Untitled2.ipynb.html,Untitled3.ipynb.html,Untitled4.ipynb.html,index.html
bs4,"[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...",...,"[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...","[html, [\n, [<meta charset=""utf-8""/>, \n, <met...",
data,"{'cells': [{'cell_type': 'markdown', 'id': 'b0...","{'cells': [{'cell_type': 'markdown', 'id': '10...","{'cells': [{'cell_type': 'raw', 'id': '778adbd...","{'cells': [{'cell_type': 'markdown', 'id': '64...","{'cells': [{'cell_type': 'markdown', 'id': '82...","{'cells': [{'cell_type': 'markdown', 'id': '64...","{'nbformat': 4, 'nbformat_minor': 0, 'metadata...","{'cells': [{'cell_type': 'markdown', 'id': 'bc...","{'cells': [{'cell_type': 'markdown', 'id': '3c...","{'cells': [{'cell_type': 'markdown', 'id': '82...",...,{'cells': [{'attachments': {'ff69e65f-8d73-44b...,"{'cells': [{'cell_type': 'markdown', 'id': '2f...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'cells': [{'cell_type': 'markdown', 'id': 'e8...","{'cells': [{'cell_type': 'code', 'execution_co...","{'cells': [{'cell_type': 'code', 'execution_co...",{'cells': [{'attachments': {'ff69e65f-8d73-44b...,"{'cells': [{'cell_type': 'markdown', 'id': 'cd...","{'nbformat': 4, 'nbformat_minor': 5, 'metadata..."
hash,f703dc629c41fd22c5d4f3d42ab064aed48460db,f703dc629c41fd22c5d4f3d42ab064aed48460db,9401e6f72d5f18ca556639cb9133bbc6b7d3d32b,7ca9b5be10312e0407bdadc8202bddc1af219d1a,bb68d29e5c8906080820ee7ad82ebb8bda27850c,39bf2cc47742935daf63650790d678ad85f4ae34,72113ce4564958c1c3b3113395e04397d1b30804,939a0a496182546359476b58e2487a84f8b31b9f,a7446d771cb294ce89d319c21e8f8cfbcaf2b6c6,72a804a0edd9e9327f95a631bca89af22b9057f1,...,,,,,,,,,,
html,"<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...",...,"<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me...","<!DOCTYPE html>\n\n<html lang=""en"">\n<head><me..."
path,/Users/tonyfast/research/2025-01-10-a11y-metad...,/Users/tonyfast/research/2025-01-14-graphql.ipynb,/Users/tonyfast/research/2025-01-24-research.i...,/Users/tonyfast/research/2025-01-27-ravelry.ipynb,/Users/tonyfast/research/2025-02-23-thingivers...,/Users/tonyfast/research/2025-02-24-ravelry.ipynb,/Users/tonyfast/research/2025-03-01-colab.ipynb,/Users/tonyfast/research/2025-03-12-lunr.ipynb,/Users/tonyfast/research/2025-03-27-github-gra...,/Users/tonyfast/research/2025-03-27-thingivers...,...,/Users/tonyfast/research/2025-06-09-illusion.i...,/Users/tonyfast/research/2025-06-10-illusion.i...,/Users/tonyfast/research/2025-06-20-hocr.ipynb,/Users/tonyfast/research/2025-06-23-hocr.ipynb,/Users/tonyfast/research/Untitled.ipynb,/Users/tonyfast/research/Untitled1.ipynb,/Users/tonyfast/research/Untitled2.ipynb,/Users/tonyfast/research/Untitled3.ipynb,/Users/tonyfast/research/Untitled4.ipynb,
title,summarizing accessibility properties for disco...,scraping github¶,the semantic web lives!¶,ravelry api¶,gathering thingiverse things¶,extracting accessibility related patterns from...,analyzing aggregated assistive technology data¶,search with lunr.js¶,gathering metadata on assistive technology¶,gathering thingiverse things¶,...,study in optical illusion¶,video circles¶,"extracting line, paragraph, and page level ocr...","extracting line, paragraph, and page level ocr...",scraping a11yhood submission from github issues¶,Untitled1,Untitled2,study in optical illusion¶,"extracting line, paragraph, and page level ocr...",
updated_at,1736884797,1736884797,1738000351,1738033058,1743101174,1744065812,1740880313,1744920971,1744086624,1744086998,...,,,,,,,,,,

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,29,30,31,32,33,34,35,36,37,38
contents,site/2025-01-14-graphql.ipynb.html,site/2025-03-12-lunr.ipynb.html,site/Untitled5.ipynb.html,site/2025-06-20-hocr.ipynb.html,site/index.html,site/2024-04-03-markdown-lists-to-python.ipynb...,site/2024-03-01-a11y-list-string.ipynb.ipynb.html,site/2024-03-15-screen-tests.ipynb.html,site/2025-02-23-thingiverse.ipynb.html,site/2024-07-03-axes.ipynb.html,...,site/Untitled1.ipynb.html,site/2025-01-24-research.ipynb.html,site/2025-01-29-ravelry.ipynb.html,site/2025-03-01-colab.ipynb.html,site/Untitled.ipynb.html,site/2025-04-17-workflows.ipynb.html,site/2025-01-27-ravelry.ipynb.html,site/Untitled10.ipynb.html,site/2025-02-03-github-graphql.ipynb.html,site/Untitled3.ipynb.html


## conclusion 

dataframes for documentation have natural interactive affordances that improve the flow and interaction while modifying static site content. the dataframe provides a consistent API across all considerations of the site from the high-level macroscopic position of the site to the canonical pages of document, blog posts, and other media all the way down to the nitty gritty units of content.