# Semantic Categories

Pass a list of segments to the LLM and ask for labels and descriptions.

In [None]:
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { readFileSync } from 'node:fs';

import { EXPERIMENTS_DIR, SERVER_DATA_DIR } from '../server/src/util/fileUtils.ts';
import { getNotebookLogger } from '../server/src/Logger.ts';
import { newModel } from '../server/src/agents/agent.ts';

const prompt = ChatPromptTemplate.fromMessages([ new MessagesPlaceholder("messages") ]);
const llm = newModel("Anthropic");
const parser = new JsonOutputParser();
const chain = prompt.pipe(llm).pipe(parser);

const logger = getNotebookLogger();
const lhsText = readFileSync(`${SERVER_DATA_DIR}/SHA-1-md/selected-text.txt`, 'utf-8');
const PROMPT = readFileSync(`${EXPERIMENTS_DIR}/annotateNodePromptCategories6.txt`, 'utf-8');

const segments = lhsText.split('\n\n');
var userInput = JSON.stringify(segments, null, 2);
var output = await chain.invoke({ messages: [
  new SystemMessage(PROMPT),
  new HumanMessage(userInput)
]});
logger.info(output);

[
  {
    description: "Describes the SHIFTROWS transformation which moves bytes in each row by r positions to the left",
    text: "SHIFTROWS() is illustrated in Figure 3. In that representation of the state, the effect is to move each",
    label: "Definitions"
  },
  {
    description: "Reference to Figure 3 showing the SHIFTROWS operation",
    text: "[Figure 3. Illustration of SHIFTROWS()]",
    label: "Examples"
  },
  {
    description: "Section header for the MixColumns transformation",
    text: "### 5.1.3 MixColumns()",
    label: "Header"
  },
  {
    description: "Definition of the MixColumns transformation that multiplies each column of the state by a fixed matrix",
    text: "MixColumns() is a transformation of the state that multiplies each of the four columns of the state by",
    label: "Definitions"
  },
  {
    description: "Mathematical formula showing the matrix multiplication in MixColumns transformation",
    text: "\\left[\\begin{array}{l}\n" +
      "s_{0, c}^{

In [29]:
console.log(JSON.stringify(output, null, 2));

[
  {
    "description": "Section header for the preprocessing section",
    "text": "# 5. PREPROCESSING",
    "label": "Header"
  },
  {
    "description": "Overview of the three preprocessing steps for the hash algorithm",
    "text": "Preprocessing consists of three steps: padding the message, $M$ (Sec. 5.1), parsing the message into ",
    "label": "Definitions"
  },
  {
    "description": "Subsection header for padding the message",
    "text": "## 5.1 Padding the Message",
    "label": "Header"
  },
  {
    "description": "Explanation of the purpose of padding in the hash algorithm",
    "text": "The purpose of this padding is to ensure that the padded message is a multiple of 512 or 1024 bits, d",
    "label": "Intent"
  },
  {
    "description": "Subsubsection header for specific hash algorithms",
    "text": "### 5.1.1 SHA-1, SHA-224 and SHA-256",
    "label": "Header"
  },
  {
    "description": "Detailed specification of the padding algorithm for SHA-1, SHA-224, and SHA-256"

In [None]:
console.log("SEGMENTS LENGTH", segments.length);
console.log("LLM OUTPUT LENGTH", output.length);
for (let i = 0; i < segments.length; i++) {
  const outtext = output[i].text;
  const outlength = outtext.length;
  const segtext = segments[i].substring(0, outlength);
  if (segtext !== outtext) { 
    console.log(`${i} Mismatch`); 
    console.log("SEGMENT");
    console.log(segtext);
    console.log("LLM OUTPUT");
    console.log(outtext);
  } else { 
    console.log(`${i} Match`); 
  }
}


SEGMENTS LENGTH 32
LLM OUTPUT LENGTH 31
0 Match
1 Match
2 Match
3 Match
4 Mismatch
SEGMENT
Thus,
$$
\left[\begin{array}{l}
s_{0, c}^{\prime} \\
s_{1, c}^{\prime} \\
s_{2, c}^{\prime} \\
s_{3, c}^{\prime}
\end{array}\right]=\left[\begin{array}{llll}
02 & 03 & 01 & 01 \\
01 & 02 & 03 & 01 \\
01 & 01 & 02 & 03 \\
03 & 01 & 01 & 02
\end{array}\right]\left[\begin{array}{l}
s_{0, c} \\
s_{1, c} \\
s_{2, c} \\
s_{3, c}
\end{array}\right] \quad \text{for } 0
LLM OUTPUT
\left[\begin{array}{l}
s_{0, c}^{\prime} \\
s_{1, c}^{\prime} \\
s_{2, c}^{\prime} \\
s_{3, c}^{\prime}
\end{array}\right]=\left[\begin{array}{llll}
02 & 03 & 01 & 01 \\
01 & 02 & 03 & 01 \\
01 & 01 & 02 & 03 \\
03 & 01 & 01 & 02
\end{array}\right]\left[\begin{array}{l}
s_{0, c} \\
s_{1, c} \\
s_{2, c} \\
s_{3, c}
\end{array}\right] \quad \text{for } 0 \leq c<4
5 Match
6 Match
7 Match
8 Match
9 Match
10 Match
11 Match
12 Match
13 Match
14 Match
15 Match
16 Mismatch
SEGMENT
INVCIPHER() is described in the pseudocode in Alg. 3, wh

TypeError: Cannot read properties of undefined (reading 'text')