Provide a way to populate field values from Markdown/MDX processing pipeline #216

motleydev · 2022-05-10T13:26:14Z

Feature:
I'd like to use an MDX plugin to create programmatic meta (variable declarations) that can then be exposed in the final object contentlayer provides.

Use case:
An example of here this would be useful is generating a TOC, creating a list of unique keywords, etc.

Work around:
Use computed fields (which don't expose the the raw AST)

Related issues: kentcdodds/mdx-bundler#169

Consider the following config structure

export default makeSource({
  contentDirPath: "content",
  documentTypes: [Posts, Things, People],
  mdx: {
    rehypePlugins: [rehypeSlug, rehypeAutolinkHeadings, searchMeta],
  },
});

Where searchMeta looks at paragraph nodes of mhast, grabs a list of unique words, and adds them to the metadata as searchMeta.

A markdown file with the structure of:

---
title: Hello World
slug: hello-world
---
Hello World! Please say Hello!

Would generate a final object of:

{
"title": "Hello World",
"slug": "hello-world",
"searchMeta": ["hello", "world", "please", "say"],
"code": "().....",
"_raw": "..."
}

For sake of complete, if not ugly code, here's a working example of the plugin that adds searchMeta to the data attribute of the vFile in the rehype plugin chain.

import { visit } from "unist-util-visit";

export default function searchMeta() {
  return (tree, file) => {
    visit(tree, { tagName: "p" }, (node) => {
      let words = node.children.reduce((collector, current) => {
        if (typeof current.value === "string") {
          let wordList = current.value
            .split(" ")
            .filter((word) => !word.includes(":"))
            .map((word) => word.toLowerCase().replace(/[^a-z0-9]/gi, ""))
            .filter((word) => word.length > 3);
          let newCollector = new Set([...wordList, ...collector]);
          return newCollector;
        } else {
          return collector;
        }
      }, new Set());

      file.data.searchMeta = [...words];
    });
  };
}

The text was updated successfully, but these errors were encountered:

timlrx · 2022-05-10T14:43:01Z

As a temporary workaround, one could consider defining a computedField that parses the raw output from contentlayer. Here's an example of extracting the table of contents of a markdown file and making it available as a toc property in contentlayer

// Assume a remark plugin that stores the information in `vfile.data.toc`
export async function extractTocHeadings(markdown) {
  const vfile = await remark().use(remarkTocHeadings).process(markdown)
  return vfile.data.toc
} 

const computedFields: ComputedFields = {
  toc: { type: 'string', resolve: (doc) => extractTocHeadings(doc.body.raw) },
  ...
}

schickling · 2022-07-05T10:24:54Z

@motleydev would having access to the vFile.data property from within computedFields be a good solution to your described problem?

Something along those lines

const computedFields: ComputedFields = {
  toc: { type: 'string', resolve: (_doc, { vfile }) => vfile.data.toc },
  ...
}

motleydev · 2022-07-05T12:01:58Z

When do computed fields get executed? At run time or at compilation? At the end of the day, what I'm trying to get is the data added to static output.

schickling · 2022-07-05T12:31:01Z

computedFields are executed together with all other fields - therefore are part of your static output. (Just opened a docs issue to clarify this).

motleydev · 2022-07-10T05:24:42Z

in that case, that would probably work just fine! Would still be nice to do the work during the original transform process to not need to revisit each file, but for a static output process, that's probably shaving the yack a bit too close.

motleydev · 2022-07-13T05:43:21Z

The more I think about it, accessing vfile.data from computed fields would totally solve my use-case. It'd still be nice to be able to do all the work "in" the handler, but being able to do visit work during the initial parsing and then passing that along with the payload would be more than sufficient. What do you think a reasonable timeline on that would be?

essential-randomness · 2022-11-14T07:27:40Z

Any update on this? I'd be willing to try my hand at a PR to pass vfile as an additional argument of resolve in computedFields. I need the same thing!

essential-randomness · 2022-11-14T09:56:56Z

I've spent the evening trying to work on a solution myself (for MDX files), and reached the same conclusions as @stefanprobst in #236 (comment). To summarize: there is no way to access vfile.data when using mdx-bundler or @mdx-js/esbuild, and the best way to surface them back to them is as named exports, as done here.

At this point, I think the way to resolve this would be

Create utilities for (or document a way) to map vfile.data fields to named exports.
Support surfacing MDX exports as document fields (Consider supporting MDX exports #64).

I'm still willing to try and help further progress on this issue. Currently carrying around a lot of hacks in my code ;)

schickling · 2023-01-17T10:12:01Z

Thanks for your comment @essential-randomness. Very helpful. I hope I'll get some capacity soon to take a stab at this!

donaldxdonald · 2023-05-27T16:07:15Z

Need this~

cpatti97100 · 2024-01-25T18:28:47Z

I tried the code above to no avail... did someone manage to read the mdx content and add data to frontmatter using a custom remark plugin in this context? thanks!

cpatti97100 · 2024-02-02T09:22:48Z

hope it helps someone, in the end I managed like this

// this is a bit too custom maybe but you get the idea
function extractHtmlHeadings(tree) {
  const headings = []

  visit(
    tree,
    (node) =>
      ['mdxJsxFlowElement', 'mdxJsxTextElement'].includes(node.type) &&
      node.name.match(/h[2-3]/g),
    (node) => {
      if (['mdxJsxFlowElement'].includes(node.type)) {
        headings.push({
          id: node.attributes[0].value,
          text: node.children[0].children[0].value,
          type: node.name === 'h2' ? 'heading2' : 'heading3',
        })

        return
      }

      headings.push({
        id: node.attributes[0].value,
        text: node.children[0].value,
        type: node.name === 'h2' ? 'heading2' : 'heading3',
      })
    }
  )

  return buildTreeFromHeadings(headings)
}


export const InstructionsForUse = defineDocumentType(() => ({
  contentType: 'mdx',
  computedFields: {
    toc: {
      type: 'nested',
      of: Toc,
      resolve(doc) {
        return remark()
          .use(remarkMdx)
          .use(function searchMeta() {
            return function transformer(tree, file) {
              const headings = extractHtmlHeadings(tree)

              file.data = headings
            }
          })
          .process(doc.body.raw)
          .then((vFile) => {
            return vFile.data
          })
      },
    },
  },

schickling added topic: markdown/mdx topic: schema labels May 10, 2022

schickling mentioned this issue May 10, 2022

Investigate table of content use case #137

Open

schickling changed the title ~~Creating programmatic meta via plugin~~ Provide a way to populate field values from Markdown/MDX processing pipeline May 10, 2022

schickling added this to the Pre-1.0 milestone May 10, 2022

schickling mentioned this issue May 10, 2022

Content sources as data processing pipelines #159

Open

stefanprobst mentioned this issue May 29, 2022

feat: preserve vfile data from unified processor #236

Closed

schickling mentioned this issue Jul 5, 2022

Clarify that computedFields are processed statically contentlayerdev/website#66

Open

Themezv mentioned this issue Apr 2, 2023

Add TOC component MemeBattle/monorepo#314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a way to populate field values from Markdown/MDX processing pipeline #216

Provide a way to populate field values from Markdown/MDX processing pipeline #216

motleydev commented May 10, 2022 •

edited

Loading

timlrx commented May 10, 2022

schickling commented Jul 5, 2022

motleydev commented Jul 5, 2022

schickling commented Jul 5, 2022

motleydev commented Jul 10, 2022

motleydev commented Jul 13, 2022

essential-randomness commented Nov 14, 2022

essential-randomness commented Nov 14, 2022 •

edited

Loading

schickling commented Jan 17, 2023

donaldxdonald commented May 27, 2023

cpatti97100 commented Jan 25, 2024

cpatti97100 commented Feb 2, 2024 •

edited

Loading

Provide a way to populate field values from Markdown/MDX processing pipeline #216

Provide a way to populate field values from Markdown/MDX processing pipeline #216

Comments

motleydev commented May 10, 2022 • edited Loading

timlrx commented May 10, 2022

schickling commented Jul 5, 2022

motleydev commented Jul 5, 2022

schickling commented Jul 5, 2022

motleydev commented Jul 10, 2022

motleydev commented Jul 13, 2022

essential-randomness commented Nov 14, 2022

essential-randomness commented Nov 14, 2022 • edited Loading

schickling commented Jan 17, 2023

donaldxdonald commented May 27, 2023

cpatti97100 commented Jan 25, 2024

cpatti97100 commented Feb 2, 2024 • edited Loading

motleydev commented May 10, 2022 •

edited

Loading

essential-randomness commented Nov 14, 2022 •

edited

Loading

cpatti97100 commented Feb 2, 2024 •

edited

Loading