Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to populate field values from Markdown/MDX processing pipeline #216

Open
motleydev opened this issue May 10, 2022 · 12 comments

Comments

@motleydev
Copy link

motleydev commented May 10, 2022

Feature:
I'd like to use an MDX plugin to create programmatic meta (variable declarations) that can then be exposed in the final object contentlayer provides.

Use case:
An example of here this would be useful is generating a TOC, creating a list of unique keywords, etc.

Work around:
Use computed fields (which don't expose the the raw AST)

Related issues: kentcdodds/mdx-bundler#169

Consider the following config structure

export default makeSource({
  contentDirPath: "content",
  documentTypes: [Posts, Things, People],
  mdx: {
    rehypePlugins: [rehypeSlug, rehypeAutolinkHeadings, searchMeta],
  },
});

Where searchMeta looks at paragraph nodes of mhast, grabs a list of unique words, and adds them to the metadata as searchMeta.

A markdown file with the structure of:

---
title: Hello World
slug: hello-world
---
Hello World! Please say Hello!

Would generate a final object of:

{
"title": "Hello World",
"slug": "hello-world",
"searchMeta": ["hello", "world", "please", "say"],
"code": "().....",
"_raw": "..."
}

For sake of complete, if not ugly code, here's a working example of the plugin that adds searchMeta to the data attribute of the vFile in the rehype plugin chain.

import { visit } from "unist-util-visit";

export default function searchMeta() {
  return (tree, file) => {
    visit(tree, { tagName: "p" }, (node) => {
      let words = node.children.reduce((collector, current) => {
        if (typeof current.value === "string") {
          let wordList = current.value
            .split(" ")
            .filter((word) => !word.includes(":"))
            .map((word) => word.toLowerCase().replace(/[^a-z0-9]/gi, ""))
            .filter((word) => word.length > 3);
          let newCollector = new Set([...wordList, ...collector]);
          return newCollector;
        } else {
          return collector;
        }
      }, new Set());

      file.data.searchMeta = [...words];
    });
  };
}
@schickling schickling changed the title Creating programmatic meta via plugin Provide a way to populate field values from Markdown/MDX processing pipeline May 10, 2022
@schickling schickling added this to the Pre-1.0 milestone May 10, 2022
@timlrx
Copy link

timlrx commented May 10, 2022

As a temporary workaround, one could consider defining a computedField that parses the raw output from contentlayer. Here's an example of extracting the table of contents of a markdown file and making it available as a toc property in contentlayer

// Assume a remark plugin that stores the information in `vfile.data.toc`
export async function extractTocHeadings(markdown) {
  const vfile = await remark().use(remarkTocHeadings).process(markdown)
  return vfile.data.toc
} 

const computedFields: ComputedFields = {
  toc: { type: 'string', resolve: (doc) => extractTocHeadings(doc.body.raw) },
  ...
}

@schickling
Copy link
Collaborator

@motleydev would having access to the vFile.data property from within computedFields be a good solution to your described problem?

Something along those lines

const computedFields: ComputedFields = {
  toc: { type: 'string', resolve: (_doc, { vfile }) => vfile.data.toc },
  ...
}

@motleydev
Copy link
Author

When do computed fields get executed? At run time or at compilation? At the end of the day, what I'm trying to get is the data added to static output.

@schickling
Copy link
Collaborator

computedFields are executed together with all other fields - therefore are part of your static output. (Just opened a docs issue to clarify this).

@motleydev
Copy link
Author

in that case, that would probably work just fine! Would still be nice to do the work during the original transform process to not need to revisit each file, but for a static output process, that's probably shaving the yack a bit too close.

@motleydev
Copy link
Author

The more I think about it, accessing vfile.data from computed fields would totally solve my use-case. It'd still be nice to be able to do all the work "in" the handler, but being able to do visit work during the initial parsing and then passing that along with the payload would be more than sufficient. What do you think a reasonable timeline on that would be?

@essential-randomness
Copy link

Any update on this? I'd be willing to try my hand at a PR to pass vfile as an additional argument of resolve in computedFields. I need the same thing!

@essential-randomness
Copy link

essential-randomness commented Nov 14, 2022

I've spent the evening trying to work on a solution myself (for MDX files), and reached the same conclusions as @stefanprobst in #236 (comment). To summarize: there is no way to access vfile.data when using mdx-bundler or @mdx-js/esbuild, and the best way to surface them back to them is as named exports, as done here.

At this point, I think the way to resolve this would be

  1. Create utilities for (or document a way) to map vfile.data fields to named exports.
  2. Support surfacing MDX exports as document fields (Consider supporting MDX exports #64).

I'm still willing to try and help further progress on this issue. Currently carrying around a lot of hacks in my code ;)

@schickling
Copy link
Collaborator

Thanks for your comment @essential-randomness. Very helpful. I hope I'll get some capacity soon to take a stab at this!

@donaldxdonald
Copy link

Need this~

@cpatti97100
Copy link

I tried the code above to no avail... did someone manage to read the mdx content and add data to frontmatter using a custom remark plugin in this context? thanks!

@cpatti97100
Copy link

cpatti97100 commented Feb 2, 2024

hope it helps someone, in the end I managed like this

// this is a bit too custom maybe but you get the idea
function extractHtmlHeadings(tree) {
  const headings = []

  visit(
    tree,
    (node) =>
      ['mdxJsxFlowElement', 'mdxJsxTextElement'].includes(node.type) &&
      node.name.match(/h[2-3]/g),
    (node) => {
      if (['mdxJsxFlowElement'].includes(node.type)) {
        headings.push({
          id: node.attributes[0].value,
          text: node.children[0].children[0].value,
          type: node.name === 'h2' ? 'heading2' : 'heading3',
        })

        return
      }

      headings.push({
        id: node.attributes[0].value,
        text: node.children[0].value,
        type: node.name === 'h2' ? 'heading2' : 'heading3',
      })
    }
  )

  return buildTreeFromHeadings(headings)
}


export const InstructionsForUse = defineDocumentType(() => ({
  contentType: 'mdx',
  computedFields: {
    toc: {
      type: 'nested',
      of: Toc,
      resolve(doc) {
        return remark()
          .use(remarkMdx)
          .use(function searchMeta() {
            return function transformer(tree, file) {
              const headings = extractHtmlHeadings(tree)

              file.data = headings
            }
          })
          .process(doc.body.raw)
          .then((vFile) => {
            return vFile.data
          })
      },
    },
  },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants