Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] MarkdownDB plugin system #2

Open
5 tasks
rufuspollock opened this issue Apr 28, 2023 · 3 comments
Open
5 tasks

[epic] MarkdownDB plugin system #2

rufuspollock opened this issue Apr 28, 2023 · 3 comments
Labels
enhancement New feature or request epic

Comments

@rufuspollock
Copy link
Member

rufuspollock commented Apr 28, 2023

We want a plugin system in MarkdownDB so people can easily extend the core functionality, for example to extract additional metadata, so that not all functionality has to be in core and people can rapidly add functionality

Sketch (April 2023)

https://link.excalidraw.com/l/9u8crB2ZmUo/9hkrQmVl9QX

image

Acceptance

  • Identify the different types of plugins ✅2023-11-19 roughly: parsing, computing, validating (and maybe serializing ...)
  • Research how remark works to see if we can reuse it 🚧2023-11-19 see notes in comment below
  • Design of MarkdownDB and especially the plugin system.
    • extract first heading as title metadata
    • add a metadata field

Notes

MarkdownDB vs Contentlayer

Contentlayer supported:

  • document types with
    • frontmatter schema definition and validation
    • assigning document types based on glob patterns
    • computed fields, e.g. description auto-extracted from the document content
  • excluding/including some content folders we kinda already have this but it's not configurable
  • ...

What we need:

  • probably config file similar to Contentlayer one, with:
    • custom document types,
    • content include/exclude option
    • plugins
    • ...
  • ...
@olayway olayway added the enhancement New feature or request label May 9, 2023
@rufuspollock rufuspollock changed the title MarkdownDB plugin system [epic] MarkdownDB plugin system Sep 24, 2023
@rufuspollock
Copy link
Member Author

rufuspollock commented Nov 19, 2023

Doing a bunch of research on remark and micromark re the parsing part of this - could remark be our plug in system here? (probably)

  • Should we just build on top of the remark ecosystem i.e. use remark plugins for doing the parsing? ✅2023-11-19 my sense is yes
    • Should we use remark plugins or micromark (what's the difference even?). 🚧2023-11-19 still confused on this one (as others are) but my sense is we just remark and its plugins
  • How do you create a plugin 🚧2023-11-19 see https://github.com/remarkjs/remark/blob/main/doc/plugins.md and it's guide
    • How do you pass data around? see notes below (no answer yet!) 🚧2023-11-19 there is something called messages ...
  • What remark plugins could we learn from?

Can you pass "data" along the chain of a plugin

This example remarkjs/remark#251 talks about word counts but it console logs the info ...

var unified = require('unified');
var parse = require('remark-parse');
var stringify = require('remark-stringify');
var english = require('retext-english');
var remark2retext = require('remark-retext');
var visit = require('unist-util-visit');

unified()
  .use(parse)
  .use(remark2retext, unified().use(english).use(count))
  .use(stringify)
  .processSync('*This* and _that_. \n> And some more stuff.\n\nAnd another thing.');

function count() {
  return counter;
  function counter(tree) {
    var counts = {};
    visit(tree, visitor);
    console.log(counts);
    function visitor(node) {
      counts[node.type] = (counts[node.type] || 0) + 1;
    }
  }
}
{ RootNode: 1,
  ParagraphNode: 3,
  SentenceNode: 3,
  WordNode: 10,
  TextNode: 10,
  WhiteSpaceNode: 10,
  PunctuationNode: 3 }

@mohamedsalem401
Copy link
Member

The immediate question that arises is how the output of running plugins can be stored. Let's consider a straightforward example using a simple plugin available at https://github.com/florianeckerstorfer/remark-a11y-emoji. This plugin wraps emojis in a <span> tag and sets the emoji name as the aria-label.

Assuming we successfully run the markdown files through such plugins, the next query is where the newly generated markdown should be stored. Currently, the library only generates SQL databases from metadata, lacking a method to load the content of a file.

Possible solutions include:

  1. Add Content to Database/JSON:
    Store each file's body content in the generated database or local JSON files. This approach consolidates the parsed content along with metadata.

  2. Generate Separate Markdown Files:
    Create a designated folder, say .markdown, and start generating markdown files there after parsing. This process involves removing metadata from the files.

  3. Introduce a Loading Method:
    Implement a method like loadFile(file_path) to retrieve the content of a given file after running the plugins. However, a drawback of this approach is that if users generate the database/JSON files using the library but employ another tool to load the markdown file content.

@rufuspollock
Copy link
Member Author

@mohamedsalem401 we aren't using plugins to transform markdown at all - we are using plugins to extract information from the markdown and then store that somewhere ...

See my last comment section about "Can you pass "data" along the chain of a plugin" ... because we just want to pass data along the chain. Or see the example above where it computes wordcount etc.

To repeat: we are not using remark plugins to transform the content but rather to extract information from it ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request epic
Projects
None yet
Development

No branches or pull requests

3 participants