Skip to content

Commit

Permalink
Merge d1b5d89 into daec3b0
Browse files Browse the repository at this point in the history
  • Loading branch information
activescott committed Aug 18, 2019
2 parents daec3b0 + d1b5d89 commit 063dffd
Show file tree
Hide file tree
Showing 38 changed files with 1,405 additions and 778 deletions.
2 changes: 2 additions & 0 deletions .eslintrc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ rules:
- ignore:
- 0
- 1
no-console:
- warn
# References for TS rules: https://github.com/typescript-eslint/typescript-eslint/tree/master/packages/eslint-plugin#supported-rules
"@typescript-eslint/explicit-function-return-type":
- error
Expand Down
67 changes: 66 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Agent Markdown is a [HTML user agent](https://en.wikipedia.org/wiki/User_agent)
- [Features](#features)
- [CLI Example](#cli-example)
- [Live Example](#live-example)
- [Customize & Extend with Plugins](#customize--extend-with-plugins)
- [Show your support](#show-your-support)
- [Contributing 🤝](#contributing-🤝)
- [Release Process (Deploying to NPM) 🚀](#release-process-deploying-to-npm-🚀)
Expand All @@ -47,7 +48,7 @@ yarn (`yarn add agentmarkdown`) or npm (`npm install agentmarkdown`)
- Supports nested lists
- Supports [implied paragraphs](https://html.spec.whatwg.org/#paragraphs) / [CSS anonymous bock box layout](https://www.w3.org/TR/CSS22/visuren.html#anonymous-block-level)
- Can be used client side (in the browser) or server side (with Node.js)
- Extensible to allow extended or customized output?
- Add support for new elements [with plugins](#customize--extend-with-plugins)
- Fast?

## CLI Example
Expand Down Expand Up @@ -76,6 +77,69 @@ yarn
yarn start
```

## Customize & Extend with Plugins

To customize how the markdown is generated or add support for new elements, implement the `LayoutPlugin` interface to handle a particular HTML element. The `LayoutPlugin` interface is defined as follows:

```TypeScript
export interface LayoutPlugin {
/**
* Specifies the name of the HTML element that this plugin renders markdown for.
* NOTE: Must be all lowercase
*/
elementName: string
/**
* This is the core of the implementation that will be called for each instance of the HTML element that this plugin is registered for.
*/
layout: LayoutGenerator
}
```

The `LayoutGenerator` is a single function that performs a [CSS2 box generation layout algorithm](https://www.w3.org/TR/CSS22/visuren.html#box-gen) on the an HTML element. Essentially it creates zero or more boxes for the given element that AgentMarkdown will render to text. A box can contain text content and/or other boxes, and eacn box has a type of `inline` or `block`. Inline blocks are laid out horizontally. Block boxes are laid out vertically (i.e. they have new line characters before and after their contents). The `LayoutGenerator` function definition is as follows:

```TypeScript
export interface LayoutGenerator {
(
context: LayoutContext,
manager: LayoutManager,
element: HtmlNode
): CssBox | null
}
```

An example of how the HTML `<b>` element could be implemented as a plugin like the following:

```TypeScript
class BoldPlugin {
elementName: "b"

layout: LayoutGenerator = (
context: LayoutContext,
manager: LayoutManager,
element: HtmlNode
): CssBox | null => {
// let the manager use other plugins to layout any child elements:
const kids = manager.layout(context, element.children)
// wrap the child elements in the markdown ** syntax for bold/strong:
kids.unshift(manager.createBox(context, BoxType.inline, "**"))
kids.push(manager.createBox(context, BoxType.inline, "**"))
// return a new box containing everything:
return manager.createBox(context, BoxType.inline, "", kids)
}
}
```

To initialize AgentMarkdown with plugins pass them in as an array value for the `layoutPlugins` option as follows. To customize the rendering an element you can just specify a plugin for the elementName and your plugin will override the built-in plugin.

```TypeScript
const result = await AgentMarkdown.render({
html: myHtmlString,
layoutPlugins: [
new BoldPlugin()
]
})
```

## Show your support

Give a ⭐️ if this project helped you!
Expand Down Expand Up @@ -108,6 +172,7 @@ see [/docs/todo.md](docs/todo.md)
# Alternatives

- http://domchristie.github.io/turndown/
- https://github.com/rehypejs/rehype-remark

## License 📝

Expand Down
8 changes: 4 additions & 4 deletions docs/todo.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@
- multi-paragraph list items

- Feat: Extensibility
- Allow customizing the conversion. Should the caller be able to provide custom `BoxBuilder`?
- We're using uinst (inadvertantly) so maybe allow different uinst frontends and extensibility in uinst as well as the CssBox BoxBuilder backend: https://unified.js.org/create-a-plugin.html
+ Allow customizing the conversion. Should the caller be able to provide custom `LayoutGenerator`?
+ We're *nearly* using [uinst](https://github.com/syntax-tree/unist) inadvertently so maybe allow different uinst frontends and extensibility in uinst as well as the CssBox LayoutGenerator backend: https://unified.js.org/create-a-plugin.html
+ Okay, HtmlNode is not quite close enough to uninst. We focus more on node hierarchy and unist does that plus on position within text. Interesting bot not trivial amount of work and for potentially little benefit.

- support elements (WITH TESTS):
+ links
Expand All @@ -36,5 +37,4 @@
- benchmarks (some other lib had benchmarks)

# code smells #
- the css layout code started off pure based on the css specification only generating three general types of boxes: inline, block, and list-item - as defined in the CSS spec. The only determination on the type of box generated was a lookup to the CSS-defined `display` value for the corresponding HTML element name. Interestingly, this very simplistic layout algorithm actually produced pretty descent markdown from almost any HTML! It didn't have inline formatting but the general layout, line breaking, and list generation was perfect as far as I recall.
All that smelled fine, but it started growing beyond pure CSS and introducing "special" non-standard boxes to handle markdown-specific formatting for certain elements (e.g. headings, emphasis, link, br, hr, etc.). This is perfectly fine as long as the only target is markdown but does start coupling the otherwise pure CSS box-generation/layout algorithm to markdown. It would be better to maybe make that CSS layout/box-generation code slightly extensible by allowing the caller to pass in a map of `BoxBuilder`s that could customize the box generation.
- the css layout code started off pure based on the css specification only generating three general types of boxes: inline, block, and list-item - as defined in the CSS spec. The only determination on the type of box generated was a lookup to the CSS-defined `display` value for the corresponding HTML element name. Interestingly, this very simplistic layout algorithm actually produced pretty descent markdown from almost any HTML! It didn't have inline formatting but the general layout, line breaking, and list generation was perfect as far as I recall. All that smelled fine, but it started growing beyond pure CSS and introducing "special" non-standard boxes to handle markdown-specific formatting for certain elements (e.g. headings, emphasis, link, br, hr, etc.). This is perfectly fine as long as the only target is markdown but does start coupling the otherwise pure CSS box-generation/layout algorithm to markdown. It would be better to maybe make that CSS layout/box-generation code slightly extensible by allowing the caller to pass in a map of `LayoutGenerator`s that could customize the box generation.
8 changes: 3 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,8 @@
"coverage-publish": "cat ./coverage/lcov.info | coveralls"
},
"devDependencies": {
"@types/domhandler": "^2.4.1",
"@types/htmlparser2": "^3.10.0",
"@types/jest": "^24.0.15",
"@types/node": "^12.0.10",
"@types/node": "^12.7.2",
"@typescript-eslint/eslint-plugin": "^2.0.0",
"@typescript-eslint/parser": "^2.0.0",
"commitizen": "^4.0.3",
Expand All @@ -55,8 +53,8 @@
"typescript": "^3.5.2"
},
"dependencies": {
"domhandler": "^2.4.2",
"htmlparser2": "^3.10.1"
"domhandler": "^3.0.0",
"htmlparser2": "^4.0.0"
},
"config": {
"commitizen": {
Expand Down
2 changes: 1 addition & 1 deletion src/HtmlNode.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ export interface HtmlNode {
/**
* The name of the node when @see type is "tag"
*/
name: string
tagName?: string
attribs?: AttribsType
children?: HtmlNode[]
}
Expand Down
27 changes: 27 additions & 0 deletions src/LayoutContext.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
export interface LayoutContext {
/**
* Returns the specified stack.
* If the stack is not yet created it will return an empty stack.
* @param stackName The stack name (state key) to retrieve.
*/
getStateStack<TValue>(stackName: string): TValue[]
/**
* Pushes the specified state onto the specified stack.
* If the stack does not yet exist, it will be created and the value pushed onto it.
* NOTE: If you want to evaluate the stack itself, use @see getStateStack and it will return the stack.
* @param stackName The ´key/name of the stack to push the value onto.
* @param value The value to push onto the top of the stack.
*/
pushState<TValue>(stackName: string, value: TValue): void
/**
* Pops the top value from the specified stack and returns it.
* If the stack doesn't exist or is empty @see undefined is returned.
* @param stackName The key/name of the stack to pop the value from.
*/
popState<TValue>(stackName: string): TValue | undefined
/**
* Returns the top value from the specified stack without removing it from the stack.
* @param stackName The key/name of the stack to peek at.
*/
peekState<TValue>(stackName: string): TValue | undefined
}
13 changes: 13 additions & 0 deletions src/LayoutManager.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import { CssBox, HtmlNode, LayoutContext } from "."
import { CssBoxFactoryFunc } from "./css/layout/CssBoxFactory"

export interface LayoutManager {
/**
* Creates a new @see CssBox instance.
*/
createBox: CssBoxFactoryFunc
/**
* Lays out a set of @see CssBox objects for the specified HTML elements.
*/
layout(context: LayoutContext, elements: HtmlNode[]): CssBox[]
}
2 changes: 2 additions & 0 deletions src/cli/agentmarkdown.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ export class Cli {
try {
markdown = await AgentMarkdown.produce(html)
} catch (err) {
// eslint-disable-next-line no-console
console.error("Error converting HTML to markdown.")
process.exit(EXIT_ERR_CONVERTING)
return false
Expand All @@ -50,6 +51,7 @@ export class Cli {
process.stdout.write(markdown)
process.stdout.end()
} catch (err) {
// eslint-disable-next-line no-console
console.error("Error writing to stdout.")
process.exit(EXIT_ERR_STDOUT)
return false
Expand Down
82 changes: 0 additions & 82 deletions src/css/CssBox.ts

This file was deleted.

47 changes: 24 additions & 23 deletions src/css/CssBox.spec.ts → src/css/CssBoxImp.spec.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { CssBox, BoxType } from "./CssBox"
import { CssBoxImp } from "./CssBoxImp"
import { BoxType, CssBox } from ".."

describe("CSS 9.2.1.1 Anonymous block boxes", () => {
/**
Expand All @@ -7,35 +8,35 @@ describe("CSS 9.2.1.1 Anonymous block boxes", () => {
*/
it("should not insert boxes with only inline children", () => {
const children = [
new CssBox(BoxType.inline),
new CssBox(BoxType.inline),
new CssBox(BoxType.inline)
new CssBoxImp(BoxType.inline),
new CssBoxImp(BoxType.inline),
new CssBoxImp(BoxType.inline)
]
const rootBox = new CssBox(BoxType.block, "", children)
const rootBox = new CssBoxImp(BoxType.block, "", children)
for (const child of children) {
expect(rootBox.children).toContainEqual(child)
}
})

it("should not insert boxes with only block children", () => {
const children = [
new CssBox(BoxType.block),
new CssBox(BoxType.block),
new CssBox(BoxType.block)
new CssBoxImp(BoxType.block),
new CssBoxImp(BoxType.block),
new CssBoxImp(BoxType.block)
]
const rootBox = new CssBox(BoxType.block, "", children)
const rootBox = new CssBoxImp(BoxType.block, "", children)
for (const child of children) {
expect(rootBox.children).toContainEqual(child)
}
})

it("should insert anonymous block box with block & inline children", () => {
const children = [
new CssBox(BoxType.block),
new CssBox(BoxType.inline),
new CssBox(BoxType.block)
new CssBoxImp(BoxType.block),
new CssBoxImp(BoxType.inline),
new CssBoxImp(BoxType.block)
]
const rootBox = new CssBox(BoxType.block, "", children)
const rootBox = new CssBoxImp(BoxType.block, "", children)
const actual: CssBox[] = Array.from(rootBox.children)
expect(actual).toHaveLength(3)
// should have the first and last:
Expand All @@ -47,12 +48,12 @@ describe("CSS 9.2.1.1 Anonymous block boxes", () => {

it("anonymous block box should collect sequences of adjacent inlines with block & inline children", () => {
const children = [
new CssBox(BoxType.block),
new CssBox(BoxType.inline),
new CssBox(BoxType.inline),
new CssBox(BoxType.block)
new CssBoxImp(BoxType.block),
new CssBoxImp(BoxType.inline),
new CssBoxImp(BoxType.inline),
new CssBoxImp(BoxType.block)
]
const rootBox = new CssBox(BoxType.block, "", children)
const rootBox = new CssBoxImp(BoxType.block, "", children)
const actual: CssBox[] = Array.from(rootBox.children)
expect(actual).toHaveLength(3)
// should have the first and last:
Expand All @@ -65,12 +66,12 @@ describe("CSS 9.2.1.1 Anonymous block boxes", () => {

it("anonymous block box should not collect sequences of non-adjacent inlines with block & inline children", () => {
const children = [
new CssBox(BoxType.block),
new CssBox(BoxType.inline),
new CssBox(BoxType.block),
new CssBox(BoxType.inline)
new CssBoxImp(BoxType.block),
new CssBoxImp(BoxType.inline),
new CssBoxImp(BoxType.block),
new CssBoxImp(BoxType.inline)
]
const rootBox = new CssBox(BoxType.block, "", children)
const rootBox = new CssBoxImp(BoxType.block, "", children)
const actual: CssBox[] = Array.from(rootBox.children)
expect(actual).toHaveLength(4)
// should have the blocks (first and third):
Expand Down
Loading

0 comments on commit 063dffd

Please sign in to comment.