Skip to content

Commit

Permalink
Merge pull request #128 from ajayns/docs
Browse files Browse the repository at this point in the history
Add documentation for html-parsing and handling collections
  • Loading branch information
datakurre committed Jul 15, 2018
2 parents 4341646 + 7623088 commit 8fe9e9d
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 0 deletions.
20 changes: 20 additions & 0 deletions docs/html_parsing.md
@@ -0,0 +1,20 @@
# HTML Parsing

HTML content from the [plone.restapi](https://github.com/plone/plone.restapi) is returned as string of HTML. Using a combination of [react-html-parser](url) and [react-serialize](url), this HTML content is processed into React nodes.

## RichText Component

It deserializes the React nodes field which was processed by the plugin to be used in the component and also handles images, files and relative links. Using backlinks, images and files are queried separately and passed into the RichText componoent and it replaces the `img` and `a` tags with updated data, in the case of relative links, it even replaces it with `Link` tags.

## Parsing process

`react-html-parser` is used to parse the HTML string into React nodes, it is then serialized by `react-serialize` so that it can be passed into and retrieved via GraphQL queries. In this process of parsing, backlinks and relative links are configured, and in the gatsby-site, RichText component is used to handle deserialization and displaying images, files and so on.

## Backlinks

Backlinks provide an optimized way to get the relevant images and files for a certain component. It's basically an object with stores a list of nodes to which a certain file/image is relevant to. This eliminates the need for iterating every single image and file to replace the `a` or `img` tag with.

Taking the case of the `tests/gatsby-starter-default`:

- In default layout, we find node for matching path and render it with a proper component, and also pass the component all related images and files
- in RichText component (called from Document or NewsItem) we use that data to replace links to files and images with optimized gatsby-images
8 changes: 8 additions & 0 deletions docs/recursive_traversal.md
Expand Up @@ -27,3 +27,11 @@ Tree.prototype.traverse = function(callback) {
```

This was used as the reference, and the basic idea of the BFS algorithm was followed, ie, using the queue data structure to start with the first item, and then consecutively store the children of each processed item, while processing the dequeued element in each iteration of the loop.

## Handling Collections

Collections are data types that are basically a group of content objects returned which a certain search query is run. This means they have children that are originally children of other nodes, which means there isn't a need for them to be traversed again.

The algorithm was modified to handle this condition as well using a `seen[path]` approach, in which an object `seen` has a list of all paths that's been traversed already saved as `seen[path] = true`.

Furthermore, it was noted that often, when content objects have a huge list of children, they too, are batched like the `@search` response, and so all content objects have batching support added.

0 comments on commit 8fe9e9d

Please sign in to comment.