Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NodeJS compatibility, use getRawTextContent #33

Merged
merged 11 commits into from
Nov 20, 2022

Conversation

njordhov
Copy link
Contributor

@njordhov njordhov commented Mar 1, 2016

Issue #17 reports an incompatibility with nodejs where loading fails due to missing DOM bindings.

I have resolved this issue by replacingjs/Node with goog.dom/NodeType and made js/NodeList optional, which eliminates the dependency on browser dom, allowing hickory to load also on nodejs.

@njordhov njordhov mentioned this pull request Mar 1, 2016
@jeluard
Copy link
Collaborator

jeluard commented Mar 2, 2016

Interesting!

Is it only about loading or can you parse html in node too?

@njordhov
Copy link
Contributor Author

njordhov commented Mar 3, 2016

It's primarily to complete the loading, ready for a polyfill that makes hickory able to parse html in nodejs.

@davidsantiago
Copy link
Collaborator

Is this something that is useful by itself, if you still can't parse the HTML?

@njordhov
Copy link
Contributor Author

njordhov commented Mar 3, 2016

With the commit hickory.core/parse works in nodejs after setting js/DOMParser to one of the available DOM parsers available for node. I currently use xmldom:

(set! js/DOMParser (.-DOMParser (nodejs/require "xmldom")))

For full use of hickory I also have a hairy polyfill to accommodate that hickory extends NodeList instead of using standard dom to iterate over childnodes and attributes. A minor additional change to hickory would eliminate the need for this patch.

But making hickory loadable on nodejs has value in itself, as it doesn't become a showstopper for using other modules that depends on hickory. Case in point is kioo.

@davidsantiago
Copy link
Collaborator

Ok, I don't understand the nodelist issue (or most of what you said really), but I think we don't want to stop anyone's show. Can you explain what the further fix that obviates this patch is?

@njordhov
Copy link
Contributor Author

njordhov commented Mar 5, 2016

Sorry for not explaining more clearly. The problem for compatibility with node is that hickory.core (cljs) assumes that there is a js/NodeList that can be made seqable using extend-type-with-seqable. However, js/NodeList is not available on NodeJS.

The quick fix is a wrapper that coerces all dom nodelists into a sequence, for example using a function like:

(defn- as-seq [nodelist]                                                                                                              
  (if (seq? nodelist) nodelist (array-seq nodelist)))                                                                                 

Then use as-seq to wrap accessors for the "childNodes" and "attributes" properties to ensure they always provide a sequence that can be used with map:

(as-seq (aget this "childNodes"))
(as-seq (aget this "attributes"))

Or perhaps as-seq should just integrate the aget? Your choice.

@njordhov
Copy link
Contributor Author

njordhov commented Mar 5, 2016

With these commits, hickory can be used on nodejs to parse markup into hickory or hiccup format, after setting js/DOMParser to one of the available DOM parsers available for node, such as:

(set! js/DOMParser (.-DOMParser (nodejs/require "xmldom")))

@jeluard
Copy link
Collaborator

jeluard commented Mar 5, 2016

Can we not simply use array-seq everywhere? Not needs to go through extend-type if we can avoid it.

@njordhov
Copy link
Contributor Author

njordhov commented Mar 6, 2016

We can indeed simply use array-seq everywhere and eliminate extend-type. I wrote the commit with the intent to maintain compatibility in case any uses had come to depend on hickory extending js/NodeLists to become seqable, including cases where the NodeList of the dom nodes from parse already is seqable. Note that using array-seq on an object that already is a seq may lead to unexpected results so better test for seq? before using array-seq:

(array-seq (array-seq (.-childNodes (parse "<p>abc<br>def</p>"))))
=> nil  

@jeluard
Copy link
Collaborator

jeluard commented Mar 20, 2016

I don't think we need to keep this compatibility. I don't remember exactly why I choose this option when porting to ClojureScript but in retrospect it looks unnecessarily complex.

including cases where the NodeList of the dom nodes from parse already is seqable

Can this happen?

My opinion is we should just go for the simple case. Do you mind modifying the PR to achieve that?
Appart from this I think it makes sense to merge then.

@njordhov
Copy link
Contributor Author

Can this happen?

Yes, nodes from parse may already be seqable if some other module extends js/NodeList to be seqable. So it makes sense to test for seq? before using array-seq on nodelists.

I'll modify the PR for the simple case, eliminating the extend-type on js/NodeList.

@jeluard
Copy link
Collaborator

jeluard commented Mar 21, 2016

Right that may happen. I do think it is considered bad practice to do this.
Harmless to add a seq? check anyway.

@njordhov
Copy link
Contributor Author

I'd expect the seq? check to be harmless and have insignificant impact on performance. An alternative though is to use DOM to iterate over the child nodes instead of mapping over coerced nodelists.

@njordhov
Copy link
Contributor Author

Extending js/NodeList to be seqable is practiced in the wild, e.g. in LightTable and Domina. So it makes sense to guard by checking for seq?.

@jeluard
Copy link
Collaborator

jeluard commented Mar 21, 2016

Good catch!

Looks great now, probably ready to be merged. @davidsantiago what do you think?

@davidsantiago
Copy link
Collaborator

I'm fine with an API break and issuing a new minor version, if that's the right thing to do. I'd rather just solve it and avoid any associated complexity. If you're happy with the patch, I'd just like someone to give me a little snippet for the readme change notes so I can be sure I am explaining what changes.

@jeluard
Copy link
Collaborator

jeluard commented Mar 22, 2016

There's no real API breakage beyond the js/NodeList not being extended anymore but that was an implementation detail never intended to be used outside hickory.

Tentative change notes:

  • fixed an issue where text nodes were parsed twice under some circumstances
  • hickory can now be used with nodejs (see README for configuration details)
  • ! js/NodeList are not extended anymore so iterating over those nodes will require usage of array-seq

@njordhov
Copy link
Contributor Author

fixed an issue where text nodes were parsed twice under some circumstances

More accurate to say: text nodes were repeated as the issue could cause text nodes to be parsed and repeated multiple times.

are not extended anymore

Consider instead saying no longer extended to be seqable.

Update for nodejs compatibility PR
@ricardojmendez
Copy link
Contributor

I am now looking into node more for Macchiato. What's the status of this PR?

@davidsantiago
Copy link
Collaborator

Yep, sorry about that, this one fell off my radar as other things have been placed above it on my stack. I'll try to find some time tonight to get this merged in.

@michihuber
Copy link

Is there help needed to get this released?
nodejs support would be greatly appreciated.

@davidsantiago
Copy link
Collaborator

Yes, very much. I don't know javascript or node or clojurescript really, and there are a number of issues and pull requests pending that I simply don't understand. There has been some discussion and debate in some of them that I don't understand. And just in a more general sense I don't understand what node support entails or what's missing or when it would be done. I have been meaning to do some studying and figure out what is going on in them, but I simply don't have the time to get up to speed for various life reasons.

If you can understand what is going on in pending node PRs and issues and figure out what is needed and what is ready to merge in, I'm going to defer to anyone who knows what they are doing on this environment and can help me figure out what to put in, what changes to make, and what to leave out (if anything).

@Wanderwichtl
Copy link

Wanderwichtl commented Jul 9, 2017

I would love it to get this working.
I actually have this code:
`(ns nodetest.core
(:require [cljs.nodejs :as nodejs]
[hickory.core :as h]))

(nodejs/enable-util-print!)

(set! js/DOMParser (.-DOMParser (cljs.nodejs/require "xmldom")))

(defn -main []
(println (h/as-hiccup (h/parse "<a href="foo">foo <a id="so" href="bar">bar<script src="blah.js">alert("hi");</script>"))))

(set! main-cli-fn -main)`

But when I run this node target/server_dev/nodetest.js I get the following error:
`/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1088
:else (throw (js/Error. (str coll " is not ISeqable"))))))
^
Error: false is not ISeqable
at Object.cljs$core$seq [as seq] (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1088:20)

at cljs$core$empty_QMARK_ (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1884:20)

at hickory$core$format_doctype (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:55:15)

at hickory.core.as_hiccup.object (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:66:37)

at hickory$core$as_hiccup (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:11:1)

at /home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:4215:16
at cljs.core.map.cljs$core$IFn$_invoke$arity$2 (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:4215:15)

at cljs.core.LazySeq.sval (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:3010:18)

at cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:3052:12)

at Object.cljs$core$seq [as seq] (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1075:25)

`
I am new to Clojure and don't have the best debug workflow at this time, so help is always appreciated.

@njordhov
Copy link
Contributor Author

Likely xmldom requires well-formed XML markup, with each start tag having a matching end tag:

"<a href='foo'>foo</a> <a id='so' href='bar'>bar</a> <script src='blah.js'>alert('hi');</script>"

If it still fails, I suggest you trim the markup down to a single element:

"<a href='foo'>foo</a>"

As you're new to Clojure, have you actually gone through the extra steps of installing the "node" branch of Hickory rather than just using a dependency on the stock version that doesn't yet support node? If so, I award you the coveted lein install badge, please use your new powers with care.

@grav
Copy link

grav commented Jul 5, 2018

Very nice patch!

I got https://github.com/mpcarolin/markdown-to-hiccup working in nodejs (via lumo 1.8) by using this fork, installing jsdom via npm, and finally "polyfilling" something like this:

(let [jsdom (cljs.nodejs/require "jsdom")]
    (set! (. goog.global -Node) goog.dom.NodeType)
    (set! (. goog.global -document) (.-document (.-window (jsdom.JSDOM.)))))

(markdown-to-hiccup.core/md->hiccup "*hello*")
=> [:div {} [:p {:style {:margin 0}} [:em {} hello]]]

@alza-bitz
Copy link

alza-bitz commented Dec 13, 2018

Hi folks,

I'm trying to get Hickory to work on NodeJS courtesy of this PR/branch..

However, with the following code (at the repl):

#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])

(set! js/XMLHttpRequest (cljs.nodejs/require "xhr2"))
(set! (. goog.global -Node) goog.dom.NodeType)
(set! (. goog.global -NodeList) goog.dom.NodeList)
(def jsdom (cljs.nodejs/require "jsdom"))
(set! (. goog.global -document) (.-document (.-window (jsdom.JSDOM.))))

(async/go (let [response (async/<! (http/get "https://www.google.co.uk"))]
  (h/as-hickory (h/parse (:body response)))))

I get the following error:

Error: No matching clause:
    at /mnt/d/Projects/hickory-experiment/.cljs_node_repl/hickory/core.cljs:89:22
    at m__4244__auto__ (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/hickory/core.cljs:19:1)
    at f (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:4680:16)
    at cljs.core.map.cljs$core$IFn$_invoke$arity$2 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.js:17602:3)
    at cljs.core.LazySeq.fn [as sval] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3394:18)
    at cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3384:1)
    at Object.coll [as seq] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:1210:25)
    at Function.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:2445:26)
    at cljs.core.LazySeq.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 [as cljs$core$IReduce$_reduce$arity$3] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3458:28)
    at Function.coll [as cljs$core$IFn$_invoke$arity$3] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:2517:29)

This is with HEAD of the njordhov:node branch (@njordhov) 🙂

The following equivalent Clojure code works fine:

#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])

(h/as-hickory (h/parse (:body (http/get "https://www.google.co.uk"))))

What am I doing wrong? Any ideas?

Thanks!

@njordhov
Copy link
Contributor Author

njordhov commented Dec 14, 2018

The likely cause for the No matching clause error is that in jsdom, attributes have no nodeType property, consistent with DOM4 where element attributes are no longer implementing the Node interface. As a consequence there is no clause for an attribute in as-hickory, causing it to fail when recursively processing the attributes of elements:

(map as-hickory (aget this "attributes")) 

A work-around is to patch the Attribute constant to be consistent with jsdom:

(set! hickory.core/Attribute nil)

A more permanent fix is for as-hickory and as-hiccup to no longer recur over element attributes but instead recode them in place. See issue #36 for details.

PS: Be careful setting js/document in node as it is used by prominent js modules to determine whether or not it is run in a nodejs context (bad idea). Consider instead to bind it dynamically:

(async/go
  (let [response (async/<! (http/get "https://www.google.co.uk"))]
    (with-redefs [js/document  (.-document (.-window (jsdom.JSDOM.)))]
      (h/as-hickory (h/parse (:body response))))))

@alza-bitz
Copy link

alza-bitz commented Dec 18, 2018

Hi @njordhov thanks for the help!

(set! hickory.core/Attribute nil) seems to do the trick, but when I additionally try to use the with-redefs style binding I cannot use the surrounding let to define jsdom (I get ReferenceError: jsdom is not defined), so I have to use a global def:

#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
#?(:cljs (require '[cljs.nodejs :as nodejs]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])

(set! js/XMLHttpRequest (nodejs/require "xhr2"))

#?(:cljs (def jsdom (nodejs/require "jsdom")))

(async/go
  (let [response (async/<! (http/get "https://www.google.co.uk"))]
    (with-redefs [js/Node goog.dom.NodeType
                  js/NodeList goog.dom.NodeList
                  js/document (.-document (.-window (jsdom.JSDOM.)))
                  hickory.core/Attribute nil]
      (h/as-hickory (h/parse (:body response))))))

alza-bitz added a commit to alza-bitz/resident-advisor-ticket-alert that referenced this pull request Dec 18, 2018
By setting hickory.core/Attribute to nil (clj-commons/hickory#33 (comment))
Code restructure to isolate side effects into 'app' layer functions
Use with-redefs instead of set!
@euporos euporos mentioned this pull request Sep 4, 2020
@logseq-cldwalker
Copy link

logseq-cldwalker commented Oct 18, 2022

@slipset Any thoughts on if this PR is ok to land? Looks like the main feedback was addressed and then approved at #33 (comment). Encountered same failure as #17

@borkdude borkdude merged commit ab2e384 into clj-commons:master Nov 20, 2022
@borkdude
Copy link
Contributor

@slipset Needed to merge this before we can move on to running tests on Node.js without a browser. If tests fail we can revert.

@njordhov njordhov deleted the node branch November 20, 2022 21:21
@port19x port19x mentioned this pull request Apr 11, 2023
@danielcompton danielcompton changed the title Compatible with nodejs Add NodeJS compatibility, use getRawTextContent Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet