Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data readers are not being loaded #419

Closed
albersonn opened this issue May 7, 2020 · 15 comments
Closed

Data readers are not being loaded #419

albersonn opened this issue May 7, 2020 · 15 comments

Comments

@albersonn
Copy link

albersonn commented May 7, 2020

v0.0.92

OS X Mojave 10.14.6 (18G4032)

Custom data readers are not being loaded

There are some data readers defined in a project which is in the classpath but I can't access them

repro

  1. Create a custom project with a data readers
  2. Put the custom project in the classpath via environment variable
  3. Put the require for the custom reader functions in the environment variable BABASHKA_PRELOADS
  4. Try to access the data reader via bb "#your/datareader 'something"

expected behavior

  • Call the function bound to the data reader correctly.

Example code

$ tree project
.
└── src
    ├── test
    │   └── readers.clj
    └── data_readers.clj

readers.clj

(ns test.readers
  (:require [clojure.pprint :refer [pprint]]))

(defn tap [x]
  (println (str (with-out-str (pprint x)) "<="))
  x)

data_readers.clj

{
 t/tap test.readers/tap
}
export BABASHKA_CLASSPATH="/Users/my.user/project/src:/Users/my.user/.m2/repository/org/clojure/clojure/1.10.1/clojure-1.10.1.jar:/Users/my.user/.m2/repository/org/clojure/spec.alpha/0.2.176/spec.alpha-0.2.176.jar:/Users/my.user/.m2/repository/org/clojure/core.specs.alpha/0.2.44/core.specs.alpha-0.2.44.jar"
export BABASHKA_PRELOADS="(require 'test.readers)"

$ bb '#t/tap "hi"'
# clojure.lang.ExceptionInfo: [line 1, col 12] No reader function for tag t/tap.
@borkdude borkdude added this to To do in Global board via automation May 12, 2020
@borkdude
Copy link
Collaborator

Note to self: in Clojure the user is responsible for requiring the namespace with data reader functions:

$ clj -e "#foo/bar [1 2 3]"
Execution error (IllegalStateException) at clojure.main/main (main.java:40).
Attempting to call unbound fn: #'foo.bar/reader

$ clj -e "(require '[foo.bar])" -e "#foo/bar [1 2 3]"
read!
"x"
[1 2 3]

So babashka should do the same.

@borkdude
Copy link
Collaborator

Note:

https://github.com/clojure/clojure/blob/8f03ff0078dda9fb3923ff0c4ee5a101570b8485/changes.md#21-reader-literals

Clojure 1.4 supports reader literals, which are data structures tagged by a symbol to denote how they will be read.

When Clojure starts, it searches for files named data_readers.clj at the root of the classpath. Each such file must contain a Clojure map of symbols, like this:

{foo/bar my.project.foo/bar
 foo/baz my.project/baz}

@borkdude
Copy link
Collaborator

@borkdude
Copy link
Collaborator

@albersonn I have tested with discovering all data_readers.clj files on a relatively large classpath.

src:feature-xml:feature-core-async:feature-yaml:feature-csv:feature-transit:feature-java-time:feature-java-nio:sci/src:babashka.curl/src:babashka.pods/src:resources:sci/resources:/Users/borkdude/.m2/repository/com/cognitect/transit-java/1.0.343/transit-java-1.0.343.jar:/Users/borkdude/.m2/repository/org/clojure/clojure/1.10.2-alpha1/clojure-1.10.2-alpha1.jar:/Users/borkdude/.m2/repository/commons-codec/commons-codec/1.10/commons-codec-1.10.jar:/Users/borkdude/.m2/repository/org/clojure/tools.analyzer/1.0.0/tools.analyzer-1.0.0.jar:/Users/borkdude/.m2/repository/org/clojure/tools.logging/0.6.0/tools.logging-0.6.0.jar:/Users/borkdude/.m2/repository/org/clojure/core.specs.alpha/0.2.44/core.specs.alpha-0.2.44.jar:/Users/borkdude/.m2/repository/org/clojure/spec.alpha/0.2.187/spec.alpha-0.2.187.jar:/Users/borkdude/.m2/repository/org/clojure/tools.cli/1.0.194/tools.cli-1.0.194.jar:/Users/borkdude/.m2/repository/org/clojure/tools.analyzer.jvm/1.0.0/tools.analyzer.jvm-1.0.0.jar:/Users/borkdude/.m2/repository/borkdude/graal.locking/0.0.2/graal.locking-0.0.2.jar:/Users/borkdude/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-cbor/2.10.2/jackson-dataformat-cbor-2.10.2.jar:/Users/borkdude/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1.jar:/Users/borkdude/.m2/repository/org/flatland/ordered/1.5.9/ordered-1.5.9.jar:/Users/borkdude/.m2/repository/org/postgresql/postgresql/42.2.12/postgresql-42.2.12.jar:/Users/borkdude/.m2/repository/fipp/fipp/0.6.22/fipp-0.6.22.jar:/Users/borkdude/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.2/jackson-core-2.10.2.jar:/Users/borkdude/.m2/repository/org/yaml/snakeyaml/1.25/snakeyaml-1.25.jar:/Users/borkdude/.m2/repository/org/ow2/asm/asm/5.2/asm-5.2.jar:/Users/borkdude/.gitlibs/libs/clj-commons/conch/9aa7724e925cb8bf163e0b62486dd420b84e5f0b/src:/Users/borkdude/.m2/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/Users/borkdude/.m2/repository/seancorfield/next.jdbc/1.0.424/next.jdbc-1.0.424.jar:/Users/borkdude/.m2/repository/org/clojure/data.xml/0.2.0-alpha6/data.xml-0.2.0-alpha6.jar:/Users/borkdude/.m2/repository/org/msgpack/msgpack/0.6.12/msgpack-0.6.12.jar:/Users/borkdude/.m2/repository/borkdude/edamame/0.0.11-alpha.9/edamame-0.0.11-alpha.9.jar:/Users/borkdude/.m2/repository/org/clojure/data.csv/1.0.0/data.csv-1.0.0.jar:/Users/borkdude/.m2/repository/com/cognitect/transit-clj/1.0.324/transit-clj-1.0.324.jar:/Users/borkdude/.m2/repository/clj-commons/clj-yaml/0.7.1/clj-yaml-0.7.1.jar:/Users/borkdude/.m2/repository/org/clojure/core.rrb-vector/0.1.1/core.rrb-vector-0.1.1.jar:/Users/borkdude/.m2/repository/persistent-sorted-set/persistent-sorted-set/0.1.2/persistent-sorted-set-0.1.2.jar:/Users/borkdude/.m2/repository/cheshire/cheshire/5.10.0/cheshire-5.10.0.jar:/Users/borkdude/.m2/repository/tigris/tigris/0.1.2/tigris-0.1.2.jar:/Users/borkdude/.m2/repository/org/clojure/tools.reader/1.3.2/tools.reader-1.3.2.jar:/Users/borkdude/.m2/repository/datascript/datascript/0.18.11/datascript-0.18.11.jar:/Users/borkdude/.m2/repository/org/hsqldb/hsqldb/2.4.0/hsqldb-2.4.0.jar:/Users/borkdude/.m2/repository/org/clojure/core.memoize/0.8.2/core.memoize-0.8.2.jar:/Users/borkdude/.m2/repository/org/clojure/data.priority-map/0.0.7/data.priority-map-0.0.7.jar:/Users/borkdude/.m2/repository/org/clojure/java.data/1.0.64/java.data-1.0.64.jar:/Users/borkdude/.m2/repository/borkdude/sci.impl.reflector/0.0.1/sci.impl.reflector-0.0.1.jar:/Users/borkdude/.m2/repository/nrepl/bencode/1.1.0/bencode-1.1.0.jar:/Users/borkdude/.m2/repository/org/clojure/core.cache/0.8.2/core.cache-0.8.2.jar:/Users/borkdude/.m2/repository/org/clojure/core.async/1.1.587/core.async-1.1.587.jar:/Users/borkdude/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.10.2/jackson-dataformat-smile-2.10.2.jar:/Users/borkdude/.m2/repository/org/clojure/data.codec/0.1.0/data.codec-0.1.0.jar:/Users/borkdude/.m2/repository/javax/xml/bind/jaxb-api/2.3.0/jaxb-api-2.3.0.jar

This takes about 14-30 ms. So I don't think it would be acceptable to discover them at babashka startup time unless I can find a way to optimize this.

Possible strategies:

  • Lazy approach: only discover these files when data readers are actually needed.
    Problem: in the above example we already need to know these readers during code parsing time.
  • Ask the user to declare all data_reader.clj files merged into one file
  • ...

@borkdude
Copy link
Collaborator

borkdude commented May 13, 2020

Possible optimization:

Currently we're walking the entire jar files when we search something on the classpath, but this might not be necessary when we already know the exact filename data_readers.clj or data_readers.cljc. Possibly it's faster to get it through getEntry or getJarEntry on JarFile. I'll try this later.

This turned out to help. For the above classpath it now takes 8ms. Further optimization may be to combine the search for .bb, .clj and .cljc in one function call while having the .jar file opened.

@borkdude
Copy link
Collaborator

The further optimization from the last comment didn't seem to help that much in the data loaders case, but it did help in general when finding things on the classpath.

For a smaller classpath like:

$ export BABASHKA_CLASSPATH=src:/Users/borkdude/.m2/repository/org/clojure/clojure/1.10.1/clojure-1.10.1.jar:/Users/borkdude/.m2/repository/doric/doric/0.9.0/doric-0.9.0.jar:/Users/borkdude/.m2/repository/org/clojure/spec.alpha/0.2.176/spec.alpha-0.2.176.jar:/Users/borkdude/.m2/repository/org/clojure/core.specs.alpha/0.2.44/core.specs.alpha-0.2.44.jar

searching for data_readers takes about 3ms:

$ ./bb -e '(+ 1 2 3)'
()
"Elapsed time: 2.822865 msecs"
nil
6

Maybe that's ok enough. When people do not use a classpath they are not bothered by this slowdown.

@albersonn
Copy link
Author

@borkdude maybe this can be a flag in bb command? 🤔
So even if people are loading a big classpath, if they wont use data readers they won't be affected.

@borkdude
Copy link
Collaborator

borkdude commented May 14, 2020

@albersonn That might be a good idea since data readers haven't been that common yet. Will the data reader configuration also be provided using this flag or does babashka still scan the classpath itself?

I'll still have to figure out how to implement data readers since edamame (the parser) needs to know about them while a program is executed so it needs to have some kind of hook. Something for later this week.

I guess babashka or sci has to have a *data-readers* dynamic variable that is added on to while evaluating code.

In Clojure I notice that it populates this dynamic var before you require any namespace:

$ clj
Clojure 1.10.2-alpha1
user=> *data-readers*
{ordered/set #'flatland.ordered.set/into-ordered-set, ordered/map #'flatland.ordered.map/ordered-map, datascript/Datom #'datascript.db/datom-from-reader, datascript/DB #'datascript.db/db-from-reader, xml/ns #'clojure.data.xml.name/uri-symbol, xml/element #'clojure.data.xml.node/tagged-element}
user=> #'flatland.ordered.map/ordered-map
#'flatland.ordered.map/ordered-map
user=> flatland.ordered.map/ordered-map
#object[clojure.lang.Var$Unbound 0x5d5f10b2 "Unbound: #'flatland.ordered.map/ordered-map"]
user=> (find-ns 'flatland.ordered.map)
#object[clojure.lang.Namespace 0x2d96543c "flatland.ordered.map"]

So it already creates these vars and namespaces but leaves the vars unbound until they are actually populated by requiring the namespaces.

@borkdude
Copy link
Collaborator

@albersonn One compromise might be to make a function in babashka.classpath, e.g. babashka.classpath/scan-data-readers which will populate *data-readers* with the readers found on the classpath.

So a program would look like:

(require '[babashka.classpath :as cp])
(cp/scan-data-readers) ;;=> takes about 3-8ms

and then the rest would be the same as in Clojure.

@albersonn
Copy link
Author

@borkdude I guess the function in babashka classpath looks better than the switch. It'll be very declarative of what scripts uses/loads data readers. This will avoid some bugs/unexpected behaviours if you forget the switch and will be (kind of) faster to debug slow scripts. I like it.

@borkdude
Copy link
Collaborator

borkdude commented May 14, 2020

Another alternative is to let people set data-readers themselves by doing something like:

(require '[test.readers])
(alter-var-root *data-readers* assoc 't/tap #'test.readers/tap)

This would make it compatible with clojure as well and would not incur any startup penalty.

@borkdude
Copy link
Collaborator

It seems we must use set! for this:

user=> (alter-var-root #'*data-readers* assoc 't/tap #'inc)
#:t{tap #'clojure.core/inc}
user=> #t/tap 1
Syntax error reading source at (REPL:10:0).
No reader function for tag t/tap
user=> (set! *data-readers* {'t/tap #'inc})
#:t{tap #'clojure.core/inc}
user=> #t/tap 1
2

@borkdude
Copy link
Collaborator

I don't know if the last mentioned alternative will always work since theoretically a namespace with the data reader function in it can already start using it in that very namespace.

borkdude added a commit that referenced this issue May 15, 2020
@borkdude
Copy link
Collaborator

borkdude commented May 15, 2020

@albersonn With babashka master this is now possible:

bb "(set! *data-readers* {'t/tag inc}) #t/tag 1"
2

Can you test if this suffices for your use case? Do you need a binary or can you build yourself? You can also run with the JVM like:

lein bb "(set! *data-readers* {'t/tag inc}) #t/tag 1"
2

or:

clojure -A:main "(set! *data-readers* {'t/tag inc}) #t/tag 1"
2

@albersonn
Copy link
Author

@borkdude I tested and this solution looks good enough for me. Thanks!

Global board automation moved this from To do to Done May 15, 2020
@borkdude borkdude removed this from Done in Global board Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants