Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First class Jupyter notebook integration #13016

Closed
bartlomieju opened this issue Dec 7, 2021 · 22 comments · Fixed by #20337
Closed

First class Jupyter notebook integration #13016

bartlomieju opened this issue Dec 7, 2021 · 22 comments · Fixed by #20337
Assignees
Labels
suggestion suggestions for new features (yet to be agreed) user feedback wanted feedback from the community is desired

Comments

@bartlomieju
Copy link
Member

bartlomieju commented Dec 7, 2021

Jupyter notebooks are very popular. They provide rich, interactive environment for development.

There are numerous kernels that support JavaScript and TypeScript:

Prompted by discussion with @apowers313 who's working on a kernel for Deno (https://github.com/apowers313/ideno) I propose we add first-class support for Jupyter in Deno (with deno jupyter subcommand).

I'd argue that providing first class support for Jupyter will open Deno to a whole community of people using Jupyter notebooks, additionally providing the community with new powerful tools (after all Deno supports WebGPU out-of-the-box) and could help significantly Machine Learning applications of Deno.

This proposal is motivated by several things; firstly Deno originated from a similar idea to Jupyter called PropelML that never fully materialized. Secondly, @apowers313 will have to integrate with V8 inspector protocol to provide kernel functionality. Currently Deno doesn't have a programatic API to interact with the inspector so it will require quite an effort to integrate over Websocket. Additionally most of the functionality that has to be provided for kernel is already working in the REPL. In fact most of REPL functionality could be reused in the kernel, we would have to add communication protocol APIs to integrate with the kernel.

@kitsonk was eyeballing implementation of the kernel in Q2/Q3 but that never materialized due to other more pressing work. I'll be happy to spearhead the effort as it seems like a very fun project to work on.

@bartlomieju bartlomieju added suggestion suggestions for new features (yet to be agreed) user feedback wanted feedback from the community is desired labels Dec 7, 2021
@apowers313
Copy link
Contributor

apowers313 commented Dec 8, 2021

Roadmap to creating a kernel:

  • create a kernel spec
    • see jupyter kernelspec list for examples and jupyter kernelspec install to install
    • run jupyter notebook -- if installed correctly, it will show up under "New" in the top right corner of the Jupyter web browser
    • your kernel will be started as a command line application with the arguments specified in the kernel spec. it won't start until you select "new" in the menu
  • create zeromq connections
    • the kernel spec will specify a {connection_file} that gets converted to a JSON connection file describing IP / ports to connect to
    • the first messages received will be "kernel_info_request" and "comm_info_request" on the shell zmq Dealer connection (examples of the packets can be found here)
    • you can set Jupyter into debug mode so that jupyter notebook prints the packets it sends / receives:
      • create a config file using jupyter notebook --generate-config
      • set the following options in the config:
        • c.Application.log_level = 'DEBUG'
        • c.JupyterApp.log_level = 'DEBUG'
        • c.NotebookApp.log_level = 'DEBUG'
        • c.Session.debug = True
  • next create the IOPub zmq Publisher to send "busy" and "idle" packets
  • next handle the "execute_request" message, which sends code from the front-end to the kernel
    • when a user selects "restart and run all" in the front end, I think it sends all the Jupyter cells at once, you will have to queue execution requests and run them one at a time
    • each execution request sends multiple replies, and each reply is expected to embed the original packet header in the reply as a parentHeader, so you'll have to keep state for which task is currently running
    • you will need to capture stdout and stderr from the kernel and send them back to the front end as stream messages
  • implement "kernel_shutdown" and "kernel_restart" on the Control zmq Dealer connection
    • note that the "kernel_shutdown" request has a "restart" option, which presumes that you are starting up a clean JS environment. I have no idea how this is going to work if the kernel and the execution context are running in the same Deno instance.
    • I also have no idea how you are going to interrupt Deno mid-execution.
  • implement display data to send PNG, SVG, HTML, JSON back to the browser to be rendered in the front end
    • I think these should automatically render for objects that have Symbols on them, similar to toStringTag. e.g. Symbol.toPngTag. Maybe TC39 worthy?
  • implement an interpreter to parse out line magics and cell magics
    • pay special attention to:
      • automagic, which turns off requiring "%" at the front of magics. instead of requiring a magic like "%ls" the user can just type "ls"
      • !cmd command execution
      • magic assignment like "output = %ls"
      • {var} substitution
      • inline documentation and inspection like "?" and "??". I dream of a Symbol.toDocTag on Objects that contains documentation (maybe populated by JSDoc comments) or URLs to documentation, similar to Python's docstrings. Might be a TC39 proposal?
      • input and output caching in In[n] and Out[n] (also _, __, and ___)
    • feel free to steal magics or the interpreter from magicpatch.
    • there should probably be an API to enable users to add their own magics.
  • implement introspection and completion
  • maybe implement code completeness which is only used by command line Jupyter front ends to determine when to execute code

Sorry, I realize that's a lot... hopefully it's helpful.

@bartlomieju
Copy link
Member Author

A bit of a hurdle in the integration is the fact that the only crate that provides async integration with ZeroMQ is currently marked as unstable and not recommended to use in production: https://github.com/zeromq/zmq.rs

This crate builds on top of: https://github.com/erickt/rust-zmq which provides sync bindings (which might not be a big deal), but it seems its built process might be quite involved.

I will do some more research on this topic before proceeding.

@apowers313 thank you for providing the roadmap, this is very helpful!

@apowers313
Copy link
Contributor

If you get painted into a corner, Jupyter appears to require a very small subset of ZMQ: it appears to only use NULL security, send ~4 packets to negotiate the session, and then has a control / length header for each data chunk. There's a Wireshark plugin for ZMQ if you want to see how it works. (Note: I had to use an older commit to get the plugin to work)

@bartlomieju
Copy link
Member Author

@apowers313 perfect!

@apowers313
Copy link
Contributor

I'd like some feedback on where / how to implement the user-facing Jupyter API for the Deno Jupyter kernel. This would be the API for users to display charts / images, displaying object specific documentation, add new magics, etc.

I think regardless we will want a Deno.core.jupyter interface, which will only be instantiated when Deno is running in Jupyter (useful for feature detection), and that interface will have Deno.core.jupyter.display(mimeType, data) for rendering and saving formatted data in Jupyter. Similarly, it would have Deno.core.jupyter.addmagic(name, fn) for user-implemented functions. This enables users to import modules that will detect Jupyter and implement new functionality (similar to how %matplotlib works in Python's Jupyter today.

Requested Feedback 1: I'd be interested if anyone objects to Deno.core.jupyter as a direction.

The part where design decisions are needed is an interface / protocol for Objects to automatically convert them to structured data types. For example, if a user returns an object implementing Foo.toPng that function should be called and the returned data should be rendered as a PNG.

Requested Feedback 2: Three options for how to do this:

  1. Foo.toPng() -- Seems antiquated and potentially has namespace conflicts since it isn't Symbol based
  2. Foo[Symbol(Deno.toPng)] -- Deno-wide specific decoding of Objects, similar to Deno.customInspect. This allows the entire Deno ecosystem to benefit from this feature, not just Jupyter and eventually enables whatever comes after Jupyter or other new innovations.
  3. Foo[Symbol(Deno.core.jupyter.toPng)] -- Jupyter only symbols, not nearly as useful but keeps them out of the rest of Deno if people don't think this functionality is going to be broadly useful.
  4. Foo[Symbol(Symbol.toPng)] -- Requires modifying Symbol, similar to toStringTag, but potentially benefits all of JS. Might require a TC39 proposal to ensure that Deno doesn't drift from ECMAScript specs.

Thanks!

@apowers313
Copy link
Contributor

I just checked in a proposed API for Jupyter display:

  • display(mimeType, uint8Buf, opts)
  • displayPngFile(path, opts)
  • displayPng(buf, opts)
  • displayFile(path) -- guesses file type based on file extension

I'm trying to decide if it would be more convenient to overload displayPng with all the different types it could support (buf, file path, stream, whatever tomorrow's thing is...) or if it's better to have different function calls for each input type. Any thoughts would be appreciated.

@bartlomieju
Copy link
Member Author

I think regardless we will want a Deno.core.jupyter interface, which will only be instantiated when Deno is running in Jupyter (useful for feature detection), and that interface will have Deno.core.jupyter.display(mimeType, data) for rendering and saving formatted data in Jupyter. Similarly, it would have Deno.core.jupyter.addmagic(name, fn) for user-implemented functions. This enables users to import modules that will detect Jupyter and implement new functionality (similar to how %matplotlib works in Python's Jupyter today.

Requested Feedback 1: I'd be interested if anyone objects to Deno.core.jupyter as a direction.

The part where design decisions are needed is an interface / protocol for Objects to automatically convert them to structured data types. For example, if a user returns an object implementing Foo.toPng that function should be called and the returned data should be rendered as a PNG.

Sounds good to me, but it should be Deno.jupyter namespace instead of Deno.core.jupyter.

Foo.toPng() -- Seems antiquated and potentially has namespace conflicts since it isn't Symbol based
Foo[Symbol(Deno.toPng)] -- Deno-wide specific decoding of Objects, similar to Deno.customInspect. This allows the entire Deno ecosystem to benefit from this feature, not just Jupyter and eventually enables whatever comes after Jupyter or other new innovations.
Foo[Symbol(Deno.core.jupyter.toPng)] -- Jupyter only symbols, not nearly as useful but keeps them out of the rest of Deno if people don't think this functionality is going to be broadly useful.
Foo[Symbol(Symbol.toPng)] -- Requires modifying Symbol, similar to toStringTag, but potentially benefits all of JS. Might require a TC39 proposal to ensure that Deno doesn't drift from ECMAScript specs.

In this case I think we should use something like Symbol.for("Deno.jupyter") similar to Symbol.for("Deno.customInspect").

I just checked in a proposed API for Jupyter display:

  • display(mimeType, uint8Buf, opts)
  • displayPngFile(path, opts)
  • displayPng(buf, opts)
  • displayFile(path) -- guesses file type based on file extension

I'm trying to decide if it would be more convenient to overload displayPng with all the different types it could support (buf, file path, stream, whatever tomorrow's thing is...) or if it's better to have different function calls for each input type. Any thoughts would be appreciated.

I believe the "overload" approach would be better in this case - we already use this approach in numerous Deno APIs.

Deno.jupyter.display(mimeType: string, buf: Uint8Array, opts);
Deno.jupyter.displayPng(pathOrBuf: string | Uint8Array, opts);
Deno.jupyter.displayFile(path: string);

Seem preferable, what are the opts that could be used for displaying files?

@bartlomieju bartlomieju removed this from the 1.18.0 milestone Jan 19, 2022
@tif-calin
Copy link

Is this, and Ideno still being worked on?

@apowers313
Copy link
Contributor

Nope, I stopped working on IDeno in favor of the built-in Jupyter kernel. The built-in kernel stalled out because the ZMQ library we were using had some bugs.

IDeno was mostly functional, happy to pass the baton if anyone wants to pick it up.

@bartlomieju
Copy link
Member Author

Hey @tif-calin, @apowers313! We did the kernel mostly working but that ZMQ library bug was quite serious and it was happening very often (it manifested itself every 3-4 connections). If there was a different library that we could use, then we should be able to revive that PR without much trouble and that still seems like a great feature for many people.

@acrodrig
Copy link

I would be very interested in seeing this happening. Any way I can help?

@apowers313
Copy link
Contributor

I would be very interested in seeing this happening. Any way I can help?

Fix the Rust ZMQ library? :)

@acrodrig
Copy link

Do you have a specific bug that needs to be fixed? Is it filed somewhere?

@apowers313
Copy link
Contributor

Do you have a specific bug that needs to be fixed? Is it filed somewhere?

zeromq/zmq.rs#153

@apowers313
Copy link
Contributor

Also, for a mostly working non-Rust version of a kernel: https://github.com/apowers313/ideno

@rgbkrk
Copy link
Contributor

rgbkrk commented Aug 30, 2023

Love this. Where has the work coalesced?

Background: I'm a longtime Jupyter, IPython, and ZeroMQ maintainer. I'd love to help steward this work.

@apowers313
Copy link
Contributor

@rgbkrk still stuck on this bug as far as I can tell: zeromq/zmq.rs#153

@bartlomieju
Copy link
Member Author

Hey @rgbkrk, thanks for stopping by. So @apowers313 and I had a PR that was quite close to landing (#13122) unfortunately the bug above caused it to be very flaky (2/3 times you opened a notebook it resulted in Broken pipe error). Besides that, the PR was more or less ready to land.

We recently discussed this feature with @crowlKats and @dsherret and we'd like to resurrect the PR, we were thinking of maybe rewriting parts of zmq.rs that are necessary for Jupyter kernel purely in Rust and with Tokio integration in mind. If you have other ideas I'd be more than happy to hear them!

@rgbkrk
Copy link
Contributor

rgbkrk commented Aug 30, 2023

Looks like I'm going to have to learn Rust. While it might not be the best, you might get more reliability more quickly by building on top of libzmq even though it would pale in comparison to native rust bindings. As far as I can tell, there's a lot to be tested within zmq.rs.

I'm curious if this jupyter rust kernel ran into the same issues. Have you all checked that out too?

@bartlomieju
Copy link
Member Author

I'm curious if this jupyter rust kernel ran into the same issues. Have you all checked that out too?

I did not know about that project. I can certainly check it.

Let me get my PR rebased and reopened so we can discuss over code.

@bartlomieju
Copy link
Member Author

Opened #20337 that is rebased against main.

@bartlomieju
Copy link
Member Author

FYI, it looks like the PR above works quite nicely with notebook integration in VSCode, but all hell breaks loose when I try it with jupyter notebook. I think the PR is quite close to being landable, it probably needs 5-10h of work to polish it and release.

bartlomieju added a commit that referenced this issue Sep 16, 2023
This commit adds "deno jupyter" subcommand which
provides a Deno kernel for Jupyter notebooks.

The implementation is mostly based on Deno's REPL and
reuses large parts of it (though there's some clean up that
needs to happen in follow up PRs). Not all functionality of
Jupyter kernel is implemented and some message type
are still not implemented (eg. "inspect_request") but
the kernel is fully working and provides all the capatibilities
that the Deno REPL has; including TypeScript transpilation
and npm packages support.

Closes #13016

---------

Co-authored-by: Adam Powers <apowers@ato.ms>
Co-authored-by: Kyle Kelley <rgbkrk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion suggestions for new features (yet to be agreed) user feedback wanted feedback from the community is desired
Projects
None yet
5 participants