Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of one join as input to another #5027

Open
philrz opened this issue Feb 12, 2024 · 0 comments
Open

Output of one join as input to another #5027

philrz opened this issue Feb 12, 2024 · 0 comments

Comments

@philrz
Copy link
Contributor

philrz commented Feb 12, 2024

tl;dr

A community zync user had some complex Zed containing several join operations. While helping them I found I wanted to place the output of one join in a place where it could be used as input to another join, but it currently doesn't parse.

Details

Repro is with Zed commit 31ff542.

The original community zync user's Zed program was around 100 lines with lots of transformations and other logic, so I've tried to simplify using the three NDJSON input files from the join tutorial. These can be loaded into a local lake via:

#!/bin/sh
export ZED_LAKE="./lake"
rm -rf $ZED_LAKE && zed init
zed create -orderby flavor:asc fruit
zed create -orderby likes:asc people
zed create -orderby name:asc prices
zed load -use fruit fruit.ndjson
zed load -use people people.ndjson
zed load -use prices prices.ndjson

The community zync user in this case has a preference for what we've been calling the "alternate syntax", so I've maintained that syntax here. Let's start from the simple join of the names of fruit to which people like them.

$ cat fruit-to-people.zed 
from (
  pool fruit => cut name,flavor
  pool people
) | inner join on flavor=likes eater:=name

$ zed -version
Version: v1.13.0-11-g31ff5428

$ zed query -lake ./lake -I fruit-to-people.zed
{name:"figs",flavor:"plain",eater:"jessie"}
{name:"dates",flavor:"sweet",eater:"quinn"}
{name:"banana",flavor:"sweet",eater:"quinn"}
{name:"strawberry",flavor:"sweet",eater:"quinn"}
{name:"apple",flavor:"tart",eater:"chris"}
{name:"apple",flavor:"tart",eater:"morgan"}

And let's imagine I'd separately joined the price list with the fruit data to get color detail.

$ cat prices-to-fruit.zed
from prices
| from (
  pass
  pool fruit   // what I really want is to join against the output we got from "fruit-to-people.zed"
) | inner join on name=name color

$ zed query -lake ./lake -I prices-to-fruit.zed
{name:"apple",price:3.15,color:"red"}
{name:"avocado",price:2.5,color:"green"}
{name:"banana",price:4.01,color:"yellow"}
{name:"dates",price:6.7,color:"brown"}
{name:"figs",price:1.6,color:"brown"}
{name:"strawberry",price:1.05,color:"red"}

As the comment indicates, what I'd really like them to be able to do here is take the first join shown and put it in place of the simple pool fruit, but the parser currently doesn't allow this.

$ cat join-all.zed 
from prices
| from (
  pass
  from (
    pool fruit => cut name,flavor
    pool people
  ) | inner join on flavor=likes eater:=name
) | inner join on name=name color

$ zed query -lake ./lake -I join-all.zed
error parsing Zed in join-all.zed at line 4, column 3:
  from (
= ^ ===

Note

The community zync user's original query is captured in an internal Slack thread dated February 7, 2024. In addition to addressing the specific example shown here, @mccanne expressed an interest in using that user's query as the basis for some other join improvements we've envisioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant