# TPC-H

TPC-H is a decision support benchmark that offers business-oriented ad hoc queries.
More information can be found [here](http://www.tpc.org/tpch)

The queries are originally in SQL format and here they are implemented using the pandas API.

By defaults runs use Bodo. Hence, data is distributed in chunks across processes.

Dataset size is 2GB.

There's a larger dataset available on "s3://bodo-example-data/tpch/s4/" which is 4GB. 


**To scale and run your application with multiple nodes you can use [Bodo platform](https://platform.bodo.ai/account/login)**

The following code imports bodo and verifies that the IPyParallel cluster is set up correctly

In [1]:
import bodo
import time
import pandas as pd

@bodo.jit
def hello_world():
    with bodo.no_warning_objmode:    
        print(f"Hello World from rank {bodo.get_rank()}. Total ranks={bodo.get_size()}")

hello_world()

    conda install openjdk=11 -c conda-forge
and then reactivate your environment via
    conda deactivate && conda activate /Users/scottroutledge/miniforge3


Hello World from rank 6. Total ranks=10
Hello World from rank 5. Total ranks=10
Hello World from rank 1. Total ranks=10
Hello World from rank 8. Total ranks=10
Hello World from rank 9. Total ranks=10
Hello World from rank 2. Total ranks=10
Hello World from rank 4. Total ranks=10
Hello World from rank 3. Total ranks=10
Hello World from rank 0. Total ranks=10
Hello World from rank 7. Total ranks=10


<a id="loading_data"></a>
## Loading data

In this section, we load the data required by the queries in pandas DataFrame.

In [2]:
@bodo.jit(cache=True)
def load_lineitem(data_folder):
    t1 = time.time()
    rel = pd.read_parquet(data_folder)
    print("Lineitem Reading time: ", ((time.time() - t1) * 1000), " (ms)")
    return rel

lineitem = load_lineitem("s3://bodo-example-data/tpch/s2/lineitem.pq")
display(lineitem.head())



Lineitem Reading time:  44152.46400000001  (ms)


Unnamed: 0,L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT
0,1,310379,15395,1,17.0,23619.12,0.04,0.02,N,O,1996-03-13,1996-02-12,1996-03-22,DELIVER IN PERSON,TRUCK,egular courts above the
1,1,134619,14620,2,36.0,59529.96,0.09,0.06,N,O,1996-04-12,1996-02-28,1996-04-20,TAKE BACK RETURN,MAIL,ly final dependencies: slyly bold
2,1,127400,7401,3,8.0,11419.2,0.1,0.02,N,O,1996-01-29,1996-03-05,1996-01-31,TAKE BACK RETURN,REG AIR,"riously. regular, express dep"
3,1,4263,9264,4,28.0,32683.28,0.09,0.06,N,O,1996-04-21,1996-03-30,1996-05-16,NONE,AIR,lites. fluffily even de
4,1,48054,3061,5,24.0,24049.2,0.1,0.04,N,O,1996-03-30,1996-03-14,1996-04-01,NONE,FOB,pending foxes. slyly re


In [3]:
@bodo.jit(cache=True)
def load_orders(data_folder):
    t1 = time.time()    
    rel = pd.read_parquet(data_folder)
    print("Orders Reading time: ", ((time.time() - t1) * 1000), " (ms)")
    return rel
    
orders = load_orders("s3://bodo-example-data/tpch/s2/orders.pq")
display(orders.head())



Orders Reading time:  14385.428000000047  (ms)


Unnamed: 0,O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT
0,1,73801,O,181503.69,1996-01-02,5-LOW,Clerk#000001902,0,nstructions sleep furiously among
1,2,156004,O,49967.96,1996-12-01,1-URGENT,Clerk#000001759,0,"foxes. pending accounts at the pending, silen..."
2,3,246628,F,227024.64,1993-10-14,5-LOW,Clerk#000001909,0,sly final accounts boost. carefully regular id...
3,4,273553,O,36018.68,1995-10-11,5-LOW,Clerk#000000247,0,"sits. slyly regular warthogs cajole. regular, ..."
4,5,88970,F,112288.43,1994-07-30,5-LOW,Clerk#000001850,0,quickly. bold deposits sleep slyly. packages u...


In [4]:
@bodo.jit(cache=True)
def load_customer(data_folder):
    t1 = time.time()
    rel = pd.read_parquet(data_folder)
    print("Customer Reading time: ", ((time.time() - t1) * 1000), " (ms)")
    return rel

customer = load_customer("s3://bodo-example-data/tpch/s2/customers.pq")

display(customer.head())



Customer Reading time:  2443.66500000001  (ms)


Unnamed: 0,C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
0,1,Customer#000000001,"IVhzIApeRb ot,c,E",15,25-989-741-2988,711.56,BUILDING,"to the even, regular platelets. regular, ironi..."
1,2,Customer#000000002,"XSTf4,NCwDVaWNe6tEgvwfmRchLXak",13,23-768-687-3665,121.65,AUTOMOBILE,l accounts. blithely ironic theodolites integr...
2,3,Customer#000000003,MG9kdTD2WBHm,1,11-719-748-3364,7498.12,AUTOMOBILE,"deposits eat slyly ironic, even instructions...."
3,4,Customer#000000004,XxVSJsLAGtn,4,14-128-190-5944,2866.83,MACHINERY,"requests. final, regular ideas sleep final accou"
4,5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,3,13-750-942-6364,794.47,HOUSEHOLD,n accounts will have to unwind. foxes cajole a...


In [5]:
@bodo.jit(cache=True)
def load_nation(data_folder):
    t1 = time.time()
    rel = pd.read_parquet(data_folder)
    print("Nation Reading time: ", ((time.time() - t1) * 1000), " (ms)")
    return rel

nation = load_nation("s3://bodo-example-data/tpch/s2/nation.pq")

display(nation.head())



Nation Reading time:  739.0419999999267  (ms)


Unnamed: 0,N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT
0,0,ALGERIA,0,haggle. carefully final deposits detect slyly...
1,1,ARGENTINA,1,al foxes promise slyly according to the regula...
2,2,BRAZIL,1,y alongside of the pending deposits. carefully...
3,3,CANADA,1,"eas hang ironic, silent packages. slyly regula..."
4,4,EGYPT,4,y above the carefully unusual theodolites. fin...


In [6]:
@bodo.jit(cache=True)
def load_supplier(data_folder):
    t1 = time.time()    
    rel = pd.read_parquet(data_folder)
    print("Supplier Reading time: ", ((time.time() - t1) * 1000), " (ms)")     
    return rel

supplier = load_supplier("s3://bodo-example-data/tpch/s2/supplier.pq")

display(supplier.head())



Supplier Reading time:  1018.3309999999892  (ms)


Unnamed: 0,S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT
0,1,Supplier#000000001,"N kD4on9OM Ipw3,gf0JBoQDd7tgrzrddZ",17,27-918-335-1736,5755.94,each slyly above the careful
1,2,Supplier#000000002,"89eJ5ksX3ImxJQBvxObC,",5,15-679-861-2259,4032.68,slyly bold instructions. idle dependen
2,3,Supplier#000000003,"q1,G3Pj6OjIuUYfUoH18BFTKP5aU9bEV3",1,11-383-516-1199,4192.4,blithely silent requests after the express dep...
3,4,Supplier#000000004,Bk7ah4CK8SYQTepEmvMkkgMwg,15,25-843-787-7479,4641.08,riously even requests above the exp
4,5,Supplier#000000005,Gcdm2rJRzl5qlTVzc,11,21-151-690-3663,-283.84,. slyly regular pinto bea


In [7]:
@bodo.jit(cache=True)
def load_partsupp(data_folder):
    t1 = time.time()
    rel = pd.read_parquet(data_folder)
    print("Partsupp Reading time: ", ((time.time() - t1) * 1000), " (ms)")
    return rel

partsupp = load_partsupp("s3://bodo-example-data/tpch/s2/partsupp.pq")

display(partsupp.head())



Partsupp Reading time:  14847.838000000138  (ms)


Unnamed: 0,PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_SUPPLYCOST,PS_COMMENT
0,1,2,3325,771.64,", even theodolites. regular, final theodolites..."
1,1,5002,8076,993.49,ven ideas. quickly even packages print. pendin...
2,1,10002,3956,337.09,after the fluffily ironic deposits? blithely s...
3,1,15002,4069,357.84,"al, regular dependencies serve carefully after..."
4,2,3,8895,378.49,nic accounts. final accounts sleep furiously a...


In [8]:
@bodo.jit(cache=True)
def load_part(data_folder):
    t1 = time.time()
    rel = pd.read_parquet(data_folder)
    print("Part Reading time: ", ((time.time() - t1) * 1000), " (ms)")
    return rel

part = load_part("s3://bodo-example-data/tpch/s2/part.pq")

display(part.head())



Part Reading time:  1329.2610000000877  (ms)


Unnamed: 0,P_PARTKEY,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_SIZE,P_CONTAINER,P_RETAILPRICE,P_COMMENT
0,1,goldenrod lavender spring chocolate lace,Manufacturer#1,Brand#13,PROMO BURNISHED COPPER,7,JUMBO PKG,901.0,ly. slyly ironi
1,2,blush thistle blue yellow saddle,Manufacturer#1,Brand#13,LARGE BRUSHED BRASS,1,LG CASE,902.0,lar accounts amo
2,3,spring green yellow purple cornsilk,Manufacturer#4,Brand#42,STANDARD POLISHED BRASS,21,WRAP CASE,903.0,egular deposits hag
3,4,cornflower chocolate smoke green pink,Manufacturer#3,Brand#34,SMALL PLATED BRASS,14,MED DRUM,904.0,p furiously r
4,5,forest brown coral puff cream,Manufacturer#3,Brand#32,STANDARD POLISHED TIN,15,SM PKG,905.0,wake carefully


## Query Definitions

This section includes some of the queries using Python (Pandas)

### Q1: Pricing Summary Report Query
This query reports the amount of businesses that were billed, shipped, and returned.

Make sure you have run **`load_lineitem`** from [loading data section](#loading_data) before running this query.

In [9]:
@bodo.jit(cache=True)
def q1(lineitem):
    t1 = time.time()
    sel = lineitem.L_SHIPDATE <= "1998-09-02"
    flineitem = lineitem[sel]
    flineitem["DISC_PRICE"] = flineitem.L_EXTENDEDPRICE * (1 - flineitem.L_DISCOUNT)
    flineitem["CHARGE"] = (
        flineitem.L_EXTENDEDPRICE * (1 - flineitem.L_DISCOUNT) * (1 + flineitem.L_TAX)
    )
    gb = flineitem.groupby(["L_RETURNFLAG", "L_LINESTATUS"], as_index=False)
    total = gb.agg({"L_QUANTITY": ["sum", "mean"], "L_EXTENDEDPRICE": ["sum", "mean"],
                   "DISC_PRICE": "sum", "CHARGE": "sum",
                   "L_DISCOUNT": "mean", "L_ORDERKEY": "count"})
    total = total.sort_values(["L_RETURNFLAG", "L_LINESTATUS"])
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q1_result = q1(lineitem)

display(q1_result)



Execution time:  372.0100000000457  (ms)


Unnamed: 0_level_0,L_RETURNFLAG,L_LINESTATUS,L_QUANTITY,L_QUANTITY,L_EXTENDEDPRICE,L_EXTENDEDPRICE,DISC_PRICE,CHARGE,L_DISCOUNT,L_ORDERKEY
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,sum,mean,sum,mean,sum,sum,mean,count
3,A,F,75478173.0,25.505699,113197300000.0,38251.814164,107536400000.0,111838900000.0,0.050004,2959267
1,N,F,1966480.0,25.530081,2946115000.0,38248.3165,2798797000.0,2911030000.0,0.049996,77026
2,N,O,148642120.0,25.495192,222903600000.0,38232.562546,211762300000.0,220235800000.0,0.049981,5830202
0,R,F,75577628.0,25.51215,113351900000.0,38263.321544,107688100000.0,111994300000.0,0.04998,2962417


### Q3: Shipping Priority Query
This query retrieves the 10 unshipped orders with the highest value.

Make sure you have run **`load_lineitem`, `load_orders`, and `load_customer`** from [loading data section](#loading_data) before running this query.

In [10]:
@bodo.jit(cache=True)
def q3(lineitem, orders, customer):
    date = "1995-03-04"    
    t1 = time.time()
    lsel = lineitem.L_SHIPDATE > date
    osel = orders.O_ORDERDATE < date
    csel = customer.C_MKTSEGMENT == "HOUSEHOLD"
    flineitem = lineitem[lsel]
    forders = orders[osel]
    fcustomer = customer[csel]
    jn1 = fcustomer.merge(forders, left_on="C_CUSTKEY", right_on="O_CUSTKEY")
    jn2 = jn1.merge(flineitem, left_on="O_ORDERKEY", right_on="L_ORDERKEY")

    jn2["TMP"] = jn2.L_EXTENDEDPRICE * (1 - jn2.L_DISCOUNT)

    total = (
        jn2.groupby(
            ["L_ORDERKEY", "O_ORDERDATE", "O_SHIPPRIORITY"], as_index=False
        )["TMP"]
        .sum()
        .sort_values(["TMP"], ascending=False)
    )
    res = total[["L_ORDERKEY", "TMP", "O_ORDERDATE", "O_SHIPPRIORITY"]]

    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return res.head(10)

q3_result = q3(lineitem, orders, customer)

display(q3_result)



Execution time:  290.26500000009037  (ms)


Unnamed: 0,L_ORDERKEY,TMP,O_ORDERDATE,O_SHIPPRIORITY
13455,11495873,417031.8957,1995-01-17,0
16555,4163074,416152.8027,1995-02-13,0
5647,6487431,412710.0508,1995-02-06,0
21232,5536290,407857.9678,1995-01-31,0
22182,10666915,397407.4185,1995-02-14,0
14309,4225253,391852.7649,1995-02-28,0
14686,6232772,391241.2513,1995-01-17,0
21091,4724865,387185.9724,1995-02-03,0
5159,3900355,386089.5768,1995-02-13,0
11718,2377474,383057.9763,1995-02-26,0


### Q4: Order Priority Checking Query
This query determines how well the order priority system is working and gives an assessment of customer satisfaction.

Make sure you have run **`load_lineitem` and `load_orders`** from [loading data section](#loading_data) before running this query.

In [11]:
@bodo.jit(cache=True)
def q4(lineitem, orders):
    date1 = "1993-11-01"
    date2 = "1993-08-01"
    t1 = time.time()
    lsel = lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE
    osel = (orders.O_ORDERDATE < date1) & (orders.O_ORDERDATE >= date2)
    flineitem = lineitem[lsel]
    forders = orders[osel]
    jn = forders[forders["O_ORDERKEY"].isin(flineitem["L_ORDERKEY"])]
    total = (
        jn.groupby("O_ORDERPRIORITY", as_index=False)["O_ORDERKEY"]
        .count()
        .sort_values(["O_ORDERPRIORITY"])
    )
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q4_result = q4(lineitem, orders)

display(q4_result)



Execution time:  186.97599999995873  (ms)


Unnamed: 0,O_ORDERPRIORITY,O_ORDERKEY
3,1-URGENT,21039
4,2-HIGH,20986
2,3-MEDIUM,20918
1,4-NOT SPECIFIED,21056
0,5-LOW,21247


### Q6: Forecasting Revenue Change Query
This query quantifies the amount of revenue increase that would have resulted from eliminating certain company-wide discounts in a given percentage range in a given year.

Make sure you have run **`load_lineitem`** from [loading data section](#loading_data) before running this query.

In [12]:
@bodo.jit(cache=True)
def q6(lineitem):
    date1 = "1996-01-01"
    date2 = "1997-01-01"
    t1 = time.time()
    sel = (
        (lineitem.L_SHIPDATE >= date1)
        & (lineitem.L_SHIPDATE < date2)
        & (lineitem.L_DISCOUNT >= 0.08)
        & (lineitem.L_DISCOUNT <= 0.1)
        & (lineitem.L_QUANTITY < 24)
    )
    flineitem = lineitem[sel]
    total = (flineitem.L_EXTENDEDPRICE * flineitem.L_DISCOUNT).sum()
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    print(total)
    return total

q6_result = q6(lineitem)

Execution time:  90.04400000003443  (ms)
369930968.9012998




### Q9: Product Type Profit Measure Query
This query determines how much profit is made on a given line of parts, broken out by supplier nation and year.

Make sure you have run **`load_lineitem`, `load_orders`, `load_part`, `load_nation`, `load_partsupp`, and `load_supplier`** from [loading data section](#loading_data) before running this query.

In [13]:
@bodo.jit(cache=True)
def q9(lineitem, orders, part, nation, partsupp, supplier):
    t1 = time.time()
    psel = part.P_NAME.str.contains("ghost")
    fpart = part[psel]
    jn1 = lineitem.merge(fpart, left_on="L_PARTKEY", right_on="P_PARTKEY")
    jn2 = jn1.merge(supplier, left_on="L_SUPPKEY", right_on="S_SUPPKEY")
    jn3 = jn2.merge(nation, left_on="S_NATIONKEY", right_on="N_NATIONKEY")
    jn4 = partsupp.merge(
        jn3, left_on=["PS_PARTKEY", "PS_SUPPKEY"], right_on=["L_PARTKEY", "L_SUPPKEY"]
    )
    jn5 = jn4.merge(orders, left_on="L_ORDERKEY", right_on="O_ORDERKEY")
    jn5["TMP"] = jn5.L_EXTENDEDPRICE * (1 - jn5.L_DISCOUNT) - (
        (1 * jn5.PS_SUPPLYCOST) * jn5.L_QUANTITY
    )
    jn5["O_YEAR"] = jn5.O_ORDERDATE.dt.year
    gb = jn5.groupby(["N_NAME", "O_YEAR"], as_index=False)["TMP"].sum()
    total = gb.sort_values(["N_NAME", "O_YEAR"], ascending=[True, False])
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q9_result = q9(lineitem, orders, part, nation, partsupp, supplier)

display(q9_result)

Execution time:  410.4300000001331  (ms)


Unnamed: 0,N_NAME,O_YEAR,TMP
80,ALGERIA,1998,53569210.0
161,ALGERIA,1997,90991530.0
117,ALGERIA,1996,93471060.0
131,ALGERIA,1995,88678590.0
38,ALGERIA,1994,88257400.0
78,ALGERIA,1993,91800060.0
84,ALGERIA,1992,88034640.0
172,ARGENTINA,1998,56816340.0
144,ARGENTINA,1997,98691960.0
121,ARGENTINA,1996,102540200.0


### Q10: Returned Item Reporting Query
This query identifies customers who might be having problems with the parts that are shipped to them.

Make sure you have run **`load_lineitem`, `load_orders`, `load_customer`, and `load_nation`** from [loading data section](#loading_data) before running this query.

In [14]:
@bodo.jit(cache=True)
def q10(lineitem, orders, customer, nation):
    date1 = "1994-11-01"
    date2 = "1995-02-01"
    t1 = time.time()
    osel = (orders.O_ORDERDATE >= date1) & (orders.O_ORDERDATE < date2)
    lsel = lineitem.L_RETURNFLAG == "R"
    forders = orders[osel]
    flineitem = lineitem[lsel]
    jn1 = flineitem.merge(forders, left_on="L_ORDERKEY", right_on="O_ORDERKEY")
    jn2 = jn1.merge(customer, left_on="O_CUSTKEY", right_on="C_CUSTKEY")
    jn3 = jn2.merge(nation, left_on="C_NATIONKEY", right_on="N_NATIONKEY")
    jn3["TMP"] = jn3.L_EXTENDEDPRICE * (1.0 - jn3.L_DISCOUNT)
    gb = jn3.groupby(
        [
            "C_CUSTKEY",
            "C_NAME",
            "C_ACCTBAL",
            "C_PHONE",
            "N_NAME",
            "C_ADDRESS",
            "C_COMMENT",
        ],
        as_index=False,
    )["TMP"].sum()
    total = gb.sort_values("TMP", ascending=False)
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q10_result = q10(lineitem, orders, customer, nation)

display(q10_result)



Execution time:  322.18399999987923  (ms)


Unnamed: 0,C_CUSTKEY,C_NAME,C_ACCTBAL,C_PHONE,N_NAME,C_ADDRESS,C_COMMENT,TMP
69861,202000,Customer#000202000,3155.77,30-860-645-7227,SAUDI ARABIA,RITU1eYat8iNeD,arls. unusual sauternes boost along the even d...,755353.7227
4336,117250,Customer#000117250,8260.39,17-524-241-3788,GERMANY,2N bS9peD0b5Dr3tf Vbxq,l theodolites. slyly even decoys after the fur...,724785.0804
68258,67172,Customer#000067172,1747.27,21-578-917-6336,IRAQ,IxhKZyFCzL3Ch5qKOxjvATnj1DT6A6hnXNVH,e furiously of the quickly regular requests. f...,710060.3908
19614,87863,Customer#000087863,3105.23,21-393-302-3317,IRAQ,pzMIGld2L1N2RjI8,ckages. carefully close instructions sleep qui...,670339.9103
75885,284104,Customer#000284104,2668.71,32-167-107-4631,RUSSIA,ZyqUNCEiF3G1E7nmKPBnf Uzk,ages haggle blithely along the ironic deposits...,658410.558
43717,168475,Customer#000168475,3873.61,21-537-772-3811,IRAQ,eaNNmHamiNHcgWKyiMqMP,ular dolphins. unusual deposits haggle pending...,653657.0433
32293,217825,Customer#000217825,1753.84,23-765-943-1680,JORDAN,VQGXiCNltZCst5bep2T62ODXANsLYm,thely final deposits: carefully even accounts ...,652603.2797
71344,288742,Customer#000288742,6316.82,12-549-320-8899,BRAZIL,zy3eDgxy33rL5umYEGUShZjScaR0uhlpaLwI,"ideas? ironic, ironic packages sle",641153.5757
7316,50281,Customer#000050281,3306.39,25-210-337-8539,MOROCCO,G2Tb39F1Fz2 upMhG7EpGJMu,ests. slyly express requests doubt fluffily ev...,640923.7708
26372,159304,Customer#000159304,-342.39,18-681-863-8034,INDIA,NNDLOnfKDE9OEd96gOrw,nic platelets cajole: furiously bold dinos nag...,631296.2304


### Q12: Shipping Modes and Order Priority Query
This query determines whether selecting less expensive modes of shipping is negatively affecting the critical-priority orders by causing more parts to be received by customers after the committed date.

Make sure you have run **`load_lineitem` and `load_orders`** from [loading data section](#loading_data) before running this query.

In [15]:
@bodo.jit(cache=True)
def q12(lineitem, orders):
    date1 = "1994-01-01"
    date2 = "1995-01-01"
    t1 = time.time()
    sel = (
        (lineitem.L_RECEIPTDATE < date2)
        & (lineitem.L_COMMITDATE < date2)
        & (lineitem.L_SHIPDATE < date2)
        & (lineitem.L_SHIPDATE < lineitem.L_COMMITDATE)
        & (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE)
        & (lineitem.L_RECEIPTDATE >= date1)
        & ((lineitem.L_SHIPMODE == "MAIL") | (lineitem.L_SHIPMODE == "SHIP"))
    )
    flineitem = lineitem[sel]
    jn = flineitem.merge(orders, left_on="L_ORDERKEY", right_on="O_ORDERKEY")

    def g1(x):
        return ((x == "1-URGENT") | (x == "2-HIGH")).sum()

    def g2(x):
        return ((x != "1-URGENT") & (x != "2-HIGH")).sum()

    total = jn.groupby("L_SHIPMODE", as_index=False)["O_ORDERPRIORITY"].agg((g1, g2))
    total = total.sort_values("L_SHIPMODE")
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q12_result = q12(lineitem, orders)

display(q12_result)



Execution time:  281.43199999999524  (ms)


Unnamed: 0,L_SHIPMODE,g1,g2
0,MAIL,12354,18548
1,SHIP,12430,18644


### Q14: Promotion Effect Query
This query monitors the market response to a promotion such as TV advertisements or a special campaign.

Make sure you have run **`load_lineitem`** and **`load_part`** from [loading data section](#loading_data) before running this query.

In [16]:
@bodo.jit(cache=True)
def q14(lineitem, part):
    startDate = "1994-03-01"
    endDate = "1994-04-01"
    p_type_like = "PROMO"
    t1 = time.time()
    sel = (lineitem.L_SHIPDATE >= startDate) & (lineitem.L_SHIPDATE < endDate)
    flineitem = lineitem[sel]
    jn = flineitem.merge(part, left_on="L_PARTKEY", right_on="P_PARTKEY")
    jn["TMP"] = jn.L_EXTENDEDPRICE * (1.0 - jn.L_DISCOUNT)
    total = jn[jn.P_TYPE.str.startswith(p_type_like)].TMP.sum() * 100 / jn.TMP.sum()
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    print(total)
    return total

q14_result = q14(lineitem, part)

Execution time:  140.98999999987427  (ms)
16.545810644547526




### Q18: Large Volume Customer Query
This query ranks customers based on their having placed a large quantity order. Large quantity orders are defined as those orders whose total quantity is above a certain level.

Make sure you have run **`load_lineitem`, `load_orders`, and `load_customer`** from [loading data section](#loading_data) before running this query.

In [17]:
@bodo.jit(cache=True)
def q18(lineitem, orders, customer):
    t1 = time.time()
    gb1 = lineitem.groupby("L_ORDERKEY", as_index=False)["L_QUANTITY"].sum()
    fgb1 = gb1[gb1.L_QUANTITY > 300]
    jn1 = fgb1.merge(orders, left_on="L_ORDERKEY", right_on="O_ORDERKEY")
    jn2 = jn1.merge(customer, left_on="O_CUSTKEY", right_on="C_CUSTKEY")
    gb2 = jn2.groupby(
        ["C_NAME", "C_CUSTKEY", "O_ORDERKEY", "O_ORDERDATE", "O_TOTALPRICE"],
        as_index=False,
    )["L_QUANTITY"].sum()
    total = gb2.sort_values(["O_TOTALPRICE", "O_ORDERDATE"], ascending=[False, True])
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q18_result = q18(lineitem, orders, customer)

display(q18_result)

Execution time:  1988.1940000000213  (ms)


Unnamed: 0,C_NAME,C_CUSTKEY,O_ORDERKEY,O_ORDERDATE,O_TOTALPRICE,L_QUANTITY
65,Customer#000256240,256240,4722021,1994-04-07,543948.47,323.0
56,Customer#000192203,192203,5984582,1992-03-16,539085.2,312.0
25,Customer#000273004,273004,11785570,1996-07-18,535097.55,303.0
19,Customer#000198385,198385,8574884,1992-07-04,530902.09,308.0
10,Customer#000186325,186325,9436480,1992-05-22,523925.49,311.0
4,Customer#000048682,48682,1474818,1992-11-15,522718.6,302.0
106,Customer#000027880,27880,2232932,1997-04-13,519887.44,304.0
74,Customer#000082297,82297,8231942,1993-02-06,516398.5,302.0
90,Customer#000176644,176644,10889601,1992-05-22,513824.71,306.0
66,Customer#000258475,258475,7125602,1994-07-08,509018.85,301.0


### Q19: Discounted Revenue Query
This query reports the gross discounted revenue attributed to the sale of selected parts handled in a particular manner.

Make sure you have run **`load_lineitem`** and **`load_part`** from [loading data section](#loading_data) before running this query.

In [18]:
@bodo.jit(cache=True)
def q19(lineitem, part):
    Brand31 = "Brand#31"
    Brand43 = "Brand#43"
    SMBOX = "SM BOX"
    SMCASE = "SM CASE"
    SMPACK = "SM PACK"
    SMPKG = "SM PKG"
    MEDBAG = "MED BAG"
    MEDBOX = "MED BOX"
    MEDPACK = "MED PACK"
    MEDPKG = "MED PKG"
    LGBOX = "LG BOX"
    LGCASE = "LG CASE"
    LGPACK = "LG PACK"
    LGPKG = "LG PKG"
    DELIVERINPERSON = "DELIVER IN PERSON"
    AIR = "AIR"
    AIRREG = "AIRREG"
    t1 = time.time()
    lsel = (
        (
            ((lineitem.L_QUANTITY <= 36) & (lineitem.L_QUANTITY >= 26))
            | ((lineitem.L_QUANTITY <= 25) & (lineitem.L_QUANTITY >= 15))
            | ((lineitem.L_QUANTITY <= 14) & (lineitem.L_QUANTITY >= 4))
        )
        & (lineitem.L_SHIPINSTRUCT == DELIVERINPERSON)
        & ((lineitem.L_SHIPMODE == AIR) | (lineitem.L_SHIPMODE == AIRREG))
    )
    psel = (part.P_SIZE >= 1) & (
        (
            (part.P_SIZE <= 5)
            & (part.P_BRAND == Brand31)
            & (part.P_CONTAINER.isin([SMBOX, SMCASE, SMPACK, SMPKG]))
        )
        | (
            (part.P_SIZE <= 10)
            & (part.P_BRAND == Brand43)
            & (part.P_CONTAINER.isin([MEDBAG, MEDBOX, MEDPACK, MEDPKG]))
        )
        | (
            (part.P_SIZE <= 15)
            & (part.P_BRAND == Brand43)
            & (part.P_CONTAINER.isin([LGBOX, LGCASE, LGPACK, LGPKG]))
        )
    )
    flineitem = lineitem[lsel]
    fpart = part[psel]
    jn = flineitem.merge(fpart, left_on="L_PARTKEY", right_on="P_PARTKEY")
    jnsel = (
        (
            (jn.P_BRAND == Brand31)
            & (jn.P_CONTAINER.isin([SMBOX, SMCASE, SMPACK, SMPKG]))
            & (jn.L_QUANTITY >= 4)
            & (jn.L_QUANTITY <= 14)
            & (jn.P_SIZE <= 5)
        )
        | (
            (jn.P_BRAND == Brand43)
            & (jn.P_CONTAINER.isin([MEDBAG, MEDBOX, MEDPACK, MEDPKG]))
            & (jn.L_QUANTITY >= 15)
            & (jn.L_QUANTITY <= 25)
            & (jn.P_SIZE <= 10)
        )
        |((jn.P_BRAND == Brand43)
        & (jn.P_CONTAINER.isin([LGBOX, LGCASE, LGPACK, LGPKG]))
        & (jn.L_QUANTITY >= 26)
        & (jn.L_QUANTITY <= 36)
        & (jn.P_SIZE <= 15))
    )
    jn = jn[jnsel]
    total = (jn.L_EXTENDEDPRICE * (1.0 - jn.L_DISCOUNT)).sum()
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    print(total)
    return total

q19_result = q19(lineitem, part)

Execution time:  266.9100000000526  (ms)
7178285.313399999


### Q20: Potential Part Promotion Query
This query identifies suppliers in a particular nation having selected parts that may be candidates for a promotional offer.

Make sure you have run **`load_lineitem`, `load_part`, `load_nation`, `load_partsupp`, and `load_supplier`** from [loading data section](#loading_data) before running this query.

In [19]:
@bodo.jit(cache=True)
def q20(lineitem, part, nation, partsupp, supplier):
    date1 = "1996-01-01"
    date2 = "1997-01-01"
    t1 = time.time()
    psel = part.P_NAME.str.startswith("azure")
    nsel = nation.N_NAME == "JORDAN"
    lsel = (lineitem.L_SHIPDATE >= date1) & (lineitem.L_SHIPDATE < date2)
    fpart = part[psel]
    fnation = nation[nsel]
    flineitem = lineitem[lsel]
    jn1 = fpart.merge(partsupp, left_on="P_PARTKEY", right_on="PS_PARTKEY")
    jn2 = jn1.merge(
        flineitem,
        left_on=["PS_PARTKEY", "PS_SUPPKEY"],
        right_on=["L_PARTKEY", "L_SUPPKEY"],
    )
    gb = jn2.groupby(["PS_PARTKEY", "PS_SUPPKEY", "PS_AVAILQTY"], as_index=False)[
        "L_QUANTITY"
    ].sum()
    gbsel = gb.PS_AVAILQTY > (0.5 * gb.L_QUANTITY)
    fgb = gb[gbsel]
    jn3 = fgb.merge(supplier, left_on="PS_SUPPKEY", right_on="S_SUPPKEY")
    jn4 = fnation.merge(jn3, left_on="N_NATIONKEY", right_on="S_NATIONKEY")
    jn4 = jn4[["S_NAME", "S_ADDRESS"]]
    total = jn4.sort_values("S_NAME").drop_duplicates()
    print("Execution time: ", ((time.time() - t1) * 1000), " (ms)")
    return total.head(10)

q20_result = q20(lineitem, part, nation, partsupp, supplier)

display(q20_result)

Execution time:  138.51100000010774  (ms)


Unnamed: 0,S_NAME,S_ADDRESS
37,Supplier#000001645,3dq6lQRmb6oukvgSbMUgBPt
38,Supplier#000001914,wFmRY6QNUcQhjjt7JIGSdv
60,Supplier#000002172,"OEtLtQ9aWxB,pCRV0brBTaqEEhatnULDNFZyiGnn"
50,Supplier#000003307,ij6rKFRJjQGU
112,Supplier#000004596,"ZTq,wSuzJJ6qXC3vu DJ"
119,Supplier#000005087,q0c6r9wYVQx31IeGBZKfe
117,Supplier#000005251,"OAOfy3S9Q OUjL28,FVs"
133,Supplier#000005287,"xuHdQHi,qvGq1zD6y295Vs5T8hiDv0MDgcNy,0AM"
121,Supplier#000005556,LaxP c8bNr1Yh8lFUHyMXBoYf1Pn91nJoc4
97,Supplier#000005754,"sNspSyE3ne2Zi,OARwe"
