## TPCH benchmark queries
This notebook implements all the tpch queries. You can try this with larger datasets on your own bodo platform tenant. Please contact bodo for larger TPCH datasets.

In [1]:
%%px
from utils.creds import *
load_aws_creds()

Starting 8 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>


  0%|          | 0/8 [00:00<?, ?engine/s]

In [2]:
%%px
from utils.tpch_functions_s3 import *

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

In [3]:
%%px
import time
@bodo.jit(cache=True)
def run_queries(data_folder):
    # Load the data
    t1 = time.time()
    lineitem = load_lineitem(data_folder)
    orders = load_orders(data_folder)
    customer = load_customer(data_folder)
    nation = load_nation(data_folder)
    region = load_region(data_folder)
    supplier = load_supplier(data_folder)
    part = load_part(data_folder)
    partsupp = load_partsupp(data_folder)

    print("Reading time (s): ", time.time() - t1)

    t1 = time.time()
    # Run the Queries:
    # q01
    q01(lineitem)
    # q2
    q02(part, partsupp, supplier, nation, region)

    # q03
    q03(lineitem, orders, customer)

    q04(lineitem, orders)
    
    q05(lineitem, orders, customer, nation, region, supplier)
    
    q06(lineitem)

    q07(lineitem, supplier, orders, customer, nation)

    q08(part, lineitem, supplier, orders, customer, nation, region)

    q09(lineitem, orders, part, nation, partsupp, supplier)

    q10(lineitem, orders, customer, nation)

    q11(partsupp, supplier, nation)

    q12(lineitem, orders)

    q13(customer, orders)

    q14(lineitem, part)

    q15(lineitem, supplier)

    q16(part, partsupp, supplier)

    q17(lineitem, part)

    q18(lineitem, orders, customer)

    q19(lineitem, part)

    q20(lineitem, part, nation, partsupp, supplier)

    q21(lineitem, orders, supplier, nation)

    q22(customer, orders)
    print("Total Query time (s): ", time.time() - t1)
    
run_queries("s3://tpch-data-parquet/SF1")

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

For best performance the number of row groups should be greater than the number of workers (8). For more details, refer to
https://docs.bodo.ai/latest/file_io/#parquet-section.



[stdout:3] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
87     5826.08  Supplier#000006524         ROMANIA      49011  Manufacturer#5   
169    5801.24  Supplier#000006544          FRANCE      74036  Manufacturer#3   
40     5800.50  Supplier#000001376          FRANCE      13872  Manufacturer#3   
66     5730.52  Supplier#000001468  UNITED KINGDOM      33958  Manufacturer#1   
48     5727.03  Supplier#000001185  UNITED KINGDOM       6184  Manufacturer#5   
273    5722.22  Supplier#000001675  UNITED KINGDOM     111674  Manufacturer#1   
401    5675.39  Supplier#000004335  UNITED KINGDOM     169302  Manufacturer#1   
56     5647.88  Supplier#000004135         GERMANY       1634  Manufacturer#2   
134    5630.80  Supplier#000002692          FRANCE      62691  Manufacturer#4   
5      5621.87  Suppli

[stdout:6] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
83     1607.30  Supplier#000008553          FRANCE      41040  Manufacturer#4   
61     1596.44  Supplier#000000158         GERMANY      25153  Manufacturer#4   
274    1559.42  Supplier#000005174          FRANCE     112662  Manufacturer#5   
289    1555.70  Supplier#000003784         ROMANIA     101273  Manufacturer#5   
149    1549.49  Supplier#000005018          FRANCE      50007  Manufacturer#2   
437    1547.92  Supplier#000003799         GERMANY     176247  Manufacturer#3   
14     1545.16  Supplier#000005827          RUSSIA      13325  Manufacturer#1   
238    1545.16  Supplier#000005827          RUSSIA     123314  Manufacturer#2   
346    1417.33  Supplier#000007386          RUSSIA     129849  Manufacturer#3   
331    1417.33  Suppli

[stdout:5] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
269    3115.27  Supplier#000001227  UNITED KINGDOM     106206  Manufacturer#1   
285    3070.27  Supplier#000009346          FRANCE     111812  Manufacturer#5   
109    3011.28  Supplier#000003026         ROMANIA      35516  Manufacturer#4   
63     2962.92  Supplier#000004476         ROMANIA      26969  Manufacturer#3   
60     2938.97  Supplier#000002124          RUSSIA      22123  Manufacturer#5   
446    2920.34  Supplier#000004897         GERMANY     197339  Manufacturer#2   
106    2905.10  Supplier#000002055          RUSSIA      49550  Manufacturer#1   
317    2887.89  Supplier#000002623  UNITED KINGDOM     125086  Manufacturer#3   
415    2887.89  Supplier#000002623  UNITED KINGDOM     195065  Manufacturer#3   
122    2886.07  Suppli

[stdout:2] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
254    7346.39  Supplier#000003467         GERMANY     123466  Manufacturer#1   
124    7323.65  Supplier#000006595         GERMANY      34091  Manufacturer#1   
100    7299.85  Supplier#000003751         ROMANIA      38744  Manufacturer#5   
420    7286.94  Supplier#000001672         GERMANY     176637  Manufacturer#1   
190    7241.31  Supplier#000000809          RUSSIA      80808  Manufacturer#1   
299    7174.74  Supplier#000000085         GERMANY     142542  Manufacturer#3   
345    7174.74  Supplier#000000085         GERMANY     147570  Manufacturer#5   
8      7148.26  Supplier#000005680  UNITED KINGDOM       5679  Manufacturer#3   
193    7135.82  Supplier#000009338         GERMANY      89337  Manufacturer#5   
133    6990.37  Suppli

[stdout:7] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
305     456.08  Supplier#000006136         ROMANIA     128599  Manufacturer#5   
64      433.74  Supplier#000000585  UNITED KINGDOM      28082  Manufacturer#4   
441     389.70  Supplier#000005657  UNITED KINGDOM     188102  Manufacturer#1   
340     361.01  Supplier#000000816  UNITED KINGDOM     133276  Manufacturer#4   
417     339.82  Supplier#000002916          RUSSIA     185361  Manufacturer#5   
211     311.07  Supplier#000004989          RUSSIA      97461  Manufacturer#2   
117     264.01  Supplier#000001889         ROMANIA      39385  Manufacturer#3   
224     211.66  Supplier#000006910         ROMANIA      99382  Manufacturer#4   
294     182.29  Supplier#000001588          FRANCE     119076  Manufacturer#3   
186     172.40  Suppli

[stdout:0] Reading time (s):  30.932581575773384
  L_RETURNFLAG L_LINESTATUS  L_QUANTITY  L_EXTENDEDPRICE    DISC_PRICE  \
0            A            F  37734107.0     5.658655e+10  5.375826e+10   
1            N            F    991417.0     1.487505e+09  1.413082e+09   
2            N            O  74476040.0     1.117017e+11  1.061182e+11   
3            R            F  37719753.0     5.656804e+10  5.374129e+10   

         CHARGE    AVG_QTY     AVG_PRICE  L_DISCOUNT  L_ORDERKEY  
0  5.590907e+10  25.522006  38273.129735    0.049985     1478493  
1  1.469649e+09  25.516472  38284.467761    0.050093       38854  
2  1.103670e+11  25.502227  38249.117989    0.049997     2920374  
3  5.588962e+10  25.505794  38250.854626    0.050009     1478870  
Q01 Execution time (s):  0.12598361094910615
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
457    9938.53  Supplier#000005359  UNITED KINGDOM     185358  Manufacturer#4   
248    9937.84  Supplier#000005969    

[stdout:4] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
426    4373.08  Supplier#000001617          FRANCE     189098  Manufacturer#1   
89     4353.56  Supplier#000007823          RUSSIA      27822  Manufacturer#4   
418    4332.54  Supplier#000001751  UNITED KINGDOM     186714  Manufacturer#2   
291    4331.41  Supplier#000001494         GERMANY     103963  Manufacturer#5   
1      4324.51  Supplier#000000957  UNITED KINGDOM      10956  Manufacturer#5   
33     4305.85  Supplier#000007241         GERMANY      12238  Manufacturer#1   
141    4241.79  Supplier#000005610          RUSSIA      55609  Manufacturer#3   
189    4232.40  Supplier#000002759         ROMANIA      75237  Manufacturer#4   
330    4232.40  Supplier#000002759         ROMANIA     140244  Manufacturer#4   
159    4162.42  Suppli

[stdout:1] Empty DataFrame
Columns: [L_RETURNFLAG, L_LINESTATUS, L_QUANTITY, L_EXTENDEDPRICE, DISC_PRICE, CHARGE, AVG_QTY, AVG_PRICE, L_DISCOUNT, L_ORDERKEY]
Index: []
     S_ACCTBAL              S_NAME          N_NAME  P_PARTKEY          P_MFGR  \
310    8691.06  Supplier#000004429  UNITED KINGDOM     126892  Manufacturer#2   
454    8655.99  Supplier#000006330          RUSSIA     193810  Manufacturer#2   
204    8638.36  Supplier#000002920          RUSSIA      75398  Manufacturer#1   
380    8638.36  Supplier#000002920          RUSSIA     170402  Manufacturer#3   
212    8607.69  Supplier#000006003  UNITED KINGDOM      76002  Manufacturer#2   
9      8569.52  Supplier#000005936          RUSSIA       5935  Manufacturer#5   
243    8564.12  Supplier#000000033         GERMANY     110032  Manufacturer#1   
306    8553.82  Supplier#000003979         ROMANIA     143978  Manufacturer#4   
115    8517.23  Supplier#000009529          RUSSIA      37025  Manufacturer#5   
128    8517.23  Suppli