# Benchmark Comparison: DuckDB vs Pandas 

**Author:** Anurag Kumar Pal
<br>
**Email:** iampalanurag@gmail.com  
**GitHub:** [Anurag-Kumar-Pal](https://github.com/Anurag-Kumar-Pal)  
**Date:** 2026-02-09  
**Description:** Benchmarkig Runtime for DuckDB and Pandas.

---

## Table of Contents
1. [Step 1- Importing the Libraries](#Step-1--Importing-the-Libraries)
2. [Step 2- Data Wrangling using the Pandas Library](#Step-2--Data-Wrangling-using-the-Pandas-Library)
3. [Step 3- Data Wrangling using the DuckDB Library](#Step-3--Data-Wrangling-using-the-DuckDB-Library)

---

### Step 1- Importing the Libraries

In [1]:
import pandas as pd
import duckdb as db
import time

### Step 2- Data Wrangling using the Pandas Library

##### A. Loading the Datasets

In [2]:
%%time

# Datase File for 50k records

pdf_50k_rows = pd.read_csv(r"C:\Users\heyit\Desktop\Jupyter Notebooks\PB Notebooks\AKP_Tech_Playground\DuckDB vs Pandas\Input_Files\50K_Rows.csv")

CPU times: total: 156 ms
Wall time: 241 ms


In [3]:
%%time

# Datase File for 100k records

pdf_100k_rows = pd.read_csv(r"C:\Users\heyit\Desktop\Jupyter Notebooks\PB Notebooks\AKP_Tech_Playground\DuckDB vs Pandas\Input_Files\100K_Rows.csv")

CPU times: total: 422 ms
Wall time: 456 ms


In [4]:
%%time

# Datase File for 500k records

pdf_500k_rows = pd.read_csv(r"C:\Users\heyit\Desktop\Jupyter Notebooks\PB Notebooks\AKP_Tech_Playground\DuckDB vs Pandas\Input_Files\500K_Rows.csv")

CPU times: total: 4.59 s
Wall time: 4.92 s


In [5]:
%%time

# Datase File for 1.5M records

pdf_1point5m_rows = pd.read_csv(r"C:\Users\heyit\Desktop\Jupyter Notebooks\PB Notebooks\AKP_Tech_Playground\DuckDB vs Pandas\Input_Files\1.5M_Rows.csv")

CPU times: total: 8.06 s
Wall time: 8.89 s


In [6]:
%%time

# Datase File for 4M records

pdf_4m_rows = pd.read_csv(r"C:\Users\heyit\Desktop\Jupyter Notebooks\PB Notebooks\AKP_Tech_Playground\DuckDB vs Pandas\Input_Files\4M_Rows.csv")

CPU times: total: 1min 40s
Wall time: 1min 45s


##### B. Checking the # of Rows and Columns for each Dataset

In [7]:
%%time
# Record Count for 50k records

pdf_50k_rows.shape

CPU times: total: 0 ns
Wall time: 905 µs


(48895, 16)

In [8]:
%%time
# Record Count for 100k records

pdf_100k_rows.shape

CPU times: total: 0 ns
Wall time: 0 ns


(114000, 21)

In [9]:
%%time
# Record Count for 500k records

pdf_500k_rows.shape

CPU times: total: 0 ns
Wall time: 0 ns


(568454, 10)

In [10]:
%%time
# Record Count for 1.5M records

pdf_1point5m_rows.shape

CPU times: total: 0 ns
Wall time: 0 ns


(1444963, 11)

In [11]:
%%time
# Record Count for 4M records

pdf_4m_rows.shape

CPU times: total: 0 ns
Wall time: 0 ns


(3906160, 21)

##### C.  Checking the "Sort" Command on Dataset

In [12]:
%%time
# Sorting Data for 50k records

pdf_50k_rows.sort_values("latitude").head(5)

CPU times: total: 62.5 ms
Wall time: 97.2 ms


Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
14119,10830083,Beautiful well kept private home!,56078939,Tony,Staten Island,Tottenville,40.49979,-74.24084,Private room,110,2,0,,,1,364
46919,35489384,Cozy Apartment,236186921,Iveth,Staten Island,Tottenville,40.50641,-74.23059,Entire home/apt,75,1,1,2019-06-28,1.0,1,299
15278,12230928,Villa DiGioia visit NYC via SI,65806798,Michael J,Staten Island,Tottenville,40.50708,-74.24285,Private room,100,2,0,,,1,365
1424,639199,"Beautiful 4BR/4BA Home, Staten Island, NY City.",1483081,Marina,Staten Island,Tottenville,40.50868,-74.23986,Entire home/apt,299,3,59,2019-07-08,0.82,1,245
23460,18997371,Cozy Getaway,90104417,Sueann,Staten Island,Tottenville,40.50873,-74.23914,Entire home/apt,85,2,49,2019-07-01,2.08,2,159


In [13]:
%%time
# Sorting Data for 100k records

pdf_100k_rows.sort_values("duration_ms").head(5)

CPU times: total: 62.5 ms
Wall time: 113 ms


Unnamed: 0.1,Unnamed: 0,track_id,artists,album_name,track_name,popularity,duration_ms,explicit,danceability,energy,...,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,track_genre
65900,65900,1kR4gIb7nGxHPI3D2ifs59,,,,0,0,False,0.501,0.583,...,-9.46,0,0.0605,0.69,0.00396,0.0747,0.734,138.391,4,k-pop
59310,59310,6hsyfegVY5yklJneM40mWi,Leila Bela,Angra Manyu,The Exorsism Begins...,0,8586,False,0.0,0.04,...,-29.714,0,0.0,0.928,0.956,0.115,0.0,0.0,0,iranian
59812,59812,38Ogh3rsHba83kXx13gbKs,Leila Bela,Angra Manyu,V-4,0,13386,False,0.0,0.224,...,-22.196,1,0.0,0.97,0.0,0.907,0.0,0.0,0,iranian
59775,59775,1HVjSh7scH1PaPiLjy2LEu,Leila Bela;Leila's Opera Class,Angra Manyu,Screams for a Finale! (feat. Leila's Opera Class),0,15800,False,0.251,0.508,...,-10.564,0,0.316,0.969,0.999,0.952,0.0,184.051,3,iranian
16856,16856,5YKCM3jbJ8lqUXUwfU7KwZ,Wolfgang Amadeus Mozart;Ingrid Haebler,Mozart: The Complete Piano Sonatas,"Andante in C Major, K. 1a",0,17453,False,0.467,0.0301,...,-28.518,0,0.0428,0.995,0.9,0.124,0.0,84.375,4,classical


In [14]:
%%time
# Sorting Data for 500k records

pdf_500k_rows.sort_values("Time").head(5)

CPU times: total: 953 ms
Wall time: 973 ms


Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
150523,150524,6641040,ACITT7DI6IDDL,shari zychinski,0,0,5,939340800,EVERY book is educational,this witty little book makes my son laugh at l...
150500,150501,6641040,AJ46FKXOVC7NR,Nicholas A Mesiano,2,2,5,940809600,This whole series is great way to spend time w...,I can remember seeing the show when it aired o...
451855,451856,B00004CXX9,AIUWLEQ1ADEG5,Elizabeth Medina,0,0,5,944092800,Entertainingl Funny!,Beetlejuice is a well written movie ..... ever...
230284,230285,B00004RYGX,A344SMIA5JECGM,Vincent P. Ross,1,2,5,944438400,A modern day fairy tale,"A twist of rumplestiskin captured on film, sta..."
451877,451878,B00004CXX9,A344SMIA5JECGM,Vincent P. Ross,1,2,5,944438400,A modern day fairy tale,"A twist of rumplestiskin captured on film, sta..."


In [15]:
%%time
# Sorting Data for 1.5M records

pdf_1point5m_rows.sort_values("creationDate").head(5)

CPU times: total: 3.12 s
Wall time: 3.2 s


Unnamed: 0,id,reviewId,creationDate,criticName,isTopCritic,originalScore,reviewState,publicatioName,reviewText,scoreSentiment,reviewUrl
155743,ballad_of_aj_weberman,1914317,1800-01-01,Jennie Kermode,False,5/5,fresh,Eye for Film,,POSITIVE,http://www.eyeforfilm.co.uk/reviews.php?id=8266
168836,lucky_country,1904549,1800-01-01,Thomas Caldwell,False,2/5,rotten,Cinema Autopsy,,NEGATIVE,http://blog.cinemaautopsy.com/2009/07/14/film-...
278238,the_definition_of_insanity,1905309,1800-01-01,Joe Lozito,False,3.5/4,fresh,Big Picture Big Sound,I don't know how autobiographical this film is...,POSITIVE,http://www.bigpicturebigsound.com/The-Definiti...
947162,juice,1897051,1800-01-01,Owen Gleiberman,True,B+,fresh,Entertainment Weekly,"Coming out from behind Spike Lee's camera, Ern...",POSITIVE,"http://www.ew.com/ew/article/0,,309271,00.html"
1060329,heartless-2009,1917013,1800-01-01,Tim Robey,True,3/5,fresh,Daily Telegraph (UK),It's exciting to see a British horror film wit...,POSITIVE,http://www.telegraph.co.uk/culture/film/filmre...


In [16]:
%%time
# Sorting Data for 4M records

pdf_4m_rows.sort_values("followers_count").head(5)

CPU times: total: 8.58 s
Wall time: 8.94 s


Unnamed: 0,id,name,universal_name,description,linkedin_url,website_url,followers_count,associated_members_count,verification,founded_on,...,location_branches,logo_url,specialities,industry,hashtags,funding_info,__created_at,__updated_at,claimable,company_type
445842,4057402,Mccormack Photos,mccormack-photos,Denver band photography and album design.,https://www.linkedin.com/company/mccormack-pho...,http://www.mccormickphotos.net/,0.0,1.0,"{""verified"": false}",,...,"[{""name"": ""Jersey City"", ""address"": {""city"": ""...",,,Photography,,,2025-06-22 06:24:31.712114+00,2025-06-22 06:24:31.712114+00,t,company
451480,4060992,Gourd Music,gourd-music,,https://www.linkedin.com/company/gourd-music/,http://www.gourd.com/,0.0,1.0,"{""verified"": false}",,...,"[{""name"": ""Felton"", ""address"": {""city"": ""Felto...",,,Music,,,2025-06-22 06:33:21.873146+00,2025-06-22 06:33:21.873146+00,t,company
1230725,4584688,Strayhorn Photography,strayhorn-photography,,https://www.linkedin.com/company/strayhorn-pho...,,0.0,1.0,"{""verified"": false}",,...,"[{""name"": ""Humboldt"", ""address"": {""city"": ""Hum...",,,Photography,,,2025-06-23 06:11:40.249238+00,2025-06-23 06:11:40.249238+00,t,company
1723093,3778148,Soluciones de Iluminación,soluciones-de-iluminación,,https://www.linkedin.com/showcase/soluciones-d...,,0.0,,"{""verified"": false}",,...,[],,,,,,2025-06-24 05:28:34.774188+00,2025-06-24 05:28:34.774188+00,f,showcase
1230721,4584680,Little Company Inc The,little-company-inc-the,"We bring together problem solvers, strategic t...",https://www.linkedin.com/company/little-compan...,http://www.littleco.com/,0.0,0.0,"{""verified"": false}",,...,"[{""name"": ""Prior Lake"", ""address"": {""city"": ""P...",,,Advertising Services,,,2025-06-23 06:11:40.249238+00,2025-06-23 06:11:40.249238+00,t,company


##### D.  Checking the "Aggregation" Command on Dataset

In [17]:
%%time
# Aggregating Data for 50k records

pdf_50k_rows.groupby("room_type")["price"].mean()

CPU times: total: 0 ns
Wall time: 34.2 ms


room_type
Entire home/apt    211.794246
Private room        89.780973
Shared room         70.127586
Name: price, dtype: float64

In [18]:
%%time
# Aggregating Data for 100k records

pdf_100k_rows.groupby("track_genre")["duration_ms"].mean()

CPU times: total: 15.6 ms
Wall time: 17.6 ms


track_genre
acoustic       214896.957
afrobeat       248412.791
alt-rock       235455.907
alternative    222016.180
ambient        237059.038
                  ...    
techno         312311.477
trance         269007.478
trip-hop       274954.026
turkish        219529.010
world-music    297195.622
Name: duration_ms, Length: 114, dtype: float64

In [19]:
%%time
# Aggregating Data for 500k records

pdf_500k_rows.groupby("Score")["Time"].mean()

CPU times: total: 15.6 ms
Wall time: 19.9 ms


Score
1    1.303159e+09
2    1.301131e+09
3    1.300126e+09
4    1.296722e+09
5    1.294306e+09
Name: Time, dtype: float64

In [20]:
%%time
# Aggregating Data for 1.5M records

pdf_1point5m_rows.groupby("reviewState")["creationDate"].max()

CPU times: total: 234 ms
Wall time: 280 ms


reviewState
fresh     2023-04-08
rotten    2023-04-08
Name: creationDate, dtype: object

In [21]:
%%time
# Aggregating Data for 4M records

pdf_4m_rows.groupby("industry")["associated_members_count"].mean()

CPU times: total: 359 ms
Wall time: 398 ms


industry
Abrasives and Nonmetallic Minerals Manufacturing    62.250000
Accessible Architecture and Design                   7.714286
Accommodation and Food Services                     54.800000
Accounting                                          29.705513
Administration of Justice                           18.807198
                                                      ...    
Wireless Services                                    8.346405
Wood Product Manufacturing                          47.346154
Writing & Editing                                    8.826317
Writing and Editing                                  3.637877
Zoos and Botanical Gardens                          60.875000
Name: associated_members_count, Length: 523, dtype: float64

##### E.  Checking the "Row-Wise Transformation" Command on Dataset

In [22]:
%%time
# Row-wise Transformation for 50k records

pdf_50k_rows["price_converted"] = pdf_50k_rows["price"].apply(lambda x: x * 1.18)

CPU times: total: 0 ns
Wall time: 14 ms


In [23]:
%%time
# Row-wise Transformation for 100k records

pdf_100k_rows["duration_ms_converted"] = pdf_100k_rows["duration_ms"].apply(lambda x: x * 1.18)

CPU times: total: 46.9 ms
Wall time: 38.6 ms


In [24]:
%%time
# Row-wise Transformation for 500k records

pdf_500k_rows["Time_converted"] = pdf_500k_rows["Time"].apply(lambda x: x * 1.18)

CPU times: total: 141 ms
Wall time: 119 ms


In [25]:
%%time
# Row-wise Transformation for 1.5M records

pdf_1point5m_rows["reviewId_converted"] = pdf_1point5m_rows["reviewId"].apply(lambda x: x * 1.18)

CPU times: total: 359 ms
Wall time: 387 ms


In [26]:
%%time
# Row-wise Transformation for 4M records

pdf_4m_rows["associated_members_count_converted"] = pdf_4m_rows["associated_members_count"].apply(lambda x: x * 1.18)

CPU times: total: 922 ms
Wall time: 959 ms


### Step 3- Data Wrangling using the DuckDB Library

##### A. Loading the Datasets

In [27]:
# DuckDB command to read a dataset for 50k records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM read_csv_auto(
            'C:\\Users\\heyit\\Desktop\\Jupyter Notebooks\\PB Notebooks\\AKP_Tech_Playground\\DuckDB vs Pandas\\Input_Files\\50K_Rows.csv')
        """

# This prepares the query but doesn't execute it
dbf_50k_rows = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
db_count_records = len(dbf_50k_rows.to_df())
action_time_duckdb = time.perf_counter() - start_action

print("Record Count: ", db_count_records)
print("Timings for 50K Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

Record Count:  48895
Timings for 50K Rows -->>>
DuckDb Lazy Read Time:  0.31 s
DuckDB Action Time:  0.22 s


In [28]:
# DuckDB command to read a dataset for 100k records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM read_csv_auto(
            'C:\\Users\\heyit\\Desktop\\Jupyter Notebooks\\PB Notebooks\\AKP_Tech_Playground\\DuckDB vs Pandas\\Input_Files\\100K_Rows.csv')
        """

# This prepares the query but doesn't execute it
dbf_100k_rows = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
db_count_records = len(dbf_100k_rows.to_df())
action_time_duckdb = time.perf_counter() - start_action

print("Record Count: ", db_count_records)
print("Timings for 100K Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

Record Count:  114000
Timings for 100K Rows -->>>
DuckDb Lazy Read Time:  0.10 s
DuckDB Action Time:  0.30 s


In [29]:
# DuckDB command to read a dataset for 500k records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM read_csv_auto(
            'C:\\Users\\heyit\\Desktop\\Jupyter Notebooks\\PB Notebooks\\AKP_Tech_Playground\\DuckDB vs Pandas\\Input_Files\\500K_Rows.csv')
        """

# This prepares the query but doesn't execute it
dbf_500k_rows = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
db_count_records = len(dbf_500k_rows.to_df())
action_time_duckdb = time.perf_counter() - start_action

print("Record Count: ", db_count_records)
print("Timings for 500K Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

Record Count:  568454
Timings for 500K Rows -->>>
DuckDb Lazy Read Time:  0.19 s
DuckDB Action Time:  1.32 s


In [30]:
# DuckDB command to read a dataset for 1.5M records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM read_csv_auto(
            'C:\\Users\\heyit\\Desktop\\Jupyter Notebooks\\PB Notebooks\\AKP_Tech_Playground\\DuckDB vs Pandas\\Input_Files\\1.5M_Rows.csv')
        """

# This prepares the query but doesn't execute it
dbf_1point5m_rows = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
db_count_records = len(dbf_1point5m_rows.to_df())
action_time_duckdb = time.perf_counter() - start_action

print("Record Count: ", db_count_records)
print("Timings for 1.5M Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

Record Count:  1444963
Timings for 1.5M Rows -->>>
DuckDb Lazy Read Time:  0.14 s
DuckDB Action Time:  3.42 s


In [31]:
# DuckDB command to read a dataset for 4M records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM read_csv_auto(
            'C:\\Users\\heyit\\Desktop\\Jupyter Notebooks\\PB Notebooks\\AKP_Tech_Playground\\DuckDB vs Pandas\\Input_Files\\4M_Rows.csv')
        """

# This prepares the query but doesn't execute it
dbf_4m_rows = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
db_count_records = len(dbf_4m_rows.to_df())
action_time_duckdb = time.perf_counter() - start_action

print("Record Count: ", db_count_records)
print("Timings for 4M Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Record Count:  3906160
Timings for 4M Rows -->>>
DuckDb Lazy Read Time:  0.23 s
DuckDB Action Time:  52.25 s


##### B.  Checking the "Sort" Command on Dataset

In [32]:
# DuckDB command to sort a dataset of 50k records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM dbf_50k_rows
        ORDER BY latitude ASC
        LIMIT 5
        """

# This prepares the query but doesn't execute it
dbf_50k_sorted = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
pdf_50k_sorted = dbf_50k_sorted.to_df()
print(pdf_50k_sorted.head())
action_time_duckdb = time.perf_counter() - start_action

print("Timings for 50K Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

         id                                             name    host_id  \
0  10830083                Beautiful well kept private home!   56078939   
1  35489384                                   Cozy Apartment  236186921   
2  12230928                   Villa DiGioia visit NYC via SI   65806798   
3    639199  Beautiful 4BR/4BA Home, Staten Island, NY City.    1483081   
4  18997371                                     Cozy Getaway   90104417   

   host_name neighbourhood_group neighbourhood  latitude  longitude  \
0       Tony       Staten Island   Tottenville  40.49979  -74.24084   
1      Iveth       Staten Island   Tottenville  40.50641  -74.23059   
2  Michael J       Staten Island   Tottenville  40.50708  -74.24285   
3     Marina       Staten Island   Tottenville  40.50868  -74.23986   
4     Sueann       Staten Island   Tottenville  40.50873  -74.23914   

         room_type  price  minimum_nights  number_of_reviews last_review  \
0     Private room    110               2     

In [33]:
# DuckDB command to sort a dataset of 100k records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM dbf_100k_rows
        ORDER BY duration_ms ASC
        LIMIT 5
        """

# This prepares the query but doesn't execute it
dbf_100k_sorted = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
pdf_100k_sorted = dbf_100k_sorted.to_df()
print(pdf_100k_sorted.head())
action_time_duckdb = time.perf_counter() - start_action

print("Timings for 100K Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

   column00                track_id                                 artists  \
0     65900  1kR4gIb7nGxHPI3D2ifs59                                    None   
1     59310  6hsyfegVY5yklJneM40mWi                              Leila Bela   
2     59812  38Ogh3rsHba83kXx13gbKs                              Leila Bela   
3     59775  1HVjSh7scH1PaPiLjy2LEu          Leila Bela;Leila's Opera Class   
4     16856  5YKCM3jbJ8lqUXUwfU7KwZ  Wolfgang Amadeus Mozart;Ingrid Haebler   

                           album_name  \
0                                None   
1                         Angra Manyu   
2                         Angra Manyu   
3                         Angra Manyu   
4  Mozart: The Complete Piano Sonatas   

                                          track_name  popularity  duration_ms  \
0                                               None           0            0   
1                             The Exorsism Begins...           0         8586   
2                                  

In [34]:
# DuckDB command to sort a dataset of 500k records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM dbf_500k_rows
        ORDER BY Time ASC
        LIMIT 5
        """

# This prepares the query but doesn't execute it
dbf_500k_sorted = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
pdf_500k_sorted = dbf_500k_sorted.to_df()
print(pdf_500k_sorted.head())
action_time_duckdb = time.perf_counter() - start_action

print("Timings for 500K Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

       Id   ProductId          UserId         ProfileName  \
0  150524     6641040   ACITT7DI6IDDL     shari zychinski   
1  150501     6641040   AJ46FKXOVC7NR  Nicholas A Mesiano   
2  451856  B00004CXX9   AIUWLEQ1ADEG5    Elizabeth Medina   
3  374359  B00004CI84  A344SMIA5JECGM     Vincent P. Ross   
4  230285  B00004RYGX  A344SMIA5JECGM     Vincent P. Ross   

   HelpfulnessNumerator  HelpfulnessDenominator  Score       Time  \
0                     0                       0      5  939340800   
1                     2                       2      5  940809600   
2                     0                       0      5  944092800   
3                     1                       2      5  944438400   
4                     1                       2      5  944438400   

                                             Summary  \
0                          EVERY book is educational   
1  This whole series is great way to spend time w...   
2                               Entertainingl Funn

In [35]:
# DuckDB command to sort a dataset of 1.5M records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM dbf_1point5m_rows
        ORDER BY creationDate ASC
        LIMIT 5
        """

# This prepares the query but doesn't execute it
dbf_1point5m_sorted = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
pdf_1point5m_sorted = dbf_1point5m_sorted.to_df()
print(pdf_1point5m_sorted.head())
action_time_duckdb = time.perf_counter() - start_action

print("Timings for 1.5M Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

                                id  reviewId creationDate        criticName  \
0                      accomplices   1914020   1800-01-01       Roger Ebert   
1                 black_death-2010   1896991   1800-01-01         Tim Robey   
2  1013485-masque_of_the_red_death   1908664   1800-01-01    Jennie Kermode   
3        the-jolly-boys-last-stand   1906693   1800-01-01  Christopher Null   
4       1202487-book_of_revelation   1892314   1800-01-01      Mike Barnard   

   isTopCritic originalScore reviewState        publicatioName  \
0         True           3/4       fresh     Chicago Sun-Times   
1         True           4/5       fresh  Daily Telegraph (UK)   
2        False           5/5       fresh          Eye for Film   
3        False           3/5       fresh        Filmcritic.com   
4        False          5/10      rotten      Future Movies UK   

                                          reviewText scoreSentiment  \
0  All four of these actors are completely natura...     

In [36]:
# DuckDB command to sort a dataset of 4M records and convert to Pandas Dataframe
# Lazy Read Simulation: Preparing the query
start_lazy = time.perf_counter()

query = """
        SELECT * FROM dbf_4m_rows
        ORDER BY followers_count ASC
        LIMIT 5
        """

# This prepares the query but doesn't execute it
dbf_4m_sorted = db.sql(query)
lazytime_duckdb = time.perf_counter() - start_lazy

# Actual Materialization of the Query
start_action = time.perf_counter()

# Triggering the DuckDB Query
pdf_4m_sorted = dbf_4m_sorted.to_df()
print(pdf_4m_sorted.head())
action_time_duckdb = time.perf_counter() - start_action

print("Timings for 4M Rows -->>>")
print(f"DuckDb Lazy Read Time: {lazytime_duckdb: .2f} s")
print(f"DuckDB Action Time: {action_time_duckdb: .2f} s")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

        id                              name                   universal_name  \
0  2655940                            FINCBS                           fincbs   
1  2655841  Consulting Group "Partner group"  consulting-group-partner-group-   
2  2655923                          Kun Jobs                         kun-jobs   
3   944867                 Polisrendement.nl                polisrendement.nl   
4  4135369                 Riverfront Cabins                riverfront-cabins   

                                         description  \
0                                               None   
1                                               None   
2                                               None   
3  Help, ik heb een woekerpolis! Hoe ga ik hierme...   
4                                               None   

                                        linkedin_url  \
0           https://www.linkedin.com/company/fincbs/   
1  https://www.linkedin.com/company/consulting-gr...   
2       