Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(route): add route for DuckDB news #16183

Merged
merged 2 commits into from
Jul 16, 2024
Merged

feat(route): add route for DuckDB news #16183

merged 2 commits into from
Jul 16, 2024

Conversation

mocusez
Copy link
Contributor

@mocusez mocusez commented Jul 16, 2024

Involved Issue / 该 PR 相关 Issue

Close #

Example for the Proposed Route(s) / 路由地址示例

/duckdb/news

New RSS Route Checklist / 新 RSS 路由检查表

  • New Route / 新的路由
  • Anti-bot or rate limit / 反爬/频率限制
    • If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
  • Date and time / 日期和时间
    • Parsed / 可以解析
    • Correct time zone / 时区正确
  • New package added / 添加了新的包
  • Puppeteer

Note / 说明

@github-actions github-actions bot added Route Auto: Route Test Complete Auto route test has finished on given PR labels Jul 16, 2024
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/duckdb/news - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>DuckDB News</title>
    <link>https://duckdb.org/news/</link>
    <atom:link href="http://localhost:1200/duckdb/news" rel="self" type="application/rss+xml"></atom:link>
    <description>DuckDB News - Made with love by RSSHub(https://github.com/DIYgod/RSSHub)</description>
    <generator>RSSHub</generator>
    <webMaster>i@diygod.me (DIYgod)</webMaster>
    <language>en</language>
    <lastBuildDate>Tue, 16 Jul 2024 12:56:18 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>DuckCon #5 in Seattle</title>
      <description></description>
      <link>https://duckdb.org/2024/08/15/duckcon5.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/08/15/duckcon5.html</guid>
      <pubDate>Thu, 15 Aug 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Memory Management in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/07/09/memory-management.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/07/09/memory-management.html</guid>
      <pubDate>Tue, 09 Jul 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB Community Extensions</title>
      <description></description>
      <link>https://duckdb.org/2024/07/05/community-extensions.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/07/05/community-extensions.html</guid>
      <pubDate>Fri, 05 Jul 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Benchmarking Ourselves over Time at DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/06/26/benchmarks-over-time.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/06/26/benchmarks-over-time.html</guid>
      <pubDate>Wed, 26 Jun 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>20 000 Stars on GitHub</title>
      <description></description>
      <link>https://duckdb.org/2024/06/22/github-20k-stars.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/06/22/github-20k-stars.html</guid>
      <pubDate>Sat, 22 Jun 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Command Line Data Processing: Using DuckDB as a Unix Tool</title>
      <description></description>
      <link>https://duckdb.org/2024/06/20/cli-data-processing-using-duckdb-as-a-unix-tool.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/06/20/cli-data-processing-using-duckdb-as-a-unix-tool.html</guid>
      <pubDate>Thu, 20 Jun 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Native Delta Lake Support in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/06/10/delta.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/06/10/delta.html</guid>
      <pubDate>Mon, 10 Jun 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Announcing DuckDB 1.0.0</title>
      <description></description>
      <link>https://duckdb.org/2024/06/03/announcing-duckdb-100.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/06/03/announcing-duckdb-100.html</guid>
      <pubDate>Mon, 03 Jun 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Analyzing Railway Traffic in the Netherlands</title>
      <description></description>
      <link>https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands.html</guid>
      <pubDate>Fri, 31 May 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Access 150k+ Datasets from Hugging Face with DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb.html</guid>
      <pubDate>Wed, 29 May 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Vector Similarity Search in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/05/03/vector-similarity-search-vss.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/05/03/vector-similarity-search-vss.html</guid>
      <pubDate>Fri, 03 May 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>duckplyr: dplyr Powered by DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/04/02/duckplyr.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/04/02/duckplyr.html</guid>
      <pubDate>Tue, 02 Apr 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>No Memory? No Problem. External Aggregation in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/03/29/external-aggregation.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/03/29/external-aggregation.html</guid>
      <pubDate>Fri, 29 Mar 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>42.parquet – A Zip Bomb for the Big Data Age</title>
      <description></description>
      <link>https://duckdb.org/2024/03/26/42-parquet-a-zip-bomb-for-the-big-data-age.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/03/26/42-parquet-a-zip-bomb-for-the-big-data-age.html</guid>
      <pubDate>Tue, 26 Mar 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Dependency Management in DuckDB Extensions</title>
      <description></description>
      <link>https://duckdb.org/2024/03/22/dependency-management.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/03/22/dependency-management.html</guid>
      <pubDate>Fri, 22 Mar 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>SQL Gymnastics: Bending SQL into Flexible New Shapes</title>
      <description></description>
      <link>https://duckdb.org/2024/03/01/sql-gymnastics.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/03/01/sql-gymnastics.html</guid>
      <pubDate>Fri, 01 Mar 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Announcing DuckDB 0.10.0</title>
      <description></description>
      <link>https://duckdb.org/2024/02/13/announcing-duckdb-0100.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/02/13/announcing-duckdb-0100.html</guid>
      <pubDate>Tue, 13 Feb 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Multi-Database Support in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2024/01/26/multi-database-support-in-duckdb.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/01/26/multi-database-support-in-duckdb.html</guid>
      <pubDate>Fri, 26 Jan 2024 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Extensions for DuckDB-Wasm</title>
      <description></description>
      <link>https://duckdb.org/2023/12/18/duckdb-extensions-in-wasm.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/12/18/duckdb-extensions-in-wasm.html</guid>
      <pubDate>Mon, 18 Dec 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Updates to the H2O.ai db-benchmark!</title>
      <description></description>
      <link>https://duckdb.org/2023/11/03/db-benchmark-update.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/11/03/db-benchmark-update.html</guid>
      <pubDate>Fri, 03 Nov 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB&#39;s CSV Sniffer: Automatic Detection of Types and Dialects</title>
      <description></description>
      <link>https://duckdb.org/2023/10/27/csv-sniffer.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/10/27/csv-sniffer.html</guid>
      <pubDate>Fri, 27 Oct 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckCon #4 in Amsterdam</title>
      <description></description>
      <link>https://duckdb.org/2023/10/06/duckcon4.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/10/06/duckcon4.html</guid>
      <pubDate>Fri, 06 Oct 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Announcing DuckDB 0.9.0</title>
      <description></description>
      <link>https://duckdb.org/2023/09/26/announcing-duckdb-090.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/09/26/announcing-duckdb-090.html</guid>
      <pubDate>Tue, 26 Sep 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB&#39;s AsOf Joins: Fuzzy Temporal Lookups</title>
      <description></description>
      <link>https://duckdb.org/2023/09/15/asof-joins-fuzzy-temporal-lookups.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/09/15/asof-joins-fuzzy-temporal-lookups.html</guid>
      <pubDate>Fri, 15 Sep 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Even Friendlier SQL with DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2023/08/23/even-friendlier-sql.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/08/23/even-friendlier-sql.html</guid>
      <pubDate>Wed, 23 Aug 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB ADBC – Zero-Copy Data Transfer via Arrow Database Connectivity</title>
      <description></description>
      <link>https://duckdb.org/2023/08/04/adbc.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/08/04/adbc.html</guid>
      <pubDate>Fri, 04 Aug 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>From Waddle to Flying: Quickly Expanding DuckDB&#39;s Functionality with Scalar Python UDFs</title>
      <description></description>
      <link>https://duckdb.org/2023/07/07/python-udf.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/07/07/python-udf.html</guid>
      <pubDate>Fri, 07 Jul 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Correlated Subqueries in SQL</title>
      <description></description>
      <link>https://duckdb.org/2023/05/26/correlated-subqueries-in-sql.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/05/26/correlated-subqueries-in-sql.html</guid>
      <pubDate>Fri, 26 May 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Announcing DuckDB 0.8.0</title>
      <description></description>
      <link>https://duckdb.org/2023/05/17/announcing-duckdb-080.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/05/17/announcing-duckdb-080.html</guid>
      <pubDate>Wed, 17 May 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>10 000 Stars on GitHub</title>
      <description></description>
      <link>https://duckdb.org/2023/05/12/github-10k-stars.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/05/12/github-10k-stars.html</guid>
      <pubDate>Fri, 12 May 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>PostGEESE? Introducing The DuckDB Spatial Extension</title>
      <description></description>
      <link>https://duckdb.org/2023/04/28/spatial.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/04/28/spatial.html</guid>
      <pubDate>Fri, 28 Apr 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckCon #3 in San Francisco</title>
      <description></description>
      <link>https://duckdb.org/2023/04/28/duckcon3.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/04/28/duckcon3.html</guid>
      <pubDate>Fri, 28 Apr 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Introducing DuckDB for Swift</title>
      <description></description>
      <link>https://duckdb.org/2023/04/21/swift.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/04/21/swift.html</guid>
      <pubDate>Fri, 21 Apr 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>The Return of the H2O.ai Database-like Ops Benchmark</title>
      <description></description>
      <link>https://duckdb.org/2023/04/14/h2oai.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/04/14/h2oai.html</guid>
      <pubDate>Fri, 14 Apr 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Shredding Deeply Nested JSON, One Vector at a Time</title>
      <description></description>
      <link>https://duckdb.org/2023/03/03/json.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/03/03/json.html</guid>
      <pubDate>Fri, 03 Mar 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>JupySQL Plotting with DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2023/02/24/jupysql.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/02/24/jupysql.html</guid>
      <pubDate>Fri, 24 Feb 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Announcing DuckDB 0.7.0</title>
      <description></description>
      <link>https://duckdb.org/2023/02/13/announcing-duckdb-070.html</link>
      <guid isPermaLink="false">https://duckdb.org/2023/02/13/announcing-duckdb-070.html</guid>
      <pubDate>Mon, 13 Feb 2023 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckCon 2023 – 2nd edition</title>
      <description></description>
      <link>https://duckdb.org/2022/11/25/duckcon.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/11/25/duckcon.html</guid>
      <pubDate>Fri, 25 Nov 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Announcing DuckDB 0.6.0</title>
      <description></description>
      <link>https://duckdb.org/2022/11/14/announcing-duckdb-060.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/11/14/announcing-duckdb-060.html</guid>
      <pubDate>Mon, 14 Nov 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Lightweight Compression in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/10/28/lightweight-compression.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/10/28/lightweight-compression.html</guid>
      <pubDate>Fri, 28 Oct 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Modern Data Stack in a Box with DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/10/12/modern-data-stack-in-a-box.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/10/12/modern-data-stack-in-a-box.html</guid>
      <pubDate>Wed, 12 Oct 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Querying Postgres Tables Directly From DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/09/30/postgres-scanner.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/09/30/postgres-scanner.html</guid>
      <pubDate>Fri, 30 Sep 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Persistent Storage of Adaptive Radix Trees in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/07/27/art-storage.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/07/27/art-storage.html</guid>
      <pubDate>Wed, 27 Jul 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Range Joins in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/05/27/iejoin.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/05/27/iejoin.html</guid>
      <pubDate>Fri, 27 May 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Friendlier SQL with DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/05/04/friendlier-sql.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/05/04/friendlier-sql.html</guid>
      <pubDate>Wed, 04 May 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Parallel Grouped Aggregation in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2022/03/07/aggregate-hashtable.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/03/07/aggregate-hashtable.html</guid>
      <pubDate>Mon, 07 Mar 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB Time Zones: Supporting Calendar Extensions</title>
      <description></description>
      <link>https://duckdb.org/2022/01/06/time-zones.html</link>
      <guid isPermaLink="false">https://duckdb.org/2022/01/06/time-zones.html</guid>
      <pubDate>Thu, 06 Jan 2022 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB Quacks Arrow: A Zero-copy Data Integration between Apache Arrow and DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2021/12/03/duck-arrow.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/12/03/duck-arrow.html</guid>
      <pubDate>Fri, 03 Dec 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB – The Lord of Enums: The Fellowship of the Categorical and Factors</title>
      <description></description>
      <link>https://duckdb.org/2021/11/26/duck-enum.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/11/26/duck-enum.html</guid>
      <pubDate>Fri, 26 Nov 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Fast Moving Holistic Aggregates</title>
      <description></description>
      <link>https://duckdb.org/2021/11/12/moving-holistic.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/11/12/moving-holistic.html</guid>
      <pubDate>Fri, 12 Nov 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>DuckDB-Wasm: Efficient Analytical SQL in the Browser</title>
      <description></description>
      <link>https://duckdb.org/2021/10/29/duckdb-wasm.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/10/29/duckdb-wasm.html</guid>
      <pubDate>Fri, 29 Oct 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Windowing in DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2021/10/13/windowing.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/10/13/windowing.html</guid>
      <pubDate>Wed, 13 Oct 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Fastest Table Sort in the West – Redesigning DuckDB’s Sort</title>
      <description></description>
      <link>https://duckdb.org/2021/08/27/external-sorting.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/08/27/external-sorting.html</guid>
      <pubDate>Fri, 27 Aug 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Querying Parquet with Precision Using DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2021/06/25/querying-parquet.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/06/25/querying-parquet.html</guid>
      <pubDate>Fri, 25 Jun 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Efficient SQL on Pandas with DuckDB</title>
      <description></description>
      <link>https://duckdb.org/2021/05/14/sql-on-pandas.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/05/14/sql-on-pandas.html</guid>
      <pubDate>Fri, 14 May 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
    <item>
      <title>Testing Out DuckDB&#39;s Full Text Search Extension</title>
      <description></description>
      <link>https://duckdb.org/2021/01/25/full-text-search.html</link>
      <guid isPermaLink="false">https://duckdb.org/2021/01/25/full-text-search.html</guid>
      <pubDate>Mon, 25 Jan 2021 00:00:00 GMT</pubDate>
      <author>DuckDB Organization</author>
    </item>
  </channel>
</rss>

lib/routes/duckdb/news.ts Outdated Show resolved Hide resolved
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/duckdb/news - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>DuckDB News</title>
    <link>https://duckdb.org/news/</link>
    <atom:link href="http://localhost:1200/duckdb/news" rel="self" type="application/rss+xml"></atom:link>
    <description>DuckDB News - Made with love by RSSHub(https://github.com/DIYgod/RSSHub)</description>
    <generator>RSSHub</generator>
    <webMaster>i@diygod.me (DIYgod)</webMaster>
    <language>en</language>
    <lastBuildDate>Tue, 16 Jul 2024 13:16:26 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>DuckCon #5 in Seattle</title>
      <description>&lt;div class=&quot;eventdate&quot;&gt;2024-08-15&lt;/div&gt;
        &lt;h1&gt;DuckCon #5 in Seattle&lt;/h1&gt;
        &lt;div class=&quot;infoline&quot;&gt;
        &lt;div class=&quot;icon&quot;&gt;
        &lt;/div&gt;
        &lt;div&gt;&lt;span class=&quot;author&quot;&gt;Mark Raasveldt, Hannes Mühleisen, Gabor Szarnyas, Kelly de Smit&lt;/span&gt;&lt;/div&gt;
        &lt;/div&gt;
        &lt;p&gt;&lt;img src=&quot;https://duckdb.org/images/duckcon5-splashscreen.svg&quot; alt=&quot;DuckCon #5 Splashscreen&quot; width=&quot;680&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p&gt;We are excited to hold the next &quot;DuckCon&quot; DuckDB user group meeting in &lt;strong&gt;Seattle, WA&lt;/strong&gt;, sponsored by &lt;a href=&quot;https://motherduck.com/&quot;&gt;MotherDuck&lt;/a&gt;.
        The meeting will take place on August 15, 2024 (Thursday) in the &lt;a href=&quot;https://www.siff.net/cinema/cinema-venues/siff-cinema-egyptian&quot;&gt;SIFF Cinema Egyptian&lt;/a&gt;, located at &lt;a href=&quot;https://maps.app.goo.gl/jRfRPMaYY6AmJ2fF6&quot;&gt;805 E Pine St, Seattle, WA, 98122&lt;/a&gt;.&lt;/p&gt;
        &lt;p&gt;As is traditional in DuckCons, we will start with a talk from DuckDB&#39;s creators &lt;a href=&quot;https://hannes.muehleisen.org/&quot;&gt;Hannes Mühleisen&lt;/a&gt; and &lt;a href=&quot;https://mytherin.github.io/&quot;&gt;Mark Raasveldt&lt;/a&gt; about the state of DuckDB. This will be followed by presentations by DuckDB users. In addition, we will have several lightning talks from the DuckDB community.&lt;/p&gt;
        &lt;h3 id=&quot;timetable&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/08/15/duckcon5.html#timetable&quot;&gt;Timetable&lt;/a&gt;
        &lt;/h3&gt;
        &lt;!-- To watch the recordings, see the [playlist of talks](https://www.youtube.com/playlist?list=). --&gt;
        &lt;table&gt;
        &lt;thead&gt;
        &lt;tr&gt;
        &lt;th&gt;Time&lt;/th&gt;
        &lt;th style=&quot;text-align: left&quot;&gt;Title&lt;/th&gt;
        &lt;th style=&quot;text-align: left&quot;&gt;Presenter&lt;/th&gt;
        &lt;/tr&gt;
        &lt;/thead&gt;
        &lt;tbody&gt;
        &lt;tr&gt;
        &lt;td&gt;1:30PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;em&gt;First session&lt;/em&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&amp;nbsp;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;1:30PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Introductions&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://hannes.muehleisen.org/&quot;&gt;Hannes Mühleisen&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://duckdblabs.com/&quot;&gt;DuckDB Labs&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;1:40PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Overview and latest developments&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;Hannes Mühleisen, &lt;a href=&quot;https://mytherin.github.io/&quot;&gt;Mark Raasveldt&lt;/a&gt; &lt;br&gt; &lt;em&gt;(DuckDB Labs)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;2:10PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;MotherDuck: Taking flight with interactive analytics&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/frances-perry/&quot;&gt;Frances Perry&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://motherduck.com/&quot;&gt;MotherDuck&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;2:40PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;How DuckDB is changing the face of spatial analytics&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/mbforr/&quot;&gt;Matt Forrest&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://carto.com/&quot;&gt;CARTO&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;3:10PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;em&gt;Break&lt;/em&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&amp;nbsp;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;3:30PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;em&gt;Second session&lt;/em&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&amp;nbsp;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;3:30PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;A duck for your dashboard: Performant data apps in the browser with DuckDB&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/rkosara/&quot;&gt;Robert Kosara&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://observablehq.com/&quot;&gt;Observable&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;4:00PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;Lightning talks&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&amp;nbsp;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Delighting users with RESTful APIs and DuckDB&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/miguelmfilipe/&quot;&gt;Miguel Filipe&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://dune.com/&quot;&gt;Dune Analytics&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Aerodynamic data models: Flying fast at scale with DuckDB&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/begelundmuller/&quot;&gt;Benjamin Egelund-Müller&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://www.rilldata.com/&quot;&gt;Rill Data&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Scaling business intelligence in the built environment using analytics and AI to improve health&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://jaan.io/&quot;&gt;Jaan Lı&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://ut.ee/en/home&quot;&gt;University of Tartu&lt;/a&gt; and &lt;a href=&quot;https://www.onefact.org/&quot;&gt;One Fact Foundation&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Double glazing: Two years of windowing improvements&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/riwesley/&quot;&gt;Richard Wesley&lt;/a&gt; &lt;br&gt; &lt;em&gt;(DuckDB Labs)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;dbverse: composable database libraries for larger-than-memory scientific analytics&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://twitter.com/Ed2uiz&quot;&gt;Edward Ruiz&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://www.bu.edu/&quot;&gt;Boston University&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;A quack at building scalable data pipelines with DuckDB&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/junaidrahim/&quot;&gt;Junaid Rahim&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://atlan.com/&quot;&gt;Atlan&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Quack attack: Bringing DuckDB to the dart side&lt;/strong&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;a href=&quot;https://www.linkedin.com/in/andyprock/&quot;&gt;Andy Prock&lt;/a&gt; &lt;br&gt; &lt;em&gt;(&lt;a href=&quot;https://www.tigereye.com/&quot;&gt;TigerEye&lt;/a&gt;)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;5:00PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;Drinks and snacks&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&amp;nbsp;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;6:30PM&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&lt;em&gt;End of event&lt;/em&gt;&lt;/td&gt;
        &lt;td style=&quot;text-align: left&quot;&gt;&amp;nbsp;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;/tbody&gt;
        &lt;/table&gt;
        &lt;h3 id=&quot;availability-of-talks&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/08/15/duckcon5.html#availability-of-talks&quot;&gt;Availability of Talks&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;The event will not be streamed online and there is no hybrid option for participation.
        However, the slides and recorded videos of the talks will be published a few weeks after the event, similarly to &lt;a href=&quot;https://duckdb.org/2023/04/28/duckcon3.html&quot;&gt;DuckCon #3&lt;/a&gt; and &lt;a href=&quot;https://duckdb.org/2023/10/06/duckcon4.html&quot;&gt;DuckCon #4&lt;/a&gt;.&lt;/p&gt;
        &lt;h3 id=&quot;registration-process&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/08/15/duckcon5.html#registration-process&quot;&gt;Registration Process&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;Attendance is free. While supplies last, you can still get a ticket on &lt;a href=&quot;https://www.eventbrite.com/e/duckcon-5-tickets-877957674037&quot;&gt;Eventbrite&lt;/a&gt;.
        You will need to show this ticket at the entrance to attend.&lt;/p&gt;
        &lt;p&gt;&lt;strong&gt;If you register before July 24, you will get a badge with your name at the registration desk.&lt;/strong&gt;&lt;/p&gt;
        &lt;h3 id=&quot;parking-information&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/08/15/duckcon5.html#parking-information&quot;&gt;Parking Information&lt;/a&gt;
        &lt;/h3&gt;
        &lt;ul&gt;
        &lt;li&gt;Parking garage is available across the street at &lt;a href=&quot;https://maps.app.goo.gl/dWe76SbhGtZ2j9Dz7&quot;&gt;1609 Harvard Ave&lt;/a&gt; for $15/day.&lt;/li&gt;
        &lt;li&gt;There is no cloakroom and there are no lockers available.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;h3 id=&quot;inquiries&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/08/15/duckcon5.html#inquiries&quot;&gt;Inquiries&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;Please contact Kelly de Smit at &lt;a href=&quot;https://duckdb.org/cdn-cgi/l/email-protection#7d16181111043d19081e16191f111c1f0e531e1210&quot;&gt;&lt;span class=&quot;__cf_email__&quot; data-cfemail=&quot;701b151c1c09301405131b14121c1112035e131f1d&quot;&gt;[email&amp;nbsp;protected]&lt;/span&gt;&lt;/a&gt; if you have any questions.&lt;/p&gt;
      </description>
      <link>https://duckdb.org/2024/08/15/duckcon5.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/08/15/duckcon5.html</guid>
      <pubDate>Thu, 15 Aug 2024 00:00:00 GMT</pubDate>
      <author>Mark Raasveldt, Hannes Mühleisen, Gabor Szarnyas, Kelly de Smit</author>
    </item>
    <item>
      <title>Memory Management in DuckDB</title>
      <description>&lt;h1&gt;Memory Management in DuckDB&lt;/h1&gt;
        &lt;div class=&quot;infoline&quot;&gt;
        &lt;div class=&quot;icon&quot;&gt;
        &lt;img src=&quot;https://duckdb.org/images/blog/authors/mark_raasveldt.jpg&quot; alt=&quot;Author Avatar&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;/div&gt;
        &lt;div&gt;&lt;span class=&quot;author&quot;&gt;Mark Raasveldt&lt;/span&gt;&lt;span&gt;2024-07-09&lt;/span&gt;&lt;/div&gt;
        &lt;/div&gt;
        &lt;p&gt;Memory is an important resource when processing large amounts of data. Memory is a fast caching layer that can provide immense speed-ups to query processing. However, memory is finite and expensive, and when working with large data sets there is generally not enough memory available to keep all necessary data structures cached. Managing memory effectively is critical for a high-performance query engine – as memory must be utilized in order to provide that high performance, but we must be careful so that we do not use excessive memory which can cause out-of-memory errors or can cause the ominous &lt;a href=&quot;https://en.wikipedia.org/wiki/Out_of_memory#Recovery&quot;&gt;OOM killer&lt;/a&gt; to zap the process out of existence.&lt;/p&gt;
        &lt;p&gt;DuckDB is built to effectively utilize available memory while avoiding running out of memory:&lt;/p&gt;
        &lt;ul&gt;
        &lt;li&gt;The streaming execution engine allows small chunks of data to flow through the system without requiring entire data sets to be materialized in memory.&lt;/li&gt;
        &lt;li&gt;Data from intermediates can be spilled to disk temporarily in order to free up space in memory, allowing computation of complex queries that would otherwise exceed the available memory.&lt;/li&gt;
        &lt;li&gt;The buffer manager caches as many pages as possible from any attached databases without exceeding the pre-defined memory limits.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;p&gt;In this blog post we will cover these aspects of memory management within DuckDB – and provide examples of where they are utilized.&lt;/p&gt;
        &lt;h2 id=&quot;streaming-execution&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/09/memory-management.html#streaming-execution&quot;&gt;Streaming Execution&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;DuckDB uses a streaming execution engine to process queries. Data sources, such as tables, CSV files or Parquet files, are never fully materialized in memory. Instead, data is read and processed one chunk at a time. For example, consider the execution of the following query:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;UserAgent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&#39;hits.csv&#39;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;UserAgent&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;Instead of reading the entire CSV file at once, DuckDB reads data from the CSV file in pieces, and computes the aggregation incrementally using the data read from those pieces. This happens continuously until the entire CSV file is read, at which point the entire aggregation result is computed.&lt;/p&gt;
        &lt;p&gt;&lt;img src=&quot;https://duckdb.org/images/blog/streamingexecution.png&quot; alt=&quot;DuckDB Streaming Execution&quot; width=&quot;800&quot; referrerpolicy=&quot;no-referrer&quot;&gt;&lt;/p&gt;
        &lt;p&gt;In the above example we are only showing a single data stream. In practice, DuckDB uses multiple data streams to enable multi-threaded execution – each thread executes its own data stream. The aggregation results of the different threads are combined to compute the final result.&lt;/p&gt;
        &lt;p&gt;While streaming execution is conceptually simple, it is powerful, and is sufficient to provide larger-than-memory support for many simple use cases. For example, streaming execution enables larger-than-memory support for:&lt;/p&gt;
        &lt;ul&gt;
        &lt;li&gt;Computing aggregations where the total number of groups is small&lt;/li&gt;
        &lt;li&gt;Reading data from one file and writing to another (e.g., reading from CSV and writing to Parquet)&lt;/li&gt;
        &lt;li&gt;Computing a Top-N over the data (where N is small)&lt;/li&gt;
        &lt;/ul&gt;
        &lt;p&gt;Note that nothing needs to be done to enable streaming execution – DuckDB always processes queries in this manner.&lt;/p&gt;
        &lt;h2 id=&quot;intermediate-spilling&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/09/memory-management.html#intermediate-spilling&quot;&gt;Intermediate Spilling&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;While streaming execution enables larger-than-memory processing for simple queries, there are many cases where streaming execution alone is not sufficient.&lt;/p&gt;
        &lt;p&gt;In the previous example, streaming execution enabled larger-than-memory processing because the computed aggregate result was very small – as there are very few unique user agents in comparison to the total number of web requests. As a result, the aggregate hash table would always remain small, and never exceed the amount of available memory.&lt;/p&gt;
        &lt;p&gt;Streaming execution is not sufficient if the intermediates required to process a query are larger than memory. For example, suppose we group by the source IP in the previous example:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IPNetworkID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&#39;hits.csv&#39;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IPNetworkID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;Since there are many more unique source IPs, the hash table we need to maintain is significantly larger. If the size of the aggregate hash table exceeds memory, the streaming execution engine is not sufficient to prevent out-of-memory issues.&lt;/p&gt;
        &lt;p&gt;Larger-than-memory intermediates can happen in many scenarios, in particular when executing more complex queries. For example, the following scenarios can lead to larger-than-memory intermediates:&lt;/p&gt;
        &lt;ul&gt;
        &lt;li&gt;Computing an aggregation with many unique groups&lt;/li&gt;
        &lt;li&gt;Computing an exact distinct count of a column with many distinct values&lt;/li&gt;
        &lt;li&gt;Joining two tables together that are both larger than memory&lt;/li&gt;
        &lt;li&gt;Sorting a larger-than-memory dataset&lt;/li&gt;
        &lt;li&gt;Computing a complex window over a larger-than-memory table&lt;/li&gt;
        &lt;/ul&gt;
        &lt;p&gt;DuckDB deals with these scenarios by disk spilling. Larger-than-memory intermediates are (partially) written to disk in the temporary directory when required. While powerful, disk spilling reduces performance – as additional I/O must be performed. For that reason, DuckDB tries to minimize disk spilling. Disk spilling is adaptively used only when the size of the intermediates increases past the memory limit. Even in those scenarios, as much data is kept in memory as possible to maximize performance. The exact way this is done depends on the operators and is detailed in other blog posts
        (&lt;a href=&quot;https://duckdb.org/2024/03/29/external-aggregation.html&quot;&gt;aggregation&lt;/a&gt;,
        &lt;a href=&quot;https://duckdb.org/2021/08/27/external-sorting.html&quot;&gt;sorting&lt;/a&gt;).&lt;/p&gt;
        &lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;memory_limit&lt;/code&gt; setting controls how much data DuckDB is allowed to keep in memory. By default, this is set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;80%&lt;/code&gt; of the physical RAM of your system (e.g., if your system has 16 GB RAM, this defaults to 12.8 GB). The memory limit can be changed using the following command:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;py&quot;&gt;memory_limit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&#39;4GB&#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;The location of the temporary directory can be chosen using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;temp_directory&lt;/code&gt; setting, and is by default the connected database with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.tmp&lt;/code&gt; suffix (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;database.db.tmp&lt;/code&gt;), or only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.tmp&lt;/code&gt; if connecting to an in-memory database. The maximum size of the temporary directory can be limited using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_temp_directory_size&lt;/code&gt; setting, which defaults to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;90%&lt;/code&gt; of the remaining disk space on the drive where the temporary files are stored. These settings can be adjusted as follows:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;py&quot;&gt;temp_directory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&#39;/tmp/duckdb_swap&#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;py&quot;&gt;max_temp_directory_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&#39;100GB&#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;If the memory limit is exceeded and disk spilling cannot be used, either because disk spilling is explicitly disabled, the temporary directory size exceeds the provided limit, or a system limitation means that disk spilling cannot be used for a given query – an out-of-memory error is reported and the query is canceled.&lt;/p&gt;
        &lt;h2 id=&quot;buffer-manager&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/09/memory-management.html#buffer-manager&quot;&gt;Buffer Manager&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;Another core component of memory management in DuckDB is the buffer manager. The buffer manager is responsible for caching pages from DuckDB&#39;s own persistent storage. Conceptually the buffer manager works in a similar fashion to the intermediate spilling. Pages are kept in memory as much as possible, and evicted from memory when space is required for other data structures. The buffer manager abides by the same memory limit as any intermediate data structures. Pages in the buffer manager can be freed up to make space for intermediate data structures, or vice versa.&lt;/p&gt;
        &lt;p&gt;There are two main differences between the buffer manager and intermediate data structures:&lt;/p&gt;
        &lt;ul&gt;
        &lt;li&gt;As the buffer manager caches pages that already exist on disk (in DuckDB&#39;s persistent storage) – they do not need to be written to the temporary directory when evicted. Instead, when they are required again, they can be re-read from the attached storage file directly.&lt;/li&gt;
        &lt;li&gt;Query intermediates have a natural life-cycle, namely when the query is finished processing the intermediates are no longer required. Pages that are buffer managed from the persistent storage are useful across queries. As such, the pages kept by the buffer manager are kept cached until either the persistent database is closed, or until space must be freed up for other operations.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;p&gt;The performance boost of the buffer manager depends on the speed of the underlying storage medium. When data is stored on a very fast disk, reading data is fast and the speed-up is minimal. When data is stored on a network drive or read over http/S3, reading requires performing network requests, and the speed-up can be very large.&lt;/p&gt;
        &lt;h2 id=&quot;profiling-memory-usage&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/09/memory-management.html#profiling-memory-usage&quot;&gt;Profiling Memory Usage&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;DuckDB contains a number of tools that can be used to profile memory usage.&lt;/p&gt;
        &lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;duckdb_memory()&lt;/code&gt; function can be used to inspect which components of the system are using memory. Memory used by the buffer manager is labeled as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BASE_TABLE&lt;/code&gt;, while query intermediates are divided into separate groups.&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;duckdb_memory&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;┌──────────────────┬────────────────────┬─────────────────────────┐
        │ tag │ memory_usage_bytes │ temporary_storage_bytes │
        │ varchar │ int64 │ int64 │
        ├──────────────────┼────────────────────┼─────────────────────────┤
        │ BASE_TABLE │ 168558592 │ 0 │
        │ HASH_TABLE │ 0 │ 0 │
        │ PARQUET_READER │ 0 │ 0 │
        │ CSV_READER │ 0 │ 0 │
        │ ORDER_BY │ 0 │ 0 │
        │ ART_INDEX │ 0 │ 0 │
        │ COLUMN_DATA │ 0 │ 0 │
        │ METADATA │ 0 │ 0 │
        │ OVERFLOW_STRINGS │ 0 │ 0 │
        │ IN_MEMORY_TABLE │ 0 │ 0 │
        │ ALLOCATOR │ 0 │ 0 │
        │ EXTENSION │ 0 │ 0 │
        ├──────────────────┴────────────────────┴─────────────────────────┤
        │ 12 rows 3 columns │
        └─────────────────────────────────────────────────────────────────┘
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;duckdb_temporary_files&lt;/code&gt; function can be used to examine the current contents of the temporary directory.&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;duckdb_temporary_files&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;┌────────────────────────────────┬───────────┐
        │ path │ size │
        │ varchar │ int64 │
        ├────────────────────────────────┼───────────┤
        │ .tmp/duckdb_temp_storage-0.tmp │ 967049216 │
        └────────────────────────────────┴───────────┘
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;h2 id=&quot;conclusion&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/09/memory-management.html#conclusion&quot;&gt;Conclusion&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;Memory management is critical for a high-performance analytics engine. DuckDB is built to take advantage of any available memory to speed up query processing, while gracefully dealing with larger-than-memory datasets using intermediate spilling. Memory management is still an active area of development and has &lt;a href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#scale-tests&quot;&gt;continuously improved across DuckDB versions&lt;/a&gt;. Amongst others, we are working on improving memory management for complex queries that involve multiple operators with larger-than-memory intermediates.&lt;/p&gt;
      </description>
      <link>https://duckdb.org/2024/07/09/memory-management.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/07/09/memory-management.html</guid>
      <pubDate>Tue, 09 Jul 2024 00:00:00 GMT</pubDate>
      <author>Mark Raasveldt</author>
    </item>
    <item>
      <title>DuckDB Community Extensions</title>
      <description>&lt;h1&gt;DuckDB Community Extensions&lt;/h1&gt;
        &lt;div class=&quot;infoline&quot;&gt;
        &lt;div class=&quot;icon&quot;&gt;
        &lt;/div&gt;
        &lt;div&gt;&lt;span class=&quot;author&quot;&gt;The DuckDB team&lt;/span&gt;&lt;span&gt;2024-07-05&lt;/span&gt;&lt;/div&gt;
        &lt;/div&gt;
        &lt;div class=&quot;excerpt&quot;&gt;&lt;p&gt;&lt;em&gt;TL;DR: DuckDB extensions can now be published via the &lt;a href=&quot;https://github.com/duckdb/community-extensions&quot;&gt;DuckDB Community Extensions repository&lt;/a&gt;. The repository makes it easier for users to install extensions using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INSTALL ⟨extension name⟩ FROM community&lt;/code&gt; syntax. Extension developers avoid the burdens of compilation and distribution.&lt;/em&gt;&lt;/p&gt;
        &lt;/div&gt;
        &lt;h2 id=&quot;duckdb-extensions&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#duckdb-extensions&quot;&gt;DuckDB Extensions&lt;/a&gt;
        &lt;/h2&gt;
        &lt;h3 id=&quot;design-philosophy&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#design-philosophy&quot;&gt;Design Philosophy&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;One of the main design goals of DuckDB is &lt;em&gt;simplicity&lt;/em&gt;, which – to us – implies that the system should be rather nimble, very light on dependencies, and generally small enough to run on constrained platforms like &lt;a href=&quot;https://duckdb.org/docs/api/wasm/overview.html&quot;&gt;WebAssembly&lt;/a&gt;. This goal is in direct conflict with very reasonable user requests to support advanced features like spatial data analysis, vector indexes, connectivity to various other databases, support for data formats, etc. Baking all those features into a monolithic binary is certainly possible and the route some systems take. But we want to preserve DuckDB’s simplicity. Also, shipping all possible features would be quite excessive for most users because no use cases require &lt;em&gt;all&lt;/em&gt; extensions at the same time (the “Microsoft Word paradox”, where even power users only use a few features of the system, but the exact set of features vary between users).&lt;/p&gt;
        &lt;p&gt;To achieve this, DuckDB has a powerful extension mechanism, which allows users to add new functionalities to DuckDB. This mechanism allows for registering new functions, supporting new file formats and compression methods, handling new network protocols, etc. In fact, many of DuckDB’s popular features are implemented as extensions: the &lt;a href=&quot;https://duckdb.org/docs/data/parquet/overview.html&quot;&gt;Parquet reader&lt;/a&gt;, the &lt;a href=&quot;https://duckdb.org/docs/extensions/json.html&quot;&gt;JSON reader&lt;/a&gt;, and the &lt;a href=&quot;https://duckdb.org/docs/extensions/httpfs/overview.html&quot;&gt;HTTPS/S3 connector&lt;/a&gt; all use the extension mechanism.&lt;/p&gt;
        &lt;h3 id=&quot;using-extensions&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#using-extensions&quot;&gt;Using Extensions&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;Since &lt;a href=&quot;https://github.com/duckdb/duckdb/releases/tag/v0.3.2&quot;&gt;version 0.3.2&lt;/a&gt;, we have already greatly simplified the discovery and installation by hosting them on a centralized extension repository. So, for example, to install the &lt;a href=&quot;https://duckdb.org/docs/extensions/spatial.html&quot;&gt;spatial extension&lt;/a&gt;, one can just run the following commands using DuckDB’s SQL interface:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSTALL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spatial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- once&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;LOAD&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spatial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- on each use&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;What happens behind the scenes is that DuckDB downloads an extension binary suitable to the current operating system and processor architecture (e.g., macOS on ARM64) and stores it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/.duckdb&lt;/code&gt; folder. On each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LOAD&lt;/code&gt;, this file is loaded into the running DuckDB instance, and things happily continue from there. Of course, for this to work, we compile, sign and host the extensions for a rather large and growing list of processor architecture – operating system combinations. This mechanism is already heavily used, currently, we see around six million extension downloads &lt;em&gt;each week&lt;/em&gt; with a corresponding data transfer volume of around 40 terabytes!&lt;/p&gt;
        &lt;p&gt;Until now, publishing third-party extensions has been a &lt;em&gt;difficult process&lt;/em&gt; which required the extension developer to build the extensions in their repositories for a host of platforms. Moreover, they were unable to sign the extensions using official keys, forcing users to use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;allow_unsigned_extensions&lt;/code&gt; option that disables signature checks which is problematic in itself.&lt;/p&gt;
        &lt;h2 id=&quot;duckdb-community-extensions&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#duckdb-community-extensions&quot;&gt;DuckDB Community Extensions&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;Distributing software in a safe way has never been easier, allowing us to reach a wide base of users across pip, conda, cran, npm, brew, etc. We want to provide a similar experience both to users who can easily grab the extension they will want to use, and developers who should not be burdened with distribution details. We are also interested in lowering the bar to package utilities and scripts as a DuckDB extension, empowering users to package useful functionality connected to their area of expertise (or pain points).&lt;/p&gt;
        &lt;p&gt;We believe that fostering a community extension ecosystem is the next logical step for DuckDB. That’s why we’re very excited about launching our &lt;a href=&quot;https://github.com/duckdb/community-extensions/&quot;&gt;Community Extension repository&lt;/a&gt; which was &lt;a href=&quot;https://youtu.be/wuP6iEYH11E?t=275&quot;&gt;announced at the Data + AI Summit&lt;/a&gt;.&lt;/p&gt;
        &lt;p&gt;For users, this repository allows for easy discovery, installation and maintenance of community extensions directly from the DuckDB SQL prompt. For developers, it greatly streamlines the publication process of extensions. In the following, we’ll discuss how the new extension repository enhances the experiences of these groups.&lt;/p&gt;
        &lt;h3 id=&quot;user-experience&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#user-experience&quot;&gt;User Experience&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;We are going to use the &lt;a href=&quot;https://github.com/isaacbrodsky/h3-duckdb&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;h3&lt;/code&gt; extension&lt;/a&gt; as our example. This extension implements &lt;a href=&quot;https://github.com/uber/h3&quot;&gt;hierarchical hexagonal indexing&lt;/a&gt; for geospatial data.&lt;/p&gt;
        &lt;p&gt;Using the DuckDB Community Extensions repository, you can now install and load the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;h3&lt;/code&gt; extension as follows:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;INSTALL&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;community&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;LOAD&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;Then, you can instantly start using it. Note that the sample data is 500 MB:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;h3_latlng_to_cell&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pickup_latitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pickup_longitude&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cell_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;h3_cell_to_boundary_wkt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cell_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;boundary&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cnt&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;read_parquet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&#39;https://blobs.duckdb.org/data/yellow_tripdata_2010-01.parquet&#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cell_id&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;HAVING&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cnt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;On load, the extension’s signature is checked, both to ensure platform and versions are compatible, and to verify that the source of the binary is the community extensions repository. Extensions are built, signed and distributed for Linux, macOS, Windows, and WebAssembly. This allows extensions to be available to any DuckDB client using version 1.0.0 and upcoming versions.&lt;/p&gt;
        &lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;h3&lt;/code&gt; extension’s documentation is available at &lt;a href=&quot;https://community-extensions.duckdb.org/extensions/h3.html&quot;&gt;https://community-extensions.duckdb.org/extensions/h3.html&lt;/a&gt;.&lt;/p&gt;
        &lt;h3 id=&quot;developer-experience&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#developer-experience&quot;&gt;Developer Experience&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;From the developer’s perspective, the Community Extensions repository performs the steps required for publishing extensions, including building the extensions for all relevant &lt;a href=&quot;https://duckdb.org/docs/dev/building/supported_platforms.html&quot;&gt;platforms&lt;/a&gt;, signing the extension binaries and serving them from the repository.&lt;/p&gt;
        &lt;p&gt;For the &lt;a href=&quot;https://github.com/isaacbrodsky/&quot;&gt;maintainer of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;h3&lt;/code&gt;&lt;/a&gt;, the publication process required performing the following steps:&lt;/p&gt;
        &lt;ol&gt;
        &lt;li&gt;
        &lt;p&gt;Sending a PR with a metadata file &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;description.yml&lt;/code&gt; contains the description of the extension:&lt;/p&gt;
        &lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;extension&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;h3&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Hierarchical hexagonal indexing for geospatial data&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;1.0.0&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;language&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;C++&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;cmake&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;license&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Apache-2.0&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;maintainers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;isaacbrodsky&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;repo&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;github&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;isaacbrodsky/h3-duckdb&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;ref&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;3c8a5358e42ab8d11e0253c70f7cc7d37781b2ef&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt; &lt;/div&gt;
        &lt;/li&gt;
        &lt;li&gt;
        &lt;p&gt;The CI will build and test the extension. The checks performed by the CI are aligned with the &lt;a href=&quot;https://github.com/duckdb/extension-template&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extension-template&lt;/code&gt; repository&lt;/a&gt;, so iterations can be done independently.&lt;/p&gt;
        &lt;/li&gt;
        &lt;li&gt;
        &lt;p&gt;Wait for approval from the DuckDB Community Extension repository’s maintainers and for the build process to complete.&lt;/p&gt;
        &lt;/li&gt;
        &lt;/ol&gt;
        &lt;h2 id=&quot;published-extensions&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#published-extensions&quot;&gt;Published Extensions&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;To show that it’s feasible to publish extensions, we reached out to a few developers of key extensions. At the time of the publication of this blog post, the DuckDB Community Extensions repository already contains the following extensions.&lt;/p&gt;
        &lt;div class=&quot;narrow_table&quot;&gt;&lt;/div&gt;
        &lt;table&gt;
        &lt;thead&gt;
        &lt;tr&gt;
        &lt;th&gt;Name&lt;/th&gt;
        &lt;th&gt;Description&lt;/th&gt;
        &lt;/tr&gt;
        &lt;/thead&gt;
        &lt;tbody&gt;
        &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://github.com/rustyconover/duckdb-crypto-extension&quot;&gt;crypto&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;Adds cryptographic hash functions and &lt;a href=&quot;https://en.wikipedia.org/wiki/HMAC&quot;&gt;HMAC&lt;/a&gt;.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://github.com/isaacbrodsky/h3-duckdb&quot;&gt;h3&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;Implements hierarchical hexagonal indexing for geospatial data.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://github.com/rustyconover/duckdb-lindel-extension&quot;&gt;lindel&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;Implements linearization/delinearization, Z-Order, Hilbert and Morton curves.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://github.com/ywelsch/duckdb-prql&quot;&gt;prql&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;Allows running &lt;a href=&quot;https://prql-lang.org/&quot;&gt;PRQL&lt;/a&gt; commands directly within DuckDB.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://github.com/pdet/Scrooge-McDuck&quot;&gt;scrooge&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;Supports a set of aggregation functions and data scanners for financial data.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://github.com/rustyconover/duckdb-shellfs-extension&quot;&gt;shellfs&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;Allows shell commands to be used for input and output.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;/tbody&gt;
        &lt;/table&gt;
        &lt;p&gt;DuckDB Labs and the DuckDB Foundation do not vet the code within community extensions and, therefore, cannot guarantee that DuckDB community extensions are safe to use. The loading of community extensions can be explicitly disabled with the following one-way configuration option:&lt;/p&gt;
        &lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;py&quot;&gt;allow_community_extensions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
        &lt;p&gt;For more details, see the documentation’s &lt;a href=&quot;https://duckdb.org/docs/operations_manual/securing_duckdb/securing_extensions.html#community-extension&quot;&gt;Securing DuckDB page&lt;/a&gt;.&lt;/p&gt;
        &lt;h2 id=&quot;summary-and-looking-ahead&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/07/05/community-extensions.html#summary-and-looking-ahead&quot;&gt;Summary and Looking Ahead&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;In this blog post, we introduced the DuckDB Community Extensions repository, which allows easy installation of third-party DuckDB extensions.&lt;/p&gt;
        &lt;p&gt;We are looking forward to continuously extending this repository. If you have an idea for creating an extension, take a look at the already published extension source codes, which provide good examples of how to package community extensions, and join the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;#extensions&lt;/code&gt; channel on our &lt;a href=&quot;https://discord.duckdb.org/&quot;&gt;Discord&lt;/a&gt;.
        Once you have an extension, please contribute it via a &lt;a href=&quot;https://github.com/duckdb/community-extensions/pulls&quot;&gt;pull request&lt;/a&gt;.&lt;/p&gt;
        &lt;p&gt;Finally, we would like to thank the early adopters of DuckDB’s extension mechanism and Community Extension repository. Thanks for iterating with us and providing feedback to us.&lt;/p&gt;
      </description>
      <link>https://duckdb.org/2024/07/05/community-extensions.html</link>
      <guid isPermaLink="false">https://duckdb.org/2024/07/05/community-extensions.html</guid>
      <pubDate>Fri, 05 Jul 2024 00:00:00 GMT</pubDate>
      <author>The DuckDB team</author>
    </item>
    <item>
      <title>Benchmarking Ourselves over Time at DuckDB</title>
      <description>&lt;h1&gt;Benchmarking Ourselves over Time at DuckDB&lt;/h1&gt;
        &lt;div class=&quot;infoline&quot;&gt;
        &lt;div class=&quot;icon&quot;&gt;
        &lt;img src=&quot;https://duckdb.org/images/blog/authors/alex_monahan.jpg&quot; alt=&quot;Author Avatar&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;/div&gt;
        &lt;div&gt;&lt;span class=&quot;author&quot;&gt;Alex Monahan&lt;/span&gt;&lt;span&gt;2024-06-26&lt;/span&gt;&lt;/div&gt;
        &lt;/div&gt;
        &lt;div class=&quot;excerpt&quot;&gt;&lt;p&gt;&lt;em&gt;TL;DR: In the last 3 years, DuckDB has become 3-25x faster and can analyze ~10x larger datasets all on the same hardware.&lt;/em&gt;&lt;/p&gt;
        &lt;/div&gt;
        &lt;!-- &lt;script src=&quot;https://cdn.plot.ly/plotly-latest.min.js&quot;&gt;&lt;/script&gt; --&gt;
        &lt;div id=&quot;overall_results_by_time_header&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;p&gt;A big part of DuckDB&#39;s focus is on the developer experience of working with data.
        However, performance is an important consideration when investigating data management systems.
        Fairly comparing data processing systems using benchmarks is &lt;a href=&quot;https://mytherin.github.io/papers/2018-dbtest.pdf&quot;&gt;very difficult&lt;/a&gt;.
        Whoever creates the benchmark is likely to know one system better than the rest, influencing benchmark selection, how much time is spent tuning parameters, and more.&lt;/p&gt;
        &lt;p&gt;Instead, this post focuses on benchmarking &lt;em&gt;our own&lt;/em&gt; performance over time.
        &lt;!-- (Of course, we encourage you to conduct your own benchmarks and welcome your feedback on our [Discord server](https://discord.duckdb.org/) or in [GitHub discussions](https://github.com/duckdb/duckdb/discussions)!). --&gt;
        This approach avoids many comparison pitfalls, and also provides several valuable data points to consider when selecting a system.&lt;/p&gt;
        &lt;ul&gt;
        &lt;li&gt;
        &lt;p&gt;&lt;strong&gt;How fast is it improving?&lt;/strong&gt;
        Learning a new tool is an investment.
        Picking a vibrant, rapidly improving database ensures your choice pays dividends for years to come.
        Plus, if you haven&#39;t experimented with a tool in a while, you can see how much faster it has become since you last checked!&lt;/p&gt;
        &lt;/li&gt;
        &lt;li&gt;
        &lt;p&gt;&lt;strong&gt;What is it especially good at?&lt;/strong&gt;
        The choice of benchmark is an indicator of what types of workloads a tool is useful for.
        The higher the variety of analyses in the benchmark, the more broadly useful the tool can be.&lt;/p&gt;
        &lt;/li&gt;
        &lt;li&gt;
        &lt;p&gt;&lt;strong&gt;What scale of data can it handle?&lt;/strong&gt;
        Many benchmarks are deliberately smaller than typical workloads.
        This allows the benchmark to complete in a reasonable amount of time when run with many configurations.
        However, an important question to answer when selecting a system is whether the size of your data can be handled within the size of your compute resources.&lt;/p&gt;
        &lt;/li&gt;
        &lt;/ul&gt;
        &lt;!-- #### Limitations of Benchmarking over Time --&gt;
        &lt;p&gt;There are some limitations when looking at the performance of a system over time.
        If a feature is brand new, there is no prior performance to compare to!
        As a result, this post focuses on fundamental workloads rather than DuckDB&#39;s ever-increasing set of integrations with different lakehouse data formats, cloud services, and more.&lt;/p&gt;
        &lt;p&gt;The code used to run the benchmark also avoids many of DuckDB&#39;s &lt;a href=&quot;https://duckdb.org/docs/sql/dialect/friendly_sql.html&quot;&gt;Friendlier SQL&lt;/a&gt; additions, as those have also been added more recently.
        (When writing these queries, it felt like going back in time!)&lt;/p&gt;
        &lt;h2 id=&quot;benchmark-design-summary&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#benchmark-design-summary&quot;&gt;Benchmark Design Summary&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;This post measures DuckDB&#39;s performance over time using the &lt;a href=&quot;https://duckdblabs.github.io/db-benchmark/&quot;&gt;H2O.ai benchmark&lt;/a&gt;, plus some new benchmarks added for importing, exporting, and using window functions.
        Please see our previous &lt;a href=&quot;https://duckdb.org/2023/04/14/h2oai.html&quot;&gt;blog&lt;/a&gt; &lt;a href=&quot;https://duckdb.org/2023/11/03/db-benchmark-update.html&quot;&gt;posts&lt;/a&gt; for details on why we believe the H2O.ai benchmark is a good approach! The full details of the benchmark design are in the appendix.&lt;/p&gt;
        &lt;ul&gt;
        &lt;li&gt;H2O.ai, plus import/export and window function tests&lt;/li&gt;
        &lt;li&gt;Python instead of R&lt;/li&gt;
        &lt;li&gt;5GB scale for everything, plus 50GB scale for group bys and joins&lt;/li&gt;
        &lt;li&gt;Median of 3 runs&lt;/li&gt;
        &lt;li&gt;Using a MacBook Pro M1 with 16GB RAM&lt;/li&gt;
        &lt;li&gt;DuckDB Versions 0.2.7 through 1.0.0
        &lt;ul&gt;
        &lt;li&gt;Nearly 3 years, from 2021-06-14 to 2024-06-03&lt;/li&gt;
        &lt;/ul&gt;
        &lt;/li&gt;
        &lt;li&gt;Default settings&lt;/li&gt;
        &lt;li&gt;Pandas pre-version 0.5.1, Apache Arrow 0.5.1+&lt;/li&gt;
        &lt;/ul&gt;
        &lt;h2 id=&quot;overall-benchmark-results&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#overall-benchmark-results&quot;&gt;Overall Benchmark Results&lt;/a&gt;
        &lt;/h2&gt;
        &lt;p&gt;The latest DuckDB can complete one run of the full benchmark suite in under 35 seconds, while version 0.2.7 required nearly 500 seconds for the same task in June 2021.
        &lt;strong&gt;That is 14 times faster, in only 3 years!&lt;/strong&gt;&lt;/p&gt;
        &lt;h3 id=&quot;performance-over-time&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#performance-over-time&quot;&gt;Performance over Time&lt;/a&gt;
        &lt;/h3&gt;
        &lt;div id=&quot;overall_results_by_time&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;blockquote&gt;
        &lt;p&gt;Note These graphs are interactive, thanks to &lt;a href=&quot;https://plotly.com/javascript/&quot;&gt;Plotly.js&lt;/a&gt;!
        Feel free to filter the various series (single click to hide, double click to show only that series) and click-and-drag to zoom in.
        Individual benchmark results are visible on hover.&lt;/p&gt;
        &lt;/blockquote&gt;
        &lt;p&gt;The above plot shows the median runtime in seconds for all tests.
        Due to the variety of uses for window functions, and their relative algorithmic complexity, the 16 window function tests require the most time of any category.&lt;/p&gt;
        &lt;div id=&quot;overall_results_by_time_relative&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;p&gt;This plot normalizes performance to the latest version of DuckDB to show relative improvements over time.
        If you look at the point in time when you most recently measured DuckDB performance, that number will show you how many times faster DuckDB is now!&lt;/p&gt;
        &lt;p&gt;A portion of the overall improvement is DuckDB&#39;s addition of multi-threading, which became the default in November 2021 with version 0.3.1.
        DuckDB also moved to a push-based execution model in that version for additional gains.
        Parallel data loading boosted performance in December 2022 with version 0.6.1, as did improvements to the core &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JOIN&lt;/code&gt; algorithm.
        We will explore other improvements in detail later in the post.&lt;/p&gt;
        &lt;p&gt;However, we see that all aspects of the system have seen improvements, not just raw query performance!
        DuckDB focuses on the entire data analysis workflow, not just aggregate or join performance.
        CSV parsing has seen significant gains, import and export have improved significantly, and window functions have improved the most of all.&lt;/p&gt;
        &lt;p&gt;What was the slight regression from December 2022 to June 2023?
        Window functions received additional capabilities and experienced a slight performance degradation in the process.
        However, from June 2023 onward we see substantial performance improvement across the board for window functions.
        If window functions are filtered out of the chart, we see a smoother trend.&lt;/p&gt;
        &lt;p&gt;You may also notice that starting with version 0.9 in September 2023, the performance appears to plateau.
        What is happening here?
        First, don&#39;t forget to zoom in!
        Over the last year, DuckDB has still improved over 3x!
        More recently, the DuckDB Labs team focused on scalability by developing algorithms that support larger-than-memory calculations.
        We will see the fruits of those labors in the scale section later on!
        In addition, DuckDB focused exclusively on bug fixes in versions 0.10.1, 0.10.2, and 0.10.3 in preparation for an especially robust DuckDB 1.0.
        Now that those two major milestones (larger than memory calculations and DuckDB 1.0) have been accomplished, performance improvements will resume!
        It is worth noting that the boost from moving to multi-threading will only occur once, but there are still many opportunities moving forward.&lt;/p&gt;
        &lt;h3 id=&quot;performance-by-version&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#performance-by-version&quot;&gt;Performance by Version&lt;/a&gt;
        &lt;/h3&gt;
        &lt;p&gt;We can also recreate the overall plot by version rather than by time.
        This demonstrates that DuckDB has been doing more frequent releases recently.
        See &lt;a href=&quot;https://duckdb.org/docs/dev/release_calendar.html&quot;&gt;DuckDB&#39;s release calendar&lt;/a&gt; for the full version history.&lt;/p&gt;
        &lt;div id=&quot;overall_results_by_version&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;p&gt;If you remember the version that you last tested, you can compare how much faster things are now with 1.0!&lt;/p&gt;
        &lt;div id=&quot;overall_results_by_version_relative&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;h2 id=&quot;results-by-workload&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#results-by-workload&quot;&gt;Results by Workload&lt;/a&gt;
        &lt;/h2&gt;
        &lt;h3 id=&quot;csv-reader&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#csv-reader&quot;&gt;CSV Reader&lt;/a&gt;
        &lt;/h3&gt;
        &lt;div id=&quot;perf_over_time_csv_reader_area&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;p&gt;DuckDB has invested substantially in building a &lt;a href=&quot;https://duckdb.org/2023/10/27/csv-sniffer.html&quot;&gt;fast and robust CSV parser&lt;/a&gt;.
        This is often the first task in a data analysis workload, and it tends to be undervalued and underbenchmarked.
        DuckDB has &lt;strong&gt;improved CSV reader performance by nearly 3x&lt;/strong&gt;, while adding the ability to handle many more CSV dialects automatically.&lt;/p&gt;
        &lt;h3 id=&quot;group-by&quot;&gt;
        &lt;a style=&quot;text-decoration: none;&quot; href=&quot;https://duckdb.org/2024/06/26/benchmarks-over-time.html#group-by&quot;&gt;Group By&lt;/a&gt;
        &lt;/h3&gt;
        &lt;div id=&quot;perf_over_time_group_by_area&quot; style=&quot;width:100%;height:400px;&quot;&gt;&lt;/div&gt;
        &lt;p&gt;Group by or aggregation operations are critical steps in OLAP workloads, and have therefore received substantial focus in DuckDB, &lt;strong&gt;improving over 12x in the last 3 years&lt;/strong&gt;.&lt;/p&gt;
        &lt;p&gt;In November 2021, version 0.3.1 enabled multithreaded aggregation by default, providing a significant speedup.&lt;/p&gt;
        &lt;p&gt;In December 2022, data loads into tables were parallelized with the release of version 0.6.1.
        This is another example of improving the entire data workflow, as this group by benchmark actually stressed the insertion performance substantially.

@TonyRL TonyRL merged commit ad13581 into DIYgod:master Jul 16, 2024
27 checks passed
wonktondI pushed a commit to wonktondI/RSSHub that referenced this pull request Jul 28, 2024
* feat(route): add route for DuckDB news

* fix(route): fix the missing of full content and author of DuckDB news
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Auto: Route Test Complete Auto route test has finished on given PR Route
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants