Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .htaccess
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@

# redirect urls to the output directory
# See https://issues.apache.org/jira/browse/INFRA-27512
RewriteEngine On

# Two-step flow:
# 1) Normalize extensionless URLs to a trailing slash.
# 2) Internally rewrite all non-output requests into output/.

# Redirect extensionless, non-output URLs to the trailing-slash form
# (example: /blog/2025/12/15/post -> /blog/2025/12/15/post/).
RewriteCond %{ENV:REDIRECT_STATUS} ^$
# Use THE_REQUEST so the original URL is honored even if server rewrites to /output/.
RewriteCond %{THE_REQUEST} \s+(/[^\s?]+) [NC]
# Skip URLs already targeting output (example: /blog/output/...).
RewriteCond %1 !/output/
# Skip URLs that already end with a slash (example: /blog/2025/12/15/post/).
RewriteCond %1 !/$
# Skip URLs that look like files (have an extension) (example: /blog/theme/css/style.css).
RewriteCond %1 !\.[^./]+$
RewriteRule ^.*$ %1/ [R=301,L]

# Rewrite all non-output requests to the Pelican output/ directory
# (example: /blog/2025/12/15/post/ -> /blog/output/2025/12/15/post/).
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_URI} !/output/
RewriteRule ^(.*)$ output/$1 [L]

# CSP permissions for datafusion.apache.org
# Adding 3rd party service Giscus – Enable Giscus comments for Datafusion blog posts.
# Approved by VP Data Privacy on 2026-05-09.
Expand Down
217 changes: 217 additions & 0 deletions output/2019/02/04/datafusion-donation/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
<!doctype html>
<html class="no-js" lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>DataFusion: A Rust-native Query Engine for Apache Arrow - Apache DataFusion Blog</title>
<link href="/blog/favicon.ico" rel="icon" type="image/x-icon">
<link href="/blog/favicon.ico" rel="shortcut icon" type="image/x-icon">
<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
<link href="/blog/css/headerlink.css" rel="stylesheet">
<link href="/blog/highlight/default.min.css" rel="stylesheet">
<link href="/blog/css/app.css" rel="stylesheet">
<link href="/blog/css/dark-mode.css" rel="stylesheet">
<script src="/blog/js/dark-mode.js"></script>
<script src="/blog/highlight/highlight.js"></script>
<script>hljs.highlightAll();</script> </head>
<body class="d-flex flex-column h-100">
<main class="flex-shrink-0">
<!-- nav bar -->
<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth navbar example">
<div class="container-fluid">
<a class="navbar-brand" href="/blog/"><img src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache DataFusion Blog</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>

<div class="collapse navbar-collapse" id="navbarADP">
<ul class="navbar-nav me-auto mb-2 mb-lg-0">
<li class="nav-item">
<a class="nav-link" href="/blog/about.html">About</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/blog/feed.xml">RSS</a>
</li>
</ul>
<button id="dark-mode-toggle" type="button" class="dark-mode-toggle" aria-label="Toggle dark mode" aria-pressed="false" title="Toggle dark mode">
<span class="sun-icon" aria-hidden="true">☀</span>
<span class="moon-icon" aria-hidden="true">☾</span>
</button>
</div>
</div>
</nav>
<!-- article contents -->
<div id="contents">
<div class="bg-white p-4 p-md-5 rounded">
<div class="row justify-content-center">
<div class="col-12 col-md-8 main-content">
<h1>
DataFusion: A Rust-native Query Engine for Apache Arrow
</h1>
<p>Posted on: Mon 04 February 2019 by agrove</p>

<aside class="toc-container d-md-none mb-2">
<div class="toc"><span class="toctitle">Contents</span><ul>
<li><a href="#example">Example</a></li>
<li><a href="#roadmap">Roadmap</a></li>
<li><a href="#contributors-welcome">Contributors Welcome!</a></li>
</ul>
</div>
</aside>

<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->

<p>We are excited to announce that <a href="https://github.com/apache/arrow-datafusion">DataFusion</a> has been donated to the Apache Arrow project. DataFusion is an in-memory query engine for the Rust implementation of Apache Arrow.</p>
<p>Although DataFusion was started two years ago, it was recently re-implemented to be Arrow-native and currently has limited capabilities but does support SQL queries against iterators of RecordBatch and has support for CSV files. There are plans to <a href="https://issues.apache.org/jira/browse/ARROW-4466">add support for Parquet files</a>.</p>
<p>SQL support is limited to projection (<code>SELECT</code>), selection (<code>WHERE</code>), and simple aggregates (<code>MIN</code>, <code>MAX</code>, <code>SUM</code>) with an optional <code>GROUP BY</code> clause.</p>
<p>Supported expressions are identifiers, literals, simple math operations (<code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>), binary expressions (<code>AND</code>, <code>OR</code>), equality and comparison operators (<code>=</code>, <code>!=</code>, <code>&lt;</code>, <code>&lt;=</code>, <code>&gt;=</code>, <code>&gt;</code>), and <code>CAST(expr AS type)</code>.</p>
<h2 id="example">Example<a class="headerlink" href="#example" title="Permanent link">¶</a></h2>
<p>The following example demonstrates running a simple aggregate SQL query against a CSV file.</p>
<pre><code class="language-rust">// create execution context
let mut ctx = ExecutionContext::new();

// define schema for data source (csv file)
let schema = Arc::new(Schema::new(vec![
Field::new("c1", DataType::Utf8, false),
Field::new("c2", DataType::UInt32, false),
Field::new("c3", DataType::Int8, false),
Field::new("c4", DataType::Int16, false),
Field::new("c5", DataType::Int32, false),
Field::new("c6", DataType::Int64, false),
Field::new("c7", DataType::UInt8, false),
Field::new("c8", DataType::UInt16, false),
Field::new("c9", DataType::UInt32, false),
Field::new("c10", DataType::UInt64, false),
Field::new("c11", DataType::Float32, false),
Field::new("c12", DataType::Float64, false),
Field::new("c13", DataType::Utf8, false),
]));

// register csv file with the execution context
let csv_datasource =
CsvDataSource::new("test/data/aggregate_test_100.csv", schema.clone(), 1024);
ctx.register_datasource("aggregate_test_100", Rc::new(RefCell::new(csv_datasource)));

let sql = "SELECT c1, MIN(c12), MAX(c12) FROM aggregate_test_100 WHERE c11 &gt; 0.1 AND c11 &lt; 0.9 GROUP BY c1";

// execute the query
let relation = ctx.sql(&amp;sql).unwrap();
let mut results = relation.borrow_mut();

// iterate over the results
while let Some(batch) = results.next().unwrap() {
println!(
"RecordBatch has {} rows and {} columns",
batch.num_rows(),
batch.num_columns()
);

let c1 = batch
.column(0)
.as_any()
.downcast_ref::&lt;BinaryArray&gt;()
.unwrap();

let min = batch
.column(1)
.as_any()
.downcast_ref::&lt;Float64Array&gt;()
.unwrap();

let max = batch
.column(2)
.as_any()
.downcast_ref::&lt;Float64Array&gt;()
.unwrap();

for i in 0..batch.num_rows() {
let c1_value: String = String::from_utf8(c1.value(i).to_vec()).unwrap();
println!("{}, Min: {}, Max: {}", c1_value, min.value(i), max.value(i),);
}
}
</code></pre>
<h2 id="roadmap">Roadmap<a class="headerlink" href="#roadmap" title="Permanent link">¶</a></h2>
<p>The roadmap for DataFusion will depend on interest from the Rust community, but here are some of the short term items that are planned:</p>
<ul>
<li>Extending test coverage of the existing functionality</li>
<li>Adding support for Parquet data sources</li>
<li>Implementing more SQL features such as <code>JOIN</code>, <code>ORDER BY</code> and <code>LIMIT</code></li>
<li>Implement a DataFrame API as an alternative to SQL</li>
<li>Adding support for partitioning and parallel query execution using Rust's async and await functionality</li>
<li>Creating a Docker image to make it easy to use DataFusion as a standalone query tool for interactive and batch queries</li>
</ul>
<h2 id="contributors-welcome">Contributors Welcome!<a class="headerlink" href="#contributors-welcome" title="Permanent link">¶</a></h2>
<p>If you are excited about being able to use Rust for data science and would like to contribute to this work then there are many ways to get involved. The simplest way to get started is to try out DataFusion against your own data sources and file bug reports for any issues that you find. You could also check out the current <a href="https://cwiki.apache.org/confluence/display/ARROW/Rust+JIRA+Dashboard">list of issues</a> and have a go at fixing one. You can also join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-user/">user mailing list</a> to ask questions.</p>

<!--
Comments Section
Loaded only after explicit visitor consent to comply with ASF policy.
-->

<div>
<hr>
<h3 id="comments">Comments<a class="headerlink" href="#comments" title="Permanent link">&para;</a></h3>

<!-- Local loader script -->
<script src="/blog/js/giscus-consent.js" defer></script>

<!-- Consent UI -->
<div id="giscus-consent">
<p>
We use <a href="https://giscus.app/">Giscus</a> for comments, powered by GitHub Discussions.
To respect your privacy, Giscus and comments will load only if you click "Show Comments"
</p>

<div class="consent-actions">
<button id="giscus-load" type="button">Show Comments</button>
<button id="giscus-revoke" type="button" hidden>Hide Comments</button>
</div>

<noscript>JavaScript is required to load comments from Giscus.</noscript>
</div>

<!-- Container where Giscus will render -->
<div id="comment-thread"></div>
</div> </div>
<aside class="toc-container d-none d-md-block col-md-4 col-xl-3 ms-xl-2">
<div class="toc"><span class="toctitle">Contents</span><ul>
<li><a href="#example">Example</a></li>
<li><a href="#roadmap">Roadmap</a></li>
<li><a href="#contributors-welcome">Contributors Welcome!</a></li>
</ul>
</div>
</aside>
</div>
</div>
</div>
<!-- footer -->
<div class="row g-0">
<div class="col-12">
<p style="font-style: italic; font-size: 0.8rem; text-align: center;">
Copyright 2026, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/>
Apache&reg; and the Apache feather logo are trademarks of The Apache Software Foundation.
</p>
</div>
</div>
<script src="/blog/js/bootstrap.bundle.min.js"></script> </main>
</body>
</html>
Loading