[Epic] Prepared Statement Support

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** 
[Prepared Statements](https://en.wikipedia.org/wiki/Prepared_statement) are widely used by SQL clients when issuing queries to a database. Major use cases include improved transaction processing latency as well as preventing[ SQL injection attacks](https://xkcd.com/327/) (parameterized query arguments are often implemented as a feature of prepared statements). 

Supporting prepared statements will increase the number of client applications that can work with DataFusion.

**Task List**
- [x] https://github.com/apache/arrow-datafusion/pull/4490 (support `PREPARE` for `SELECT`)
- [x] Support `EXECUTE` for prepared statements #13242
- [x] Support PREPARE statements without explicit parameters #13632
- [x] https://github.com/apache/arrow-datafusion/issues/4549
- [x] https://github.com/apache/arrow-datafusion/issues/4550


**Background** 
Here is a schematic of how prepared statements work:

```
                                                 (               )
╔═══════════╗      SELECT *                      │`─────────────'│
║           ║      FROM foo                      │               │
║  Client   ║      WHERE id = $1                 │    Database   │
║           ║━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━▶│     Server    │
║           ║                                    │               │
╚═══════════╝                                    │.─────────────.│
                                                 (               )
 Step 1: Client send parameterized query to       `─────────────' 
 server to "prepare"                                              
                                                                  
                                                                  
                  HANDLE: 0x.....                                 
                  SCHEMA: (VARCHAR, INT, FLOAT)                   
                  PARAMS: {$1: INT}                               
           ◀━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                   
 Step 2: Server prepares to run query and sends back opaque       
 "handle", result schema, and needed bind parameters to client    
                                                                  
                                                                  
                                                                  
                  HANDLE: 0x.....                                 
                  PARAMS: { $1 = 12345 }                          
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━▶                   
                                                                  
 Step 3: Client returns the handle and values of "bind"           
 parameters to the server                                         
                                                                                                                                    
                                                                  
                                                                  
            Results: [('Hi', 12345, 5423.13)]                     
           ◀━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                   
                                                                  
Step 4: Server fills in bind parameters, and returns results      
as if the entire query had been supplied                          
```

Steps:
1.  The client sends two messages to the server. One to prepare the statement leaving placeholders called bind parameters. 
2. The server responds with a handle for the client to identify the prepared query, the result schema, and needed parameters. 
3. The client sends a second message with the handle and bind parameter values. 
4. The server fills in the parameter values, executes the query and returns the results. It is typically possible to execute the same prepared statement multiple times using different bind parameters with a single additional message each. 
 
Some protocols (like the postgres FrontEnd - BackEnd , FEBE, protocol) allow optionally sending both messages in the same transmission to avoid a network round trip. 

**Describe the solution you'd like**
We would like:
1. Support for `PREPARE` statements. 
2. Support for `EXECUTE` statements PreparedStatements with bind parameters.

Both PREPARE and EXECUTE should offer a basic implementation in SessionContext and the ability to extend by other systems (similar to CREATE VIEW and CREATE TABLE)



cc @NGA-TRAN 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Epic] Prepared Statement Support #4539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Epic] Prepared Statement Support #4539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions