# Introduction

Snowflake's support for Java through Snowpark enables developers to write rich, flexible data processing logic directly within the data platform. This notebook demonstrates how to leverage Snowflake’s Java UDFs and stored procedures to build scalable, reusable, and efficient data workflows. By combining Snowflake's compute engine with Java's maturity and Snowpark's powerful APIs, developers can encapsulate business logic, perform asynchronous processing, and work with structured or unstructured data—all inside Snowflake.

Throughout this notebook, we explore key concepts including the creation and execution of Java-based stored procedures and UDFs, how to read static and dynamic files using Snowflake stages, and how to handle asynchronous operations to optimize performance. Practical examples help illustrate the power of SnowflakeFile, InputStream, and DataFrame integrations for real-time data handling and processing scenarios.


![Java UDF Calling Flow](https://docs.snowflake.com/en/_images/UDF_Java_Calling_03a.png)


## Step 1: Creating a Stage and Uploading Files in Snowflake

### Create a Stage:
1. **Sign in** to Snowsight.
2. Select **Create » Stage » Snowflake Managed**.
3. Enter **Stage Name** and select the **database/schema**.
4. Optionally, **deselect Directory table** to avoid warehouse costs.
5. Choose **Encryption** (cannot be changed later).

### Upload Files:
1. **Sign in** to Snowsight.
2. Select **Data » Add Data » Load files into a Stage**.
3. Choose files to upload.
4. Select **database/schema** and **stage**.
5. Optionally, create a **path**.
6. Click **Upload**.

In [None]:
--list the staged file(s)
ls @sales_data_stage;

## Step 2: Stored Procedures in Snowflake for Java Developers

Stored procedures in Snowflake allow Java developers to automate and simplify database tasks by writing procedural logic with Java handlers. These procedures can be used to execute dynamic database operations, encapsulate complex logic, and manage privileges securely. Java can be used as the handler language, with code either in-line or staged, and procedures can return single values or tables. Developers can use Snowpark for Java to create, manage, and deploy procedures, while also utilizing features like temporary procedures, logging, and external network access. Security and data protection practices should be followed, especially when deciding between caller's or owner's rights for execution.


### Step 2.1: Writing Java Handlers for Snowflake Stored Procedures

To write a Java handler for a Snowflake stored procedure, developers use the Snowpark API to interact with Snowflake tables and data pipelines. The handler code can be deployed in-line with the procedure or as compiled classes stored on a Snowflake stage. The Java method must include a Snowpark Session object as the first argument and return a value (e.g., String or tabular data). Developers need to ensure thread-safety, handle exceptions, and optimize performance to avoid memory limits. It's crucial to consider whether the procedure will run with caller's or owner's rights and manage dependencies by uploading necessary JAR files or resource files to Snowflake. Asynchronous child jobs must be carefully handled, as they can be canceled when the parent procedure completes. Snowflake also supports logging and tracing for monitoring execution, which is vital for debugging and performance tracking.

### Step 2.2: Reading a Dynamically-Specified File with SnowflakeFile

The following example demonstrates how to read a dynamically-specified file using the `SnowflakeFile` class. The `execute` handler function takes a `String` as input and returns a `String` containing the file's contents. During execution, Snowflake initializes the handler's `fileName` variable with the incoming file path from the procedure's input variable. The handler code then uses a `SnowflakeFile` instance to read the specified file.


In [None]:
CREATE OR REPLACE PROCEDURE file_reader_java_proc_snowflakefile(input VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVA
RUNTIME_VERSION = 11
HANDLER = 'FileReader.execute'
PACKAGES=('com.snowflake:snowpark:latest')
AS $$
import java.io.InputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import com.snowflake.snowpark_java.types.SnowflakeFile;
import com.snowflake.snowpark_java.Session;

class FileReader {
  public String execute(Session session, String fileName) throws IOException {
    InputStream input = SnowflakeFile.newInstance(fileName).getInputStream();
    return new String(input.readAllBytes(), StandardCharsets.UTF_8);
  }
}
$$;
CALL file_reader_java_proc_snowflakefile(BUILD_SCOPED_FILE_URL('@sales_data_stage', '/car_sales.json'));


### Step 2.3: Reading a Dynamically-Specified File with InputStream

The following example demonstrates how to read a dynamically-specified file using `InputStream`. The `execute` handler function takes an `InputStream` as input and returns a `String` containing the file's contents. During execution, Snowflake initializes the handler's `stream` variable with the incoming file path from the procedure's input argument. The handler code then uses the `InputStream` to read the specified file.


In [None]:
CREATE OR REPLACE PROCEDURE file_reader_java_proc_input(input VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVA
RUNTIME_VERSION = 11
HANDLER = 'FileReader.execute'
PACKAGES=('com.snowflake:snowpark:latest')
AS $$
import java.io.InputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import com.snowflake.snowpark.Session;

class FileReader {
  public String execute(Session session, InputStream stream) throws IOException {
    String contents = new String(stream.readAllBytes(), StandardCharsets.UTF_8);
    return contents;
  }
}
$$;
CALL file_reader_java_proc_input(BUILD_SCOPED_FILE_URL('@sales_data_stage', '/car_sales.json'));


### Step 2.4: Returning Tabular Data from a Java Stored Procedure

You can write a stored procedure that returns data in tabular form by following these steps:

1. Specify `TABLE(...)` as the procedure's return type in your `CREATE PROCEDURE` statement.
  
2. When defining the procedure, you can specify the returned data's column names and types as `TABLE` parameters if you know them in advance. If the column names are not known at definition time, such as when they are specified at runtime, you can omit the `TABLE` parameters. 
3. Implement the handler to return the tabular result as a Snowpark DataFrame.

For more information about working with DataFrames, refer to the *Working with DataFrames in Snowpark Java* documentation.


In [None]:
CREATE OR REPLACE TABLE employees(id NUMBER, name VARCHAR, role VARCHAR);
INSERT INTO employees (id, name, role) VALUES (1, 'Alice', 'op'), (2, 'Bob', 'dev'), (3, 'Cindy', 'dev');

CREATE OR REPLACE PROCEDURE filter_by_role(table_name VARCHAR, role VARCHAR)
RETURNS TABLE(id NUMBER, name VARCHAR, role VARCHAR)
LANGUAGE JAVA
RUNTIME_VERSION = '11'
PACKAGES = ('com.snowflake:snowpark:latest')
HANDLER = 'Filter.filterByRole'
AS
$$
import com.snowflake.snowpark_java.*;

public class Filter {
  public DataFrame filterByRole(Session session, String tableName, String role) {
    DataFrame table = session.table(tableName);
    DataFrame filteredRows = table.filter(Functions.col("role").equal_to(Functions.lit(role)));
    return filteredRows;
  }
}
$$;

CALL filter_by_role('employees', 'dev');

### Step 2.5: Introduction to Asynchronous Processing in Snowflake Stored Procedures

This example introduces how to leverage Snowpark APIs for asynchronous processing within a Snowflake stored procedure. The `getResultJDBC()` procedure, written in Java, demonstrates executing an asynchronous query using the `executeAsyncQuery()` method. In this case, it calls `SYSTEM$WAIT(10)` to pause the process for 10 seconds, allowing other operations to continue without blocking the execution. This approach highlights how Snowflake's Snowpark framework enables non-blocking, scalable operations, making it ideal for handling long-running tasks efficiently within Snowflake's data warehouse environment.


In [None]:
CREATE OR REPLACE PROCEDURE getResultJDBC()
RETURNS VARCHAR
LANGUAGE JAVA
RUNTIME_VERSION = 11
PACKAGES = ('com.snowflake:snowpark:latest')
HANDLER = 'TestJavaSP.asyncBasic'
AS
$$
import java.sql.*;
import net.snowflake.client.jdbc.*;

class TestJavaSP {
  public String asyncBasic(com.snowflake.snowpark.Session session) throws Exception {
    Connection connection = session.jdbcConnection();
    SnowflakeStatement stmt = (SnowflakeStatement)connection.createStatement();
    ResultSet resultSet = stmt.executeAsyncQuery("CALL SYSTEM$WAIT(10)");
    resultSet.next();
    return resultSet.getString(1);
  }
}
$$;

## Step 3: User-Defined Functions (UDFs)

User-defined functions (UDFs) allow you to extend Snowflake’s built-in functions by creating custom operations. UDFs are reusable, always return a value, and are ideal for performing calculations. You can write a UDF’s logic in a supported language, then create and execute it using Snowflake’s tools. UDFs can be used to encapsulate standard calculations or extend existing functions, and they are called in the same way as built-in functions. While similar to stored procedures, UDFs differ in key ways. For more details, see *Choosing whether to write a stored procedure or a user-defined function*.


### Step 3.1: Passing via an ARRAY
This code creates a Snowflake table that stores arrays of strings, inserts three rows with increasingly longer arrays (e.g., `['Hello']`, `['Hello', 'Jay']`, etc.), and defines a Java user-defined function (UDF) that takes an array of strings and concatenates them into a single space-separated string. The final query applies this function to each row, resulting in output like "Hello", "Hello Jay", and "Hello Jay Smith".



In [None]:
CREATE OR REPLACE TABLE string_array_table(id INTEGER, a ARRAY);
INSERT INTO string_array_table (id, a) SELECT
        1, ARRAY_CONSTRUCT('Hello');
INSERT INTO string_array_table (id, a) SELECT
        2, ARRAY_CONSTRUCT('Hello', 'Jay');
INSERT INTO string_array_table (id, a) SELECT
        3, ARRAY_CONSTRUCT('Hello', 'Jay', 'Smith');

CREATE OR REPLACE FUNCTION concat_varchar_2(a ARRAY)
  RETURNS VARCHAR
  LANGUAGE JAVA
  HANDLER = 'TestFunc_2.concatVarchar2'
  TARGET_PATH = '@~/TestFunc_2.jar'
  AS
  $$
  class TestFunc_2 {
      public static String concatVarchar2(String[] strings) {
          return String.join(" ", strings);
      }
  }
  $$;
SELECT concat_varchar_2(a)
  FROM string_array_table
  ORDER BY id;



### Step 3.2: Understanding Java UDF Parallelization

Snowflake improves performance by parallelizing UDF execution both across and within JVMs.

- **Across JVMs**: Snowflake parallelizes work across warehouse workers, with each worker running one or more JVMs. There is no global shared state, and state can only be shared within a single JVM.

- **Within JVMs**: Each JVM can execute multiple threads, allowing parallel calls to the same handler method. Therefore, the handler method must be thread-safe.

If a UDF is **IMMUTABLE**, it will return the same value for each call with the same arguments on the same row. For example, calling an IMMUTABLE UDF multiple times with the same arguments will return the same result for each row.


In [None]:
/*
Create a Jar file with the following Class
class MyClass {

  private double x;

  // Constructor
  public MyClass()  {
    x = Math.random();
  }

  // Handler
  public double myHandler() {
    return x;
  }
}
*/
CREATE FUNCTION my_java_udf_1()
  RETURNS DOUBLE
  LANGUAGE JAVA
  IMPORTS = ('@sales_data_stage/HelloRandom.jar')
  HANDLER = 'MyClass.myHandler';

CREATE FUNCTION my_java_udf_2()
  RETURNS DOUBLE
  LANGUAGE JAVA
  IMPORTS = ('@sales_data_stage/HelloRandom.jar')
  HANDLER = 'MyClass.myHandler';

  SELECT
    my_java_udf_1(),
    my_java_udf_2()
  FROM table1;

### Step 3.3: Creating and Calling a Simple In-Line Java UDF

The following example demonstrates creating and calling a simple in-line Java UDF that returns the `VARCHAR` passed to it. 

This function is declared with the optional `CALLED ON NULL INPUT` clause, which ensures the function is called even if the input value is NULL. While this function would return NULL with or without the clause, you could modify the code to handle NULL differently, such as returning an empty string.

In [None]:
CREATE OR REPLACE FUNCTION echo_varchar(x VARCHAR)
  RETURNS VARCHAR
  LANGUAGE JAVA
  CALLED ON NULL INPUT
  HANDLER = 'TestFunc.echoVarchar'
  TARGET_PATH = '@~/testfunc.jar'
  AS
  'class TestFunc {
    public static String echoVarchar(String x) {
      return x;
    }
  }';

  SELECT echo_varchar('Hello Java');


  

### Step 3.4: Passing an OBJECT to an In-Line Java UDF

The following example demonstrates using the SQL `OBJECT` data type and the corresponding Java `Map<String, String>` type to extract a value from the object. It also shows how to pass multiple parameters to a Java UDF.


In [None]:
CREATE OR REPLACE TABLE objectives (o OBJECT);
INSERT INTO objectives SELECT PARSE_JSON('{"outer_key" : {"inner_key" : "inner_value"} }');

CREATE OR REPLACE FUNCTION extract_from_object(x OBJECT, key VARCHAR)
  RETURNS VARIANT
  LANGUAGE JAVA
  HANDLER = 'VariantLibrary.extract'
  TARGET_PATH = '@~/VariantLibrary.jar'
  AS
  $$
  import java.util.Map;
  class VariantLibrary {
    public static String extract(Map<String, String> m, String key) {
      return m.get(key);
    }
  }
  $$;

  SELECT extract_from_object(o, 'outer_key'), 
       extract_from_object(o, 'outer_key')['inner_key'] FROM objectives;

### Step 3.5: Passing a GEOGRAPHY Value to an In-Line Java UDF

This example demonstrates how to pass a `GEOGRAPHY` value to an in-line Java UDF, enabling spatial data processing within the function.


In [None]:
CREATE OR REPLACE FUNCTION geography_equals(x GEOGRAPHY, y GEOGRAPHY)
  RETURNS BOOLEAN
  LANGUAGE JAVA
  PACKAGES = ('com.snowflake:snowpark:1.2.0')
  HANDLER = 'TestGeography.compute'
  AS
  $$
  import com.snowflake.snowpark_java.types.Geography;

  class TestGeography {
    public static boolean compute(Geography geo1, Geography geo2) {
      return geo1.equals(geo2);
    }
  }
  $$;

CREATE OR REPLACE TABLE geocache_table (id INTEGER, g1 GEOGRAPHY, g2 GEOGRAPHY);

INSERT INTO geocache_table (id, g1, g2)
  SELECT 1, TO_GEOGRAPHY('POINT(-122.35 37.55)'), TO_GEOGRAPHY('POINT(-122.35 37.55)');
INSERT INTO geocache_table (id, g1, g2)
  SELECT 2, TO_GEOGRAPHY('POINT(-122.35 37.55)'), TO_GEOGRAPHY('POINT(90.0 45.0)');

SELECT id, g1, g2, geography_equals(g1, g2) AS "EQUAL?"
  FROM geocache_table
  ORDER BY id;

### 3.6: Reading a File with a Java UDF

You can read a file's contents within a Java UDF handler to process unstructured data. The file must be on a Snowflake stage accessible to your handler. 

To read staged files, your handler can:

- **Statically-specified file**: Access a file by specifying its path in the `IMPORTS` clause, useful for initialization.
  
- **Dynamically-specified file**: Use `SnowflakeFile` or `InputStream` methods to read a file specified at runtime by the caller.

`SnowflakeFile` provides additional features compared to `InputStream`.

In [None]:
CREATE OR REPLACE FUNCTION content(file STRING)
  RETURNS INTEGER
  LANGUAGE JAVA
  HANDLER = 'Sales.content'
  TARGET_PATH = '@sales_data_stage/sales_functions23.jar'
  AS
  $$
  import java.io.InputStream;
  import java.io.IOException;
  import java.nio.charset.StandardCharsets;
  import com.snowflake.snowpark_java.types.SnowflakeFile;

  public class Sales {

    public static String content(String filePath) throws IOException {

      SnowflakeFile file = SnowflakeFile.newInstance(filePath);
      InputStream stream = file.getInputStream();
      String contents = new String(stream.readAllBytes(), StandardCharsets.UTF_8);
      return contents;
    }
  }
  $$;

SELECT content(BUILD_SCOPED_FILE_URL('@sales_data_stage', '/car_sales.json'));

## 🧠 Stored Procedures vs. UDFs: Know the Difference

Snowflake gives you two powerful ways to add custom logic: **Stored Procedures** and **User-Defined Functions**. Here’s a quick comparison:

| Feature           | Stored Procedure                                 | User-Defined Function (UDF)                          |
|-------------------|--------------------------------------------------|------------------------------------------------------|
| **Purpose**        | Perform admin or batch operations using SQL.     | Return a computed value, often used in queries.      |
| **Return Value**   | Optional — may return status or custom values.   | Required — must return a value explicitly.           |
| **SQL Integration**| Called as stand-alone SQL commands.              | Embedded inline in SQL (e.g., `SELECT MyFunc(col)`). |
| **Best For**       | DDL/DML, workflows, automation.                  | Transformations, expressions, calculations.          |

Additionally:

1- UDFs return a value; stored procedures need not
2- UDF return values are directly usable in SQL; stored procedure return values may not be
3- UDFs can be called in the context of another statement; stored procedures are called independently
4- Multiple UDFs may be called with one statement; a single stored procedure is called with one statement
5- UDFs may access the database with simple queries only; stored procedures can execute DDL and DML statements¶

# Final Thoughts
This notebook explored key techniques for building powerful Java-based solutions within Snowflake using Snowpark APIs. We covered creating and calling Java stored procedures and UDFs, performing asynchronous operations, handling unstructured data through file access, and returning tabular results using DataFrames. These tools allow you to extend Snowflake's capabilities with custom logic, parallelism, and integration with external data formats.

As you continue to develop with Java in Snowflake, consider how these features can help optimize your data workflows and unlock more complex processing scenarios. Whether you're encapsulating business logic, processing files at scale, or improving performance with parallel execution, Snowflake's support for Java gives you the flexibility to build scalable and maintainable solutions.

### Resources

- [Snowflake Java UDFs Documentation](https://docs.snowflake.com/en/developer-guide/udf/java/udf-java-introduction)
- [Creating Stored Procedures in Java](https://docs.snowflake.com/en/developer-guide/stored-procedure/java/procedure-java-overview)
- [Quickstarts](https://quickstarts.snowflake.com/)
