![Banner](images/banner.png)

# Data Frames

Documentation reference link: [Working with Data Frames](https://python-oracledb.readthedocs.io/en/latest/user_guide/dataframes.html)

<hr>

Setup for this notebook:

In [None]:
import oracledb

un = "pythondemo"
pw = "welcome"
cs = "localhost/orclpdb1"

connection = oracledb.connect(user=un, password=pw, dsn=cs)

cursor = connection.cursor()
try:
    cursor.execute("drop table mytab")
except:
    pass
cursor.execute("create table mytab (id number, data varchar2(1000))")

## Working with Data Frames

Python-oracledb can query directly to a data frame format, and can also insert data frames into Oracle Database. This can improve performance and reduce memory requirements when your application uses Python data frame libraries such as Apache PyArrow, Pandas, Polars, NumPy, Dask, PyTorch, or writes files in Apache Parquet format.

Python-oracledb has two methods for querying into a DataFrame:
- `Connection.fetch_df_all()` fetches all rows from a query
- `Connection.fetch_df_batches()` implements an iterator for fetching batches of rows

To fetch all table rows into a Pandas DataFrame:

In [None]:
import pyarrow

sql = "select id, name from SampleQueryTab where id < :idbv order by id"
odf = connection.fetch_df_all(statement=sql, arraysize=100, parameters=[5])

# Get a Pandas DataFrame from the data
df = pyarrow.table(odf).to_pandas()

In [None]:
df

For larger tables you can adjust `arraysize` to optimize network performance.

You can alter the types and names by specifying a custom schema:

In [None]:
sql = "select id, name from SampleQueryTab where id < :idbv order by id"

schema = pyarrow.schema(
    [("COL_1", pyarrow.int16()), 
     ("COL_2", pyarrow.string())]
)

odf = connection.fetch_df_all(statement=sql, arraysize=100, parameters=[5], requested_schema=schema)
df = pyarrow.table(odf).to_pandas()

In [None]:
df

## Inserting DataFrames

DataFrames from popular libraries can be inserted directly into Oracle Database using `executemany()`:

In [None]:
import pandas

# Create a DataFrame manually
d = {'A': [1.2, 2.4, 8.9], 'B': ["Alex", "Bobbie", "Charlie"]}
pdf = pandas.DataFrame(data=d)

# Insert into the database
cursor.executemany("insert into mytab (id, data) values (:1, :2)", pdf)

# Verify rows
rows = cursor.execute("select * from mytab")
for r in rows:
    print(r)

connection.rollback()

You can also insert large DataFrames using **Direct Path Loading**:

In [None]:
SCHEMA_NAME = "pythondemo"
TABLE_NAME = "mytab"
COLUMN_NAMES = ["id", "data"]
DATA = pdf

connection.direct_path_load(
    schema_name=SCHEMA_NAME,
    table_name=TABLE_NAME,
    column_names=COLUMN_NAMES,
    data=DATA
)

Verify inserted data

In [None]:
for row in cursor.execute('select * from mytab'):
    print(row)

# Remove the data so the sample can be re-run cleanly
cursor.execute("truncate table mytab")
connection.commit()    