# Conversion to & from Pandas and Numpy
By the end of this lecture you will be able to:
- convert between a `DataFrame` and Numpy
- convert a `Series` to `Numpy` with zero-copy
- convert between a `DataFrame` and Polars

In [None]:
import polars as pl
import numpy as np
import pandas as pd

In [None]:
csvFile = "../data/titanic.csv"

In [None]:
df = pl.read_csv(csvFile)
df.head(3)

## Convert a `DataFrame` to Numpy

To convert a `DataFrame` to Numpy use the `to_numpy` method. This clones (copies) the data.

In [None]:
arr = df.to_numpy()
arr

This conversion turns each row into a Numpy `ndarray` and vertically stacks these row-arrays.

As the `DataFrame` has a mix of types the Numpy array has an `object` dtype.

If the columns have uniform numeric dtype then the Numpy array has the corresponding dtype.

In this example we use `select` to choose the 64-bit floating point columns only for conversion to Numpy. We cover this in more detail in the Section on Selecting columns and transforming dataframes.

In [None]:
(
    df
    .select(
        pl.col(pl.Float64)
    )
    .to_numpy()
    .dtype
)

Typically it is better to do the conversion to `Numpy` at the last moment as data processing in `Polars` is often faster and more memory efficient.

## Convert Numpy to a `DataFrame`

We can create a Polars `DataFrame` from a Numpy array

In [None]:
rand_array = np.random.standard_normal((5,3))
pl.DataFrame(rand_array)

We can optionally pass a list of column names if we want to specify these

## Convert a `Series` to Numpy
Converting a `Series` to Numpy has more options than converting an entire `DataFrame`.

To do a simple conversion where the data is cloned use `to_numpy` on the `Series`

In [None]:
df['Age'].head().to_numpy()

### Convert a `Series` to Numpy with zero-copy
In some cases we can convert a `Series` to Numpy without copying ("zero-copy"). 

Zero-copy is possible if there are no `null` or `NaN` values.

In [None]:
arr = df['Survived'].head().to_numpy(zero_copy_only=True)
arr

With zero-copy conversion the Numpy array is read-only so you cannot change the values in the Numpy array.

So the following effort to change a value raises an `Exception`

In [None]:
arr = df['Survived'].head().to_numpy(zero_copy_only=True)
arr[0] = 100

## Convert a `DataFrame` to Pandas

To convert a `DataFrame` to Pandas use the `to_pandas` method. This clones the data.

To do this conversion you must have `PyArrow` installed with `pip` or `conda`.

In [None]:
df.to_pandas().head()

Warning - at present you can call `pd.DataFrame` on a Polars `DataFrame` but the result is:
- transposed
- has lost the column names

In [None]:
pd.DataFrame(df).head()

Hopefully this conversion will be easier when both libraries have adopted the [dataframe interchange protocol](https://data-apis.org/dataframe-protocol/latest/index.html).

There are some issues when converting an Arrow Table to or from a Pandas DataFrame such as differences in types. These issues are not Polars-specific but occur for any conversion from Arrow to Pandas. These are detailed here: https://arrow.apache.org/docs/python/pandas.html

You can convert from Pandas to Polars by calling `pl.DataFrame`

In [None]:
pl.DataFrame(df.to_pandas()).head(3)

Or by calling `pl.from_pandas`

In [None]:
pl.from_pandas(df.to_pandas()).head(3)

## Convert a `Series` to Pandas

In [None]:
df['Age'].to_pandas().head()

## Exercises

No exercises for this lecture!