Conversion of Pandas to Numpy

1. convert between polars and Numpy
2. convert between polars and Pandas

Key functionalities in this in this notebook:
- Your pandas version requires version is 2.0+(automated testing is carried out with the latest version of Pandas on PyPi)

Use pl.show_versions() to check the version of polars you are using.




In [3]:
import polars as pl 
import numpy as np
import pandas as pd

In [7]:
csv_file= './Files/Sample_Superstore.csv'

In [8]:
df = pl.read_csv(csv_file)

In [9]:
df.head(5)

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,,,"""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""08-11-2016""","""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""12-06-2016""",,,"""DV-13045""","""Darrin Van Huff""","""Corporate""",,"""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
4,,"""11-10-2015""",,"""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031
5,"""US-2015-108966""","""11-10-2015""","""18-10-2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""",,"""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164


Convert a DataFrame to Numpy

To convert a DataFrame to a Numpy array, you can use the `to_numpy` method. This will return a Numpy array representation of the DataFrame and this clones(copies) the data.


In [10]:
arr = df.to_numpy()

In [20]:
arr

array([[1, None, None, ..., 2, 0.0, 41.9136],
       [2, 'CA-2016-152156', '08-11-2016', ..., 3, 0.0, 219.582],
       [3, 'CA-2016-138688', '12-06-2016', ..., 2, 0.0, 6.8714],
       ...,
       [9992, 'CA-2017-121258', '26-02-2017', ..., 2, 0.2, 19.3932],
       [9993, 'CA-2017-121258', '26-02-2017', ..., 4, 0.0, 13.32],
       [9994, 'CA-2017-119914', '04-05-2017', ..., 2, 0.0, 72.948]],
      shape=(9994, 21), dtype=object)

This conversion turns each row into a Numpy "ndarray" and vertically stacks these row-arrays.

As the Dataframe has a mix of data types, the Numpy array has the corresponding dtype.

In this example, we use select to chose the 64-bit floating point columns only for conversion to Numpy..

     - We cover 'select' in more detail in the Section on Selecting columns and transforming DataFrames.
     

In [32]:
float_array = (
    df.select(pl.col(pl.Float64)).to_numpy()
)

In [33]:
type(float_array)

numpy.ndarray

In [24]:
float_array.dtype

dtype('float64')

The Polars sequence dtypes pl.list and pl.Array are common way to store sequences that might be passed to Numpy.

Convert Numpy to a DataFrame

We can create a Polars DataFrame from a Numpy array using the `from_numpy` method. This will create a DataFrame with the same data as the Numpy array.

In [34]:
data_list = float_array.tolist()

df = pl.DataFrame(data_list)

In [35]:
df

column_0,column_1,column_2
f64,f64,f64
261.96,0.0,41.9136
731.94,0.0,219.582
14.62,0.0,6.8714
957.5775,0.45,-383.031
22.368,0.2,2.5164
…,…,…
25.248,0.2,4.1028
91.96,0.0,15.6332
258.576,0.2,19.3932
29.6,0.0,13.32


In [27]:
type(df)

polars.dataframe.frame.DataFrame

In [39]:
df = df.rename({"column_2": "Profit"})

Convert a `Series` to `Numpy`

Converting a `Series` to `Numpy` has more options that converting an entire DataFrame. You can use the `to_numpy` method to convert a Series to a Numpy array. This will return a Numpy array representation of the Series and this clones(copies) the data.

In [42]:
(
    df['Profit']
    .head()
    .to_numpy()
)

array([  41.9136,  219.582 ,    6.8714, -383.031 ,    2.5164,   14.1694,
          1.9656,   90.7152,    5.7825,   34.47  ])

And here we get the same output as above, but with the column renamed to "Profit".

Convert a `Series` to a Numpy with Zero-copy

in some cases we can convert a Series to a Numpy array without copying(Zero-copy). 

Zero-copy is only posssible if there is no null or NaN values such as in the `Survived` column. If we want to ensure that the conversion to Numpy happens with Zero-copy and raise an `Exception` if a copy is needed - we use the allow_copy=False argument.

```python
(
    df['Survived']
    .head()
    .to_numpy(allow_copy=False)
) 

```python

In [43]:
arr = (
    df['Profit']
    .head()
    .to_numpy(allow_copy=False)
)

arr

array([  41.9136,  219.582 ,    6.8714, -383.031 ,    2.5164,   14.1694,
          1.9656,   90.7152,    5.7825,   34.47  ])

With Zero-copy conversion the numpy array os read-only so we cannot change the values in the Numpy array. If we try to change the values, we will get a `ValueError`.

Numerical dtypes and precision
Introduction


Polars provides efficient handling of numerical data types to optimize performance and memory usage. Understanding numerical data types (dtypes) and precision is important when working with large datasets.

In this lecture, we will cover:
✅ Different numerical data types in Polars
✅ Controlling precision for floating-point numbers
✅ Changing numerical data types



Step 1: Checking Numerical Data Types in Polars
Let's create a sample DataFrame and inspect its data types:



```python
 
import polars as pl
 
df = pl.DataFrame({
    "integer_column": [1, 2, 3, 4, 5],  # Default: Int64
    "float_column": [1.1, 2.2, 3.3, 4.4, 5.5]  # Default: Float64
})
 
print("DataFrame:")
print(df)
 
print("\nData Types:")
print(df.schema)


Output:
shape: (5, 2)
┌───────────────┬─────────────┐
│ integer_column│ float_column│
├───────────────┼─────────────┤
│ 1             │ 1.1         │
│ 2             │ 2.2         │
│ 3             │ 3.3         │
│ 4             │ 4.4         │
│ 5             │ 5.5         │
└───────────────┴─────────────┘
 
{'integer_column': Int64, 'float_column': Float64}

```


Step 2: Numerical Data Types in Polars
Polars supports several integer and floating-point types:



Integer Types


Data TypeDescriptionInt88-bit integer (-128 to 127)Int1616-bit integer (-32,768 to 32,767)Int3232-bit integer (-2.1 billion to 2.1 billion)Int6464-bit integer (default)



Unsigned Integer Types


Data TypeDescriptionUInt88-bit unsigned integer (0 to 255)UInt1616-bit unsigned integer (0 to 65,535)UInt3232-bit unsigned integerUInt6464-bit unsigned integer



Floating-Point Types


Data TypeDescriptionFloat3232-bit floating point (single precision)Float6464-bit floating point (double precision, default)



Step 3: Changing Numerical Data Types
You can convert data types using .cast():



```python
 
df = df.with_columns([
    df["integer_column"].cast(pl.Int32),
    df["float_column"].cast(pl.Float32)
])
 
print("\nUpdated Data Types:")
print(df.schema)


Output:



{'integer_column': Int32, 'float_column': Float32}

```


Step 4: Controlling Floating-Point Precision


By default, Polars uses Float64, but you can reduce precision to save memory:



```python
 
df = df.with_columns(
    df["float_column"].cast(pl.Float32)
)
 
print("\nUpdated Float Column Type:")
print(df.schema)

```


If you need higher precision, keep it as Float64.



Step 5: Handling Large Numbers and Overflow
For very large numbers, use Int64 or UInt64:



```python
 
df_large = pl.DataFrame({
    "big_number": [10**12, 10**13, 10**14]  # Large integers
}).with_columns(pl.col("big_number").cast(pl.Int64))
 
print("\nLarge Number Data Types:")
print(df_large.schema)
```


⚠️ Beware of Overflow:
If you store large values in Int8 or Int16, you might get unexpected results.



Step 6: Avoiding Precision Loss in Calculations
```
python
 
df = df.with_columns(
    (df["float_column"] * 3.14159).alias("pi_multiplication")
)
 
print("\nResult with Precision:")
print(df)
```


If exact precision is needed, use Float64.



Conclusion


✅ Polars provides multiple integer and floating-point types
✅ Use .cast() to change numerical types
✅ Reduce precision (Float32) for memory efficiency or keep Float64 for accuracy
✅ Be mindful of integer overflows



Now you can manage numerical data types efficiently in Polars! 🚀