# Pandas support

This notebook provides a simple example of how to use Pint with Pandas. See the documentation for full details.

In [1]:
import pandas as pd 
import pint
import numpy as np

from pint.pandas_interface import PintArray

In [2]:
ureg=pint.UnitRegistry()
Q_=ureg.Quantity

## Basic example

This example shows how the DataFrame works with Pint. However, it's not the most usual case so we also show how to read from a csv below.

In [3]:
df = pd.DataFrame({
    "torque": PintArray(Q_([1, 2, 2, 3], "lbf ft")),
    "angular_velocity": PintArray(Q_([1000, 2000, 2000, 3000], "rpm"))
})
df

Unnamed: 0,torque,angular_velocity
0,1 foot * force_pound,1000 revolutions_per_minute
1,2 foot * force_pound,2000 revolutions_per_minute
2,2 foot * force_pound,2000 revolutions_per_minute
3,3 foot * force_pound,3000 revolutions_per_minute


In [4]:
df['power'] = df['torque'] * df['angular_velocity']
df

Unnamed: 0,torque,angular_velocity,power
0,1 foot * force_pound,1000 revolutions_per_minute,1000 foot * force_pound * revolutions_per_minute
1,2 foot * force_pound,2000 revolutions_per_minute,4000 foot * force_pound * revolutions_per_minute
2,2 foot * force_pound,2000 revolutions_per_minute,4000 foot * force_pound * revolutions_per_minute
3,3 foot * force_pound,3000 revolutions_per_minute,9000 foot * force_pound * revolutions_per_minute


In [5]:
df.power.values.data

In [6]:
df.torque.values.data

In [7]:
df.angular_velocity.values.data

In [8]:
df.power.values.data.to("kW")

## Reading from csv

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays. 

## Setup

Here we create the DateFrame and save it to file, next we will show you how to load and read it.

We start with an DateFrame with column headers only.

In [9]:
df_init = pd.DataFrame({
    "speed": [1000, 1100, 1200, 1200],
    "mech power": [np.nan, np.nan, np.nan, np.nan],
    "torque": [10, 10, 10, 10],
    "rail pressure": [1000, 1000000000000, 1000, 1000],
    "fuel flow rate": [10, 10, 10, 10],
    "fluid power": [np.nan, np.nan, np.nan, np.nan],
})
df_init

Unnamed: 0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
0,1000,,10,1000,10,
1,1100,,10,1000000000000,10,
2,1200,,10,1000,10,
3,1200,,10,1000,10,


Then we add a column header which contains units information

In [10]:
units = ["rpm", "kW", "N m", "bar", "l/min", "kW"]
df_to_save = df_init.copy()
df_to_save.columns = pd.MultiIndex.from_arrays([df_init.columns, units])
df_to_save

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,rpm,kW,N m,bar,l/min,kW
0,1000,,10,1000,10,
1,1100,,10,1000000000000,10,
2,1200,,10,1000,10,
3,1200,,10,1000,10,


Now we save this to disk as a csv to give us our starting point.

In [11]:
test_csv_name = "pandas_test.csv"
df_to_save.to_csv(test_csv_name, index=False)

Now we are in a position to read the csv we just saved. Let's start by reading the file with units as a level in a multiindex column.

In [12]:
df = pd.read_csv(test_csv_name, header=[0,1])
df

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,rpm,kW,N m,bar,l/min,kW
0,1000,,10,1000,10,
1,1100,,10,1000000000000,10,
2,1200,,10,1000,10,
3,1200,,10,1000,10,


Then use the DataFrame's pint accessor's quantify method to convert the columns from `np.ndarray`s to PintArrays, with units from the bottom column level.

In [13]:
df_ = df.pint.quantify(ureg, level=-1)
df_

Unnamed: 0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
0,1000.0 revolutions_per_minute,nan kilowatt,10.0 meter * newton,1000.0 bar,10.0 liter / minute,nan kilowatt
1,1100.0 revolutions_per_minute,nan kilowatt,10.0 meter * newton,1000000000000.0 bar,10.0 liter / minute,nan kilowatt
2,1200.0 revolutions_per_minute,nan kilowatt,10.0 meter * newton,1000.0 bar,10.0 liter / minute,nan kilowatt
3,1200.0 revolutions_per_minute,nan kilowatt,10.0 meter * newton,1000.0 bar,10.0 liter / minute,nan kilowatt


As previously, operations between DataFrame columns are unit aware

In [14]:
df_['mech power'] = df_.speed*df_.torque
df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
df_

Unnamed: 0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
0,1000.0 revolutions_per_minute,10000.0 meter * newton * revolutions_per_minute,10.0 meter * newton,1000.0 bar,10.0 liter / minute,10000.0 bar * liter / minute
1,1100.0 revolutions_per_minute,11000.0 meter * newton * revolutions_per_minute,10.0 meter * newton,1000000000000.0 bar,10.0 liter / minute,10000000000000.0 bar * liter / minute
2,1200.0 revolutions_per_minute,12000.0 meter * newton * revolutions_per_minute,10.0 meter * newton,1000.0 bar,10.0 liter / minute,10000.0 bar * liter / minute
3,1200.0 revolutions_per_minute,12000.0 meter * newton * revolutions_per_minute,10.0 meter * newton,1000.0 bar,10.0 liter / minute,10000.0 bar * liter / minute


The DataFrame's `pint.dequantify` method then allows us to retrieve the units information as a header row once again.

In [15]:
df_.pint.dequantify()

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,revolutions_per_minute,meter * newton * revolutions_per_minute,meter * newton,bar,liter / minute,bar * liter / minute
0,1000.0,10000.0,10.0,1000.0,10.0,10000.0
1,1100.0,11000.0,10.0,1000000000000.0,10.0,10000000000000.0
2,1200.0,12000.0,10.0,1000.0,10.0,10000.0
3,1200.0,12000.0,10.0,1000.0,10.0,10000.0


This allows for some rather powerful abilities. For example, to change single column units

In [16]:
df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,revolutions_per_minute,kilowatt,meter * newton,bar,liter / minute,kilowatt
0,1000.0,1.047198,10.0,1000.0,10.0,16.66667
1,1100.0,1.151917,10.0,1000000000000.0,10.0,16666670000.0
2,1200.0,1.256637,10.0,1000.0,10.0,16.66667
3,1200.0,1.256637,10.0,1000.0,10.0,16.66667


or the entire table's units

In [17]:
df_.pint.to_base_units().pint.dequantify()

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,radian / second,kilogram * meter ** 2 / second ** 3,kilogram * meter ** 2 / second ** 2,kilogram / meter / second ** 2,meter ** 3 / second,kilogram * meter ** 2 / second ** 3
0,104.719755,1047.197551,10.0,100000000.0,0.000167,16666.67
1,115.191731,1151.917306,10.0,1e+17,0.000167,16666670000000.0
2,125.663706,1256.637061,10.0,100000000.0,0.000167,16666.67
3,125.663706,1256.637061,10.0,100000000.0,0.000167,16666.67
