# Units in python

## Intro

There are several libraries dedicated to managing physical and custom units in python.

The following will focus on the following 3 libraries that seem to be the best / most flexible:
* [astropy.units](https://docs.astropy.org/en/stable/units/index.html)
* [pint](https://pint.readthedocs.io/en/stable/)
* [unyt](https://unyt.readthedocs.io/en/stable/index.html)

The following article, written by the authors of unyt gives a good overview and benchmarks them:

https://github.com/yt-project/unyt/blob/main/paper/paper-final.pdf

Basically, these libraries do the following:
* Provide a set of pre-defined common physical units
* Also allow users to define custom units (important for CHIMES)
* automatically manage conversions both for:
    - prefixes: for orders of magnitude (m => km, mm, dm, nm...)
    - non-prefixes: between different units in the same dimension (m => Angstroms or eV => J)
* They generally offer the possibility of defining either:
    - **Unit**: single-standing units from str
    - **Quantity**: a float or nm.ndarray to which a Unit is attached
* They have a symbolic solver to simplify units multiplications / divisions
    
    
Working with **Quantities** (value + unit in a single object) allows to take advantage of automated computation of units from baic algorithmic operations: multiplication, division, addition, soustraction and power (and combinations of those)

## examples with standard units

Nothing illustrates better than examples, let start with standard units

### astropy.units

In [17]:
import astropy.units as asunits

# ------------
# units only

# define a photon flux unit and a length unit
flux_units = asunits.Unit('ph/(s.m2)')
L_units = asunits.Unit('m')

# get the multiplication
flux_through_L2 = flux_units * L_units**2

print(flux_through_L2)

ph / s


Here, the library solved the simplification and the output units is as expected.
Note however that it does not simplify if all distances do not have the same prefix.

In [16]:
# ----------------------------
# units with dissimilar prefixes

# define a photon flux unit and a length unit
flux_units = asunits.Unit('ph/(s.m2)')
L_units = asunits.Unit('km')

# get the multiplication
flux_through_L2 = flux_units * L_units**2
print(flux_through_L2)

km2 ph / (m2 s)


The library can provide the conversion coefficients associated to a prefix.

For that the user has to specify a conversion using the `to_units()` method.

In [19]:
L_units = asunits.Unit('km')
coef = L_units.in_units('m')
print(coef)

1000.0


Now let's look at Quantities (associating a numerical value, scalar or numpy array) to a unit.

Doing that 

In [49]:
# ----------------------------
# Quantities

# define a numerical flux 
flux = asunits.Quantity(np.linspace(0, 10, 11), unit='ph/(s.m2)')
print(f'flux:\n\t{flux}')

flux:
	[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] ph / (m2 s)


The numpy array and the unit are stored in the same instance.

They are accessible attributes (generally not needed, but just so you know).

In [61]:
# the numpy array is stored as a memory view and the units as a Unit, both are accessible attributes
print(f'flux.data:\n\t{type(flux.value)}\n\t{flux.value}\n\t{flux.to_value()}')
print(f'flux.unit:\n\t{type(flux.unit)}\n\t{flux.unit}\n\t{flux.to_string()}')

flux.data:
	<class 'numpy.ndarray'>
	[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
	[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
flux.unit:
	<class 'astropy.units.core.CompositeUnit'>
	ph / (m2 s)
	[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] ph / (m2 s)


The **Quantity** object can do anythong a standard numpy numpy array can do (all the numpy methods and attributes are inherited).
There is no loss of functionality over a numpy array.

There are, on the other hand, a few extra methods specific to Quantity.

In [63]:
# the Quantity object still has all the attributes of a numpy array
latt_in_numpy_not_in_Quantity = [ss for ss in dir(np.ndarray) if ss not in dir(flux) and not ss.startswith('_')]
latt_in_Quantity_not_in_numpy = [ss for ss in dir(flux) if ss not in dir(np.ndarray) and not ss.startswith('_')]

print(f'\nlatt_in_numpy_not_in_Quantity:\n\t{latt_in_numpy_not_in_Quantity}')
print(f'\nlatt_in_Quantity_not_in_numpy:\n\t{latt_in_Quantity_not_in_numpy}')


latt_in_numpy_not_in_Quantity:
	[]

latt_in_Quantity_not_in_numpy:
	['cgs', 'decompose', 'diff', 'ediff1d', 'equivalencies', 'info', 'insert', 'isscalar', 'nansum', 'si', 'to', 'to_string', 'to_value', 'unit', 'value']


Let's keep going to operations between Quantities

In [28]:
# define a photon flux unit and a length unit
L = asunits.Quantity(10, unit='m')

# get the multiplication
flux_through_L2 = flux * L**2
print(flux_through_L2)

[   0.  100.  200.  300.  400.  500.  600.  700.  800.  900. 1000.] ph / s


The numpy operations are executed at the same time as the symbolic algebra on the units.

Also, the method `in_unit()` is now called `to()` and returns a copy of the Quantity with converted numerical values (and units)

In [57]:
L_mm = L.to('mm')
print(L_mm)

10000.0 mm


In [35]:
L.

Unit("m")

## How to chose?

Several aspects can lead to choosing one over the other.

A first group of arguments refers to how sane the library is:

* Dependencies: is the library stand-alone or does it come with a lot of dependencies?
* Quality: is the code well-tested?
* Long-term: is there a good user-community ? is the github repo alive or is the library dying?


A second group of arguments refers to specific features we want:
* Compatibility with numpy: it should work out-of-the-box
* Speed: do the extra layer add a significant CPU time to computations?
* Custom units: we're doing economics, so we'll be selling bananas in dollars or euros, counting people... we need flexibility to define our own units (not only classical units like Joules, meters, kg...)

## How to use it?

So far CHIMES typically stores data in a dict where:
* 'data' contains a numpy array
* 'units' contains a str representing the units

There are 2 ways to implement a better handling of units using one of these libraries:
* Method A:
    - easiest but less leverage: keep `data` and `units` as 2 separated keys
    - just run a `astropy.units` on each unit at definition time to check that the unit exists (validate it)
    - call `astropy.units` any time we need to do operation on data to update the 'unit' field
    - very library-agnostic
    
* Method B:
    - replace both fields by a single `data` or `quantity` field containing a Quantity instead of a numpy array
    - most leverage as units will be handled automatically by the library
    - probably makes us more dependent on the library (more painful if we want to change later?)
    - Probably the best choice for full automatied unit management on the long term if we're sure which library we want