# Dtypes in Vaex
26-07-2021

For using DataTypes class in Vaex, part of vaex modeule needs to be loaded:

``` python
from vaex.datatype import DataType
```

### Numeric
Vaex library wraps numpy and arrow datatypes. Here is a mapping of the dtypes for numeric values (bool, int, uint, float and complex).

| vaex numeric type | arrow | numpy |
| :- | :- | :- |
| bool | pd.bool_() | np.dtype(np.bool_) |
| int8 | pd.int8() | np.dtype(np.int8) |
| int16 | pd.int16() | np.dtype(np.int16) |
| int32 | pd.int32() | np.dtype(np.int32) |
| int64 | pd.int64() | np.dtype(np.int64) |
| unit8 | pd.uint8() | np.dtype(np.unit8) |
| unit16 | pd.uint16() | np.dtype(np.uint16) |
| uint32 | pd.uint32() | np.dtype(np.uint32) |
| unit64 | pd.uint64() | np.dtype(np.uint64) |
| foat16 | pd.float16() | np.dtype(np.float16) |
| float32 | pd.float32() | np.dtype(np.float32) |
| float64 | pd.float64() | np.dtype(np.float64) |
| complex64 | - | np.dtype(np.complex64) |
| complex128 | - | np.dtype(np.complex128) |

## Reasearch numeric dtypes

Research compatibility between
- arrow dtypes
- numpy dtypes
and
- vaex DataType

In [1]:
import pyarrow as pa
import numpy as np

import vaex
from vaex.datatype import DataType

In [2]:
DataType(np.dtype(np.complex64))

complex64

In [3]:
DataType(pa.float16())

halffloat

In [4]:
DataType(np.dtype(np.float16))

float16

In [5]:
a = DataType(pa.float16())
b = DataType(np.dtype(np.float16))
a == b

True

In [6]:
DataType(np.dtype(np.int64))

int64

In [7]:
DataType(pa.bool_())

bool

## Research implementation for different numeric dtypes
I have to check and see how does `from_dataframe` do with different numeric dtypes.

In [8]:
# Test data
x = np.array([1, 0, 1]).astype('bool')
y = np.array([1, 2, 3]).astype('uint16')
z = np.array([1, 2, 3]).astype('int64')
w = np.array([1, 2, 3]).astype('float32')
q = np.array([9.2, 10.5, 11.8]).astype('complex128')
df = vaex.from_arrays(x=x, y=y, z=z, w=w, q=q)

df

#,x,y,z,w,q
0,True,1,1,1,(9.2+0j)
1,False,2,2,2,(10.5+0j)
2,True,3,3,3,(11.8+0j)


In [9]:
df['q']

Expression = q
Length: 3 dtype: complex128 (column)
------------------------------------
0   (9.2+0j)
1  (10.5+0j)
2  (11.8+0j)

In [10]:
%run vaex_implementation_v2.py

# Should give an error because of the complex data in q
df2 = from_dataframe_to_vaex(df)
df2

ValueError: Data type complex128 not supported by exchangeprotocol

## Researching dtypes classes and methods in the protocol

In [11]:
df.y.dtype

uint16

In [12]:
_k = _DtypeKind
df.y.dtype in (_k.INT, _k.UINT, _k.FLOAT, _k.BOOL)

False

In [13]:
_np_kinds = {'i': _k.INT, 'u': _k.UINT, 'f': _k.FLOAT, 'b': _k.BOOL,
             'U': _k.STRING,
             'M': _k.DATETIME, 'm': _k.DATETIME}

In [14]:
# df.y.dtype = uint16
kind = _np_kinds.get(df.y.dtype.kind, None)
kind

<_DtypeKind.UINT: 1>

In [15]:
df.y.dtype.kind

'u'

In [16]:
# df.x.dtype = bool
kindx = _np_kinds.get(df.x.dtype.kind, None)
kindx

<_DtypeKind.BOOL: 20>

In [17]:
# df.q.dtype = complex128
kindq = _np_kinds.get(df.q.dtype.kind, None)
kindq