Mapping values is a central task in data processing. The most natural way to do this in SQL is with a join. However, one can also use a SQL CASE WHEN statement when there are not too many values.  [Data algebra](https://github.com/WinVector/data_algebra) version 1.0.1 is introducing a new `.mapv()` operator for this purpose.

Let's set up an example.

In [6]:
import numpy as np
import pandas as pd

from data_algebra.data_ops import *
import data_algebra.SQLite
import data_algebra.test_util

d = pd.DataFrame({
        'x': ['a', 'b', 'c', None, np.nan, 'b'],
    })

d

Unnamed: 0,x
0,a
1,b
2,c
3,
4,
5,b


The task is re-map the string-levels of `x` to values through a Python dictionary.
Let's say our desired mapping is as follows.

In [7]:
map_dict = {"a": 1.0, "b": 2.0, "q": -3}
default_value = 0.5

We can use these values with the `.mapv()` method, which expects the mapping dictionary as its first argument, and a default value for unmatched items as the second.

In [8]:
ops = (
    data(d=d)
        .extend({
            'x_mapped': f'x.mapv({map_dict.__repr__()}, {default_value})'
            })
    )

ops

(
    TableDescription(table_name="d", column_names=["x"]).extend(
        {"x_mapped": "x.mapv({'a': 1.0, 'b': 2.0, 'q': -3}, 0.5)"}
    )
)

This transformation can be be applied to Pandas data frames.

In [9]:
transformed = ops.transform(d)

transformed

Unnamed: 0,x,x_mapped
0,a,1.0
1,b,2.0
2,c,0.5
3,,0.5
4,,0.5
5,b,2.0


In [10]:
expect = pd.DataFrame({
    'x': ['a', 'b', 'c', None, None, 'b'],
    'x_mapped': [1.0, 2.0, 0.5, 0.5, 0.5, 2.0],
    })
assert data_algebra.test_util.equivalent_frames(transformed, expect)

Or, as always, we can convert the transformation to SQL for use in databases.


In [11]:
db_model = data_algebra.SQLite.SQLiteModel()
sql_str = db_model.to_sql(ops)

print(sql_str)


-- data_algebra SQL https://github.com/WinVector/data_algebra
--  dialect: SQLiteModel
--       string quote: '
--   identifier quote: "
SELECT  -- .extend({ 'x_mapped': "x.mapv({'a': 1.0, 'b': 2.0, 'q': -3}, 0.5)"})
 "x" ,
 CASE "x" WHEN 'a' THEN 1.0 WHEN 'b' THEN 2.0 WHEN 'q' THEN -3 ELSE 0.5 END AS "x_mapped"
FROM
 "d"



This function is now an option for [Python vtreat](https://github.com/WinVector/pyvtreat)'s [transform export](https://github.com/WinVector/pyvtreat/blob/main/Examples/Database/vtreat_db_adapter.ipynb).