## String Join / Concatenate
- To join list of strings to a single string...
  - For Pandas, use `str.join`.
  - If not all elements are string, it will return `NaN`.
  - For Polars, use `list.join`.
  - If not all elements are string, it will raise error.
- To concat string value columns to a single column...
  - For Pandas, use `str.cat` or `+`.
  - For Polars, use `pl.concat_str` or `+`.
  - Using `str.join` in Polars joins all values within column.
  


In [2]:
import pandas as pd
import polars as pl

In [None]:
data = {
    'a': [['a', 'b', 'c'], ['x', 'y', 'z']],
    'b': [[1, 2, 3], [4, 5, 6]],
    'c': ['10', '20'],
    'd': ['AAA', 'BBB']
}

In [8]:
df_pd = pd.DataFrame(data)
df_pd

Unnamed: 0,a,b,c,d
0,"[a, b, c]","[1, 2, 3]",10,AAA
1,"[x, y, z]","[4, 5, 6]",20,BBB


In [19]:
df_pl = pl.DataFrame(data)
df_pl

a,b,c,d
list[str],list[i64],str,str
"[""a"", ""b"", ""c""]","[1, 2, 3]","""10""","""AAA"""
"[""x"", ""y"", ""z""]","[4, 5, 6]","""20""","""BBB"""


Join

In [16]:
print(df_pd['a'].str.join('-'))
print(df_pd['b'].str.join('-'))

0    a-b-c
1    x-y-z
Name: a, dtype: object
0   NaN
1   NaN
Name: b, dtype: float64


In [21]:
print(df_pl.select(pl.col('a').list.join('-')))
try:
    df_pl.select(pl.col('b').list.join('-'))
except Exception as e:
    print(e)

shape: (2, 1)
┌───────┐
│ a     │
│ ---   │
│ str   │
╞═══════╡
│ a-b-c │
│ x-y-z │
└───────┘
`lst.join` operation not supported for dtype `i64` (expected: String)


Concatenate

In [26]:
print(df_pd['c'] + df_pd['d'])
print(df_pd['c'].str.cat(df_pd['d']))
print(df_pd['c'].str.cat(df_pd['d'], sep="|"))
print(df_pd['c'] + "@@@" + df_pd['d'])

0    10AAA
1    20BBB
dtype: object
0    10AAA
1    20BBB
Name: c, dtype: object
0    10|AAA
1    20|BBB
Name: c, dtype: object
0    10@@@AAA
1    20@@@BBB
dtype: object


In [42]:
df_pl.select(
    pl.concat_str([pl.col('c'), pl.col('d')]).alias("concat_str_1"),
    pl.concat_str([pl.col('c'), pl.col('d')], separator="|").alias("concat_str_2"),
    (pl.col('c') + pl.col('d')).alias('add_1'),
    (pl.col('c') + "@@@" + pl.col('d')).alias('add_2'),
)

concat_str_1,concat_str_2,add_1,add_2
str,str,str,str
"""10AAA""","""10|AAA""","""10AAA""","""10@@@AAA"""
"""20BBB""","""20|BBB""","""20BBB""","""20@@@BBB"""


In [44]:
df_pl.select(pl.col('c').str.join('@@'))

c
str
"""10@@20"""
