## Selecting columns 5: Transforming and adding multiple columns
By the end of this lesson you will be able to:
- transform multiple columns in-place
- add multiple columns
- transform and add multiple columns is less verbose ways

## 选择列 5：转换和添加多列
在本课结束时，您将能够：
- 就地转换多列
- 添加多列
- 以更简洁的方式转换和添加多列

In [1]:
import polars as pl
import polars.selectors as cs

In [2]:
csv_file = "../../Files/Sample_Superstore.csv"

In [3]:
df = pl.read_csv(csv_file)


In [4]:
df.head(3)

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,,,"""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""08-11-2016""","""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""12-06-2016""",,,"""DV-13045""","""Darrin Van Huff""","""Corporate""",,"""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714


## Transforming existing columns

We can transform multiple existing columns by either passing a `list` of expressions to `with_columns` or comma-separated expressions.

Here we pass comma-separated expressions to round the floating columns to 0 decimal places

## 转换现有列

我们可以通过向 `with_columns` 传递表达式列表或以逗号分隔的表达式来转换多个现有列。

这里我们传递以逗号分隔的表达式，将浮动列的值四舍五入到小数点后 0 位。

In [6]:
# 这段代码只是单纯对原来的 列的数据进行了修改, 并没有创建新的列
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        pl.col('Profit').round(0),
        pl.col('Discount').round(0),
    )
    .head(3)
)

# 但是这段代码是没有问题的, 毕竟有了.alias()方法, 是添加了新的列.
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        pl.col('Profit').round(0).alias("kaishi"),
        pl.col('Discount').round(0).alias("kaishishikdshaifoansfsahi"),
    )
    .head(3)
)

Profit,Discount,kaishi,kaishishikdshaifoansfsahi
f64,f64,f64,f64
41.9136,0.0,42.0,0.0
219.582,0.0,220.0,0.0
6.8714,0.0,7.0,0.0


We can make this less verbose, however.

As we are applying the same transformation to the `Profit` and `Discount` columns we can pass them both to the same `pl.col` as comma-separated column names

不过，我们可以简化代码。

由于我们对 `Profit` 和 `Discount` 列应用了相同的转换，我们可以将它们作为逗号分隔的列名传递给同一个 `pl.col` 对象。

In [7]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount")
    .with_columns(
        pl.col('Profit','Discount').round(0),
    )
    .head(5)
)

Profit,Discount
f64,f64
42.0,0.0
220.0,0.0
7.0,0.0
-383.0,0.0
3.0,0.0


In this example `Sales`, `Profit` and `Discount` are the only float columns. This means that we can instead pass their dtype to `pl.col` to apply the `round` expression to all float columns


在这个例子中，`Sales`、`Profit` 和 `Discount` 是仅有的浮点型列。这意味着我们可以将它们的数据类型传递给 `pl.col`，从而将 `round` 表达式应用于所有浮点型列。

In [8]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount","Sales")
    .with_columns(
        pl.col(pl.Float64).round(0),
    )
    .head(3)
)

Profit,Discount,Sales
f64,f64,f64
42.0,0.0,262.0
220.0,0.0,732.0
7.0,0.0,15.0


Or we can use selectors to select the columns that we want to round

或者我们可以使用选择器来选择要四舍五入的列。

In [9]:
(
    pl.read_csv(csv_file)
    .select("Profit","Discount","Sales")
    .with_columns(
        cs.float().round(0),
    )
    .head(3)
)

Profit,Discount,Sales
f64,f64,f64
42.0,0.0,262.0
220.0,0.0,732.0
7.0,0.0,15.0


## Adding new columns from existing columns
Above we overwrite the existing `Profit` and `Discount` columns in the `with_columns` statements

We can instead create new columns from existing columns with `alias`. 

In this example we add the rounded `Profit` and `Discount` as new columns


## 从现有列添加新列

上面我们使用 `with_columns` 语句覆盖了现有的 `Profit` 和 `Discount` 列。

我们也可以使用 `alias` 从现有列创建新列。

在这个例子中，我们将四舍五入后的 `Profit` 和 `Discount` 添加为新列。

In [14]:
(
    pl.read_csv(csv_file)
    .with_columns(
        pl.col('Profit').round(0).alias('Profit_round'),
        pl.col('Discount').round(0).alias('Discount_round')
    )
    .select(
        'Profit', 'Profit_round', 'Discount', 'Discount_round',
    )
    .head(3)
)

Profit,Profit_round,Discount,Discount_round
f64,f64,f64,f64
41.9136,42.0,0.0,0.0
219.582,220.0,0.0,0.0
6.8714,7.0,0.0,0.0


As an alternative to `alias` we can use comma-separated keyword assignments

除了使用 `alias` 之外，我们还可以使用逗号分隔的关键字赋值。

In [10]:
(
    pl.read_csv(csv_file)
    .with_columns(
        Profit_round = pl.col('Profit').round(0),
        Discount_round = pl.col('Discount').round(0),
    )
    .select(
        'Profit', 'Profit_round', 'Discount', 'Discount_round',
    )
    .head(3)
)

Profit,Profit_round,Discount,Discount_round
f64,f64,f64,f64
41.9136,42.0,0.0,0.0
219.582,220.0,0.0,0.0
6.8714,7.0,0.0,0.0


Note that if you mix the `alias` and keyword assignment approach in the same `with_columns` the keyword assignments must come after the `alias` expressions.

When should you use `alias` and when should you use the keyword approach?
- There is no performance difference between the `alias` and keyword approach
- You might find the keyword approach more readable in some cases
- You can use python variables inside an `alias` but not with keyword assignment

## Creating new columns when working with multiple expressions
We can still use the less verbose multi-expression approaches we saw above when we want to create new columns.

In this example we round the float columns as new columns by adding the `_round` using `name.suffix`


请注意，如果在同一个 `with_columns` 语句中混合使用 `alias` 和关键字赋值方法，关键字赋值必须放在 `alias` 表达式之后。

何时应该使用 `alias`，何时应该使用关键字赋值？

- `alias` 和关键字赋值方法在性能上没有区别。

- 在某些情况下，您可能会发现关键字赋值方法更易读。

- 您可以在 `alias` 中使用 Python 变量，但关键字赋值方法则不行。

## 使用多个表达式创建新列

当我们想要创建新列时，仍然可以使用上面提到的简洁的多表达式方法。

在本例中，我们通过添加 `_round` 并使用 `name.suffix` 将浮点数列舍入为新列。

In [21]:
(
    pl.read_csv(csv_file)
    .with_columns(
        pl.col(pl.Float64).round(0).name.suffix("_round"),
    )
    .select(
        'Profit','Profit_round','Discount','Discount_round',
    )
    .head(3)
)

Profit,Profit_round,Discount,Discount_round
f64,f64,f64,f64
41.9136,42.0,0.0,0.0
219.582,220.0,0.0,0.0
6.8714,7.0,0.0,0.0
