## Selecting columns 4: Transforming and adding a column
By the end of this lecture you will be able to:
- transform an existing column in place using `with_column`
- add a new column with an expression
- add a new column with column arithmetic
- add a column with constant values using `pl.lit`

In [None]:
import polars as pl

In [None]:
csvFile = "../data/titanic.csv"

In [None]:
df = pl.read_csv(csvFile)
df.head(3)

## Transforming an existing column

We can transform an existing column by passing the column to `with_column`.

In this example we round `Fare` to 0 significant figures.

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        pl.col("Fare").round(0)
        )
    .head(3)
)

## Adding a new column from an existing column
We can create a new column from an existing column by renaming it with `alias`

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        pl.col('Fare').round(0).alias('roundFare')
    )
    .head(3)
)

## Difference between `with_column` and `select`
- The `select` method returns a subset of the columns but `with_column` method returns all of the columns
- `with_column` accepts expressions only - no strings

## Adding or transforming a column with column arithmetic

We can transform columns with arithmetic in an expression.

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        (pl.col("Fare") * 2).alias("doubleFare")
    )
    .head(3)
)

We can also combine multiple columns in an expression

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        (pl.col("Fare") + pl.col("Age")).alias("farePlusAge")
    )
    .head(3)
)

## Adding a new column with a constant value

Use the literal function `pl.lit` to specify a constant value in Polars.

Here we add a new column called `Aboard` with a value `yes` for all passengers 

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        pl.lit('yes').alias('Aboard')
    )
    .select(['Name','Aboard'])
    .head(2)
)

# Exercises

In the exercises you will develop your understanding of:
- transforming an existing column
- adding a new column from existing columns
- adding a new column with a constant value

## Exercise 1: Add a new column for family size

Add a new column called `familySize` with is the sum of the number of siblings (`SibSp` columns), the number of parents or children (`Parch` columns) plus one for the passenger themself.

Print out the first 3 rows.

Hint: Add the two columns inside `()` and then apply `.alias`

In [None]:
df = pl.read_csv(csvFile)
(
    df
    <blank>
)

## Exercise 2: Create a decades column
Add a new column called `decade` that converts the `Age` column to the passengers age in decades e.g. 15.2 goes to 10, where 10 is an integer.

Print out the first 3 rows.

Hint: use `cast` to convert the dtype

In [None]:
df = pl.read_csv(csvFile)
(
    df
    <blank>
)

## Exercise 3: Create a new literal column
Add a new binary column called `Aboard` that has the value `1` for all passengers.

Print out the first 3 rows

In [None]:
df = pl.read_csv(csvFile)
(
    df
    <blank>
)

## Exercise 4: Add a new Boolean column based on an expression

Add a new Boolean column `overThirty` that captures whether a passenger's age is 30 years or older

In [None]:
df = pl.read_csv(csvFile)
(
    df
    <blank>
)

## Solutions

## Solution to exercise 1: Add a new column for family size

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column( 
        (
        pl.col('SibSp') + pl.col('Parch') + 1
        ).alias('familySize')
    )
    .head(3)
)

## Solution to exercise 2: Create a decades column

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column( 
        ((pl.col('Age')/10).floor()).cast(pl.Int64).alias('decade')
    )
    .select(['Age','decade'])
    .head(3)
)


## Solution to exercise 3: Create a new literal column

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        pl.lit(1).alias('Aboard')
    )
    .head(3)
)

## Solution to Exercise 4: Add a new Boolean column based on an expression

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .with_column(
        (pl.col("Age") >= 30).alias("overThirty")
    )
    .head(3)
)