# Selecting columns 3: selecting multiple columns
By the end of this section you will be able to:
- select all columns from a `DataFrame`
- exclude columns from a select on a `DataFrame` 
- select columns based on a regex
- select columns based on dtype


In [None]:
import polars as pl

In [None]:
csvFile = "../data/titanic.csv"

In [None]:
df = pl.read_csv(csvFile)
df.head(3)

## Selecting all columns from a `DataFrame`

We can select all columns by replacing `pl.col` with `pl.all`

In [None]:
df.select(
    pl.all()
).head(3)

We can exclude a column (or columns) with the `exclude` expression

In [None]:
df.select(
    pl.exclude(['PassengerId','Survived','Pclass'])
).head(3)

## Selecting columns with a regex
We can select columns with a regex - if the regex starts with `^` and ends with `$`.

The following regex looks for columns starting with `P` and uses the regex *wildcard* `.*` to show `P` can be followed by any characters.

In [None]:
df.select(
    "^P.*$"
).head(3)

We can pass this regex to `pl.col` to apply transformations to these columns. In this example we take the `max` of each column

In [None]:
df.select(
    pl.col("^P.*$").max()
).head(3)

## Selecting columns based on dtype
We can select all of the columns that have a particular dtype by passing the dtype to `pl.col`.

Here we select all the string columns with `pl.Utf8` - the string dtype object

In [None]:
df.describe()

In [None]:
(
    df
    .select(
        pl.col(pl.Utf8)
    )
    .head(3)
)

We can also pass a list of dtypes to `pl.col`. In this case we select all of the numeric dtypes

In [None]:
(
    df
    .select(
        pl.col([pl.Int64,pl.Float64])
    )
    .head(3)
)

# Exercises

In the exercises you will develop your understanding of:
- selecting all columns from a `DataFrame`
- excluding columns from a selection
- selecting columns with a regex
- selecting columns with a dtype

### Exercise 1

Select all columns from the `DataFrame` and sort each column

In [None]:
df = pl.read_csv(csvFile)
df.<blank>.head(3)

### Exercise 2
Select all columns from the `DataFrame` with the exception of the `PassengerId` column

In [None]:
df = pl.read_csv(csvFile)
df.<blank>.head(3)

### Exercise 3
Select all columns from the `DataFrame` that start with `S` or `N`

In [None]:
df = pl.read_csv(csvFile)
df.<blank>

### Exercise 4
Select all the columns with 64-bit floating point dtype

Hint: the 64-bit floating point dtype is `pl.Float64`

In [None]:
df = pl.read_csv(csvFile)
df.<blank>

### Exercise 5
Convert the following Pandas code to Polars

Looping over columns in Polars is to be avoided at all costs. 

Convert this Pandas code with a loop over the columns to Polars code using the Expression API.

In the loop we create a dictionary `maxDict` with the column names and maximum values

In [None]:
import pandas as pd
import numpy as np
df = pl.read_csv(csvFile)
dfPandas = df.to_pandas()

# Convert this code below to Polars in the following cell
maxDict = {}
for col in dfPandas.columns:
    if dfPandas[col].dtype == np.float64:
        maxDict[col] = [dfPandas[col].max()]
pd.DataFrame(maxDict)

In [None]:
(
    pl.read_csv(csvFile)
     <blank>
)

## Solutions

### Solution to Exercise 1
Select all columns from the `DataFrame` and sort each column

In [None]:
(
    pl.read_csv(csvFile)
    .select(
        pl.all().sort()
    )
    .head(3)
)    

### Solution to Exercise 2
Select all columns from the `DataFrame` with the exception of the `PassengerId` column

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .select(
        pl.all().exclude('PassengerId')
    )
    .head(3)
)

### Solution to Exercise 3
Select all columns from the `DataFrame` that start with `S` or `N`

In [None]:
pl.read_csv(csvFile)
(
    df
    .select(
        pl.col("^(S|N).*$")
    )
    .head(3)
)

### Solution to Exercise 4
Select all the columns with 64-bit floating point dtype

In [None]:
df = pl.read_csv(csvFile)
(
    df
    .select(
        pl.col(pl.Float64)
    )
)

### Solution to Exercise 5
Convert the following Pandas code to Polars
```python
import pandas as pd
import numpy as np
df = pl.read_csv(csvFile)
dfPandas = df.to_pandas()

# Convert this code below to Polars in the following cell
maxDict = {}
for col in dfPandas.columns:
    if dfPandas[col].dtype == np.float64:
        maxDict[col] = [dfPandas[col].max()]
pd.DataFrame(maxDict)
```

In [None]:
(
    pl.read_csv(csvFile)
    .select(
        pl.col(pl.Float64).max()
    )
)