# Selecting columns : using `[]`

In [1]:
import polars as pl

In [2]:
csv_file = "data/titanic.csv"

In [3]:
df = pl.read_csv(csv_file)
df.head(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Owen Harris""","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Miss. Laina""","""female""",26.0,0,0,"""STON/O2. 3101282""",7.925,,"""S"""


## Choosing columns with square brackets

In [4]:
df["Age"].head(3)

Age
f64
22.0
38.0
26.0


The output is a `Series`.

In [5]:
type(df["Age"])

polars.series.series.Series

Choose columns with a list of strings in `[]` - the output is a `DataFrame`

In [6]:
df[["Survived", "Age"]].head(3)

Survived,Age
i64,f64
0,22.0
1,38.0
1,26.0


In [7]:
type(df[["Survived", "Age"]])

polars.dataframe.frame.DataFrame

## Choosing rows and columns with `[]`

`df[row, column]`

In [8]:
df[0, "Age"]

22.0

list in row and column

In [9]:
df[[0,1], ["Age", "Fare"]]

Age,Fare
f64,f64
22.0,7.25
38.0,71.2833


The basic rules are:
- if we pass just numeric values we select rows
- if we pass just strings we select columns
- if we pass a tuple like `[numeric,string]` we select rows and columns

### Numeric indexing

In [10]:
df[:, 1:6].head(3)

Survived,Pclass,Name,Sex,Age
i64,i64,str,str,f64
0,3,"""Braund, Mr. Owen Harris""","""male""",22.0
1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0
1,3,"""Heikkinen, Miss. Laina""","""female""",26.0


### Slice

In [12]:
df[:2, "Survived": "Age"]

Survived,Pclass,Name,Sex,Age
i64,i64,str,str,f64
0,3,"""Braund, Mr. Owen Harris""","""male""",22.0
1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0


## Creating a column with `[]`?
We can't create a column with `[]`

To create a column we use the `with_columns` method.

# Exercises

### Exercise 1

Choose the `Name` column as a `Series`

In [14]:
df["Name"].head(2)

Name
str
"""Braund, Mr. Owen Harris"""
"""Cumings, Mrs. John Bradley (Fl…"


Choose the `Name` and `Fare` columns

In [15]:
df[["Name", "Fare"]].head(2)

Name,Fare
str,f64
"""Braund, Mr. Owen Harris""",7.25
"""Cumings, Mrs. John Bradley (Fl…",71.2833


Choose all columns from `Name` to `Fare`

In [16]:
df[:, "Name":"Fare"]

Name,Sex,Age,SibSp,Parch,Ticket,Fare
str,str,f64,i64,i64,str,f64
"""Braund, Mr. Owen Harris""","""male""",22.0,1,0,"""A/5 21171""",7.25
"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0,1,0,"""PC 17599""",71.2833
"""Heikkinen, Miss. Laina""","""female""",26.0,0,0,"""STON/O2. 3101282""",7.925
"""Futrelle, Mrs. Jacques Heath (…","""female""",35.0,1,0,"""113803""",53.1
"""Allen, Mr. William Henry""","""male""",35.0,0,0,"""373450""",8.05
…,…,…,…,…,…,…
"""Montvila, Rev. Juozas""","""male""",27.0,0,0,"""211536""",13.0
"""Graham, Miss. Margaret Edith""","""female""",19.0,0,0,"""112053""",30.0
"""Johnston, Miss. Catherine Hele…","""female""",,1,2,"""W./C. 6607""",23.45
"""Behr, Mr. Karl Howell""","""male""",26.0,0,0,"""111369""",30.0


### Exercise 2
Choose the first 3 rows from the `Name` column as a `Series`

In [17]:
df[:3, "Name"]

Name
str
"""Braund, Mr. Owen Harris"""
"""Cumings, Mrs. John Bradley (Fl…"
"""Heikkinen, Miss. Laina"""


Choose the second and third rows of all columns from `Name` to `Fare`

In [18]:
df[1:3, "Name": "Fare"]

Name,Sex,Age,SibSp,Parch,Ticket,Fare
str,str,f64,i64,i64,str,f64
"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0,1,0,"""PC 17599""",71.2833
"""Heikkinen, Miss. Laina""","""female""",26.0,0,0,"""STON/O2. 3101282""",7.925
