# Selecting columns 1: using `[]`
By the end of this lecture you will be able to:
- select a column or columns with `[]` indexing
- select rows and columns with `[]` indexing

In [1]:
import polars as pl

In [2]:
csv_file = "../data/titanic.csv"

In [3]:
df = pl.read_csv(csv_file)
df.head(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Owen Harris""","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Miss. Laina""","""female""",26.0,0,0,"""STON/O2. 3101282""",7.925,,"""S"""


## Choosing columns with square brackets

We can choose a column with a string in `[]`

In [4]:
df['Age'].head(3)

Age
f64
22.0
38.0
26.0


The output is a `Series`.

We can choose a column with a list of strings in `[]` - the output is a `DataFrame`

In [5]:
df[['Survived','Age']].head(3)

Survived,Age
i64,f64
0,22.0
1,38.0
1,26.0


## Choosing rows and columns with `[]`
We can choose rows and columns together with `[]`

In [6]:
df[0,"Age"]

22.0

Python interprets values separate by a comma as a `tuple`

In [8]:
0,"Age"

(0, 'Age')

So underneath the hood this case with two elements is really passing a `tuple` inside `[]`

In [7]:
df[(0,"Age")]

22.0

We can still pass lists for either element inside `[]`

In [9]:
df[[0,1],["Age","Fare"]]

Age,Fare
f64,f64
22.0,7.25
38.0,71.2833


The basic rules are:
- if we pass just numeric values we select rows
- if we pass just strings we select columns
- if we pass a tuple like `[numeric,string`] we select rows and columns

### Numeric indexing
We can use numeric indexing for columns when we pass a `tuple`

In [10]:
df[:, 1:6].head(2)

Survived,Pclass,Name,Sex,Age
i64,i64,str,str,f64
0,3,"""Braund, Mr. Owen Harris""","""male""",22.0
1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0


### Slice
We can choose columns with a `slice` into the list in `df.columns` 

In [None]:
df[:2, "Survived":"Age"]

## Creating a column with `[]`?
We can't create a column with `[]`

In [None]:
# df["constant"] = 3

To create a column we use the `with_columns` method which we will meet later in this section.

In [14]:
(
    df
    .with_columns(
        pl.lit(3).alias('constant')
    )
    .head()
)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,constant
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str,i32
1,0,3,"""Braund, Mr. Owen Harris""","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S""",3
2,1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C""",3
3,1,3,"""Heikkinen, Miss. Laina""","""female""",26.0,0,0,"""STON/O2. 3101282""",7.925,,"""S""",3
4,1,1,"""Futrelle, Mrs. Jacques Heath (…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S""",3
5,0,3,"""Allen, Mr. William Henry""","""male""",35.0,0,0,"""373450""",8.05,,"""S""",3


* `pl.lit()`은 상수 값을 나타내기 위해 사용하는 함수


# Exercises

In the exercises you will develop your understanding of:
- selecting columns using `[]`
- selecing rows and columns using `[]`


### Exercise 1

Choose the `Name` column as a `Series`

In [None]:
df = pl.read_csv(csv_file)
df<blank>.head(3)

Choose the `Name` and `Fare` columns

In [None]:
df = pl.read_csv(csv_file)
df<blank>.head(3)

Choose all columns from `Name` to `Fare`

In [None]:
df = pl.read_csv(csv_file)
df<blank>.head(3)

### Exercise 2
Choose the first 3 rows from the `Name` column as a `Series`

In [None]:
df = pl.read_csv(csv_file)
df<blank>

Choose the second and third rows of all columns from `Name` to `Fare`

In [None]:
df = pl.read_csv(csv_file)
df<blank>

## Solutions

### Solution to Exercise 1
Choose the `Name` column as a `Series`

In [None]:
df = pl.read_csv(csv_file)
df["Name"].head(3)


Choose the `Name` and `Fare` columns

In [None]:
df = pl.read_csv(csv_file)
df[["Name","Fare"]].head(3)


### Solution to Exercise 2
Choose the first 3 rows from the column `Name` as a `Series`

In [None]:
df = pl.read_csv(csv_file)
df[:3,"Name"]


Choose the second and third rows of all columns from `Name` to `Fare`

In [None]:
df = pl.read_csv(csv_file)
df[1:3,"Name":"Fare"]
