In [1]:
import pandas as pd

# Acessando valores de um DataFrame
- Nesse notebook vamos explorar como acessar valores específicos em um dataframe
- Mas antes, vamos carregar os dados dessa seção

In [2]:
bond = pd.read_csv("../data/jamesbond.csv")
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## Usando os métodos `set_index` e `reset_index`
- Antes de acessarmos valores propriamente ditos, vamos dar uma olhada nesses dois métods
- Como já fizemos no passado, podemos indicar no `read_csv()` qual coluna será nosso index usando o parâmetro `index_col`
- Porém, podemos usar o método `set_index()` também
- Observe que o DF `bond` não possui um index definido, está como valores numericos que são automaticamente colocados
- Vamos definir a coluna `Film` como indice da seguinte forma:

In [3]:
bond.set_index("Film").head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- Para alterar direto no DF, precisamos usar (como sempre) o `inplace`

In [4]:
bond.set_index("Film", inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- Agora, se quisermos reverter essa operação, podemos utilizar o método `reset_index()`

In [5]:
bond.reset_index().head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- E se usarmos o parametro `drop=True`, `Film` nao vira coluna e é jogado fora:

In [6]:
bond.reset_index(drop=True).head()

Unnamed: 0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,1967,David Niven,Ken Hughes,315.0,85.0,


In [7]:
bond.reset_index(inplace=True)
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- Observação interessante: se um DF já tem os indices defidos e a gente seta pra outro, a gente perde o inicial. Por exemplo

In [8]:
bond.set_index("Film", inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [9]:
bond.set_index("Year", inplace=True)
bond.head()

Unnamed: 0_level_0,Actor,Director,Box Office,Budget,Bond Actor Salary
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962,Sean Connery,Terence Young,448.8,7.0,0.6
1963,Sean Connery,Terence Young,543.8,12.6,1.6
1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
1965,Sean Connery,Terence Young,848.1,41.9,4.7
1967,David Niven,Ken Hughes,315.0,85.0,


- A unica maneira de evitar isso, é usando o `reset_index()` antes de trocar o indice

## Acessandos linhas pelo índice usando o método `.loc[]`
- Vamos aprender a acessar dados pelos indice
- Mas antes, vamos recarregar os dados que foram modificados nos conceitos anteriores

In [12]:
bond = pd.read_csv("../data/jamesbond.csv")
bond.set_index("Film", inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- **Pro tip**: sempre que formos buscar algo em um DF é interessante ordenar pelos index ele porque fica bem mais fácil para o pandas procurar os dados la

In [13]:
bond.sort_index(inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


- Bom, voce deve se lembrar que para uma série, podemos recuperar uma posição (linha) pelo label usando: `serie[label]`
- A ideia aqui é exatamente a mesma, mas precisamos usar `.loc[label]`
- Isso vai retornar uma linha do DF em forma de uma série em  que cada coluna vira um índice:

In [14]:
bond.loc["Diamonds Are Forever"]

Year                         1971
Actor                Sean Connery
Director             Guy Hamilton
Box Office                  442.5
Budget                       34.7
Bond Actor Salary             5.8
Name: Diamonds Are Forever, dtype: object

- Obviamente, se procurarmos por um label que não existe, vamos tomar um `KeyError`
- Agora, se tentarmos acessar um label que é duplicado (e lembre-se que isso é possível), vamos ter um DF como retorno:

In [15]:
bond.loc["Casino Royale"]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


- Assim como para uma lista, é possível usar slices dentro do `.loc[]`
    - Assim como para séries, o intervalo inclui as duas extremindades

In [16]:
bond.loc["A View to a Kill":"Diamonds Are Forever"]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


In [17]:
bond.loc["A View to Kill":"Licence to Kill":2] # pular de dois em dois

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9


- Também podemos passar uma lista de indices:
    - Todos os valores precisa existir, caso contrario tomaremos um `KeyError`

In [18]:
bond.loc[["A View to a Kill", "Licence to Kill", "Diamonds Are Forever"]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


## Acessandos linhas pelo índice usando o método `.iloc[]`
- Assim como nas séries a ideia é acessar de acordo com a posição do indice
- É quase igual ao `.loc`, mas aqui vamos passar uma posição
- A primeira linha é a `A View to a Kill`, que ta na posição zero. Podemos acessar-la assim:

In [20]:
bond.iloc[0]

Year                        1985
Actor                Roger Moore
Director               John Glen
Box Office                 275.2
Budget                      54.5
Bond Actor Salary            9.1
Name: A View to a Kill, dtype: object

In [21]:
bond.iloc[20]

Year                         1974
Actor                 Roger Moore
Director             Guy Hamilton
Box Office                  334.0
Budget                       27.7
Bond Actor Salary             NaN
Name: The Man with the Golden Gun, dtype: object

- Assim como o `.loc[]`, se colocar uma posição que nao existe, tomaremos uma `KeyError`
- Também podemos usar slices a vontade
    - Nesse caso, o último elemento (no caso, 4) nao está incluso. É intervalo aberto pra ele.

In [22]:
bond.iloc[0:4]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


- Também podemos passar uma lista de posições

In [23]:
bond.iloc[[1, 4, 6, 10]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9


## Especificando a coluna para os métodos `.loc[]` e `.iloc[]`
- Até agora acessamos uma linha inteira ou varias linhas como resultado
- Mas é possível passar uma coluna em específico para esses métodos. Assim, temos uma resultado unico
- Para isso, basta passarmos um segundo argumento para o método, que no caso é a coluna

In [24]:
bond.loc["A View to a Kill", "Actor"]

'Roger Moore'

- Também podemos passar uma lista de colunas
    - Neste caso, vamos receber uma série como resposta

In [25]:
bond.loc["A View to a Kill", ["Actor", "Bond Actor Salary"]]

Actor                Roger Moore
Bond Actor Salary            9.1
Name: A View to a Kill, dtype: object

- Como voce ja deve esperar, também é possível passar uma lista para ambos os parametros
    - Neste caso, receberemos um DF de retorno

In [26]:
bond.loc[["A View to a Kill", "Licence to Kill"], ["Actor", "Bond Actor Salary"]]

Unnamed: 0_level_0,Actor,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1
A View to a Kill,Roger Moore,9.1
Licence to Kill,Timothy Dalton,7.9


- Também podemos usar slices em qualquer um dos parametros (apesar de que fica um pouco confuso)

In [27]:
bond.loc["A View to a Kill":"Licence to Kill", "Year":"Director"]

Unnamed: 0_level_0,Year,Actor,Director
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,1985,Roger Moore,John Glen
Casino Royale,2006,Daniel Craig,Martin Campbell
Casino Royale,1967,David Niven,Ken Hughes
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton
Die Another Day,2002,Pierce Brosnan,Lee Tamahori
Dr. No,1962,Sean Connery,Terence Young
For Your Eyes Only,1981,Roger Moore,John Glen
From Russia with Love,1963,Sean Connery,Terence Young
GoldenEye,1995,Pierce Brosnan,Martin Campbell
Goldfinger,1964,Sean Connery,Guy Hamilton


- Para o `.iloc[]` é similar, o único porém é que precisamos passar apenas valores inteiros
- Logo, precisamos passar a posição da coluna. `Year` por exemplo, é a posição `0`
- O restante é a mesma coisa (listas, slices etc)

In [28]:
bond.iloc[5, 0]

1962

In [29]:
bond.iloc[5, 0:3]

Year                 1962
Actor        Sean Connery
Director    Terence Young
Name: Dr. No, dtype: object

In [30]:
bond.iloc[[5, 10, 12], 0:3]

Unnamed: 0_level_0,Year,Actor,Director
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dr. No,1962,Sean Connery,Terence Young
Licence to Kill,1989,Timothy Dalton,John Glen
Moonraker,1979,Roger Moore,Lewis Gilbert


## Alterando valores no DataFrame
- Podemos usar tanto os métodos `.loc[]` e/ou `.iloc[]` para alterarmos valores específicos em um dataframe
- Funciona como a atribuição para uma posição de uma lista

In [31]:
bond.loc["A View to a Kill", "Year"]

1985

- Se quisermos alterar o ano:

In [32]:
bond.loc["A View to a Kill", "Year"] = 2022
bond.loc["A View to a Kill", "Year"]

2022

- Funciona igual para o iloc:

In [33]:
bond.iloc[0, 1]

'Roger Moore'

In [34]:
bond.iloc[0, 1] = "Andre Pacheco"
bond.iloc[0, 1]

'Andre Pacheco'

- Podemos alterar varios valores ao mesmo tempo passando listas:

In [35]:
bond.iloc[0, [2, 3, 4]]

Director      John Glen
Box Office        275.2
Budget             54.5
Name: A View to a Kill, dtype: object

In [36]:
bond.iloc[0, [2, 3, 4]] = ["Zezinho", 200, 180]
bond.iloc[0, [2, 3, 4]]

Director      Zezinho
Box Office      200.0
Budget          180.0
Name: A View to a Kill, dtype: object

- E essas alterações vao direto para o dataframe

In [37]:
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,2022,Andre Pacheco,Zezinho,200.0,180.0,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


- Agora vamos supor que queremos trocar todas as ocorrências de `Sean Connery` da coluna `Actor` por `Joaozinho`
- Nesse caso precisamos criar uma mascara, realizar a filtragem e na sequencia, usar `loc[]`

In [38]:
is_sean_connery = bond["Actor"] == "Sean Connery"
bond[is_sean_connery]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


- Podemos usar o `.loc[]` para encontrar a mesma saída:

In [40]:
bond.loc[is_sean_connery]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


- Agora é só realizar a alteração

In [41]:
bond.loc[is_sean_connery, "Actor"] = "Joaozinho"
bond.loc[is_sean_connery]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Joaozinho,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Joaozinho,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Joaozinho,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Joaozinho,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Joaozinho,Irvin Kershner,380.0,86.0,
Thunderball,1965,Joaozinho,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Joaozinho,Lewis Gilbert,514.2,59.9,4.4
