# Percorrendo linhas de um DataFrame usando iterrows

In [2]:
import pandas as pd

Como de costume, vamos criar nosso dataset

In [3]:
file = 'kc_house_data.csv'
dataset = pd.read_csv(file)
dataset.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3.0,1.0,1180,5650,1.0,0,0,...,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3.0,2.25,2570,7242,2.0,0,0,...,7,2170,400,1951,1991,98125,47.721,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2.0,1.0,770,10000,1.0,0,0,...,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4.0,3.0,1960,5000,1.0,0,0,...,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3.0,2.0,1680,8080,1.0,0,0,...,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503


O método `iterrows()` permite percorrer as linhas de um dataframe. Se usarmos `next(dataset.iterrows())`, iremos mostrar a primeira linha do nosso dataset

In [6]:
next(dataset.iterrows())

(0, id                    7129300520
 date             20141013T000000
 price                     221900
 bedrooms                       3
 bathrooms                      1
 sqft_living                 1180
 sqft_lot                    5650
 floors                         1
 waterfront                     0
 view                           0
 condition                      3
 grade                          7
 sqft_above                  1180
 sqft_basement                  0
 yr_built                    1955
 yr_renovated                   0
 zipcode                    98178
 lat                      47.5112
 long                    -122.257
 sqft_living15               1340
 sqft_lot15                  5650
 Name: 0, dtype: object)

Podemos percorrer algumas linhas ou o dataframe todo através do `iterrows()`

In [8]:
for index,row in dataset.head(10).iterrows():
    print(index,row, '\n')

0 id                    7129300520
date             20141013T000000
price                     221900
bedrooms                       3
bathrooms                      1
sqft_living                 1180
sqft_lot                    5650
floors                         1
waterfront                     0
view                           0
condition                      3
grade                          7
sqft_above                  1180
sqft_basement                  0
yr_built                    1955
yr_renovated                   0
zipcode                    98178
lat                      47.5112
long                    -122.257
sqft_living15               1340
sqft_lot15                  5650
Name: 0, dtype: object 

1 id                    6414100192
date             20141209T000000
price                     538000
bedrooms                       3
bathrooms                   2.25
sqft_living                 2570
sqft_lot                    7242
floors                         2
waterfront    

Podemos também printar algumas colunas

In [10]:
for index,row in dataset.head(10).iterrows():
    print(index, row['bedrooms'], row['bathrooms'], row['price'])

0 3.0 1.0 221900.0
1 3.0 2.25 538000.0
2 2.0 1.0 180000.0
3 4.0 3.0 604000.0
4 3.0 2.0 510000.0
5 4.0 4.5 1225000.0
6 3.0 2.25 257500.0
7 3.0 1.5 291850.0
8 3.0 1.0 229500.0
9 3.0 2.5 323000.0


## Atualizando valores do dataframe através do `iterrows()`

In [11]:
dataset.price.head()

0    221900.0
1    538000.0
2    180000.0
3    604000.0
4    510000.0
Name: price, dtype: float64

Usamos o método `at[]` para acessar um elemento da linha, depois dobramos seu valor. 

Na linha seguinte, printamos os novos valores

In [12]:
for index,row in dataset.iterrows():
    dataset.at[index, 'price'] = row['price'] * 2

In [13]:
dataset.price.head()

0     443800.0
1    1076000.0
2     360000.0
3    1208000.0
4    1020000.0
Name: price, dtype: float64

## Usando itertuples para percorrer um dataframe

- Retorna as linhas e os índices como tuplas
- É mais rápido do que o iterrows

Vamos ler as linhas do DataFrame usando o `itertuples()`

In [15]:
for row in dataset.head().itertuples():
    print(row, '\n')

Pandas(Index=0, id=7129300520, date='20141013T000000', price=443800.0, bedrooms=3.0, bathrooms=1.0, sqft_living=1180, sqft_lot=5650, floors=1.0, waterfront=0, view=0, condition=3, grade=7, sqft_above=1180, sqft_basement=0, yr_built=1955, yr_renovated=0, zipcode=98178, lat=47.5112, long=-122.257, sqft_living15=1340, sqft_lot15=5650) 

Pandas(Index=1, id=6414100192, date='20141209T000000', price=1076000.0, bedrooms=3.0, bathrooms=2.25, sqft_living=2570, sqft_lot=7242, floors=2.0, waterfront=0, view=0, condition=3, grade=7, sqft_above=2170, sqft_basement=400, yr_built=1951, yr_renovated=1991, zipcode=98125, lat=47.721000000000004, long=-122.319, sqft_living15=1690, sqft_lot15=7639) 

Pandas(Index=2, id=5631500400, date='20150225T000000', price=360000.0, bedrooms=2.0, bathrooms=1.0, sqft_living=770, sqft_lot=10000, floors=1.0, waterfront=0, view=0, condition=3, grade=6, sqft_above=770, sqft_basement=0, yr_built=1933, yr_renovated=0, zipcode=98028, lat=47.7379, long=-122.23299999999999, sqf

In [21]:
for row in dataset.head().itertuples():
    print(row.id, row.bedrooms, row.bathrooms, row.price, '\n')

7129300520 3.0 1.0 443800.0 

6414100192 3.0 2.25 1076000.0 

5631500400 2.0 1.0 360000.0 

2487200875 4.0 3.0 1208000.0 

1954400510 3.0 2.0 1020000.0 

