# Indexing, Selecting & Assigning

In [1]:
import pandas as pd

## Naive accessors

In [7]:
atl_addr = pd.read_csv("./atl-address-1.csv")
atl_addr.head()

Unnamed: 0,Title,Price,Beds,Baths,Area
0,"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,"3,644 sqft"
1,"2060 Shirley St SW, Atlanta, GA 30311","$225,000",3 bds,2 ba,"1,300 sqft"
2,"300 Peachtree St NE APT 11G, Atlanta, GA 30308","$259,000",2 bds,2 ba,890 sqft
3,"1690 Memorial Dr SE, Atlanta, GA 30317","$320,000",2 bds,1 ba,"1,163 sqft"
4,"6253 Old Kingston Dr, South Fulton, GA","$349,275",5 bds,3 ba,-- sqft


In [9]:
# access one colum of the dataframe, via attribute(column name)
atl_addr.Price.head()

0    $1,495,000
1      $225,000
2      $259,000
3      $320,000
4      $349,275
Name: Price, dtype: object

In [11]:
# or using indexing operator "[]"
atl_addr["Price"].head()

0    $1,495,000
1      $225,000
2      $259,000
3      $320,000
4      $349,275
Name: Price, dtype: object

In [13]:
# select a single value
atl_addr["Title"][2]

'300 Peachtree St NE APT 11G, Atlanta, GA 30308'

## Indexing in pandas
pandas has its own accessor operators, **loc** and **iloc**.<br>
### index-based selection, iloc:

In [15]:
# select the data of second row
# both loc and iloc are row-first, column-second
atl_addr.iloc[1]

Title    2060 Shirley St SW, Atlanta, GA 30311
Price                                 $225,000
Beds                                     3 bds
Baths                                     2 ba
Area                                1,300 sqft
Name: 1, dtype: object

In [18]:
# get a column with iloc
atl_addr.iloc[:, 0].head()

0                34 The Prado NE, Atlanta, GA 30309
1             2060 Shirley St SW, Atlanta, GA 30311
2    300 Peachtree St NE APT 11G, Atlanta, GA 30308
3            1690 Memorial Dr SE, Atlanta, GA 30317
4            6253 Old Kingston Dr, South Fulton, GA
Name: Title, dtype: object

In [20]:
# select just second and third entries
atl_addr.iloc[1:3, 0]

1             2060 Shirley St SW, Atlanta, GA 30311
2    300 Peachtree St NE APT 11G, Atlanta, GA 30308
Name: Title, dtype: object

In [27]:
# or using list
atl_addr.iloc[[1,2,3], 0]

1             2060 Shirley St SW, Atlanta, GA 30311
2    300 Peachtree St NE APT 11G, Atlanta, GA 30308
3            1690 Memorial Dr SE, Atlanta, GA 30317
Name: Title, dtype: object

In [23]:
# select last 5 rows
atl_addr.iloc[-5:]

Unnamed: 0,Title,Price,Beds,Baths,Area
35,"1692 Sandtown Rd SW, Atlanta, GA 30311","$179,900",3 bds,3 bds,3 bds
36,"938 Rebel Forest Dr SE, Atlanta, GA 30315","$135,000",3 bds,3 bds,3 bds
37,"3380 Oakcliff Rd NW, Atlanta, GA 30331","$167,899",3 bds,3 bds,3 bds
38,"1463 La France St NE APT 12, Atlanta, GA 30307","$550,000",3 bds,3 bds,3 bds
39,"1185 Arlington Ave SW, Atlanta, GA 30310","$155,000",3 bds,3 bds,3 bds


In [31]:
# select part of the dataframe
atl_addr.iloc[:5, [1, 3]]

Unnamed: 0,Price,Baths
0,"$1,495,000",4 ba
1,"$225,000",2 ba
2,"$259,000",2 ba
3,"$320,000",1 ba
4,"$349,275",3 ba


### label-based selection, loc:

In [25]:
# second value under 'Title' column, 
atl_addr.loc[1, 'Title']

'2060 Shirley St SW, Atlanta, GA 30311'

In [28]:
# select part of the dataframe
atl_addr.loc[:5, ['Price', 'Area']]

Unnamed: 0,Price,Area
0,"$1,495,000","3,644 sqft"
1,"$225,000","1,300 sqft"
2,"$259,000",890 sqft
3,"$320,000","1,163 sqft"
4,"$349,275",-- sqft
5,"$284,900",-- sqft


### Choosing between loc and iloc:
**iloc:** 0 : 10, select 0 to 9, last one excluded<br>
**loc:** 0 : 10, select 0 to 10, last one included

## Manipulating the index


In [44]:
# set one of the column as row labels(index)
atl_addr.set_index("Price").head()

Unnamed: 0_level_0,Title,Beds,Baths,Area
Price,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"$1,495,000","34 The Prado NE, Atlanta, GA 30309",4 bds,4 ba,"3,644 sqft"
"$225,000","2060 Shirley St SW, Atlanta, GA 30311",3 bds,2 ba,"1,300 sqft"
"$259,000","300 Peachtree St NE APT 11G, Atlanta, GA 30308",2 bds,2 ba,890 sqft
"$320,000","1690 Memorial Dr SE, Atlanta, GA 30317",2 bds,1 ba,"1,163 sqft"
"$349,275","6253 Old Kingston Dr, South Fulton, GA",5 bds,3 ba,-- sqft


## Conditional selection

In [53]:
# check if each room has 4 bds
atl_addr.Beds == "4 bds"

0      True
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13    False
14    False
15     True
16    False
17    False
18     True
19     True
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29     True
30    False
31    False
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39    False
Name: Beds, dtype: bool

In [54]:
# filt rows by specific condition
atl_addr.loc[atl_addr.Beds == "4 bds"]

Unnamed: 0,Title,Price,Beds,Baths,Area
0,"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,"3,644 sqft"
8,"175 Wynfield Way SW, Atlanta, GA 30331","$457,900",4 bds,4 ba,"3,624 sqft"
15,"3502 Redwine Pkwy SW, Atlanta, GA 30331","$270,000",4 bds,3 ba,"2,490 sqft"
18,"1149 Mobile St NW, Atlanta, GA 30314","$260,000",4 bds,2 ba,"1,627 sqft"
19,"1488 Sandrock Ln SW, Atlanta, GA 30331","$272,900",4 bds,3 ba,"2,786 sqft"
29,"2363 NW 2363 Cross St, Atlanta, GA 30318","$249,900",4 bds,4 bds,4 bds


In [55]:
# select row with double conditions, using "&"
atl_addr.loc[(atl_addr.Beds == "4 bds") & (atl_addr.Baths == "3 ba")]

Unnamed: 0,Title,Price,Beds,Baths,Area
15,"3502 Redwine Pkwy SW, Atlanta, GA 30331","$270,000",4 bds,3 ba,"2,490 sqft"
19,"1488 Sandrock Ln SW, Atlanta, GA 30331","$272,900",4 bds,3 ba,"2,786 sqft"


In [61]:
# select rows that satisfy one of the conditions, usgin "|"
atl_addr.loc[(atl_addr.Beds == "4 bds") | (atl_addr.Baths == "3 ba")]

Unnamed: 0,Title,Price,Beds,Baths,Area
0,"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,"3,644 sqft"
4,"6253 Old Kingston Dr, South Fulton, GA","$349,275",5 bds,3 ba,-- sqft
6,"6253 Old Kingston Dr # 37, Atlanta, GA 30331","$349,275",5 bds,3 ba,-- sqft
8,"175 Wynfield Way SW, Atlanta, GA 30331","$457,900",4 bds,4 ba,"3,624 sqft"
15,"3502 Redwine Pkwy SW, Atlanta, GA 30331","$270,000",4 bds,3 ba,"2,490 sqft"
18,"1149 Mobile St NW, Atlanta, GA 30314","$260,000",4 bds,2 ba,"1,627 sqft"
19,"1488 Sandrock Ln SW, Atlanta, GA 30331","$272,900",4 bds,3 ba,"2,786 sqft"
29,"2363 NW 2363 Cross St, Atlanta, GA 30318","$249,900",4 bds,4 bds,4 bds


Pandas comes with a few built-in conditional selectors, two of which we will highlight here.
1. **isin**: select data whose value "is in" a list of values.<br>
2. **isnull / notnull**: highlight values which are (or are not) empty (NaN)


In [68]:
# using isin filt rows with 4 or 5 bedrooms, print selected first 8 rows
atl_addr.loc[atl_addr.Beds.isin(["5 bds", "4 bds"])].head(8)

Unnamed: 0,Title,Price,Beds,Baths,Area
0,"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,"3,644 sqft"
4,"6253 Old Kingston Dr, South Fulton, GA","$349,275",5 bds,3 ba,-- sqft
6,"6253 Old Kingston Dr # 37, Atlanta, GA 30331","$349,275",5 bds,3 ba,-- sqft
8,"175 Wynfield Way SW, Atlanta, GA 30331","$457,900",4 bds,4 ba,"3,624 sqft"
9,"2973 Margaret Mitchell Ct NW, Atlanta, GA 30327","$1,435,000",5 bds,5 ba,"4,483 sqft"
12,"2875 Benjamin E Mays Dr SW, Atlanta, GA 30311","$429,000",5 bds,4 ba,"2,551 sqft"
15,"3502 Redwine Pkwy SW, Atlanta, GA 30331","$270,000",4 bds,3 ba,"2,490 sqft"
18,"1149 Mobile St NW, Atlanta, GA 30314","$260,000",4 bds,2 ba,"1,627 sqft"


In [72]:
# select rows with Beds empty(not empty rows here)
atl_addr.loc[atl_addr.Beds.isnull()]

Unnamed: 0,Title,Price,Beds,Baths,Area


## Assigning data

In [76]:
# assign a certain value to a column
atl_addr['Area'] = 1000
atl_addr.head()

Unnamed: 0,Title,Price,Beds,Baths,Area
0,"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,1000
1,"2060 Shirley St SW, Atlanta, GA 30311","$225,000",3 bds,2 ba,1000
2,"300 Peachtree St NE APT 11G, Atlanta, GA 30308","$259,000",2 bds,2 ba,1000
3,"1690 Memorial Dr SE, Atlanta, GA 30317","$320,000",2 bds,1 ba,1000
4,"6253 Old Kingston Dr, South Fulton, GA","$349,275",5 bds,3 ba,1000


In [81]:
# assign an iterable of values
atl_addr["Area"] = range(len(atl_addr), 0, -1)
atl_addr.Area.head()

0    40
1    39
2    38
3    37
4    36
Name: Area, dtype: int32