## Index DataFrame

In [None]:
import pandas as pd

In [2]:
prices_dict = {
    "fruits": ["apples", "oranges", "bananas", "strawberries"],
    "prices": [1.5, 2, 2.5, 3],
    "suppliers": ["supplier1", "supplier2", "supplier4", "supplier3"],    
}

prices_df = pd.DataFrame(prices_dict, index = [1,2,3,4])
prices_df

Unnamed: 0,fruits,prices,suppliers
1,apples,1.5,supplier1
2,oranges,2.0,supplier2
3,bananas,2.5,supplier4
4,strawberries,3.0,supplier3


### Select Columns

#### Select Single Column (returns a Series Object)

In [3]:
## select single column - square bracket notation:
prices = prices_df['prices']

print(type(prices))
print(prices)

<class 'pandas.core.series.Series'>
1    1.5
2    2.0
3    2.5
4    3.0
Name: prices, dtype: float64


In [4]:
## select single column
# attribute (dot) notation:
prices_df.prices

1    1.5
2    2.0
3    2.5
4    3.0
Name: prices, dtype: float64

In [5]:
prices_df.columns=['fruits','prices','sup']

print( prices_df['prices'] )
print( prices_df.prices )

1    1.5
2    2.0
3    2.5
4    3.0
Name: prices, dtype: float64
1    1.5
2    2.0
3    2.5
4    3.0
Name: prices, dtype: float64


#### square bracket vs dot notation
Note that square bracket notation is more canonical (can be used for 1 or multiple columns selection) and allows for any string to be used as selector. I.e you can't use the dot notation, if the column name contains spaces, or is a reserverd word (like max, min, etc.)


In [6]:
demo_df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['col 1', 'col 2', 'col 3'])

# the line bellow will raise an error:
# demo_df.'col 1'

# but next is ok:
demo_df['col 1']

0    1
1    4
Name: col 1, dtype: int64

### Select List of Columns

Note, that the columns will be selected in the order specified in the list

In [7]:
prices_df[ ['prices', 'fruits'] ]

Unnamed: 0,prices,fruits
1,1.5,apples
2,2.0,oranges
3,2.5,bananas
4,3.0,strawberries


The returned slice is a DataFrame object!

In [8]:
type(prices_df[ ['prices', 'fruits']])

pandas.core.frame.DataFrame

### Access data with the loc method

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

Access a group of rows and columns by **label**(s) or a boolean array.

**Syntax**: df.loc[row_label, column_label]

In [9]:
# get all rows and the columns 'fruits' and 'prices'
prices_df.loc[:, ['fruits', 'prices']]

# equivalent to:
# prices_df[['fruits', 'prices']]

Unnamed: 0,fruits,prices
1,apples,1.5
2,oranges,2.0
3,bananas,2.5
4,strawberries,3.0


In [10]:
# get rows from 2 to 4 (inclusive), and columns from 'fruits' to the end 
prices_df.loc[ 2:4, 'fruits': ]

Unnamed: 0,fruits,prices,sup
2,oranges,2.0,supplier2
3,bananas,2.5,supplier4
4,strawberries,3.0,supplier3


#### pass Boolean array to loc method

In [11]:
mask = prices_df.prices>2
prices_df.loc[mask]

# or shortly written:
# prices_df.loc[prices_df.prices>2]

# the same can be done with:
prices_df[prices_df.prices>2]

Unnamed: 0,fruits,prices,sup
3,bananas,2.5,supplier4
4,strawberries,3.0,supplier3


In [12]:
prices_df.min()

fruits       apples
prices          1.5
sup       supplier1
dtype: object

### Access data with the iloc method

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

Purely **integer-location** based indexing for selection by position

**Syntax**: df.iloc[row_index, column_index]

In [13]:
prices_df

Unnamed: 0,fruits,prices,sup
1,apples,1.5,supplier1
2,oranges,2.0,supplier2
3,bananas,2.5,supplier4
4,strawberries,3.0,supplier3


In [14]:
# get the cell in first row, second column
prices_df.iloc[0,1]

1.5

In [15]:
# get all cells from second row till the end
prices_df.iloc[1:,]

Unnamed: 0,fruits,prices,sup
2,oranges,2.0,supplier2
3,bananas,2.5,supplier4
4,strawberries,3.0,supplier3


In [16]:
# get the cells from second row till the end, ant the last column (using the -1 index)
prices_df.iloc[1:,-1]

2    supplier2
3    supplier4
4    supplier3
Name: sup, dtype: object