## The Core Difference: Label vs. Position
- **.loc (LOCation):** Selects data by LABEL. This means you use the names of your index and columns. It's inclusive of the end point when slicing.
- **.iloc (Integer LOCation):** Selects data by INTEGER POSITION. This works just like list/NumPy slicing. It's exclusive of the end point when slicing.

**Let's set up our DataFrame for this module.** We'll use the same sales data, but we'll set the Transaction ID as the index to better demonstrate label-based selection with .loc.

In [3]:
import pandas as pd

# Load the data and set 'Transaction ID' as the index
sales_df = pd.read_csv("simple_sales.csv", index_col="Transaction ID")

print("Sales DataFrame with 'Transaction ID' as index:")
sales_df

Sales DataFrame with 'Transaction ID' as index:


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,2023-01-15,Electronics,Laptop,2,1200
1002,2023-01-16,Office Supplies,Pen Set,10,15
1003,2023-01-16,Electronics,Mouse,5,25
1004,2023-01-17,Home Goods,Coffee Maker,1,80
1005,2023-01-18,Office Supplies,Notebook,20,5
1006,2023-01-18,Electronics,Laptop,1,1250
1007,2023-01-19,Home Goods,Blender,2,50


## **1. Selecting Columns**
This is the simplest selection. You can use standard square brackets [ ].

In [61]:
# Select a single column (returns a Series)

product_names = sales_df['Product Name']
print("--- Selecting the 'Product Name' column ---")
print(type(product_names))
print(product_names)

# Select multiple columns (returns a DataFrame)
# Note the double square brackets [[]]
# The inner list specifies which columns you want.

subset = sales_df[['Product Name', 'Unit Price']]
print("\n--- Selecting 'Product Name' and 'Unit Price' columns ---")
print(type(subset))
print(subset)

--- Selecting the 'Product Name' column ---
<class 'pandas.core.series.Series'>
Transaction ID
1001          Laptop
1002         Pen Set
1003           Mouse
1004    Coffee Maker
1005        Notebook
1006          Laptop
1007         Blender
Name: Product Name, dtype: object

--- Selecting 'Product Name' and 'Unit Price' columns ---
<class 'pandas.core.frame.DataFrame'>
                Product Name  Unit Price
Transaction ID                          
1001                  Laptop        1200
1002                 Pen Set          15
1003                   Mouse          25
1004            Coffee Maker          80
1005                Notebook           5
1006                  Laptop        1250
1007                 Blender          50


## **2. Selecting Rows and Columns with .loc (Label-based)**
- **Syntax:** df.loc[row_labels, column_labels]

In [65]:
# Select a single row by its index label (returns a Series)
row_1003 = sales_df.loc[1003] # can use sales_df.loc[[1003]] for better visuals
print("\n--- Selecting row with index label 1003 ---")
print(row_1003)


--- Selecting row with index label 1003 ---
Date                 2023-01-16
Product Category    Electronics
Product Name              Mouse
Units Sold                    5
Unit Price                   25
Name: 1003, dtype: object


In [59]:
# Select multiple rows by their labels

rows_1002_1005 = sales_df.loc[[1002, 1005]]
print("\n--- Selecting rows with labels 1002 and 1005 ---")
rows_1002_1005


--- Selecting rows with labels 1002 and 1005 ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002,2023-01-16,Office Supplies,Pen Set,10,15
1005,2023-01-18,Office Supplies,Notebook,20,5


In [58]:
# Slice rows by label. Note: .loc slicing is INCLUSIVE of the end label.

row_slice = sales_df.loc[1002:1005]
print("\n--- Slicing rows from label 1002 to 1005 (inclusive) ---")
row_slice


--- Slicing rows from label 1002 to 1005 (inclusive) ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002,2023-01-16,Office Supplies,Pen Set,10,15
1003,2023-01-16,Electronics,Mouse,5,25
1004,2023-01-17,Home Goods,Coffee Maker,1,80
1005,2023-01-18,Office Supplies,Notebook,20,5


In [57]:
# Select rows AND columns with .loc
# Get the 'Unit Price' for transaction 1004

price_1004 = sales_df.loc[1004, 'Unit Price']
print(f"\nUnit Price for transaction 1004: {price_1004}")


Unit Price for transaction 1004: 80


In [56]:
# Get 'Product Name' and 'Units Sold' for transactions 1005 through 1007

subset_loc = sales_df.loc[1005:1007, ['Product Name', 'Units Sold']]
print("\n--- Selecting specific rows and columns with .loc ---")
subset_loc


--- Selecting specific rows and columns with .loc ---


Unnamed: 0_level_0,Product Name,Units Sold
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1005,Notebook,20
1006,Laptop,1
1007,Blender,2


## **3. Selecting Rows and Columns with .iloc (Integer Position-based)**
- **Syntax:** df.iloc[row_positions, column_positions]

In [55]:
# Select the first row (at integer position 0)

first_row = sales_df.iloc[0]
print("\n--- Selecting the first row (position 0) with .iloc ---")
print(first_row)


--- Selecting the first row (position 0) with .iloc ---
Date                 2023-01-15
Product Category    Electronics
Product Name             Laptop
Units Sold                    2
Unit Price                 1200
Name: 1001, dtype: object


In [54]:
# Select the last row (at integer position -1)

last_row = sales_df.iloc[[-1]]
print("\n--- Selecting the last row (position -1) with .iloc ---")
last_row


--- Selecting the last row (position -1) with .iloc ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1007,2023-01-19,Home Goods,Blender,2,50


In [53]:
# Slice rows by integer position. Note: .iloc slicing is EXCLUSIVE of the end position.

row_slice_iloc = sales_df.iloc[1:4] # Rows at position 1, 2, 3
print("\n--- Slicing rows from position 1 to 4 (exclusive) with .iloc ---")
row_slice_iloc


--- Slicing rows from position 1 to 4 (exclusive) with .iloc ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002,2023-01-16,Office Supplies,Pen Set,10,15
1003,2023-01-16,Electronics,Mouse,5,25
1004,2023-01-17,Home Goods,Coffee Maker,1,80


In [52]:
# Select rows AND columns with .iloc
# Get the value at row position 2, column position 3

val = sales_df.iloc[2, 3] # Corresponds to Units Sold for transaction 1003
print(f"\nValue at [2, 3]: {val}")


Value at [2, 3]: 5


In [51]:
# Get the last 3 rows and the first 2 columns

subset_iloc = sales_df.iloc[-3:, :2]
print("\n--- Selecting last 3 rows and first 2 columns with .iloc ---")
subset_iloc


--- Selecting last 3 rows and first 2 columns with .iloc ---


Unnamed: 0_level_0,Date,Product Category
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1005,2023-01-18,Office Supplies
1006,2023-01-18,Electronics
1007,2023-01-19,Home Goods


## **4. Conditional Selection (Boolean Indexing)**
This is one of the most powerful features. It combines a boolean condition with .loc.

In [48]:
# Step 1: Create a boolean Series (a "mask")

is_electronics = sales_df['Product Category'] == 'Electronics'
print("\n--- Boolean mask for 'Electronics' category ---")
print(is_electronics)


--- Boolean mask for 'Electronics' category ---
Transaction ID
1001     True
1002    False
1003     True
1004    False
1005    False
1006     True
1007    False
Name: Product Category, dtype: bool


In [49]:
# Step 2: Use the mask inside .loc[] to select the rows where the condition is True

electronics_df = sales_df.loc[is_electronics]
print("\n--- Selecting only 'Electronics' sales ---")
electronics_df


--- Selecting only 'Electronics' sales ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,2023-01-15,Electronics,Laptop,2,1200
1003,2023-01-16,Electronics,Mouse,5,25
1006,2023-01-18,Electronics,Laptop,1,1250


In [50]:
# You can do this in one line

high_price_df = sales_df.loc[sales_df['Unit Price'] > 100]
print("\n--- Selecting sales with Unit Price > 100 ---")
high_price_df


--- Selecting sales with Unit Price > 100 ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,2023-01-15,Electronics,Laptop,2,1200
1006,2023-01-18,Electronics,Laptop,1,1250


In [47]:
# Combining multiple conditions
# & for AND, | for OR. Each condition MUST be in parentheses.

laptops_or_high_price = sales_df.loc[(sales_df['Product Name'] == 'Laptop') | (sales_df['Unit Price'] > 60)]
print("\n--- Selecting Laptops OR items with Unit Price > 60 ---")
laptops_or_high_price


--- Selecting Laptops OR items with Unit Price > 60 ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,2023-01-15,Electronics,Laptop,2,1200
1004,2023-01-17,Home Goods,Coffee Maker,1,80
1006,2023-01-18,Electronics,Laptop,1,1250


## **Exercises**
Use the sales_df DataFrame with Transaction ID as the index for all exercises

**1. .loc Practice:**
- Select the row for Transaction ID 1006 and print it.
- Select the Date and Product Name for transactions 1002, 1004, and 1006.
- Select all rows from Transaction ID 1003 to 1006, but only the Product Category and Units Sold columns.

In [13]:
sales_df = pd.read_csv("simple_sales.csv", index_col = "Transaction ID")
transaction_id1006 = sales_df.loc[1006]

print("\n--- Selecting row with index label 1006 ---")
print(transaction_id1006)


--- Selecting row with index label 1006 ---
Date                 2023-01-18
Product Category    Electronics
Product Name             Laptop
Units Sold                    1
Unit Price                 1250
Name: 1006, dtype: object


In [14]:
row_1002_1004_1006_date_product_name = sales_df.loc[[1002, 1004, 1006], ["Date", "Product Name"]]

print("\n--- Selecting the Date and Product Name for transactions 1002, 1004, and 1006. ---")
row_1002_1004_1006_date_product_name


--- Selecting the Date and Product Name for transactions 1002, 1004, and 1006. ---


Unnamed: 0_level_0,Date,Product Name
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1002,2023-01-16,Pen Set
1004,2023-01-17,Coffee Maker
1006,2023-01-18,Laptop


In [15]:
subset = sales_df.loc[1003:1006, ["Product Category", "Units Sold"]]

print("\n--- Selecting all rows from Transaction ID 1003 to 1006 and Product Category and Units Sold columns ---")
subset


--- Selecting all rows from Transaction ID 1003 to 1006 and Product Category and Units Sold columns ---


Unnamed: 0_level_0,Product Category,Units Sold
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1003,Electronics,5
1004,Home Goods,1
1005,Office Supplies,20
1006,Electronics,1


**2. .iloc Practice:**
- Select the row at the 4th position (integer index 3).
- Select the last 2 rows and the last 3 columns of the DataFrame.
- Select the element at the intersection of the 2nd row and the 3rd column.

In [18]:
fourth_row = sales_df.iloc[3]

print("\n--- Selecting the row at the 4th position (integer index 3) ---")
fourth_row


--- Selecting the row at the 4th position (integer index 3) ---


Date                  2023-01-17
Product Category      Home Goods
Product Name        Coffee Maker
Units Sold                     1
Unit Price                    80
Name: 1004, dtype: object

In [20]:
subset1 = sales_df.iloc[-2:, -3:]

print("\n--- Selecting the last 2 rows and the last 3 columns of the DataFrame ---")
subset1


--- Selecting the last 2 rows and the last 3 columns of the DataFrame ---


Unnamed: 0_level_0,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1006,Laptop,1,1250
1007,Blender,2,50


In [24]:
element = sales_df.iloc[1, 2]

print(f"Element at the intersection of the 2nd row and the 3rd column: {element}")

Element at the intersection of the 2nd row and the 3rd column: Pen Set


**3. Conditional Selection Challenge:**
- Select all sales from the 'Office Supplies' category.
- Select all sales where 'Units Sold' was greater than 5.
- Select all sales that were not Laptops. (Hint: use the != operator).
- Select all sales that were 'Electronics' and had a 'Unit Price' of less than $1000. Print the resulting DataFrame.

In [28]:
print("\n--- Selecting all sales from the 'Office Supplies' category ---")
sales_df[sales_df["Product Category"] == "Office Supplies"]


--- Selecting all sales from the 'Office Supplies' category ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002,2023-01-16,Office Supplies,Pen Set,10,15
1005,2023-01-18,Office Supplies,Notebook,20,5


In [32]:
print("\n--- Selecting all sales where 'Units Sold' was greater than 5 ---")
sales_df[sales_df["Units Sold"] > 5]


--- Selecting all sales where 'Units Sold' was greater than 5 ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002,2023-01-16,Office Supplies,Pen Set,10,15
1005,2023-01-18,Office Supplies,Notebook,20,5


In [34]:
print("\n--- Selecting all sales that were not Laptops ---")
sales_df[sales_df["Product Name"] != "Laptop"]


--- Selecting all sales that were not Laptops ---


Unnamed: 0_level_0,Date,Product Category,Product Name,Units Sold,Unit Price
Transaction ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1002,2023-01-16,Office Supplies,Pen Set,10,15
1003,2023-01-16,Electronics,Mouse,5,25
1004,2023-01-17,Home Goods,Coffee Maker,1,80
1005,2023-01-18,Office Supplies,Notebook,20,5
1007,2023-01-19,Home Goods,Blender,2,50


In [38]:
result = sales_df.loc[(sales_df["Product Category"] == "Electronics") & (sales_df["Unit Price"] <1000)]
print(f"\nAll sales that were 'Electronics' and had a 'Unit Price' of less than $1000: {result}")


All sales that were 'Electronics' and had a 'Unit Price' of less than $1000:                       Date Product Category Product Name  Units Sold  \
Transaction ID                                                         
1003            2023-01-16      Electronics        Mouse           5   

                Unit Price  
Transaction ID              
1003                    25  
