# 🐼 Pandas Handbook

## 04 - Data Selection

Check out the official [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)  

This notebook uses the [Ramen Ratings dataset](https://www.kaggle.com/datasets/residentmario/ramen-ratings/data) from Kaggle to demonstrate how to select the data with pandas.

## 📚 Table of Contents  
---  

🔢 **Set & Reset Index**  
🎯 **Select Data by Label or Position**  
🎯 **Select a Single Value**  
🎯 **Select Multiple Rows & Columns**  
✂️ **Slice the DataFrame**  
🔢 **Slice by Row Position or Column Name**  
🏷️ **Slice by Label with df.loc[]**  
🔍 **Select with Conditions**  
🧠 **Logical Operators in Pandas**  
🔍 **Select with Query**  
🔍 **Select with Regex**  
👉 **Next Topic: Data Cleaning**  

---

In [1]:
import pandas as pd
import os

In [2]:
data_raw = "../data/raw/"
csv_file = "ramen-ratings.csv"
import_path = os.path.join(data_raw, csv_file)
df = pd.read_csv(import_path)

### 🔢 Set & Reset Index

`df.set_index('COLUMN', inplace=True)` – Sets the specified `'COLUMN'` as the DataFrame’s new index.  
`df.reset_index(inplace=True)` – Resets the index to default integers and moves the current index back to a column.

In [3]:
df.set_index('Review #', inplace=True)

In [4]:
df.reset_index(inplace=False).head()

Unnamed: 0,Review #,Brand,Variety,Style,Country,Stars,Top Ten
0,2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
1,2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan,1.0,
2,2578,Nissin,Cup Noodles Chicken Vegetable,Cup,USA,2.25,
3,2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
4,2576,Ching's Secret,Singapore Curry,Pack,India,3.75,


In [5]:
df.head()

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan,1.0,
2578,Nissin,Cup Noodles Chicken Vegetable,Cup,USA,2.25,
2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
2576,Ching's Secret,Singapore Curry,Pack,India,3.75,


### 🎯 Select Data by Label or Position

`df.iloc[row_index, column_index]` – Selects data by **integer position** for rows and columns.  
`df.iat[row_index, column_index]` – Accesses a **single value** by position.  
`df.loc[index_label, column_label]` – Selects data by **label** for index and columns.  
`df.at[index_label, column_label]` – Accesses a **single value** by label.  

Use `df.loc[]` and `df.iloc[]` to select rows and columns by label or position.  
Use `df.at[]` and `df.iat[]` for fast access to a single value.  

`df.iloc[[row1, row2], [col1, col2]]` – Selects specific rows and columns by **position**.  
`df.loc[[index_label1, index_label2], ['col1', 'col2']]` – Selects specific index and columns by **label**.

#### 🎯 Select a Single Value

Locating data by integer position `[row, column]`

In [6]:
df.iloc[0, 0]

'New Touch'

In [7]:
df.iat[0, 0]

'New Touch'

Locating data by index and column name `[index, column_name]`

In [8]:
df.loc[2580, 'Variety']

"T's Restaurant Tantanmen "

In [9]:
df.at[2580, 'Variety']

"T's Restaurant Tantanmen "

#### 🎯 Select Multiple Rows & Columns

Select multiple rows and columns using index lists with `.iloc[]` or `.loc[]`.

In [10]:
df.iloc[[0, 1], [0,1,2]]

Unnamed: 0_level_0,Brand,Variety,Style
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack


In [11]:
df.loc[[2580, 2579], ['Brand', 'Variety', 'Style']]

Unnamed: 0_level_0,Brand,Variety,Style
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack


### ✂️ Slice the DataFrame

#### 🔢 Slice by Row Position or Column Name
Use Python slicing syntax to select ranges of rows or columns.

`df[start:stop]` – Slices rows by position (stop is exclusive).  
`df['COLUMN']` – Selects a single column by name (returns a Series).  
`df[['COLUMN_1', 'COLUMN_2']]` – Selects multiple columns by name.  
`df.get(['COLUMN_1', 'COLUMN_1'])` – Selects safely multiple columns by name.  
`df.COLUMN` – Dot notation for selecting a column (if name is a valid Python identifier).

In [12]:
df[10:13]

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2570,Tao Kae Noi,Creamy tom Yum Kung Flavour,Pack,Thailand,5.0,
2569,Yamachan,Yokohama Tonkotsu Shoyu,Pack,USA,5.0,
2568,Nongshim,Mr. Bibim Stir-Fried Kimchi Flavor,Pack,South Korea,4.25,


In [13]:
df['Country'].head()

Review #
2580     Japan
2579    Taiwan
2578       USA
2577    Taiwan
2576     India
Name: Country, dtype: object

In [14]:
df[['Brand', 'Variety', 'Style']].head()

Unnamed: 0_level_0,Brand,Variety,Style
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack
2578,Nissin,Cup Noodles Chicken Vegetable,Cup
2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack
2576,Ching's Secret,Singapore Curry,Pack


In [15]:
df.get(['Brand', 'Variety', 'Style']).head()

Unnamed: 0_level_0,Brand,Variety,Style
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack
2578,Nissin,Cup Noodles Chicken Vegetable,Cup
2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack
2576,Ching's Secret,Singapore Curry,Pack


In [16]:
df.Country.head()

Review #
2580     Japan
2579    Taiwan
2578       USA
2577    Taiwan
2576     India
Name: Country, dtype: object

#### 🏷️ Slice by Label with `df.loc[]`
Use `.loc[]` with labels to slice rows or columns. **Note: The end index is inclusive.**

`df.loc[start_label:end_label, 'COLUMN_1':'COLUMN_3']` – Slices rows and columns by label.  
`df.loc[:, ['COLUMN_1', 'COLUMN_2']]` – Selects all rows, and specific columns by name.

In [17]:
df.loc[2580:2579, 'Brand': 'Style']

Unnamed: 0_level_0,Brand,Variety,Style
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack


In [18]:
df.loc[:, ['Brand', 'Variety', 'Style']].head(3)

Unnamed: 0_level_0,Brand,Variety,Style
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack
2578,Nissin,Cup Noodles Chicken Vegetable,Cup


### 🔍 Select with Conditions

`df.loc[:, df.columns.str.contains('pattern')]` – Selects all columns whose name matches the regex `'pattern'`.  
`df[df['COLUMN'].isin(['VALUE_1', 'VALUE_2'])]` – Filters rows where the column value is in the given list.   
`df[df['COLUMN'] == 'VALUE']` – Selects rows where the specified column matches the condition.    
`df[df['COLUMN'] > NUM]` – Filters rows based on numeric comparison.    

In [19]:
df.loc[:, df.columns.str.contains('y')].head()

Unnamed: 0_level_0,Variety,Style,Country
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2580,T's Restaurant Tantanmen,Cup,Japan
2579,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan
2578,Cup Noodles Chicken Vegetable,Cup,USA
2577,GGE Ramen Snack Tomato Flavor,Pack,Taiwan
2576,Singapore Curry,Pack,India


In [20]:
df[df['Country'].isin(['Japan', 'South Korea'])].head()

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
2575,Samyang Foods,Kimchi song Song Ramen,Pack,South Korea,4.75,
2574,Acecook,Spice Deli Tantan Men With Cilantro,Cup,Japan,4.0,
2573,Ikeda Shoku,Nabeyaki Kitsune Udon,Tray,Japan,3.75,
2572,Ripe'n'Dry,Hokkaido Soy Sauce Ramen,Pack,Japan,0.25,


In [21]:
brand_yamachan = df[df['Brand'] == 'Yamachan']
brand_yamachan.head()

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2569,Yamachan,Yokohama Tonkotsu Shoyu,Pack,USA,5.0,
2563,Yamachan,Tokyo Shoyu Ramen,Pack,USA,5.0,
2557,Yamachan,Sapporo Miso Ramen,Pack,USA,4.75,
936,Yamachan,Mild Tonkotsu,Pack,USA,5.0,
889,Yamachan,Rich Shoyu Ramen,Pack,USA,4.0,


In [22]:
brand_yamachan[brand_yamachan['Stars'].astype(float) > 4.5]

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2569,Yamachan,Yokohama Tonkotsu Shoyu,Pack,USA,5.0,
2563,Yamachan,Tokyo Shoyu Ramen,Pack,USA,5.0,
2557,Yamachan,Sapporo Miso Ramen,Pack,USA,4.75,
936,Yamachan,Mild Tonkotsu,Pack,USA,5.0,
886,Yamachan,Tonkotsu-Shoyu Rich Pork Flavor Ramen,Pack,USA,4.75,
885,Yamachan,Miso Ramen Rich Sapporo Miso,Pack,USA,5.0,


### 🧠 Logical Operators in Pandas
Use these operators in combination with boolean masks or inside `.query()`.

`<` – Less than  
`>` – Greater than  
`==` – Equals  
`<=` – Less than or equals  
`>=` – Greater than or equals  
`!=` – Not equal to  
`&` – And  
`|` – Or  
`~` – Not  

### 🔍 Select with Query

`df.query("COLUMN == 'VALUE' and COLUMN > num")` – Filters rows using a SQL-like expression with improved readability.  
`df.query("COLUMN in ['VALUE_1', 'VALUE_2']")` – Filters rows where the column matches a list of values.  

In [23]:
query_df = df.copy()
query_df['Stars'] = query_df['Stars'].replace('Unrated', 0).astype(float)

In [24]:
query_df.query("Country == 'Japan' and Stars >= 4.5").head()

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2567,Nissin,Deka Buto Kimchi Pork Flavor,Bowl,Japan,4.5,
2553,Nissin,Hakata Ramen Noodle White Tonkotsu,Bowl,Japan,4.75,
2532,Nissin,Nippon Onomichi Ramen,Bowl,Japan,4.75,
2529,Maruchan,Gotsumori Chanpon Ramen,Bowl,Japan,4.5,
2522,Takamori,Hearty Japanese Style Curry Udon,Pack,Japan,5.0,


In [25]:
query_df.query("Country in ['Japan', 'Taiwan']").head()

Unnamed: 0_level_0,Brand,Variety,Style,Country,Stars,Top Ten
Review #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan,1.0,
2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
2574,Acecook,Spice Deli Tantan Men With Cilantro,Cup,Japan,4.0,
2573,Ikeda Shoku,Nabeyaki Kitsune Udon,Tray,Japan,3.75,


### 🔍 Select with Regex

`df.filter(regex='pattern')` – Selects columns whose names match the given regular expression.

In [26]:
df.filter(regex='C').head()

Unnamed: 0_level_0,Country
Review #,Unnamed: 1_level_1
2580,Japan
2579,Taiwan
2578,USA
2577,Taiwan
2576,India


### 👉 Next Topic: [Data Cleaning](./05-data-cleaning.ipynb)

Learn how to clean data with pandas.