Data selection in pandas is fundamental for effective data analysis, enabling access to specific columns, rows, and elements of a DataFrame. 

## Bracket Selection ([])

**Selecting Columns**   
- Use one bracket and a column label to select a single column (result: Series): *df['ColumnName']*  

- Use double brackets and a list to select multiple columns (result: DataFrame): *df[['Col1', 'Col2']]*    
You cannot use brackets to select both rows and columns at the same time.  

**Selecting Rows**  
- Use slicing (works only on rows, not columns): *df[0:4]*   
Returns rows with index 0, 1, 2, 3  
  
**Boolean Masking** (Filtering Rows)  
- Select rows where a condition is true:  *df[df['Height'] > 2]*  
Results in a DataFrame of all rows where the 'Height' column exceeds 2.  



In [3]:
import pandas as pd

# Loading data from a file
table = pd.read_csv("test-csv-file.csv")
table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Year                    24 non-null     object 
 1   Age                     24 non-null     object 
 2   Gender                  24 non-null     object 
 3   Educational Attainment  24 non-null     object 
 4   Personal Income         24 non-null     object 
 5   Population Count        20 non-null     float64
dtypes: float64(1), object(5)
memory usage: 1.3+ KB


In [15]:
# Narrowing table data into few columns
tableHead = table.head(5)

# Selecting a column
tableHead[['Educational Attainment']]

Unnamed: 0,Educational Attainment
0,Children under 15
1,No high school diploma
2,No high school diploma
3,No high school diploma
4,No high school diploma


In [13]:
# Selecting multiple columns
tableHead[['Age', 'Educational Attainment']]

Unnamed: 0,Age,Educational Attainment
0,00 to 17,Children under 15
1,00 to 17,No high school diploma
2,00 to 17,No high school diploma
3,00 to 17,No high school diploma
4,00 to 17,No high school diploma


In [12]:
# Selecting rows
table[0:4]

Unnamed: 0,Year,Age,Gender,Educational Attainment,Personal Income,Population Count
0,01/01/2008 12:00:00 AM,00 to 17,Male,Children under 15,No Income,
1,01/01/2008 12:00:00 AM,00 to 17,Male,No high school diploma,No Income,650889.0
2,01/01/2008 12:00:00 AM,00 to 17,Male,No high school diploma,"$5,000 to $9,999",30152.0
3,01/01/2008 12:00:00 AM,00 to 17,Male,No high school diploma,"$10,000 to $14,999",7092.0


In [11]:
# Boolean masking
table[table['Gender'] == 'Female']

Unnamed: 0,Year,Age,Gender,Educational Attainment,Personal Income,Population Count
12,01/01/2008 12:00:00 AM,00 to 17,Female,Children under 15,No Income,
13,01/01/2008 12:00:00 AM,00 to 17,Female,No high school diploma,No Income,635274.0
14,01/01/2008 12:00:00 AM,00 to 17,Female,No high school diploma,"$5,000 to $9,999",33202.0
15,01/01/2008 12:00:00 AM,00 to 17,Female,No high school diploma,"$10,000 to $14,999",6857.0
16,01/01/2008 12:00:00 AM,00 to 17,Female,No high school diploma,"$15,000 to $24,999",2009.0
17,01/01/2008 12:00:00 AM,00 to 17,Female,High school or equivalent,No Income,4711.0
18,01/01/2008 12:00:00 AM,00 to 17,Female,High school or equivalent,"$5,000 to $9,999",7672.0
19,01/01/2008 12:00:00 AM,00 to 17,Female,"Some college, less than 4-yr degree",No Income,7598.0
20,01/01/2008 12:00:00 AM,00 to 17,Female,"Some college, less than 4-yr degree","$5,000 to $9,999",1565.0


## Label-Based Selection (.loc[])
The .loc[] method selects data by explicit row and column labels.

    df.loc[row_selector, column_selector]

**Row_selector**: row label, list of labels, slice, or boolean mask  
**Column_selector**: column label, list of labels, or slice  


In [20]:
# Select row with label 5 and specific columns
table.loc[5, ['Educational Attainment', 'Personal Income']]


Educational Attainment    No high school diploma
Personal Income               $25,000 to $34,999
Name: 5, dtype: object

In [21]:
# Select specific rows and columns
table.loc[[2, 4, 5], 'Educational Attainment']

2    No high school diploma
4    No high school diploma
5    No high school diploma
Name: Educational Attainment, dtype: object

In [22]:
# Select all rows and specific columns
tableHead.loc[:, ['Personal Income']]

Unnamed: 0,Personal Income
0,No Income
1,No Income
2,"$5,000 to $9,999"
3,"$10,000 to $14,999"
4,"$15,000 to $24,999"


In [31]:
# Boolean mask wih loc
table.loc[table['Gender'] == 'Female', 
          ['Educational Attainment', 'Personal Income', 'Population Count']]

Unnamed: 0,Educational Attainment,Personal Income,Population Count
12,Children under 15,No Income,
13,No high school diploma,No Income,635274.0
14,No high school diploma,"$5,000 to $9,999",33202.0
15,No high school diploma,"$10,000 to $14,999",6857.0
16,No high school diploma,"$15,000 to $24,999",2009.0
17,High school or equivalent,No Income,4711.0
18,High school or equivalent,"$5,000 to $9,999",7672.0
19,"Some college, less than 4-yr degree",No Income,7598.0
20,"Some college, less than 4-yr degree","$5,000 to $9,999",1565.0


## Integer-Based Selection (.iloc[])
The .iloc[] method selects data by integer position (like Python lists).

    df.iloc[row_position, column_position]

Both selectors can be integers, lists, or slices

In [25]:
# Select first row, first column
table.iloc[0, 0]

'01/01/2008 12:00:00 AM'

In [26]:
# Select first three rows, first two columns
table.iloc[0:3, 0:2]

Unnamed: 0,Year,Age
0,01/01/2008 12:00:00 AM,00 to 17
1,01/01/2008 12:00:00 AM,00 to 17
2,01/01/2008 12:00:00 AM,00 to 17


In [28]:
# Select rows at position 0 and 2, columns at position 1
table.iloc[[0,2], [5]]

Unnamed: 0,Population Count
0,
2,30152.0


In [32]:
# Select last row and last column
table.iloc[-1, -1]

np.float64(317119.0)

- Always prefer .loc[] and .iloc[] for complex or two-dimensional selections.

- Bracket selection is quick for pulling columns, slices, or filtering with a mask.

- Combining selectors provides fast, readable, and powerful data extraction.

These foundations, illustrated with multiple examples, should make data selection in pandas clear—ready for further exploration or real-world analysis tasks!

In [30]:
# Sources:
# [1](https://www.programiz.com/python-programming/pandas/select)
# [2](https://mode.com/python-tutorial/pandas-dataframe/)
# [3](https://www.dataquest.io/blog/tutorial-indexing-dataframes-in-pandas/)
# [4](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html)
# [5](https://pandas.pydata.org/docs/user_guide/indexing.html)
# [6](https://www.geeksforgeeks.org/pandas/indexing-and-selecting-data-with-pandas/)
# [7](https://datacarpentry.github.io/python-ecology-lesson/03-index-slice-subset.html)
# [8](https://itnext.io/a-guide-to-efficient-data-selection-in-pandas-ea6dab640604)
# [9](https://www.geeksforgeeks.org/slicing-indexing-manipulating-and-cleaning-pandas-dataframe/)
# [10](https://data.ca.gov/dataset/cea8cd18-9d21-4676-85de-d504ee2d4aab/resource/26201f19-4469-4311-a819-bbbd3e557eda)