#### Data Selection and Filtering
Selecting the right rows and columns is the first step in analyzing any dataset.

---

##### Selecting Rows & Columns
---

Selecting rows

Syntax:
```Python
df["column_name"]   # Single column as series
df[["col1", "col2"]]  # Multiple columns as dataframe

In [6]:
import pandas as pd

df = pd.read_csv('data.csv')

df.head()

df['Category']
df[['Category', 'Date']].head()

Unnamed: 0,Category,Date
0,A,2023-01-01
1,B,2023-01-02
2,C,2023-01-03
3,B,2023-01-04
4,B,2023-01-05


Selecting Rows by Index

Use `.loc[]` (label-based) and `iloc[]` (position-based):
```Python
df.loc[0]   # First row by index
df.iloc[0]   # First row by position

In [None]:
print(df.head())

print(df.loc[3])   # fourth row by index

print(df.iloc[3, 2])   # Fourth row, 3 column

         Date Category  Value   Product  Sales Region
0  2023-01-01        A   28.0  Product1  754.0   East
1  2023-01-02        B   39.0  Product3  110.0  North
2  2023-01-03        C   32.0  Product2  398.0   East
3  2023-01-04        B    8.0  Product1  522.0   East
4  2023-01-05        B   26.0  Product3  869.0  North
Date        2023-01-04
Category             B
Value              8.0
Product       Product1
Sales            522.0
Region            East
Name: 3, dtype: object
8.0


We can also slice:

In [12]:
df.loc[0:2, ['Category', 'Value']]

Unnamed: 0,Category,Value
0,A,28.0
1,B,39.0
2,C,32.0


In [13]:
df.iloc[0:2, 0:2]

Unnamed: 0,Date,Category
0,2023-01-01,A
1,2023-01-02,B


---

##### Fast Access `.at` and `.iat`
These are optimized for single element access

In [None]:
df.at[0, 'Category']     # Fast label based access

'A'

In [None]:
df.iat[0, 3]        # Fast index based access

'Product1'

---

##### Filtering with conditions

Simple Condition

In [None]:
df[df['Category'] == 'A']  # Every row with Category 'A'

Unnamed: 0,Date,Category,Value,Product,Sales,Region
0,2023-01-01,A,28.0,Product1,754.0,East
6,2023-01-07,A,16.0,Product1,936.0,East
9,2023-01-10,A,22.0,Product2,834.0,West
12,2023-01-13,A,70.0,Product3,628.0,South
13,2023-01-14,A,69.0,Product1,423.0,East
14,2023-01-15,A,47.0,Product2,893.0,West
18,2023-01-19,A,31.0,Product2,578.0,West
19,2023-01-20,A,59.0,Product1,736.0,East
24,2023-01-25,A,24.0,Product2,458.0,East
31,2023-02-01,A,17.0,Product2,189.0,West


In [22]:
df[df['Value'] > 50].head()   # Every row where value is greater than 50

Unnamed: 0,Date,Category,Value,Product,Sales,Region
5,2023-01-06,B,54.0,Product3,192.0,West
7,2023-01-08,C,89.0,Product1,488.0,West
11,2023-01-12,B,60.0,Product2,,West
12,2023-01-13,A,70.0,Product3,628.0,South
13,2023-01-14,A,69.0,Product1,423.0,East


Multiple Conditions (AND/OR)

In [18]:
df[(df['Value'] > 50) & (df['Category'] == 'A')]

Unnamed: 0,Date,Category,Value,Product,Sales,Region
12,2023-01-13,A,70.0,Product3,628.0,South
13,2023-01-14,A,69.0,Product1,423.0,East
19,2023-01-20,A,59.0,Product1,736.0,East
39,2023-02-09,A,62.0,Product1,155.0,West
42,2023-02-12,A,93.0,Product3,164.0,West
44,2023-02-14,A,96.0,Product3,830.0,East


In [21]:
df[(df['Product'] == 'Product3') | (df['Sales'] > 500)].head()

Unnamed: 0,Date,Category,Value,Product,Sales,Region
0,2023-01-01,A,28.0,Product1,754.0,East
1,2023-01-02,B,39.0,Product3,110.0,North
3,2023-01-04,B,8.0,Product1,522.0,East
4,2023-01-05,B,26.0,Product3,869.0,North
5,2023-01-06,B,54.0,Product3,192.0,West
