## **Pandas Library - Continuation**

In pandas, **conditional selection** allows us to filter rows of a DataFrame based on logical conditions. For example, writing df[df["Age"] > 30] will return only the rows where the column Age is greater than 30. We can also combine multiple conditions using **& (AND), | (OR), and ~ (NOT)**, making it easy to work with complex filtering rules.

In [14]:
import numpy as np
import pandas as pd
from numpy.random import randn

In [15]:
np.random.seed(101)
df = pd.DataFrame(randn(4,4),['L1','L2','L3','L4'],['C1','C2','C3','C4'])
df

Unnamed: 0,C1,C2,C3,C4
L1,2.70685,0.628133,0.907969,0.503826
L2,0.651118,-0.319318,-0.848077,0.605965
L3,-2.018168,0.740122,0.528813,-0.589001
L4,0.188695,-0.758872,-0.933237,0.955057


In [16]:
df > 0

Unnamed: 0,C1,C2,C3,C4
L1,True,True,True,True
L2,True,False,False,True
L3,False,True,True,False
L4,True,False,False,True


In [17]:
df < 0

Unnamed: 0,C1,C2,C3,C4
L1,False,False,False,False
L2,False,True,True,False
L3,True,False,False,True
L4,False,True,True,False


In [18]:
df['C2'] > 0

L1     True
L2    False
L3     True
L4    False
Name: C2, dtype: bool

In [19]:
df['C2']

L1    0.628133
L2   -0.319318
L3    0.740122
L4   -0.758872
Name: C2, dtype: float64

In [None]:
# Another case.
df[df['C4']<0] # returned the row (L3) that in the chosen column (C4) had a number less than 0

Unnamed: 0,C1,C2,C3,C4
L3,-2.018168,0.740122,0.528813,-0.589001


In [23]:
result = df[df['C1']>0]

In [24]:
result

Unnamed: 0,C1,C2,C3,C4
L1,2.70685,0.628133,0.907969,0.503826
L2,0.651118,-0.319318,-0.848077,0.605965
L4,0.188695,-0.758872,-0.933237,0.955057


In [25]:
result['C2']

L1    0.628133
L2   -0.319318
L4   -0.758872
Name: C2, dtype: float64

* The code filters **rows** where **C1 > 0** and then selects only columns **C2** and **C3**.

In [None]:
# You can do this too
result = df[df['C1'] >0] [['C2','C3']]
result # So, result will contain only the data from columns C2 and C3, but only for rows where C1 > 0.

Unnamed: 0,C2,C3
L1,0.628133,0.907969
L2,-0.319318,-0.848077
L4,-0.758872,-0.933237


* The same can be done in two steps: first save the boolean condition **(df['C1'] > 0)**, then apply the selection for the desired **columns**.

In [42]:
boole = df['C1']>0
result = df[boole]
test_col = ['C2', 'C3']
result[test_col]

Unnamed: 0,C2,C3
L1,0.628133,0.907969
L2,-0.319318,-0.848077
L4,-0.758872,-0.933237


* **Now let's use multiple conditions.**

Remenber the conditions:

* **True and True = True →** a linha é mantida.
*  **True and False = False →** a linha é descartada.
*  **True or False = True →** a linha é mantida.

In [56]:
df[(df['C1']>0) | (df['C3']<1)]

Unnamed: 0,C1,C2,C3,C4
L1,2.70685,0.628133,0.907969,0.503826
L2,0.651118,-0.319318,-0.848077,0.605965
L3,-2.018168,0.740122,0.528813,-0.589001
L4,0.188695,-0.758872,-0.933237,0.955057


* **`reset_index()`** is used to reset the index of a DataFrame back to the default sequential numeric order (0, 1, 2...). This is especially useful when you filter rows and the index becomes "broken" (for example: 0, 3, 5). If you want to discard the old index completely

In [58]:
df.reset_index()

Unnamed: 0,index,C1,C2,C3,C4
0,L1,2.70685,0.628133,0.907969,0.503826
1,L2,0.651118,-0.319318,-0.848077,0.605965
2,L3,-2.018168,0.740122,0.528813,-0.589001
3,L4,0.188695,-0.758872,-0.933237,0.955057


In [59]:
df

Unnamed: 0,C1,C2,C3,C4
L1,2.70685,0.628133,0.907969,0.503826
L2,0.651118,-0.319318,-0.848077,0.605965
L3,-2.018168,0.740122,0.528813,-0.589001
L4,0.188695,-0.758872,-0.933237,0.955057


In [73]:
# New Example: 
# # Remember the number of numbers in the initial matrix
new_index = 'J A C K'.split()

new_index

['J', 'A', 'C', 'K']

In [74]:
df['Letras'] = new_index 

In [75]:
df

Unnamed: 0,C1,C2,C3,C4,Letras
L1,2.70685,0.628133,0.907969,0.503826,J
L2,0.651118,-0.319318,-0.848077,0.605965,A
L3,-2.018168,0.740122,0.528813,-0.589001,C
L4,0.188695,-0.758872,-0.933237,0.955057,K


In [None]:
# Changed the original index column to the newly created one
df.set_index('Letras')

Unnamed: 0_level_0,C1,C2,C3,C4
Letras,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
J,2.70685,0.628133,0.907969,0.503826
A,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
K,0.188695,-0.758872,-0.933237,0.955057
