In [None]:
import pandas as pd


In [None]:
df = pd.DataFrame(
    {
        'name': ['Peter', 'Juan', 'Melisa', 'Ana', 'Charles', 'Maria', 'Sonia', 'Peter', 'Melisa', 'Ana', 'Ana'],
        'age': [23, 20, 34, 40, 45, 21, 67, 55, 89, 3, 14]
    },
    index=range(10, 21)
)
df

Unnamed: 0,name,age
10,Peter,23
11,Juan,20
12,Melisa,34
13,Ana,40
14,Charles,45
15,Maria,21
16,Sonia,67
17,Peter,55
18,Melisa,89
19,Ana,3


## loc vs iloc

In [None]:
df.loc[[10, 11, 14], ['age', 'name']]

Unnamed: 0,age,name
10,23,Peter
11,20,Juan
14,45,Charles


In [None]:
df.iloc[[0,1,4], [1]]

Unnamed: 0,age
10,23
11,20
14,45


## Change DataFrame values: mask vs loc/iloc

In [None]:
age_mask = df['age'] > 25
new_df = df[age_mask]
new_df

Unnamed: 0,name,age
12,Melisa,34
13,Ana,40
14,Charles,45
16,Sonia,67
17,Peter,55
18,Melisa,89


In [None]:
new_df['name'] = '...'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['name'] = '...'


In [None]:
new_df

Unnamed: 0,name,age
12,...,34
13,...,40
14,...,45
16,...,67
17,...,55
18,...,89


In [None]:
df

Unnamed: 0,name,age
10,Peter,23
11,Juan,20
12,Melisa,34
13,Ana,40
14,Charles,45
15,Maria,21
16,Sonia,67
17,Peter,55
18,Melisa,89
19,Ana,3


In [None]:
age_mask = df['age'] > 25
df[age_mask]['name'] = '...'
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[age_mask]['name'] = '...'


Unnamed: 0,name,age
10,Peter,23
11,Juan,20
12,Melisa,34
13,Ana,40
14,Charles,45
15,Maria,21
16,Sonia,67
17,Peter,55
18,Melisa,89
19,Ana,3


In [None]:
age_mask = df['age'] > 25
df.loc[age_mask, 'name'] = '...'
df

Unnamed: 0,name,age
10,Peter,23
11,Juan,20
12,...,34
13,...,40
14,...,45
15,Maria,21
16,...,67
17,...,55
18,...,89
19,Ana,3


In [None]:
new_df['name'] = '...'
new_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['name'] = '...'


Unnamed: 0,name,age
12,...,34
13,...,40
14,...,45
16,...,67
17,...,55
18,...,89


In [None]:
df

Unnamed: 0,name,age
10,Peter,23
11,Juan,20
12,...,34
13,...,40
14,...,45
15,Maria,21
16,...,67
17,...,55
18,...,89
19,Ana,3


In [None]:
df1 = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)


df2 = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K4"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)


In [None]:
left

NameError: ignored

In [None]:
right

In [None]:
pd.merge(left, right, on="key", how="inner")


In [None]:
pd.merge(left=df1, right=df2, on="key", how="left")


Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,,


In [None]:
df1.merge(right=df2, on="key", how="left")

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,,


In [None]:
pd.merge(left, right, on="key", how="outer")


NameError: ignored

### Dataframe queries

Pandas allow us to filter data by performing queries on dataframes in a Pythonic way.

For instance, if we want to get the persons whose age is greater than 20, we can execute the following query:

In [None]:
condition = ["Peter", "Maria"]

In [None]:
df.query('age > 20')

Unnamed: 0,name,age
10,Peter,23
12,...,34
13,...,40
14,...,45
15,Maria,21
16,...,67
17,...,55
18,...,89


We can also use variables in the environment, and add multiple conditions. For instance, if we want to extend our previous query to also filter all the persons named Peter or Maria, we can do the following:

In [None]:
df.query('age > 20 and name not in @condition')

Unnamed: 0,name,age
12,...,34
13,...,40
14,...,45
16,...,67
17,...,55
18,...,89


More information about queries can be obtained [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html).