# Lección 19 - Merge Data

Las datos contenidos en objetos pandas pueden ser combinados de diferentes maneras.

Hace la combinación sobre la  intersección de las tablas, elementos que tienen en común.

Para determinar que combinacion de llaves aparecerán en el resultado dependen del método que se escoja.

In [48]:
import pandas as pd

## Ejemplo 1

In [49]:
df1 = pd.DataFrame(
    {
        "key": ["b", "b", "a", "c", "a", "a", "b"],
        "data1": pd.Series(range(7), dtype="Int64"),
    }
)
df1

Unnamed: 0,key,data1
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,a,5
6,b,6


In [50]:
df2 = pd.DataFrame(
    {"key": ["a", "b", "d"], "data2": pd.Series(range(3), dtype="Int64")}
)
df2

Unnamed: 0,key,data2
0,a,0
1,b,1
2,d,2


In [None]:
pd.merge(df1, df2)

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,1,1
2,b,6,1
3,a,2,0
4,a,4,0
5,a,5,0


In [None]:
pd.merge(df1, df2, on="key")

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,1,1
2,b,6,1
3,a,2,0
4,a,4,0
5,a,5,0


## Ejemplo 2 - Busqueda sobre 2 columnas

|    Option   |   Behavihor |
|-------------|-------------|
| how="inner" | Use only the key combinations observed in both tables |
| how="left"  | Use all key combinations found in the left table |
| how="right" | Use all key  combinations foun in the right table|
| how="outer" | Use all key combinations observed in both tables together |

In [53]:
df3 = pd.DataFrame(
    {
        "lkey": ["b", "b", "a", "c", "a", "a", "b"],
        "data1": pd.Series(range(7), dtype="Int64"),
    }
)
df3

Unnamed: 0,lkey,data1
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,a,5
6,b,6


In [54]:
df4 = pd.DataFrame(
    {
        "rkey": ["a", "b", "d"],
        "data2": pd.Series(range(3), dtype="Int64"),
    }
)
df4

Unnamed: 0,rkey,data2
0,a,0
1,b,1
2,d,2


### Left ON y Right ON

Busqueda sobre la columna lkey y rkey

In [55]:
pd.merge(df3, df4, left_on="lkey", right_on="rkey")

Unnamed: 0,lkey,data1,rkey,data2
0,b,0,b,1
1,b,1,b,1
2,b,6,b,1
3,a,2,a,0
4,a,4,a,0
5,a,5,a,0


In [56]:
# Inner Join
pd.merge(df3, df4, left_on="data1", right_on="data2")

Unnamed: 0,lkey,data1,rkey,data2
0,b,0,a,0
1,b,1,b,1
2,a,2,d,2


### Inner Join

In [None]:
pd.merge(df1, df2, how="inner")

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,1,1
2,b,6,1
3,a,2,0
4,a,4,0
5,a,5,0


### Outer Join

In [57]:
# Out Join
pd.merge(df1, df2, how="outer")

Unnamed: 0,key,data1,data2
0,b,0.0,1.0
1,b,1.0,1.0
2,b,6.0,1.0
3,a,2.0,0.0
4,a,4.0,0.0
5,a,5.0,0.0
6,c,3.0,
7,d,,2.0


### Left y Right

**Right:** Combiname pero no importa lo que tenga en df1, me vas a hacer la busqueda de lo que hay en df2

**Left:**  Combiname pero no importa lo que tenga en df2, me vas a hacer la busqueda de lo que hay en df1

In [58]:
df1

Unnamed: 0,key,data1
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,a,5
6,b,6


In [59]:
pd.merge(df1, df2, how="left")

Unnamed: 0,key,data1,data2
0,b,0,1.0
1,b,1,1.0
2,a,2,0.0
3,c,3,
4,a,4,0.0
5,a,5,0.0
6,b,6,1.0


In [60]:
df2

Unnamed: 0,key,data2
0,a,0
1,b,1
2,d,2


In [61]:
pd.merge(df1, df2, how="right")

Unnamed: 0,key,data1,data2
0,a,2.0,0
1,a,4.0,0
2,a,5.0,0
3,b,0.0,1
4,b,1.0,1
5,b,6.0,1
6,d,,2


## Ejemplo 3 -  Multiples Keys

In [63]:
left = pd.DataFrame(
    {
        "key1": ["foo", "foo", "bar"],
        "key2": ["one", "two", "one"],
        "lval": pd.Series([1, 2, 3], dtype="Int64"),
    }
)
left

Unnamed: 0,key1,key2,lval
0,foo,one,1
1,foo,two,2
2,bar,one,3


In [None]:
right = pd.DataFrame(
    {
        "key1": ["foo", "foo", "bar", "bar"],
        "key2": ["one", "one", "one", "two"],
        "lval": pd.Series([4, 5, 6, 7], dtype="Int64"),
    }
)
right

Unnamed: 0,key1,key2,lval
0,foo,one,4
1,foo,one,5
2,bar,one,6
3,bar,two,7


In [None]:
pd.merge(left, right, on=["key1", "key2"], how="outer")

Unnamed: 0,key1,key2,lval_x,lval_y
0,foo,one,1.0,4.0
1,foo,one,1.0,5.0
2,foo,two,2.0,
3,bar,one,3.0,6.0
4,bar,two,,7.0


In [68]:
pd.merge(left, right, on="key1")

Unnamed: 0,key1,key2_x,lval_x,key2_y,lval_y
0,foo,one,1,one,4
1,foo,one,1,one,5
2,foo,two,2,one,4
3,foo,two,2,one,5
4,bar,one,3,one,6
5,bar,one,3,two,7
