# Joining Data with Pandas

Joining data with pandas is a fundamental operation when working with tabular data in Python. Pandas provides several methods for combining datasets based on common columns or indices. 

### Inner Join

An inner join in pandas combines two DataFrames based on a common column (or columns) and returns only the rows where there is a match in both DataFrames. This operation is analogous to the SQL INNER JOIN

In [1]:
import pandas as pd

In [12]:
path = r"C:\Users\Alysson\Documents\GitHub\Pandas-in-Python\database.xlsx"
data_1 = pd.read_excel(path)
data_1.head()

Unnamed: 0,name,breed,color,height_cm,weight_kg,age
0,Paçoca,Labrador,Brown,56,25,5
1,Ivo,Poodle,Black,43,22,4
2,Lola,Schnauzer,Gray,49,23,4
3,Maracatu,King Cavalier,Brown,43,21,3
4,Chantal,Labrador,Black,59,29,6


In [4]:
path = r"C:\Users\Alysson\Documents\GitHub\Pandas-in-Python\database_2.xlsx"
data_2 = pd.read_excel(path)
data_2.head()

Unnamed: 0,name,owner,injured,age_owner,hometown
0,Paçoca,Pedro,yes,30,Lisbon
1,Ivo,Mario,yes,34,Rio de Janeiro
2,Lola,Maria,no,63,Brasilia
3,Maracatu,Gabi,no,28,Londres
4,Chantal,Fernanda,no,27,Minas Gerais


### Mergins Tables

Merging tables (also known as dataframes) using Pandas is a common operation in data manipulation and analysis. Pandas provides a variety of methods for merging tables based on different criteria.

In [7]:
data_complete = data_1.merge(data_2, on="name")
data_complete

Unnamed: 0,name,breed,color,height_cm,weight_kg,age,owner,injured,age_owner,hometown
0,Paçoca,Labrador,Brown,56,25,5,Pedro,yes,30,Lisbon
1,Ivo,Poodle,Black,43,22,4,Mario,yes,34,Rio de Janeiro
2,Lola,Schnauzer,Gray,49,23,4,Maria,no,63,Brasilia
3,Maracatu,King Cavalier,Brown,43,21,3,Gabi,no,28,Londres
4,Chantal,Labrador,Black,59,29,6,Fernanda,no,27,Minas Gerais
5,Xuxu,King Cavalier,White,41,20,5,Teco,no,41,Poland
6,Rex,Korg,Brown,40,23,4,Alysson,no,32,Fortaleza


In [8]:
data_complete.shape

(7, 10)

### One-to-many relationships

In [14]:
path = r"C:\Users\Alysson\Documents\GitHub\Pandas-in-Python\database_3.xlsx"
data_3 = pd.read_excel(path)
data = data_1.merge(data_3, on="name")
data

Unnamed: 0,name,breed,color,height_cm,weight_kg,age,owner,injured,age_owner,hometown
0,Paçoca,Labrador,Brown,56,25,5,Pedro,yes,30,Lisbon
1,Ivo,Poodle,Black,43,22,4,Mario,yes,34,Rio de Janeiro
2,Lola,Schnauzer,Gray,49,23,4,Maria,no,63,Brasilia
3,Lola,Schnauzer,Gray,49,23,4,Raimundo,no,75,Parnaiba
4,Maracatu,King Cavalier,Brown,43,21,3,Gabi,no,28,Londres
5,Chantal,Labrador,Black,59,29,6,Fernanda,no,27,Minas Gerais
6,Chantal,Labrador,Black,59,29,6,Cabral,no,30,São Paulo
7,Xuxu,King Cavalier,White,41,20,5,Teco,no,41,Poland
8,Rex,Korg,Brown,40,23,4,Alysson,no,32,Fortaleza


In [15]:
data.shape

(9, 10)

### Merging Multiple DataFrames

Similar to the previous case, but we use a list of columns that we would like to merge.

In [None]:
list = ["name_col_1, name_col_2"]
data = data_1.merge(data_3, on=list)\
			.merge(data_2, on = "name")
        