# Multiple DataFrames

We can merge different `DataFrames` using the `.merge()` method. It looks for columns that are common between 2 `DataFrames` and then looks for rows where those column values are the same. It then combines the the matching rows into a single row in a new table.

The `.merge()` method takes the two dataframes as args:

```py
new_df = pd.merge(orders, customers)
```

Each `DataFrame` has it's own `.merge()` method, e.g. you could merge orders and customers like so:

```py
new_df = orders.merge(customers)
```

Use this technique when you need to merge more than two dataframes together, you can chain the calls like so:

```py
new_df = orders.merge(customers).merge(products)

all_data = visits.merge(cart, how='left').merge(checkout, how='left').merge(purchase, how='left')
```

### Merge on specific columns

It is often the case that we will merge two or more dataframes whose columns don't match. Often tables will have an 'id' column but the 'id' refering to different properties, e.g. product_id vs order_id.

One way that we could address this problem is to use `.rename` to rename the columns for our merges.

```py
pd.merge(
    orders,
    customers.rename(columns={'id': 'customer_id'}))
```

In other situations where there are NO matching columns between the the dataframes. In such cases we can use `left_on` and `right_on` to specifiy which columns we want to perform the merge on.

```py
# match the 'customer_id' in the orders table to the 'id' column in the customers table
pd.merge(
    orders,
    customers,
    left_on='customer_id',
    right_on='id')
```

The 'left' table is the one that comes first (orders), and the 'right' table is the one that comes second (customers). This syntax says that we should match the customer_id from orders to the id in customers.

When we have two tables, each with an 'id' column, we'll end up with two columns called id, one from the first table and one from the second. Pandas does not permit two columns with the same name, so it will change them to id_x and id_y by default.

We can provide a list of `suffixes` to be used instead of `_x` and `_y`.

```py
pd.merge(
    orders,
    customers,
    left_on='customer_id',
    right_on='id',
    suffixes=['_order', '_customer']
)
```

```py
# match 'product_id' in orders table to 'id' in products
orders_products = pd.merge(
		orders,
  	products,
  	left_on='product_id',
  	right_on='id',
  	suffixes=['_orders', '_products']
)
```

### Mismatched Merges

There are occations when we need to match two dataframes that do NOT have matching values, e.g.

```py
# orders dataframe
	id	product_id	customer_id	quantity	timestamp
0	1	3	2	1	2017-01-01
1	2	2	2	3	2017-01-01
2	3	5	1	1	2017-01-01
3	4	2	3	2	2016-02-01
4	5	3	3	3	2017-02-01

# products dataframe
	product_id	description	price
0	1	thing-a-ma-jig	5
1	2	whatcha-ma-call-it	10
2	3	doo-hickey	7
3	4	gizmo	3
```

```py
pd.merge(df_a, df_b)

# OR
pd.merge(df_a, df_b, how='inner')
```

In such cases, any row that does not have a matching value is simply dropped - unmatched rows are dropped. This the default dehaviour.

```py
# result
    id	product_id	customer_id	quantity	timestamp	description	price
0	1	3	2	1	2017-01-01	doo-hickey	7
1	5	3	3	3	2017-02-01	doo-hickey	7
2	2	2	2	3	2017-01-01	whatcha-ma-call-it	10
3	4	2	3	2	2016-02-01	whatcha-ma-call-it	10
```

This type of merge (where we only include matching rows) is called an `Inner Merge`.

When we want to merge tables with mismatched rows, and not loose any we need to perform an `Outer Join` or `Outer Merge`(all rows are included).

An `Outer Join` would include all rows from both tables, even if they don't match. Any missing values are filled in with `None` or `nan`.

```py
# company_a
name	email
Sally Sparrow	sally.sparrow@gmail.com
Peter Grant	pgrant@yahoo.com
Leslie May	leslie_may@gmail.com

# company_b
name	phone
Peter Grant	212-345-6789
Leslie May	626-987-6543
Aaron Burr	303-456-7891

pd.merge(company_a, company_b, how='outer')

# result
name	    email	    phone
Sally Sparrow	sally.sparrow@gmail.com	nan
Peter Grant	pgrant@yahoo.com	212-345-6789
Leslie May	leslie_may@gmail.com	626-987-6543
Aaron Burr	nan	303-456-7891
```

### Mismatched rows and Left/Right Merge

A Left Merge includes all rows from the first (left) table, but only rows from the second (right) table that match the first table.

```py
pd.merge(company_a, company_b, how='left')
```

Right merge is the exact opposite of left merge. The merged table will include all rows from the second (right) table, but only rows from the first (left) table that match the second table.

```py
pd.merge(company_a, company_b, how="right")
```

### Concatenating DataFrames

We can concatenate dataframes with the `.concat()` method, passing it a list of dataframes

```py
pd.concat([df1, df2, df3, ...])
```

**This only works when all the columns are the same**.

Use `.reset_index(drop=True)` so the `id` column is reset and the `index` column of old `id`s is dropped.