# Working with Multiple DataFrames

## Introduction: Multiple DataFrames

In order to efficiently store data, we often spread related information across multiple tables.

1) Merging two DataFrames: Inner **Merge**

It is easy to do this kind of matching for one row, but hard to do it for multiple rows. Pandas can efficiently do this for the entire table. We use the **.merge()** method.

The **.merge()** method looks for columns that are common between two DataFrames and then looks for rows where those column’s values are the same. It then combines the matching rows into a single row in a new table.

We can call the **pd.merge()** method with two tables like this:

**new_df = pd.merge(orders, customers)**

This will match up all of the customer information to the orders that each customer made.

**For instance**

    import pandas as pd
    
    sales = pd.read_csv('sales.csv')
    
    print(sales)
    
    targets = pd.read_csv('targets.csv')
    
    print(targets)
    
    sales_vs_targets= pd.merge(sales,targets)
    
    print(sales_vs_targets)

    crushing_it=sales_vs_targets[sales_vs_targets.revenue > sales_vs_targets.target]


In addition to using **pd.merge()**, each DataFrame has its own **.merge()** method. For instance, if you wanted to merge orders with customers, you could use: 

**new_df = orders.merge(customers)**

This produces the same DataFrame as if we had called pd.merge(orders, customers generally use this when we are joining more than two DataFrames together because we can “chain” the commands. 

**For instance**

    import codecademylib3
    import pandas as pd
    
    sales = pd.read_csv('sales.csv')
    print(sales)
    targets = pd.read_csv('targets.csv')
    print(targets)
    men_women=pd.read_csv('men_women_sales.csv')
    print(men_women)
    
    all_data= men_women.merge(sales.merge(targets))
    
    print(all_data)
    
    results= all_data[(all_data.revenue > all_data.target) & (all_data.women > all_data.men)]

## Merge on Specific Columns

In the previous example, the .merge() function “knew” how to combine tables based on the columns that were the same between two tables. For instance, products and orders both had a column called product_id. This won’t always be true when we want to perform a merge

We could use the keywords **left_on* and **right_on* to specify which columns we want to perform the merge on. 

**For instance**

    import codecademylib3
    import pandas as pd
    
    orders = pd.read_csv('orders.csv')
    print(orders)
    products = pd.read_csv('products.csv')
    print(products)
    
    orders_products = pd.merge(
    	orders,
    	products,
    	left_on = 'product_id',
    	right_on = 'id',
    	suffixes = ['_orders', '_products']
    )
    
    print(orders_products)

### Mismatched Merges

When there is a mismatch for a merge of two dataframes, the rows that have no match will simply not be included in the result dataframe. By default the pd.merge() method performs an inner merge, which means that it will only return a row if there is a matching value in both dataframes.

**when we merge two DataFrames whose rows don’t match perfectly, we lose the unmatched rows.**

### Outer Merge

An Outer Join would include all rows from both tables, even if they don’t match. Any missing values are filled in with None or nan (which stands for “Not a Number”).

**pd.merge(company_a, company_b, how='outer')**

## Left and Right Merge

### Left Merge

A Left Merge includes all rows from the first (left) table, but only rows from the second (right) table that match the first table. 

**pd.merge(company_a, company_b, how='left')**


### Rigth Merge

Right merge is the exact opposite of left merge. Here, the merged table will include all rows from the second (right) table, but only rows from the first (left) table that match the second table.

**pd.merge(company_a, company_b, how="right")**

## Concatenate DataFrames

Sometimes, a dataset is broken into multiple tables. For instance, data is often split into multiple CSV files so that each download is smaller. When we need to reconstruct a single DataFrame from multiple smaller DataFrames, we can use the method **pd.concat([df1, df2, df3, ...])**. This method only works if all of the columns are the same in all of the DataFrames.


