# Video: Joining Data Frames with Pandas

This video shows how to join data sets in different data frames into one data frame with Pandas.


## Slide: Data Frames are Joined By Index

* The previous trivial joins worked because both data frames shared the same index.
* The data frame `join` method can join from any column of the calling data frame to the index of the other data frame.
* Key idea: searching an index is faster.


## Slide: Joining Costs to Project Materials

TODO picture of garden bed (I will provide)

How much does all that wood cost?

## Code Example: Garden Bed Data

In [None]:
import pandas as pd

In [None]:
bed_size_materials = pd.read_csv("https://raw.githubusercontent.com/bu-omds/bu-omds-data/main/data/garden-bed_size_materials.tsv", sep="\t")
bed_size_materials

Unnamed: 0,bed_size,material,quantity_per_bed
0,4' x 4',"2"" x 6"" x 4'",20
1,4' x 4',"8"" x 8"" x 16"" Cinder Block",12
2,4' x 8',"2"" x 6"" x 4'",6
3,4' x 8',"2"" x 6"" x 8'",14
4,4' x 8',"8"" x 8"" x 16"" Cinder Block",24


In [None]:
material_costs = pd.read_csv("https://raw.githubusercontent.com/bu-omds/bu-omds-data/main/data/garden-material_costs.tsv", sep="\t", index_col="material")
material_costs

Unnamed: 0_level_0,unit_cost
material,Unnamed: 1_level_1
"2"" x 6"" x 4'",4.92
"2"" x 6"" x 8'",6.62
"8"" x 8"" x 16"" Cinder Block",2.53


In [None]:
bed_size_costs = bed_size_materials.join(material_costs, on="material")
bed_size_costs

Unnamed: 0,bed_size,material,quantity_per_bed,unit_cost
0,4' x 4',"2"" x 6"" x 4'",20,4.92
1,4' x 4',"8"" x 8"" x 16"" Cinder Block",12,2.53
2,4' x 8',"2"" x 6"" x 4'",6,4.92
3,4' x 8',"2"" x 6"" x 8'",14,6.62
4,4' x 8',"8"" x 8"" x 16"" Cinder Block",24,2.53


In [None]:
bed_size_costs["cost"] = bed_size_costs["quantity_per_bed"] * bed_size_costs["unit_cost"]
bed_size_costs

Unnamed: 0,bed_size,material,quantity_per_bed,unit_cost,cost
0,4' x 4',"2"" x 6"" x 4'",20,4.92,98.4
1,4' x 4',"8"" x 8"" x 16"" Cinder Block",12,2.53,30.36
2,4' x 8',"2"" x 6"" x 4'",6,4.92,29.52
3,4' x 8',"2"" x 6"" x 8'",14,6.62,92.68
4,4' x 8',"8"" x 8"" x 16"" Cinder Block",24,2.53,60.72


In [None]:
bed_size_costs.groupby("bed_size")["cost"].sum()

bed_size
4' x 4'    128.76
4' x 8'    182.92
Name: cost, dtype: float64

In [None]:
bed_size_costs = bed_size_costs.groupby("bed_size")[["cost"]].sum()
bed_size_costs

Unnamed: 0_level_0,cost
bed_size,Unnamed: 1_level_1
4' x 4',128.76
4' x 8',182.92


## Slide: Garden Bed Wrap Up

TODO illustration of cheap big vs expensive small

* \$129 << \$400 (online comparison)
* 4' x 4' >> 4' x 1'

* Wrapping up this example, when I was looking into just ordering a garden bed online, most of the beds that I found were about four feet by one foot, and cost 4 to 500 dollars, so I was pretty pleased with this result.
* Yes, there are missing costs including the missing hardware, and my personal time, but I got a much bigger garden bed too.
* Now, I doubt you all came here to become quantitative wood workers.
* So let's talk about generalizations.