# Merge 

In [2]:
# print all the outputs in a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import pandas as pd

In [None]:
# Set to 2 decimal places, and display all columns
pd.set_option('display.float_format', lambda x: '%.2f' % x)

Load the survey data

In [3]:
df  = pd.read_csv('cleaned_survey.csv', index_col=0)

In [4]:
len(df)

61

Let us assume that we also have a dataframe <i>df_programs</i>, which contains the units required to complete the graduate programs at our business school.

In [5]:
df_programs = pd.DataFrame({'Program' : \
    ['MSIS', 'MBA', 'Master of Finance', \
     'Supply Chain Mgmt & Analytics', 'Master of Hacking'],\
    'Units_required' : [51, 70, 48, 49, 100]})

Note that Master of Hacking (unfortunately) does not actually exist... 

In [6]:
df_programs

Unnamed: 0,Program,Units_required
0,MSIS,51
1,MBA,70
2,Master of Finance,48
3,Supply Chain Mgmt & Analytics,49
4,Master of Hacking,100


## Merge on columns

A Merge operation ("join" in relational DBs) consists of joining the columns of two tables based on the equality of one or more columns. For example, we can add to <i>df</i> a column <i>Units_required</i>, which reports the units required by the program in which each student is enrolled.

### INNER MERGE (default)

Compact formulation: the merge will be performed on the columns with the same name in both tables. Merging <i>df</i> with <i>df_programs</i> will perform the merge on the column <i>Program</i>, because that is the only column with the same name.

Use this to expand the column display

pd.set_option('display.max_columns', 50)  and pd.set_option('display.max_rows', 100)

Or we can specify the names of the columns with <i>left_on</i> (the column or list of columns on the "left" table) and <i>right_on</i> (the column or list of columns on the "right" table)

### LEFT MERGE

This is the equivalent of the left outer join in relational DBs. If a row on the left table finds no match, it will still appear in the result and the missing values will be filled with NAs.

### RIGHT MERGE

### OUTER MERGE

## Merge on Indices

Let's create a new DataFrame, called <i>df_programs_i</i>, which is a copy of <i>df_programs</i> but with <i>Program</i> being an index instead of a column.

To merge <i>df</i> (left table) with <i>df_index_i</i> (right table), we need to specify that we use the index on the right table (<b>right_index = True</b>).

## Problems

For each programming skills level, find the average number of units to be completed by students with that programming skill level


For each existing program (i.e., for each Program in df_programs), find the units required to complete it and the number of students belonging to that program that responded to the survey. 

For each person in df, the number of weekly hours they are working, assuming that:
<ul>
<li>each required unit of coursework is 0.25 hours a week of work
<li>Job=0 is 0 hours a week of work
<li>Job=0.5 is 20 hours a week of work
<li>Job=1 is 40 hours a week of work
</ul>