# Creating DataFrames and Merging in pandas
In this notebook, we'll create two DataFrames and practice merging them using the `merge()` function in pandas. We'll cover various types of joins, including inner, left, right, and outer joins.

## Creating DataFrames

We'll start by creating two DataFrames: one for employees and one for departments.

In [1]:
import pandas as pd

# Dataset 1: Employees
employees = pd.DataFrame({
    'EmployeeID': [1, 2, 3, 4, 5],
    'Name': ['John Doe', 'Jane Smith', 'Mike Brown', 'Emily Davis', 'Anna White'],
    'DepartmentID': [101, 102, 101, 103, 104],
    'Salary': [50000, 60000, 45000, 70000, 48000]
})

# Dataset 2: Departments
departments = pd.DataFrame({
    'DepartmentID': [101, 102, 103, 105],
    'DepartmentName': ['HR', 'IT', 'Marketing', 'Sales']
})

# Display the DataFrames
print("Employees DataFrame:")
print(employees)
print("\nDepartments DataFrame:")
print(departments)


Employees DataFrame:
   EmployeeID         Name  DepartmentID  Salary
0           1     John Doe           101   50000
1           2   Jane Smith           102   60000
2           3   Mike Brown           101   45000
3           4  Emily Davis           103   70000
4           5   Anna White           104   48000

Departments DataFrame:
   DepartmentID DepartmentName
0           101             HR
1           102             IT
2           103      Marketing
3           105          Sales


## Example Merge Operations

We can perform various merge operations using the `merge()` function. Here are some common types of joins:

### 1. Inner Join
Merging on `DepartmentID` to get only employees who have a matching department.

In [2]:
# Inner join
merged_inner = pd.merge(employees, departments, on='DepartmentID', how='inner')

print("\nInner Join Result:")
merged_inner



Inner Join Result:


Unnamed: 0,EmployeeID,Name,DepartmentID,Salary,DepartmentName
0,1,John Doe,101,50000,HR
1,3,Mike Brown,101,45000,HR
2,2,Jane Smith,102,60000,IT
3,4,Emily Davis,103,70000,Marketing


### Left Join
Keeping all employees and adding department information where available.

In [7]:
# Left join
merged_left = pd.merge(employees, departments, on='DepartmentID', how='left')

print("\nLeft Join Result:")
merged_left



Left Join Result:


Unnamed: 0,EmployeeID,Name,DepartmentID,Salary,DepartmentName
0,1,John Doe,101,50000,HR
1,2,Jane Smith,102,60000,IT
2,3,Mike Brown,101,45000,HR
3,4,Emily Davis,103,70000,Marketing
4,5,Anna White,104,48000,


In [8]:
# Left join
merged_left = pd.merge( departments,employees, on='DepartmentID', how='left')

print("\nLeft Join Result:")
merged_left


Left Join Result:


Unnamed: 0,DepartmentID,DepartmentName,EmployeeID,Name,Salary
0,101,HR,1.0,John Doe,50000.0
1,101,HR,3.0,Mike Brown,45000.0
2,102,IT,2.0,Jane Smith,60000.0
3,103,Marketing,4.0,Emily Davis,70000.0
4,105,Sales,,,


### Right Join
Keeping all departments and adding employee information where available.

In [4]:
# Right join
merged_right = pd.merge(employees, departments, on='DepartmentID', how='right')

print("\nRight Join Result:")
print(merged_right)



Right Join Result:
   EmployeeID         Name  DepartmentID   Salary DepartmentName
0         1.0     John Doe           101  50000.0             HR
1         3.0   Mike Brown           101  45000.0             HR
2         2.0   Jane Smith           102  60000.0             IT
3         4.0  Emily Davis           103  70000.0      Marketing
4         NaN          NaN           105      NaN          Sales


### Outer Join
Combining all employees and departments, regardless of whether they have matching data.

In [5]:
# Outer join
merged_outer = pd.merge(employees, departments, on='DepartmentID', how='outer')

print("\nOuter Join Result:")
print(merged_outer)



Outer Join Result:
   EmployeeID         Name  DepartmentID   Salary DepartmentName
0         1.0     John Doe           101  50000.0             HR
1         3.0   Mike Brown           101  45000.0             HR
2         2.0   Jane Smith           102  60000.0             IT
3         4.0  Emily Davis           103  70000.0      Marketing
4         5.0   Anna White           104  48000.0            NaN
5         NaN          NaN           105      NaN          Sales


## Summary
In this notebook, we demonstrated how to create DataFrames and perform various merge operations using the `merge()` function in pandas. We covered:
- **Inner Join**: Merges only rows with matching keys in both DataFrames.
- **Left Join**: Keeps all rows from the left DataFrame and adds matching rows from the right DataFrame.
- **Right Join**: Keeps all rows from the right DataFrame and adds matching rows from the left DataFrame.
- **Outer Join**: Combines all rows from both DataFrames, with NaN for missing matches.

These operations are useful for combining datasets and performing relational data analysis.

### Concat: 
Concatenate DataFrames either along rows or columns.

In [9]:
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
pd.concat([df1, df2])

Unnamed: 0,A,B
0,1,3
1,2,4
0,5,7
1,6,8
