![openclassrooms](https://s3.eu-west-1.amazonaws.com/course.oc-static.com/courses/6204541/1+HnqdJ-5ofxiPP9HIxdNdpw.jpeg)
# Merge Data Using Pandas
This time, your task will be to build a more comprehensive dataset. You’ll need to use all of the datasets we’ve provided (the two customer files and the loans file) and merge them using all of the Pandas methods we’ve covered.


In [None]:
import numpy as np
import pandas as pd

In [None]:
# previous processing
loans = pd.read_csv('https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/loans.csv')

# calculate the debt-to-income ratio
loans['debt_to_income'] = round(loans['repayment'] * 100 / loans['income'], 2)

# rename rate to interest_rate
loans.rename(columns={'rate':'interest_rate'}, inplace=True)

# calculate the total cost of the loan
loans['total_cost'] = loans['repayment'] * loans['term']

# calculate monthly profits generated
loans['profit'] = round((loans['total_cost'] * loans['interest_rate']/100)/(24), 2)

# create the risk variable
loans['risk'] = 'No'
loans.loc[loans['debt_to_income'] > 35, 'risk'] = 'Yes'

# customer profile DataFrame
customer_profile = loans.groupby('identifier')[['repayment','debt_to_income','total_cost','profit']].sum()
customer_profile.reset_index(inplace=True)
customer_profile.head()

loans.head()


Firstly, let’s import the two customer files:

In [None]:
customers_1 = pd.read_csv('https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/customers.csv')
customers_1.head()


In [None]:
customers_2 = pd.read_csv('https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/customers_cont.csv')
customers_2.head()


Your first task will be to bring together these two DataFrames, `customers_1` and `customers_2`, into one big DataFrame called `customers` which will contain all of our customer data.

Now you’re going to merge the customer file with the customer profiles we created before. These profiles can be found in the `customer_profile` DataFrame we created previously in chapter 4. You can call this final DataFrame `data`:

The bank’s marketing department has provided us with a file containing our customers' ages

In [None]:
customers_age = pd.read_csv('https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/customers_age.csv')
customers_age.head()


Add the age information to the `data` DataFrame. However, it would seem that some customers who took out a loan aren’t present in this file. We need to ensure that all of the information in our `data` DataFrame is retained, so please choose your arguments with care!

*Here, we need to center our join around the `data` DataFrame (on the left in the above code). There are in fact many more customers in the customers_age file than there are in the data file. However, some of the customers in the data file don’t appear in customers_age. A left (or right) join is therefore required to ensure that we retain all information from the data file, so we’re just adding the age whenever it’s available.*

Well done! Why don’t you compare your answers with the [solution](https://colab.research.google.com/github/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/blob/main/notebooks/P2/P2C5%20-%20Merge%20Data%20Using%20Pandas%20-%20CORRECTION.ipynb) now?