![openclassrooms](https://s3.eu-west-1.amazonaws.com/course.oc-static.com/courses/6204541/1+HnqdJ-5ofxiPP9HIxdNdpw.jpeg)
# Manipulate a DataFrame
We’re now going to work on our file of real estate loans.
Quick synopsis of the file: each row represents a loan that has been agreed with one of our customers. Each customer is uniquely identified using (yes, you've guessed it) an identifier! We have the following details:
- The city and ZIP code of the branch where the customer took out the loan
- The customer’s monthly income
- The customer’s monthly repayments
- The agreed loan term, expressed as a number of months
- The loan type
- The interest rate

This time, your role is to modify the dataset so you can calculate the variables needed to identify customers who are approaching the limits of their repayment ability, and determine the bank’s profits.


In [2]:
import numpy as np
import pandas as pd

Let’s start off by importing our CSV file. This file can be accessed via a link to ensure it works correctly with Google Colab, but don’t worry, it’s the same file as the one you can download [via the following link](https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/loans.csv) (right click and save as).

In [4]:
loans = pd.read_csv('https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/loans.csv')
loans.head()


Unnamed: 0,identifier,city,zip code,income,repayment,term,type,interest_rate
0,0,CHICAGO,60100,3669.0,1130.05,240,real estate,1.168
1,1,DETROIT,48009,5310.0,240.0,64,automobile,3.701
2,1,DETROIT,48009,5310.0,1247.85,300,real estate,1.173
3,2,SAN FRANCISCO,94010,1873.0,552.54,240,real estate,0.972
4,3,SAN FRANCISCO,94010,1684.0,586.03,180,real estate,1.014


The first thing to do is to create a new variable to calculate the debt-to-income ratio for each individual. This ratio represents the percentage of their income that each individual can repay each month. Please round the result to two decimal places:


In [5]:
loans['debt_to_income'] = round(loans['repayment'] * 100 / loans['income'], 2)

To clarify the name, rename the rate variable to interest_rate:

In [6]:
loans.rename(columns={'rate':'interest_rate'}, inplace=True)

The DataFrame should look like this:

In [7]:
loans.head()

Unnamed: 0,identifier,city,zip code,income,repayment,term,type,interest_rate,debt_to_income
0,0,CHICAGO,60100,3669.0,1130.05,240,real estate,1.168,30.8
1,1,DETROIT,48009,5310.0,240.0,64,automobile,3.701,4.52
2,1,DETROIT,48009,5310.0,1247.85,300,real estate,1.173,23.5
3,2,SAN FRANCISCO,94010,1873.0,552.54,240,real estate,0.972,29.5
4,3,SAN FRANCISCO,94010,1684.0,586.03,180,real estate,1.014,34.8


Now, we’re going to create two final variables:
- `total_cost` to represent the total cost of the loan based on `repayment amount` and `loan term`
- `profit` to represent the **monthly profit** generated by this loan for the bank

We’ll simplify the calculation of the profit as follows:

profit = $\dfrac{C * T}{24} $

where:
- C = total cost of the loan
- T = interest rate


In [8]:
loans['total_cost'] = loans['repayment'] * loans['term']
loans['profit'] = round((loans['total_cost'] * loans['interest_rate']/100)/(24), 2)

You should get the following as your first five rows:

In [9]:
loans.head()

Unnamed: 0,identifier,city,zip code,income,repayment,term,type,interest_rate,debt_to_income,total_cost,profit
0,0,CHICAGO,60100,3669.0,1130.05,240,real estate,1.168,30.8,271212.0,131.99
1,1,DETROIT,48009,5310.0,240.0,64,automobile,3.701,4.52,15360.0,23.69
2,1,DETROIT,48009,5310.0,1247.85,300,real estate,1.173,23.5,374355.0,182.97
3,2,SAN FRANCISCO,94010,1873.0,552.54,240,real estate,0.972,29.5,132609.6,53.71
4,3,SAN FRANCISCO,94010,1684.0,586.03,180,real estate,1.014,34.8,105485.4,44.57


Now, we want to know the five loans that are generating the most profit for us. Implement a solution to display them:

In [10]:
loans.sort_values('profit', ascending=False).head()

Unnamed: 0,identifier,city,zip code,income,repayment,term,type,interest_rate,debt_to_income,total_cost,profit
8,7,NEW YORK CITY,10000,5486.0,2956.95,300,real estate,1.184,53.9,887085.0,437.63
23,22,NEW YORK CITY,10300,5838.0,3018.25,240,real estate,1.229,51.7,724380.0,370.94
186,173,DETROIT,48006,6784.0,3744.77,180,real estate,1.248,55.2,674058.6,350.51
242,226,DETROIT,48002,5098.0,2910.96,240,real estate,1.14,57.1,698630.4,331.85
181,168,DETROIT,48003,6366.0,2807.41,240,real estate,1.176,44.1,673778.4,330.15
