![openclassrooms](https://s3.eu-west-1.amazonaws.com/course.oc-static.com/courses/6204541/1+HnqdJ-5ofxiPP9HIxdNdpw.jpeg)
# Filter Data in a DataFrame
Now we’re going to go a bit further with our loan data manipulation. Our company’s loan department manager has put together the following requirements:
- The maximum permitted debt-to-income ratio is 35%. Can you tell me how many people have exceeded this threshold?
- Same question, but this time only for the Detroit branch.
- To help with future loan application processing, could you add a variable called `risk`, so we can easily identify higher risk customers?
- How many vehicle loans have been granted? What is the average total cost of these loans?
- What is the total monthly profit generated by the Chicago branch?

You’ll need to apply all of the selection processes outlined above to be able to fulfill these different requests:


In [3]:
import numpy as np
import pandas as pd

In [4]:
# previous processing
loans = pd.read_csv('https://raw.githubusercontent.com/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/main/data/loans.csv')

# calculate the debt-to-income ratio
loans['debt_to_income'] = round(loans['repayment'] * 100 / loans['income'], 2)

# rename rate to interest_rate
loans.rename(columns={'rate':'interest_rate'}, inplace=True)

# calculate the total cost of the loan
loans['total_cost'] = loans['repayment'] * loans['term']

# calculate monthly profits generated
loans['profit'] = round((loans['total_cost'] * loans['interest_rate']/100)/(24), 2)

loans.head()


Unnamed: 0,identifier,city,zip code,income,repayment,term,type,interest_rate,debt_to_income,total_cost,profit
0,0,CHICAGO,60100,3669.0,1130.05,240,real estate,1.168,30.8,271212.0,131.99
1,1,DETROIT,48009,5310.0,240.0,64,automobile,3.701,4.52,15360.0,23.69
2,1,DETROIT,48009,5310.0,1247.85,300,real estate,1.173,23.5,374355.0,182.97
3,2,SAN FRANCISCO,94010,1873.0,552.54,240,real estate,0.972,29.5,132609.6,53.71
4,3,SAN FRANCISCO,94010,1684.0,586.03,180,real estate,1.014,34.8,105485.4,44.57


The maximum permitted debt-to-income ratio is 35%. Can you tell me how many people have exceeded this threshold? Please store the list in a separate variable called `high_risk_cust` and then count the rows in the list.


Here’s a quick explanation of this line of code:
- Firstly, we use the .loc function to select all rows with a debt-to-income ratio of over 35%.
- We then use the .shape method to determine the number of clients meeting the condition.

Same question, but this time only for the Detroit branch.

To help with future loan application processing, could you add a variable called `risk` with a value of `Yes` if the client is higher risk (debt-to-income ratio > 35%) and `No` if not. To do this, I recommend creating the `risk` column and giving it an initial value of `No` (or `Yes` if you prefer) and then only update rows that need to have the other value:


How many automobile loans have been agreed? What is the average total cost of these loans?

Here’s a quick explanation of this line of code:
- We are selecting rows that relate to automobile loans. The result of this selection is stored in a DataFrame called `car_loans`.
- We use the `.shape` method as we did previously to determine the number of loans of this type.
- We then calculate the average of the `total_cost` variable in this newly created DataFrame.

What is the total monthly profit generated by the Chicago branch?

Congratulations! Please go ahead and check your answers against our [solution](https://colab.research.google.com/github/OpenClassrooms-Student-Center/en-8253136-Use-Python-Libraries-for-Data-Science/blob/main/notebooks/P2/P2C3%20-%20Filter%20Data%20in%20a%20DataFrame%20-%20CORRECTION.ipynb).