## Question: For each constructed variable, tell us how it could be relevant to improve the risk scoring model.

Below are the features i've selected which can be grouped into CREDIT AND ACCOUNT variables

From the json file the vairables fall under the **creditaccountsummary** and **accountrating sections** of the file

CREDIT ACCOUNT VARIABLES

- 'rating'
- 'amountarrear',
- 'totalaccounts',
- 'totalaccountarrear',
- 'totalaccountarrear1',
- 'totaloutstandingdebt',

ACCOUNT VARIABLES

- 'noofotheraccountsbad',
- 'noofotheraccountsgood',
- 'noofretailaccountsbad',
- 'noofretailaccountsgood',
- 'nooftelecomaccountsbad',
- 'nooftelecomaccountsgood',
- 'noofcreditcardaccountsbad',
- 'noofcreditcardaccountsgood',
- 'noofpersonalloanaccountsbad',
- 'noofpersonalloanaccountsgood'


In [2]:
import json
import warnings  
import pandas as pd
import numpy as np
import math
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.set_option('precision', 0)
warnings.filterwarnings('ignore')

In [87]:
def extract_data_from_json(json_file):
    # Read the JSON file
    with open(json_file) as f:
        data = json.load(f)

    # Initialize lists to store the extracted data
    application_id_list = []
    creditaccountsummary_list = []
    accountrating_list = []

    # Extract data from the JSON file
    for record in data:
        application_id = record.get('application_id')
        consumerfullcredit = record.get('data', {}).get('consumerfullcredit')

        # Extract and append data to the respective lists
        application_id_list.append(application_id)
        if consumerfullcredit:
            creditaccountsummary = consumerfullcredit.get('creditaccountsummary', {})
            accountrating = consumerfullcredit.get('accountrating', {})
            creditaccountsummary_list.append(creditaccountsummary)
            accountrating_list.append(accountrating)
        else:
            creditaccountsummary_list.append({})
            accountrating_list.append({})

    # Create a DataFrame with the extracted data
    df = pd.DataFrame({
        'application_id': application_id_list,
        'creditaccountsummary': creditaccountsummary_list,
        'accountrating': accountrating_list
    })

    # Flatten the dictionaries and add their key-value pairs as separate columns
    df_creditaccountsummary = pd.json_normalize(df['creditaccountsummary'])
    df_accountrating = pd.json_normalize(df['accountrating'])
    
    # Concatenate the flattened DataFrames with the original DataFrame
    df = pd.concat([df.drop(['creditaccountsummary', 'accountrating'], axis=1),
                    df_creditaccountsummary, df_accountrating], axis=1)

    return df


In [88]:
cr_df = extract_data_from_json('Credit_bureau_sample_data.json')

In [89]:
cr_df.head(10)

Unnamed: 0,application_id,rating,amountarrear,amountarrear1,totalaccounts,totalaccounts1,lastjudgementdate,lastjudgementdate1,totalaccountarrear,totalaccountarrear1,totaljudgementamount,totaloutstandingdebt,totaljudgementamount1,totaloutstandingdebt1,totaldishonouredamount,totalmonthlyinstalment,totalnumberofjudgement,totaldishonouredamount1,totalmonthlyinstalment1,totalnumberofjudgement1,totalnumberofdishonoured,totalnumberofdishonoured1,totalaccountingodcondition,totalaccountingodcondition1,noofotheraccountsbad,noofotheraccountsgood,noofretailaccountsbad,noofretailaccountsgood,nooftelecomaccountsbad,noofautoloanaccountsbad,noofautoloanccountsgood,noofhomeloanaccountsbad,nooftelecomaccountsgood,noofhomeloanaccountsgood,noofjointloanaccountsbad,noofstudyloanaccountsbad,noofcreditcardaccountsbad,noofjointloanaccountsgood,noofstudyloanaccountsgood,noofcreditcardaccountsgood,noofpersonalloanaccountsbad,noofpersonalloanaccountsgood
0,97,13,24041.0,0.0,7,0,-,-,2,0,0,105435.0,0,0.0,0.0,77404.0,0,0.0,0.0,0,0,0,0,0,0,3,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,1
1,9714953,2,0.0,0.0,17,0,-,-,1,0,0,294770.0,0,0.0,0.0,132176.0,0,0.0,0.0,0,0,0,0,0,0,3,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,2
2,9714978,109,12000.0,0.0,3,0,-,-,1,0,0,110919.0,0,0.0,0.0,7000.0,0,0.0,0.0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Justification:

- By selecting these credit and account variables, we are capturing valuable behavioral and credit information about each customer. The chosen variables provide a comprehensive view of the customer's credit history, risk exposure, repayment behavior, and overall financial health. Using these features during the risk scoring process will enable more accurate and informed decisions in assessing the creditworthiness and default risk of customers, ultimately improving the risk scoring model's performance and reliability. The focus on variables with non-zero and non-null values ensures that relevant information is available for decision-making, enhancing the model's predictive power and effectiveness.

CREDIT ACCOUNT VARIABLES:

- 'creditaccountsummary_rating': This variable provides the credit rating of the customer's account, which indicates the creditworthiness and risk associated with the account. It will be crucial for assessing the overall credit risk of the customer.

- 'creditaccountsummary_amountarrear': This variable represents the amount in arrears for the customer's account, indicating the outstanding debt that is overdue. It helps assess the customer's payment behavior and financial health.

- 'creditaccountsummary_totalaccounts': This variable shows the total number of accounts associated with the customer, which gives an idea of their credit exposure and financial activity.

- 'creditaccountsummary_totalaccountarrear': The total amount in arrears across all the customer's accounts reflects the overall default risk, providing insight into the customer's ability to manage multiple accounts.

- 'creditaccountsummary_totalaccountarrear1': Similar to the previous variable, this represents the total amount in arrears, but it may provide additional information or a different time frame for arrears.

- 'creditaccountsummary_totaloutstandingdebt': This variable indicates the total outstanding debt across all accounts, which helps to understand the customer's overall debt burden and repayment capacity.

ACCOUNT VARIABLES:

- 'accountrating_noofotheraccountsbad': The count of other accounts with bad ratings indicates the customer's exposure to risky accounts. This information is valuable for understanding the customer's credit history and repayment behavior with other lenders.

- 'accountrating_noofotheraccountsgood': Similarly, the count of other accounts with good ratings highlights the customer's positive credit behavior with other lenders, which is a positive factor for risk scoring.

- 'accountrating_noofretailaccountsbad', 'accountrating_nooftelecomaccountsbad', 'accountrating_noofcreditcardaccountsbad', 'accountrating_noofpersonalloanaccountsbad': These variables represent the number of accounts in specific categories with bad ratings, indicating the customer's behavior and risk exposure in different credit segments.

- 'accountrating_noofretailaccountsgood', 'accountrating_nooftelecomaccountsgood', 'accountrating_noofcreditcardaccountsgood', 'accountrating_noofpersonalloanaccountsgood': The count of accounts in different categories with good ratings provides insights into the customer's positive credit history and creditworthiness in various credit segments.






