# Credit Card Approval Use Case

- In machine learning, a use case helps describe how a model is applied to solve a real-world problem—from data collection to prediction and deployment.

## Use Case Structure
- A typical use case includes:

    1. Use Case Name → The title of the use case
    2. Actors → The people or systems interacting with the model
    3. Goal → What the system aims to achieve
    4. Preconditions → What needs to be in place before the process starts
    5. Steps → The sequence of actions performed in the process
    6. Alternate Scenarios → What happens if something goes wrong
    7. Postconditions → The expected outcome after the use case completes
 
### Example: Use Case for Credit Card Approval Model
1. Use Case Name
- Automated Credit Card Approval System


2. Actors
-   Applicant → A person applying for a credit card
-   Machine Learning Model → Predicts approval or rejection
-   Bank System → Collects applications, stores data, and integrates the model

3. Goal
- To automatically approve or reject credit card applications based on applicants' financial data and credit history.
  
4. Preconditions

    - Applicant provides necessary information (income, credit score, job status, etc.).
    - Historical data is available for training the machine learning model.

5. Steps (When Everything Works Fine)

    - Applicant submits a credit card application.
    - The bank system validates and preprocesses the data.
    - The machine learning model predicts Approval (1) or Rejection (0).
    - If approved, the applicant receives a confirmation email.
    - If rejected, the applicant receives a rejection message with possible reasons.

6. Alternate Scenarios (Edge Cases)

    - Missing Data → System prompts the user to complete the form.
    - Unclear Decision → Application is sent for manual review.
    - Fraudulent Data → The system flags it for investigation.

7. Postconditions (Expected Outcome)

    - The applicant is notified of their approval/rejection status.
    - The bank updates its database with the decision.

# ---------- UNDERSTANDING THE BUSINESS PROBLEM ----------

### Credit card companies need to decide who to approve for credit cards
#### Benefits of using machine learning for credit card approval:
 1. Faster decisions - automated systems can process applications in seconds
 2. Consistency - all applications are evaluated using the same criteria
 3. Accuracy - ML can find patterns humans might miss
- A good model helps banks:
  - Approve more creditworthy customers (increasing revenue)
  - Reject high-risk applicants (reducing losses from defaults)

In [133]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

In [134]:
column_names = [
    'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10',
    'A11', 'A12', 'A13', 'A14', 'A15', 'Target'
]

In [135]:
df = pd.read_csv('crx.data', names=column_names,  na_values='?') # https://archive.ics.uci.edu/dataset/27/credit+approval

In [136]:
df

Unnamed: 0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,Target
0,b,30.83,0.000,u,g,w,v,1.25,t,t,1,f,g,202.0,0,+
1,a,58.67,4.460,u,g,q,h,3.04,t,t,6,f,g,43.0,560,+
2,a,24.50,0.500,u,g,q,h,1.50,t,f,0,f,g,280.0,824,+
3,b,27.83,1.540,u,g,w,v,3.75,t,t,5,t,g,100.0,3,+
4,b,20.17,5.625,u,g,w,v,1.71,t,f,0,f,s,120.0,0,+
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
685,b,21.08,10.085,y,p,e,h,1.25,f,f,0,f,g,260.0,0,-
686,a,22.67,0.750,u,g,c,v,2.00,f,t,2,t,g,200.0,394,-
687,a,25.25,13.500,y,p,ff,ff,2.00,f,t,1,t,g,200.0,1,-
688,b,17.92,0.205,u,g,aa,v,0.04,f,f,0,f,g,280.0,750,-


In [137]:
df.head()

Unnamed: 0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,Target
0,b,30.83,0.0,u,g,w,v,1.25,t,t,1,f,g,202.0,0,+
1,a,58.67,4.46,u,g,q,h,3.04,t,t,6,f,g,43.0,560,+
2,a,24.5,0.5,u,g,q,h,1.5,t,f,0,f,g,280.0,824,+
3,b,27.83,1.54,u,g,w,v,3.75,t,t,5,t,g,100.0,3,+
4,b,20.17,5.625,u,g,w,v,1.71,t,f,0,f,s,120.0,0,+


In [138]:
df.shape

(690, 16)

In [139]:
df.tail()

Unnamed: 0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,Target
685,b,21.08,10.085,y,p,e,h,1.25,f,f,0,f,g,260.0,0,-
686,a,22.67,0.75,u,g,c,v,2.0,f,t,2,t,g,200.0,394,-
687,a,25.25,13.5,y,p,ff,ff,2.0,f,t,1,t,g,200.0,1,-
688,b,17.92,0.205,u,g,aa,v,0.04,f,f,0,f,g,280.0,750,-
689,b,35.0,3.375,u,g,c,h,8.29,f,f,0,t,g,0.0,0,-


In [140]:
df.sample(10)

Unnamed: 0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,Target
469,b,16.33,4.085,u,g,i,h,0.415,f,f,0,t,g,120.0,0,-
405,a,69.5,6.0,u,g,ff,ff,0.0,f,f,0,f,s,0.0,0,-
228,b,19.67,0.375,u,g,q,v,2.0,t,t,2,t,g,80.0,0,+
277,a,18.17,10.0,y,p,q,h,0.165,f,f,0,f,g,340.0,0,-
539,b,80.25,5.5,u,g,,,0.54,t,f,0,f,g,0.0,340,-
104,b,27.83,4.0,y,p,i,h,5.75,t,t,2,t,g,75.0,0,-
268,b,59.67,1.54,u,g,q,v,0.125,t,f,0,t,g,260.0,0,+
570,b,59.5,2.75,u,g,w,v,1.75,t,t,5,t,g,60.0,58,+
649,a,35.17,3.75,u,g,ff,ff,0.0,f,t,6,f,g,0.0,200,-
591,b,27.0,0.75,u,g,c,h,4.25,t,t,3,t,g,312.0,150,+


In [141]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 16 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A1      678 non-null    object 
 1   A2      678 non-null    float64
 2   A3      690 non-null    float64
 3   A4      684 non-null    object 
 4   A5      684 non-null    object 
 5   A6      681 non-null    object 
 6   A7      681 non-null    object 
 7   A8      690 non-null    float64
 8   A9      690 non-null    object 
 9   A10     690 non-null    object 
 10  A11     690 non-null    int64  
 11  A12     690 non-null    object 
 12  A13     690 non-null    object 
 13  A14     677 non-null    float64
 14  A15     690 non-null    int64  
 15  Target  690 non-null    object 
dtypes: float64(4), int64(2), object(10)
memory usage: 86.4+ KB


In [142]:
df['A1'].unique()

array(['b', 'a', nan], dtype=object)

In [143]:
df['A1'].value_counts()

A1
b    468
a    210
Name: count, dtype: int64

In [144]:
df.isnull().sum()

A1        12
A2        12
A3         0
A4         6
A5         6
A6         9
A7         9
A8         0
A9         0
A10        0
A11        0
A12        0
A13        0
A14       13
A15        0
Target     0
dtype: int64

In [145]:
df['A1'].mode()

0    b
Name: A1, dtype: object

In [146]:
df['A1'].mode()[0]

'b'

In [147]:
# df['A1'] = df['A1'].fillna(df['A1'].mode()[0])

# df['A1'].fillna(df['A1'].mode(), inplace=True)

In [148]:
df.isnull().sum()

A1        12
A2        12
A3         0
A4         6
A5         6
A6         9
A7         9
A8         0
A9         0
A10        0
A11        0
A12        0
A13        0
A14       13
A15        0
Target     0
dtype: int64

In [149]:
# df['A2'] = df['A2'].fillna(df['A2'].median())
# df['A14'] = df['A14'].fillna(df['A2'].mean())

In [150]:
df.isnull().sum()

A1        12
A2        12
A3         0
A4         6
A5         6
A6         9
A7         9
A8         0
A9         0
A10        0
A11        0
A12        0
A13        0
A14       13
A15        0
Target     0
dtype: int64

In [160]:
# Fill missing values with the median
num_columns = ['A2', 'A14']
imputer = SimpleImputer(strategy='median')
df[num_columns] = imputer.fit_transform(df[num_columns])

In [162]:
df.isnull().sum()

A1        12
A2         0
A3         0
A4         6
A5         6
A6         9
A7         9
A8         0
A9         0
A10        0
A11        0
A12        0
A13        0
A14        0
A15        0
Target     0
dtype: int64

In [None]:
# HomeWork
