## Basic Email/Phone/Name cleaning
* if firstname/lastname are empty or null or Unidentified First NAme then set it to **undentified**
* if emails contains .co or .c or COM or Com or com is appended without dot then all corrected with suffix **.com**
* if phone number contains country code then put it into bracket eg. 919826483920 is now corrected to **(91)** 9826483920

In [2]:
import pandas as pd

file_path = 'raw.csv'

df = pd.read_csv(file_path)

print(df.head().to_markdown())

|    | FIRSTNAME               | LASTNAME                |      BIN |       MOBILE | EMAIL                      |   PRIMARYCARD | CARD_CREATION_DATE   |   ACCOUNTSERNO |   CARDSERNO |   RMNAME |
|---:|:------------------------|:------------------------|---------:|-------------:|:---------------------------|--------------:|:---------------------|---------------:|------------:|---------:|
|  0 | Unidentified First NAme | ANSH                    | 42113700 | 919398742700 | ANSHSINGHVEE@GMAIL.COM     |             0 | 10Jul2023            |       27965111 |    37346654 |      nan |
|  1 | Unidentified First NAme | ANUSH                   | 42113700 | 919626999111 | Aswin@herofAshion.com      |             0 | 26Nov2021            |       29337588 |    20615046 |      nan |
|  2 | Unidentified First NAme | GOUTHAM                 | 42113700 | 919008839300 | SAMARATH@PRITHVIJEWELS.COM |             0 | 12Jul2022            |       39259747 |    26753794 |      nan |
|  3 | Unidentified First

In [3]:
df_copy = df.copy()
df_copy['FIRSTNAME'] = df_copy['FIRSTNAME'].apply(lambda x: 'unidentified' if pd.isna(x) or x.lower() == 'unidentified first name' else x)
df_copy['LASTNAME'] = df_copy['LASTNAME'].apply(lambda x: 'unidentified' if pd.isna(x) or x.lower() == 'unidentified last name' else x)

print(df_copy.head().to_markdown())

|    | FIRSTNAME    | LASTNAME                |      BIN |       MOBILE | EMAIL                      |   PRIMARYCARD | CARD_CREATION_DATE   |   ACCOUNTSERNO |   CARDSERNO |   RMNAME |
|---:|:-------------|:------------------------|---------:|-------------:|:---------------------------|--------------:|:---------------------|---------------:|------------:|---------:|
|  0 | unidentified | ANSH                    | 42113700 | 919398742700 | ANSHSINGHVEE@GMAIL.COM     |             0 | 10Jul2023            |       27965111 |    37346654 |      nan |
|  1 | unidentified | ANUSH                   | 42113700 | 919626999111 | Aswin@herofAshion.com      |             0 | 26Nov2021            |       29337588 |    20615046 |      nan |
|  2 | unidentified | GOUTHAM                 | 42113700 | 919008839300 | SAMARATH@PRITHVIJEWELS.COM |             0 | 12Jul2022            |       39259747 |    26753794 |      nan |
|  3 | unidentified | LAKSHMI PRABHA SELVARAJ | 42113700 | 919003711551 | susilA

In [4]:
df_copy['EMAIL'] = df_copy['EMAIL'].str.replace(r'@c$', '@com', case=False)
df_copy['EMAIL'] = df_copy['EMAIL'].str.replace(r'@co$', '@com', case=False)
df_copy['EMAIL'] = df_copy['EMAIL'].str.replace(r'(?<!\.)com$', '.com', case=False)
df_copy['EMAIL'] = df_copy['EMAIL'].str.lower()

print(df_copy.head().to_markdown())

|    | FIRSTNAME    | LASTNAME                |      BIN |       MOBILE | EMAIL                      |   PRIMARYCARD | CARD_CREATION_DATE   |   ACCOUNTSERNO |   CARDSERNO |   RMNAME |
|---:|:-------------|:------------------------|---------:|-------------:|:---------------------------|--------------:|:---------------------|---------------:|------------:|---------:|
|  0 | unidentified | ANSH                    | 42113700 | 919398742700 | anshsinghvee@gmail.com     |             0 | 10Jul2023            |       27965111 |    37346654 |      nan |
|  1 | unidentified | ANUSH                   | 42113700 | 919626999111 | aswin@herofashion.com      |             0 | 26Nov2021            |       29337588 |    20615046 |      nan |
|  2 | unidentified | GOUTHAM                 | 42113700 | 919008839300 | samarath@prithvijewels.com |             0 | 12Jul2022            |       39259747 |    26753794 |      nan |
|  3 | unidentified | LAKSHMI PRABHA SELVARAJ | 42113700 | 919003711551 | susila

In [5]:
df_copy['MOBILE'] = df_copy['MOBILE'].astype(str).apply(lambda x: f"({x[:2]}){x[2:]}" if x.startswith('91') else x)

print(df_copy.head().to_markdown())

|    | FIRSTNAME    | LASTNAME                |      BIN | MOBILE         | EMAIL                      |   PRIMARYCARD | CARD_CREATION_DATE   |   ACCOUNTSERNO |   CARDSERNO |   RMNAME |
|---:|:-------------|:------------------------|---------:|:---------------|:---------------------------|--------------:|:---------------------|---------------:|------------:|---------:|
|  0 | unidentified | ANSH                    | 42113700 | (91)9398742700 | anshsinghvee@gmail.com     |             0 | 10Jul2023            |       27965111 |    37346654 |      nan |
|  1 | unidentified | ANUSH                   | 42113700 | (91)9626999111 | aswin@herofashion.com      |             0 | 26Nov2021            |       29337588 |    20615046 |      nan |
|  2 | unidentified | GOUTHAM                 | 42113700 | (91)9008839300 | samarath@prithvijewels.com |             0 | 12Jul2022            |       39259747 |    26753794 |      nan |
|  3 | unidentified | LAKSHMI PRABHA SELVARAJ | 42113700 | (91)9003711

In [18]:
df_copy.to_csv('epn_cleaned.csv', index=False)

## Axis changes
* added **createpassword** column , formula for password :  first letter of firstname + last 4 digits of mobile + first letter of lastname + first 2 letters of email

In [6]:
df_copy['createpassword'] = df_copy['FIRSTNAME'].str[0] + df_copy['MOBILE'].str[-4:] + df_copy['LASTNAME'].str[0] + df_copy['EMAIL'].str[:2]
print(df_copy.head().to_markdown())

|    | FIRSTNAME    | LASTNAME                |      BIN | MOBILE         | EMAIL                      |   PRIMARYCARD | CARD_CREATION_DATE   |   ACCOUNTSERNO |   CARDSERNO |   RMNAME | createpassword   |
|---:|:-------------|:------------------------|---------:|:---------------|:---------------------------|--------------:|:---------------------|---------------:|------------:|---------:|:-----------------|
|  0 | unidentified | ANSH                    | 42113700 | (91)9398742700 | anshsinghvee@gmail.com     |             0 | 10Jul2023            |       27965111 |    37346654 |      nan | u2700Aan         |
|  1 | unidentified | ANUSH                   | 42113700 | (91)9626999111 | aswin@herofashion.com      |             0 | 26Nov2021            |       29337588 |    20615046 |      nan | u9111Aas         |
|  2 | unidentified | GOUTHAM                 | 42113700 | (91)9008839300 | samarath@prithvijewels.com |             0 | 12Jul2022            |       39259747 |    26753794 |      

In [10]:
bin_ranges = pd.read_csv("bin_ranges.csv")
print(bin_ranges.head(30).to_markdown())

|    | Plan ID   | Bin                 | Plan                                                    |
|---:|:----------|:--------------------|:--------------------------------------------------------|
|  0 | Plan001   | 41114600            | Axis Bank Signature Credit Card                         |
|  1 | Plan002   | 41114601            | Axis Bank Signature Card                                |
|  2 | Plan003   | 41114602            | Axis Bank Signature Card With Lifestyle Benefit         |
|  3 | Plan004   | 41114603            | Axis Bank Signature Card With Travel Benefit            |
|  4 | Plan005   | 41114604            | Axis Bank Advantage Credit Card                         |
|  5 | Plan006   | 41114606            | Axis Bank Select Credit Card                            |
|  6 | Plan007   | 42113700            | Burgundy Private Credit Card                            |
|  7 | Plan008   | 42113701            | Burgundy Private NRI Credit Card                        |
|  8 | Pla

In [None]:
df_copy.to_csv('axis_cleaned.csv', index=False)

In [None]:
# nouser@eamil.com
#