# Scrapping Bank Transaction Alert for Monthly Analysis 

Everytime a transaction is carried out on my bank account, a mail is sent to my Gmail. This mail comes with a transaction summary which includes A/C number, account name, description, reference number, transaction branch, transaction date, value date and available balance. As an individual, I would love to view my whole transaction details from a dashboard, for instance, through Microsoft Power BI mobile app. 

The aim of this project is to use the gmail api to access and extract few parameters from the transaction summary, then save it as an excel file. This file will then be used for visualization on Microsoft Power BI.

## Importing required modules

In [31]:
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
import base64
from bs4 import BeautifulSoup as bs
import re
import pandas as pd

## Defining a Scope and Creating Authentication

In [52]:
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
with open('token.json', 'rb') as token:
    creds = Credentials.from_authorized_user_file('token.json', SCOPES)

service = build('gmail', 'v1', credentials=creds)
profile = service.users().getProfile(userId='me').execute()
profile = pd.DataFrame([profile])
profile # Viewing Profile

Unnamed: 0,emailAddress,messagesTotal,threadsTotal,historyId
0,dongoodygoody@gmail.com,3830,3097,2867300


## Creating a Function to Extract the Parameters From Transaction Summary.
**Note:** The default extraction for this function is 50 transaction. The more transaction you extract, the longer the code runs. 

After test, extracting 500 transaction will take approximately 6 mins.

In [40]:
def extract_transaction(maxResult=50):
    # Filtering for transaction mails
    filter = service.users().messages().list(userId = 'me', maxResults=maxResult,
                                         q = 'from:no_reply@accessbankplc.com \
                                             subject:AccessAlert Transaction Alert').execute()

    # Extracting Ids from filtered mail
    filter_id = filter['messages']
    id_lst = []
    for ids in filter_id:
        id_lst.append(ids['id'])
    
    # Accessing trasaction summary
    trans_lst = []
    for each_id in id_lst:
        msgs = service.users().messages().get(userId = 'me', id = each_id).execute()
        snippet = msgs['snippet']
        main_body = msgs['payload']['body']['data']
        main_body = main_body.replace('-', '+').replace('_', '/')
        decode_msg = base64.b64decode(main_body)
        soup = bs(decode_msg, 'lxml')
        details = soup.find_all('tr')[7]
        
        # Extracting required parameters from transaction summary
        description = details.find_all('td')[6].text.replace('\r',' ').replace("\n"," ").strip()
        reference_number = details.find_all('td')[8].text.replace('\r',' ').replace("\n"," ").strip()
        trans_branch = details.find_all('td')[10].text.replace('\r',' ').replace("\n"," ").strip()
        date = msgs['payload']['headers'][-1]['value']
        amount = re.search('\d*\.+\d+', snippet).group()
        acct_no = re.search('\d*\*+\d+', snippet).group()
        
        # Checking the type of transaction
        if 'Credited' in snippet:
            trans_type = 'Credited'
        else:
            trans_type = 'Debited'
        
        # Appending extracted parameters to a dictionary
        trans_lst.append({
            'amount': amount,
            'a/c_number': acct_no,
            'trans_type': trans_type,
            'description': description,
            'reference_number': reference_number,
            'trans_branch': trans_branch,
            'datetime': date
        })
    
    # Assigning columns names and returning a dataframe of extracted parameters
    cols_name = ['amount', 'a/c_number', 'trans_type', 'description', 'reference_number', 'trans_branch', 'datetime']
    data = pd.DataFrame(trans_lst, columns=cols_name)
    return data

## Extraction

In [53]:
extract_transaction(500)

Unnamed: 0,amount,a/c_number,trans_type,description,reference_number,trans_branch,datetime
0,300.00,162******608,Debited,AIRTIME/ MTN/08168550974,099MJKL22253cxrA,HEAD OFFICE,"Sat, 10 Sep 2022 15:52:45 +0530"
1,200.00,162******608,Debited,AIRTIME/ MTN/08169327250,099MJKL22252PRIV,HEAD OFFICE,"Sat, 10 Sep 2022 00:36:44 +0530"
2,3000.00,070******357,Debited,POS/WEB PMT T_KCEE MAGNET GLOBA 006018 203,098WNVI222524j42,CENTRAL PROCESSING BRANCH,"Fri, 9 Sep 2022 12:41:36 +0530"
3,5000.00,070******357,Credited,TRF//FRM GOODRICH IFEANYI OKORO TO GOODRIC,099MJKL222522wGH,HEAD OFFICE,"Fri, 9 Sep 2022 12:36:09 +0530"
4,5000.00,162******608,Debited,TRF//FRM GOODRICH IFEANYI OKORO TO GOODRIC,099MJKL222522wGH,HEAD OFFICE,"Fri, 9 Sep 2022 12:36:08 +0530"
...,...,...,...,...,...,...,...
495,2010.75,070******357,Debited,TRF/Airtime Sales/FRM GOODRICH IFEANYI OKO,099MJKL21342A76o,HEAD OFFICE,"Wed, 8 Dec 2021 18:10:23 +0530"
496,5000.00,070******357,Debited,TRF/from ubongabasi/FRM GOODRICH IFEANYI O,099MJKL213428yux,HEAD OFFICE,"Wed, 8 Dec 2021 17:18:39 +0530"
497,30000.00,070******357,Credited,APPLIQUE FORMATII FARMS LIMITED/Transfer f,099MNIP213421VYC,HEAD OFFICE,"Wed, 8 Dec 2021 14:46:54 +0530"
498,2010.75,070******357,Debited,TRF/Airtime Sales/FRM GOODRICH IFEANYI OKO,099MJKL21341DIaa,HEAD OFFICE,"Tue, 7 Dec 2021 19:31:31 +0530"
