<h1 align="center">HASHING TRANSACTIONS</h1> 

### Introduction

The purpose of this script is take in transactions from different accounts and then hash all the transactions. The outcome will be a dataframe that has an account number and a corresponding hash.

### Step 1: Import Packages

Import packages such as numpy and pandas for the data manipulations and hashlib to create the hasing function. Import warnings to ignore warnings when overwritting data in a dataframe.

In [1]:
# Import all packages we will use in this project
import pandas as pd
import numpy as np
import hashlib

import warnings
warnings.filterwarnings('ignore')

### Step 2: Import Data

In [None]:
Import an excel file that has multiple transactions from different account numbers.

In [2]:
# Read in the Excel that has the transactions
df = pd.read_excel('bank.xlsx',sheet_name=0)

In [17]:
df.head(15)

Unnamed: 0,Account No,DATE,TRANSACTION DETAILS,CHQ.NO.,VALUE DATE,WITHDRAWAL AMT,DEPOSIT AMT,BALANCE AMT,.,Data
0,409000611074,2017-06-29,TRF FROM Indiaforensic SERVICES,0.0,2017-06-29,0.0,1000000.0,1000000.0,.,b' DATE TRANSACTION ...
1,409000611074,2017-07-05,TRF FROM Indiaforensic SERVICES,0.0,2017-07-05,0.0,1000000.0,2000000.0,.,b' DATE TRANSACTION ...
2,409000611074,2017-07-18,FDRL/INTERNAL FUND TRANSFE,0.0,2017-07-18,0.0,500000.0,2500000.0,.,b' DATE TRANSACTION ...
3,409000611074,2017-08-01,TRF FRM Indiaforensic SERVICES,0.0,2017-08-01,0.0,3000000.0,5500000.0,.,b' DATE TRANSACTION ...
4,409000611074,2017-08-16,FDRL/INTERNAL FUND TRANSFE,0.0,2017-08-16,0.0,500000.0,6000000.0,.,b' DATE TRANSACTION ...
5,409000611074,2017-08-16,FDRL/INTERNAL FUND TRANSFE,0.0,2017-08-16,0.0,500000.0,6500000.0,.,b' DATE TRANSACTION ...
6,409000611074,2017-08-16,FDRL/INTERNAL FUND TRANSFE,0.0,2017-08-16,0.0,500000.0,7000000.0,.,b' DATE TRANSACTION ...
7,409000611074,2017-08-16,FDRL/INTERNAL FUND TRANSFE,0.0,2017-08-16,0.0,500000.0,7500000.0,.,b' DATE TRANSACTION ...
8,409000611074,2017-08-16,FDRL/INTERNAL FUND TRANSFE,0.0,2017-08-16,0.0,500000.0,8000000.0,.,b' DATE TRANSACTION ...
9,409000611074,2017-08-16,FDRL/INTERNAL FUND TRANSFE,0.0,2017-08-16,0.0,500000.0,8500000.0,.,b' DATE TRANSACTION ...


### Step 3: Data Cleaning

In [4]:
# Remove apostrophe at the end of the account number
df['Account No'] = df['Account No'].str[:-1].astype(int)

# Conver all NaN values to 0
df['WITHDRAWAL AMT'] = df['WITHDRAWAL AMT'].fillna(0)
df['DEPOSIT AMT'] = df['DEPOSIT AMT'].fillna(0)
df['CHQ.NO.'] = df['CHQ.NO.'].fillna(0)

### Step 4: Create Dataframe For Hashes

In [6]:
df_Hash = df

In [7]:
# Create column that has all the details of the other columns as a string
df_Hash['Data'] = str(df_Hash[['DATE','TRANSACTION DETAILS','CHQ.NO.','VALUE DATE','WITHDRAWAL AMT','DEPOSIT AMT','BALANCE AMT']]).encode()
df_Hash['Data'] = df_Hash['Data'].astype(str)

### Step 5: Group All Transactions By Account Number

In [8]:
df_Hash['Data'] = df_Hash.groupby(['Account No'])['Data'].transform(lambda x: ', '.join(x))

In [9]:
# Group all the data that has the same account number
df_Hash = df_Hash.groupby('Account No').first().reset_index()

In [10]:
#Only use in account number and transaction data field
df_Hash = df_Hash[['Account No','Data']]

# Convert account number to integer
df_Hash['Account No'] = df_Hash['Account No'].astype(int)

In [11]:
df_Hash

Unnamed: 0,Account No,Data
0,1196428,b' DATE TRANSACTION ...
1,1196711,b' DATE TRANSACTION ...
2,409000362497,b' DATE TRANSACTION ...
3,409000405747,b' DATE TRANSACTION ...
4,409000425051,b' DATE TRANSACTION ...
5,409000438611,b' DATE TRANSACTION ...
6,409000438620,b' DATE TRANSACTION ...
7,409000493201,b' DATE TRANSACTION ...
8,409000493210,b' DATE TRANSACTION ...
9,409000611074,b' DATE TRANSACTION ...


### Step 6: Creating Hashing Function

In [12]:
def hash_transaction(transaction):
    
    # Encode the transaction data as a string
    transaction_str = str(transaction).encode()

    # Create a hash variable and use it to generate a hash of the transaction data
    hashing = hashlib.sha256()
    hashing.update(transaction_str)
    transaction_hash = hashing.hexdigest()

    return transaction_hash

### Steps 7: Apply Hashing Function

In [13]:
# Apply hash function to data column
df_Hash['Data'] = df_Hash['Data'].apply(hash_transaction)

In [14]:
df_Hash

Unnamed: 0,Account No,Data
0,1196428,f6f8f36be8c53af7aefcae06fc14c0b765192e217a4882...
1,1196711,b9e5c3701b0d777d282d2415dc080622e9dc52dc6aaf18...
2,409000362497,dcc4fdf0472a0f051a444578c04f466727b5153f9af721...
3,409000405747,b627a8aafa8afdd7f742d04138eea1d6d1c0c2e9a56eb0...
4,409000425051,f15afa9f3c9df44f18925a8a6fe202eadb734ec3debb30...
5,409000438611,306d72534538ff700da25f547216d9119006ff43e8eb98...
6,409000438620,0911109e1caef238977d78aea0e634c1d55f56d9d3dd4c...
7,409000493201,de16a8389dc146a36fc3aa63dc9dc129e9cc8921e698de...
8,409000493210,17eea2bdc6e52963ac23f4f9475439fa87fe1765b76798...
9,409000611074,9bbe1db64e90c2d4966d23f70a7521fc12139b8d7ea32e...
