# Khalifa University - CACM

### Table of Contents

- *Student Master Data* 
- *Human Resources Master Data*
- *Finance Master Data*

### Master Data Naming Convention

The scripts and pandas dataframe across the jupyter notebook follows the following naming convention:

<div class="alert alert-block alert-warning">
<b>Example:</b> df_[type of database]_[master] </div>

where df is the dataframe
type of database is students, finance, human resources
master refers to master data

### Exceptions Naming Convention

Additionally, all exceptions are named using the following naming convention:

<div class="alert alert-block alert-warning">
<b>Example:</b> df_[type of database]_[master]_[exception] </div>

where df is the dataframe
type of database is students, finance, human resources
master refers to master data
exception refers to type of exception, for e.g., duplicate students, employee details, invoices, payments etc.

# Import Data from SQL Server

### Key Libraries

**Pymssql** is a sql connector package that enables to extract all information in the server directly into pandas dataframe as an input.

Using pymssql extract all information from different sql queries and can be utilized for analysis / creating the mongoDB. 

The below scripts in other sections can utilize the pandas dataframe directly for running the analysis instead of connecting to excel files.

**Pandas** - Pandas is a fast, powerful, flexible and easy to use open source library which enables the user to perform data manipulation from any source (SQL, Excel, JSON, etc.)

**Numpy** - Numpy is a versatile tool in python for data analysis which enables to user to perform analysis around n-dimensional arrays, numerical computing tools and object oriented programming for ensuring readability / scalability of the code.

**Time** - Time is a library to import and stamp the current data / time of analysis and store for future information incase of tracking previous iteration of exceptions.

Additionally, the naming convention within python can assist in utilizing these libraries as abbreviations.

<div class="alert alert-block alert-warning">
<b>Example:</b> **import pandas as pd** enables the user to use "pd" within the python code to the use pandas library</div>

In [1]:
import pymssql
import pandas as pd
import numpy as np

In [2]:
conn = pymssql.connect(server = "PRDSQL16BADBE.kunet.ae", user = "KUNET\ku1016", password = "@Amal291133", database = "CACM")

In [3]:
cursor = conn.cursor()

To extract the relevant information from the SQL server, below the following example:

<div class="alert alert-block alert-warning">
<b>Example:</b> Query = SELET * FROM [Database] WHERE [Condition] </div>

where Database is the name of the database;
and conditions can be added as required such as WHERE, GroupBy, Join etc.

**Extract Student Master Information**

In [4]:
query = "SELECT * FROM dbo.KU_SRC_STD_MASTER"

df_student_master = pd.read_sql(query, conn)

**Extract Student Course Schedule**

In [5]:
query = "SELECT * FROM dbo.KU_SRC_STD_CRS_SCHEDULE"

df_student_course_schedule = pd.read_sql(query, conn)

**Extract Student Attednance**

In [6]:
query = "SELECT * FROM dbo.KU_SRC_STD_ATTENDANCE"

df_student_attendance = pd.read_sql(query, conn)

**Extract Employee Master Information**

In [7]:
query = "SELECT * FROM dbo.KU_SRC_EmployeeDetails"

df_employee_master = pd.read_sql(query, conn)

**Extract Employee Bank Details**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_EmployeesBankDetails"

df_employee_bank_details = pd.read_sql(query, conn)

**Extract Employee Leaves**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_EmployeesLeaves"

df_employee_leaves = pd.read_sql(query, conn)

**Extract AP Invoices**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_AP_Invoices"

df_AP_invoices = pd.read_sql(query, conn)

**Extract Purchase Order Master**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_POs"

df_POs = pd.read_sql(query, conn)

**Extract Purchase Requisitions Information**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_PRs"

df_PRs = pd.read_sql(query, conn)

**Extract IT Tickets**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_IT_Tickets"

df_it_tickets = pd.read_sql(query, conn)

**Extract Supplier Information**

In [None]:
query = "SELECT * FROM dbo.KU_SRC_Suppliers"

df_supplier_master = pd.read_sql(query, conn)

## Write into SQL Server

For writing files / exceptions into SQL Server, the following code can be utilized:

<div class="alert alert-block alert-warning">
<b>Example:</b> Code = [dataframe].to_sql("[Name of table / database]", conn) </div>

where dataframe is the name of the pandas dataframe
to_sql writes into the SQL server
Name of table / database is the name of the database in the SQL server
conn is the connection established to the SQL server

# Scripts for Student, Procurement and Finance Master Data

# 1. Student Master Data

Pandas dataframe library within python enables to import information from any database including SQL, Excel, Text, JSON, etc. To utilize Pandas Dataframe, it is important to designate the type of read access.

For importing from excel database the following naming convention is used:

<div class="alert alert-block alert-warning">
<b>Example:</b> df_[type of database]_[master] = pd.read_excel("[Name of file].xlsx" </div>

In [None]:
df_student_master.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,


## 1.1 Duplicate Student Records

### 1.1 Overall Duplicates

The below set of scripts enable the user to identify the duplicates in student master data based on **student ID, student names, emirates ID, passport details and length of emirates ID**.

Additionally, the script can further be modified to track active and inactive students to identify the admission of academically dismissed students to further test the re-admission of such students as per KU policies and procedures.

In [None]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated()]

In [None]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR


### 1.1.2 Student ID Duplicates

Student ID line items are stored in the database as column name **"ID"**. Therefore, to identify the duplicates, the script is enabled on column name "ID".

In [None]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated(['ID'])]

In [None]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR


### 1.1.3 Student Names Duplicates

Student Names line items are stored in the database as column name **"FULL_NAME"**. Therefore, to identify the duplicates, the script is enabled on column name "FULL_NAME".

In [None]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated(['FULL_NAME'])]

In [None]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR
3589,100058270,,Mohammad,Awadhalla,Mohammad Fadi Awadhalla,محمد فادي عوض الله,202010,2020-08-23 00:00:00,M,A,...,,,,Albarsha 2 Street 29 Villa 8,Dubai,United Arab Emirates,,,,Khaled Elbassioni
4156,100035081,,Alya,Alsaadi,Alya Alsaadi,,200920,2009-11-23 00:00:00,F,S,...,,,,,Abu Dhabi,United Arab Emirates,,,,
6735,100042377,,Hend,Mohamed Ahmed Almarzooqi,Hend Ahmed Mohamed Ahmed Almarzooqi,,201510,2015-08-23 00:00:00,F,A,...,,,,AL FALAH,Abu Dhabi,United Arab Emirates,,,,
6986,100001904,,Abdulla,Al Ali,Abdulla Rashid Al Ali,,200710,2007-06-18 12:06:23,M,S,...,,,,Abu Dhabi 77334,Abu Dhabi,United Arab Emirates,,,,
7092,100048093,,Nouf,Alzaabi,Nouf Ebrahim Alzaabi,نوف ابراهيم سالم مسعود الزعابي,201310,2013-09-08 00:00:00,F,M,...,,,,UN,UN,United Arab Emirates,,,,
7722,100046318,,Hanan,Hamdan,Hanan Ahmad Mohammad Hamdan,حنان احمد محمد حمدان,201520,2016-01-10 00:00:00,F,P,...,,,,P.O. Box 1065,Al Ain,JORDAN,,,0.0,
7757,100052916,,Fadi,Dawaymeh,Fadi Zeyad Dawaymeh,فادي زياد نواف دوايمه,201910,2019-08-25 00:00:00,M,A,...,,,,"Abu Dhabi,behind one to one hotel, Malqatah st...",Abu Dhabi,JORDAN,--,,,Nahla Saeed Al Amoodi
7808,100058255,,Dima,Ali,Dima Samer Ali,ديمة سامر علي,202010,2020-08-23 00:00:00,F,A,...,,,,Abu Dhabi,Abu Dhabi,JORDAN,,,,Isam Mustafa Janajreh
7833,100043364,,Fareha,Nasim,Fareha Zainab Nasim,,201520,2016-01-10 00:00:00,F,A,...,,,,Fatima Bint Mubarak Street,Abu Dhabi,PAKISTAN,,,,
7962,100061899,,Muhammad Ahmed,Humais,Muhammad Ahmed Humais,محمد أحمد حميس,202120,2022-01-17 00:00:00,M,A,...,,,,"R-1931, Block-14, Federal B Area",Karachi,PAKISTAN,,,,Mahmoud Al Qutayri


In [None]:
## write into excel; if required

## df_student_master_duplicate.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Duplicate Student Names.xlsx")

In [None]:
df_student_master_duplicate.to_sql("Dup_Student_Records", conn)

### 1.1.4 Emirates ID Duplicate

Student Emirates ID line items are stored in the database as column name **"EMIRATES_ID"**. Therefore, to identify the duplicates, the script is enabled on column name "EMIRATES_ID".

In [None]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated(['EMIRATES_ID'])]

In [None]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN,Length
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,M,S,,Inactive,...,,,,,,,NaT,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,M,S,,Inactive,...,,,,,,,NaT,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,M,S,,Inactive,...,,81.5,78,78,,,NaT,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,M,S,,Inactive,...,,,,,,,NaT,,,
6,100020031,,Hamad,Tunaiji,Hamad Saeed Mohammed Tunaiji,,M,S,,Inactive,...,,90,75,84,,,NaT,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16326,100061196,,Maryam,Ali,Maryam Abdulla Ali,مريم عبدالله علي,F,A,,Inactive-Did not register,...,,94.7,,,,6.5,2020-09-05,,,
16353,100058885,,Muhammad,Danishwar,Muhammad Zulfiqar Ahmad Danishwar,,M,A,,Inactive-Did not register,...,,79.27,,,,,NaT,,,
16361,100058294,,Nowshin Radiya,Kabir,Nowshin Radiya Kabir,,F,A,,Inactive-Did not register,...,,39,,,,,NaT,,,
16363,100058314,,Cyril,Pepple,Cyril Christopher Pepple,,M,A,,Inactive-Did not register,...,,B,,,,,NaT,,,


In [None]:
## write into excel; if required

## df_student_master_duplicate.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Emirates ID Duplicate.xlsx")

In [None]:
df_student_master_duplicate.to_sql("Dup_Student_Records_EmiratesID", conn)

### 1.1.5 Emirates ID

Emirates ID across the UAE contain exactly 15 digits, any deviation to the length of Emirates ID is an invalid detail. To identify such discrepancies, we utilize the **str.len() function** of python which counts the number of characters of each cell in the column.

Further, we filter the results for more than and less than **15 digits to identify discrepancies**.

We noted many student details did not contain Emirates ID details and such students are discarded as part of the test since all those students are inactive.

In [None]:
df_student_master['Length'] = df_student_master['EMIRATES_ID'].str.len()

In [None]:
df_student_master_EID = df_student_master[(df_student_master['Length'] < 15) | (df_student_master['Length'] > 15)]

In [None]:
df_student_master_EID

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN,Length
57,100020289,,Marwa,Jaber,Marwa Ali Awadh Jaber,مروة علي عوض جبر,F,A,201410.0,Inactive due to Graduation,...,PALESTINE SCHOOL-F,80.7,96,193,,6.0,2009-06-14,,,7.0
128,100035191,,Zainab,Moazzam,Zainab Muhammad Moazzam Moazzam,زينب محمد معظم,F,A,201320.0,Inactive due to Graduation,...,AL ROWAIS INTER. PVT. SCH.G,87,85,81,,,NaT,,,7.0
1225,100055530,,Saif,Yaqoub,Saif Darwish Mustafa Yaqoub,سيف درويش مصطفى يعقوب,M,H,,Inactive due to Graduation,...,,,,,,,NaT,,,6.0
2230,100035729,,Faisal,Alameeri,Faisal Abdulkareim Kamal Omar Alameeri,فيصل عبدالكريم كمال عمر الأميري,M,S,,Inactive due to Graduation,...,,,,,,,NaT,,,7.0
2284,100036384,,Youssef,Fora,Youssef Ait Fora,يوسف ايت فورة,M,A,201420.0,Inactive due to Graduation,...,AL NAHDHA NATIONAL PVT. B _AUH,A,,A,,,NaT,,,7.0
4122,100035182,,Maleka,Bin Tarsh,Maleka Abdulbari Awadh Bin Tarsh,مالكه عبدالباري عوض بن طرش,F,A,201330.0,Inactive due to Graduation,...,Al Muttahedah Girls School,97.4,97,98,,,NaT,,,7.0
4170,100035344,,Doae,Ben Khadra,Doae Amin Ahmed Jamal Ben Khadra,دعاء بن خضراء,F,A,201410.0,Inactive due to Graduation,...,ROSARY SCHOOL PVT. _G-AUH,96,92,94,,6.0,2010-07-21,,,7.0
4190,100033787,,Shamsa,Al Nuaimi,Shamsa Nasser Saeed Al Nuaimi,شمسه ناصر سعيد النعيمى,F,A,201420.0,Inactive due to Graduation,...,AL NAHDHA NATIONAL PVT. B _AUH,95.3,92,96,,,NaT,,,7.0
4218,100035200,,Enas,Osman,Enas Azhair Osman,ايناس ازهري احمد عثمان,F,A,201320.0,Inactive due to Graduation,...,"AL JAHLI SCHOOL- G(AL SAROOJ,A",98,95,99,,7.0,2013-11-01,,,7.0
5637,100058142,,Aamna,Al Shehhi,Aamna Mohammed Al Shehhi,آمنة محمد الحيد محمد الشحي,F,M,,Inactive due to Graduation,...,,,,,,7.0,2010-06-26,,,7.0


In [None]:
df_student_master_EID.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Emirates ID Length.xlsx")

In [None]:
df_student_master_EID.to_sql("Dup_Student_Records_EID_Length", conn)

### 1.1.6 Passport

Student Passport Details line items are stored in the database as column name **"PASSPORT_ID"**. Therefore, to identify the duplicates, the script is enabled on column name "PASSPORT_ID".

Additionally, we can also filter results for all active students for whom the passport details are not entered in the system. This test is necessary for international / students of other nationalities.

In [None]:
df_student_master_Passport = df_student_master[(df_student_master['STUDENT_STATUS'] == 'Active') & (df_student_master['PASSPORT_ID'] == "NULL")]

In [None]:
df_student_master_Passport

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN,Length


We can also test for the length in digits for passport details, however the details may or may not be accurate as the number of digits for passport differs across countries / nations.

In [None]:
df_student_master['Passport_Length'] = df_student_master['PASSPORT_ID'].str.len()

In [None]:
df_student_master['Passport_Length']

0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
        ... 
16395    9.0
16396    9.0
16397    9.0
16398    9.0
16399    9.0
Name: Passport_Length, Length: 16400, dtype: float64

In [None]:
df_student_master_Passport = df_student_master[(df_student_master['STUDENT_STATUS'] == 'Active') & (df_student_master['Passport_Length'] < 7)]

In [None]:
df_student_master_Passport

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN,Length,Passport_Length
12633,100039053,,Alyazyah,Alsuwaidi,Alyazyah Ahmed Saeed Binshaheen Alsuwaidi,اليازية أحمد بن شاهين السويدي,F,,202143.0,Active,...,90,,,,7.5,2012-04-12,,,,6.0


In [None]:
## write into excel; if required

## df_student_master_Passport.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Passport Length.xlsx")

In [None]:
df_student_master_Passport.to_sql("Dup_Student_Records_Passport", conn)

## 1.2 Re-admission of previous students

Students at Khalifa University may be re-admitted based on appopriate approvals from Academic Management. To analyze the re-admitted students, we can identify the students admitted based on same Emirates ID versus the previous inactive accounts.

This will require to analyze the output report to identify such instances as one column does not notify of re-admission.

In [None]:
## df_student_master = pd.read_excel("C:/Users/ku1016/Downloads/Student Master Data.xlsx")
## df_student_master.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,BIRTH_DATE,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,M,S,,Inactive,...,,,86.6,89.0,84.0,,6.5,2011-01-19,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,M,S,,Inactive,...,,,,,,,,NaT,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,M,S,,Inactive,...,,,,,,,,NaT,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,M,S,,Inactive,...,,,81.5,78.0,78.0,,,NaT,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,M,S,,Inactive,...,,,,,,,,NaT,,


In [None]:
df_student_master_Readmission = df_student_master[(df_student_master['STUDENT_STATUS'] == 'Inactive') & (df_student_master.duplicated(['PASSPORT_ID']))]

In [None]:
df_student_master_Readmission

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,BIRTH_DATE,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,M,S,,Inactive,...,,,,,,,,NaT,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,M,S,,Inactive,...,,,,,,,,NaT,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,M,S,,Inactive,...,,,81.5,78,78,,,NaT,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,M,S,,Inactive,...,,,,,,,,NaT,,
6,100020031,,Hamad,Tunaiji,Hamad Saeed Mohammed Tunaiji,,M,S,,Inactive,...,,,90,75,84,,,NaT,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15061,100001943,,Mohamed,Al Marzouqi,Mohamed Ali M. Habeeb Al Marzouqi,,M,S,,Inactive,...,1989-06-10 00:00:00,,,,,,,NaT,,
15344,100001555,,Abdulla,Al Hebsi,Abdulla S. Alwan Al Hebsi,,M,S,,Inactive,...,1985-03-24 00:00:00,,,,,,,NaT,,
15347,100001619,,Majed,Al Neaimi,Majed Mohamed S. Alsayyah Al Neaimi,,M,S,,Inactive,...,1987-04-07 00:00:00,,,,,,,NaT,,
15349,100001643,,Mohamed,Mohamed,Mohamed Mohamed Mohamed,,M,S,,Inactive,...,1986-09-07 00:00:00,,,,,,,NaT,,


In [None]:
df_student_master_Readmission.to_sql("Readmitted_students", conn)

## 1.3 Missing Student Information

All active students should have complete information as per the registrar records and protocol within the master data. The below script analyzes the missing information to ensure completeness of the data.

To review the missing information, the library for isnull() is utilized which identifies the missing elements across all columns using the axis (column) technique.

In [None]:
## df_student_master = pd.read_excel("C:/Users/ku1016/Downloads/Student Master Data.xlsx")

In [None]:
df_student_master.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,BIRTH_DATE,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,M,S,,Inactive,...,,,86.6,89.0,84.0,,6.5,2011-01-19,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,M,S,,Inactive,...,,,,,,,,NaT,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,M,S,,Inactive,...,,,,,,,,NaT,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,M,S,,Inactive,...,,,81.5,78.0,78.0,,,NaT,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,M,S,,Inactive,...,,,,,,,,NaT,,


In [None]:
df_student_missing = df_student_master[df_student_master.isnull().any(axis=1)]
df_student_missing.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,BIRTH_DATE,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,M,S,,Inactive,...,,,86.6,89.0,84.0,,6.5,2011-01-19,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,M,S,,Inactive,...,,,,,,,,NaT,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,M,S,,Inactive,...,,,,,,,,NaT,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,M,S,,Inactive,...,,,81.5,78.0,78.0,,,NaT,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,M,S,,Inactive,...,,,,,,,,NaT,,


In [None]:
df_student_missing_final = df_student_missing[(df_student_missing['STUDENT_STATUS'] == 'Active')]

In [None]:
df_student_missing_final.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,GENDER,CAMPUS,MAX_TERM,STUDENT_STATUS,...,BIRTH_DATE,SCHOOL_NAME,SCHOOL_AVG,HIGH_SCHOOL_ENGL,HIGH_SCHOOL_MATH,HIGH_SCHOOL_ARAB,IELTS_OVERALL,ELTS_TEST_DATE,EMTH,EMEN
68,100031581,,Falah,Alhammadi,Falah Mohamed Amer Yousuf Alhammadi,فلاح محمد عامر يوسف الحمادى,M,A,202110.0,Active,...,1991-11-18 00:00:00,ALFarazdaq Boy's Secondary Sch,90.8,72.0,97.0,,7.5,2020-02-21,,
135,100035244,,Abdulrahman,Agha,Abdulrahman Mohamad Agha,عبدالرحمن محمد عمر آغا,M,A,202110.0,Active,...,1993-01-21 00:00:00,,89.8,89.0,88.0,,7.5,2018-03-03,,
141,100036658,,Sarah,Azzam,Sarah Kassem Azzam,ساره قاسم عزام,F,A,202110.0,Active,...,1993-03-02 00:00:00,PALESTINE SCHOOL-F,98.6,98.5,98.3,,,NaT,,
175,100020345,,Maryam,Al Ali,Maryam Mohamed Abdulla Mohamed Al Ali,مريم محمد عبدالله محمد العلي,F,A,,Active,...,1985-10-03 00:00:00,,,,,,6.0,2019-02-20,,
208,100040178,,Amna,Alshehhi,Amna Ali Samrah Ali Alshehhi,آمنه علي صمره علي الشحي,F,A,202110.0,Active,...,1996-03-03 00:00:00,AJMAN SEC .SCHOOL -G,96.4,95.0,96.0,,6.0,2014-05-24,,


In [None]:
df_student_missing_final.to_sql("Missing_student_details", conn)

# 2. Student Attendance

## 2.1 Student Attendance as per Policy

Student Attendance in Khalifa University as per the policies and procedures, follow two different protocols:

- 80% and above for undergraduates
- 50% and above for graduates

For the same, we analyze the student attendance master data in the following manner:

- Identify the Total Classes in the semester
- Calculate the %age of absences using the new Total Classes figure and number of absences
- Identify discrepancies of more than 20% and registration status as "RE" (RE refers to still registered)

<div class="alert alert-block alert-warning">
<b>Example:</b>((Total Classes)/(Total Absences))*100; Highlight all exceptions</div>

In [None]:
##import master data for excel

## df_student_attendance = pd.read_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Data/Student Attendance Data.xlsx")

In [None]:
df_student_attendance.head()

Unnamed: 0,TERM_CODE,COURSE,CRN,DIVISION,COURSE_TITLE,FACULTY_ID,FACULTY_NAME,CAMPUS,STUDENT_ID,STUDENT_NAME,REG_STATUS,TOTAL_ABSENCES,TOTAL_ATTENDED
0,202142,MDBS705,40006,Medicine and Health Sciences,Gastrointestinal System,100052699.0,Eman Alefishat,A,100040115,Jawaher Ahmed Kareem Ahmed Alblooshi,RE,0,20
1,202142,MDBS705,40006,Medicine and Health Sciences,Gastrointestinal System,100052699.0,Eman Alefishat,A,100041430,Sherooq Hamdan Moosa Ali Karam,RE,1,19
2,202142,MDBS705,40006,Medicine and Health Sciences,Gastrointestinal System,100052699.0,Eman Alefishat,A,100041700,Fatima Omar Ahmed Ba Fakih,RE,0,20
3,202142,MDBS705,40006,Medicine and Health Sciences,Gastrointestinal System,100052699.0,Eman Alefishat,A,100042658,Ali Saeed Mohammed Salem Alshehhi,RE,1,19
4,202142,MDBS705,40006,Medicine and Health Sciences,Gastrointestinal System,100052699.0,Eman Alefishat,A,100042706,Dana Khamis Abdulla Jamaan Altamimi,RE,0,20


In [None]:
df_student_attendance['Total Classes'] = df_student_attendance["TOTAL_ABSENCES"]+df_student_attendance["TOTAL_ATTENDED"]

In [None]:
df_student_attendance['Absence Percentage'] = (df_student_attendance['TOTAL_ABSENCES']/df_student_attendance['Total Classes'])*100

In [None]:
df_student_attendance_exception = df_student_attendance[(df_student_attendance['Absence Percentage']>20) & (df_student_attendance['REG_STATUS']=="RE")]

In [None]:
df_student_attendance_exception

Unnamed: 0,TERM_CODE,COURSE,CRN,DIVISION,COURSE_TITLE,FACULTY_ID,FACULTY_NAME,CAMPUS,STUDENT_ID,STUDENT_NAME,REG_STATUS,TOTAL_ABSENCES,TOTAL_ATTENDED,Total Classes,Absence Percentage
58,202142,MDBS702,40007,Medicine and Health Sciences,Hematopoietic&Lymphoreticular,100058162.0,Ahmad AlDuaij,A,100058320,Fatima Salman Abdulla,RE,4,11,15,26.666667
75,202142,MDBS706,40008,Medicine and Health Sciences,Endocrine System,100053903.0,Sabina Semiz,A,100058074,Manal Mohammed Smail,RE,3,9,12,25.0
81,202142,MDBS706,40008,Medicine and Health Sciences,Endocrine System,100053903.0,Sabina Semiz,A,100058278,Adedayo Adegbile,RE,3,9,12,25.0
425,202110,MEEN360,11340,Mechanical Engineering,Computational Methods for Mech,100037414.0,Anas Alazzam,A,100045384,Khilad Abdulbari Abdulla Saeed Almenhali,RE,5,19,24,20.833333
551,202110,ECCE221,11274,Electrical Engr & Computer Sci,Electric Circuits I,100045794.0,Khalid Al Hammadi,A,100043118,Omar Abdulla Omar Abdulla Mukhayer,RE,6,20,26,23.076923
580,202110,ECCE316,11276,Electrical Engr & Computer Sci,Microprocessor Systems,100035100.0,Baker Mohammad,A,100042369,Salma Khalil Abdulla Hassan Alhosani,RE,6,20,26,23.076923
933,202110,STEM002,11281,PREP,STEM 2,100020363.0,Yousef Abosalem,A,100059975,Mohamed Ziad Ali Othman Almahri,RE,27,101,128,21.09375
1516,202110,MEEN741,11022,Mechanical Engineering,Advanced Conduction and Rad.,100045910.0,Mohamed Ali,H,100059855,Kamran Mukhtar Ahmed Mahboob,RE,8,18,26,30.769231
2254,202110,ECCE323,11090,Electrical Engr & Computer Sci,Feedback Control Systems,100045802.0,Ahmed Al Durra,A,100052246,Khawla Mohamed Hasan Ali AlMarzooqi,RE,6,20,26,23.076923
2288,202110,ECCE425,11094,Electrical Engr & Computer Sci,Power Sys. Stability & Control,100041087.0,Bashar Zahawi,A,100052359,Hasan Fahmi Taha Al-ssaqaf,RE,6,20,26,23.076923


In [None]:
## write into excel; if required

## df_student_attendance_exception.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Attendance Exception.xlsx")

In [None]:
df_student_attendance_exception.to_sql("Student_Attendance", conn)

# 3. Student Courses

The below mentioned data source is imported to analyze the student courses within Khalifa University

In [None]:
## import master data for excel

## df_student_courses = pd.read_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Data/Student Course Data.xlsx")

In [None]:
df_student_courses_schedule

Unnamed: 0,TERM_CODE,CAMPUS,CRS_DIVISION_NAME,CRN,CRS_ID,COURSE_TITLE,CREDIT_HOURS,INSTRUCTOR_INDICATOR,RESPONSIBILITY_PCT,CONTACT_HOURS,...,SCHEDULE_END_TIME,NO_HOLD,SECT_WAITLIST_MAX_ENRL,MAX_ENROLLMENT_ALLOWED_CRN,NBR_REGISTERED_STUDENTS_CRN,SUN,MON,TUE,WED,THU
0,202142,A,Medicine and Health Sciences,40006,MDBS705,Gastrointestinal System,4.0,Primary,100.0,4.0,...,1200.0,0,0,40,29,Y,Y,Y,Y,
1,202142,A,Medicine and Health Sciences,40007,MDBS702,Hematopoietic&Lymphoreticular,3.0,Primary,100.0,3.0,...,1200.0,0,0,40,29,Y,Y,Y,Y,
2,202142,A,Medicine and Health Sciences,40008,MDBS706,Endocrine System,3.0,Primary,100.0,3.0,...,1200.0,0,0,40,29,Y,Y,Y,Y,
3,202142,A,Medicine and Health Sciences,40009,MDBS707,Reproductive System,3.0,Primary,100.0,3.0,...,1200.0,0,0,40,29,Y,Y,Y,Y,
4,202142,A,Medicine and Health Sciences,40010,MDBS709,Nervous System,6.0,Primary,100.0,6.0,...,1200.0,0,0,40,29,Y,Y,Y,Y,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2290,202110,A,Civil Engineering,10681,CIVE470,Foundation Engineering,4.0,Primary,100.0,3.0,...,1150.0,0,0,36,26,Y,,Y,,
2291,202110,A,Biomedical Engineering,11361,BMED491,Independent Study III,,Primary,0.0,,...,,0,0,1,1,,,,,
2292,202110,A,Arts& Science,11367,SCIE795,PhD Written Qualifying Examina,0.0,Primary,100.0,0.0,...,,0,0,1,2,,,,,
2293,202120,H,Mechanical Engineering,21007,MSEN712,Imaging of Materials: Scan. El,3.0,Primary,100.0,3.0,...,1215.0,0,0,20,0,Y,,Y,,


## 3.1 Student Max Enrollment

Each course in Khalifa University has a maximum alloted number of students based on classroom size and in compliance with CAA standards to maintain student to faculty ratio.

As per the below analysis, we identified the maximum number of students as per a course (MAX_ENROLLMENT_ALLOWED_CRN) versus the number of students registered (NBR_REGISTERED_CRN) for the semester.

All exceptions are noted and identified

<div class="alert alert-block alert-warning">
<b>Example:</b>(Maximum number of students enrolled in a CRN) - (Number of students registered in a CRN); Highlight all exceptions</div>

In [None]:
df_student_courses_schedule['Above Max Enrollment'] = df_student_courses_schedule['MAX_ENROLLMENT_ALLOWED_CRN'] - df_student_courses_schedule['NBR_REGISTERED_STUDENTS_CRN'] 

In [None]:
df_student_courses_max = df_student_courses_schedule[df_student_courses_schedule['Above Max Enrollment'] <0]

In [None]:
df_student_courses_max

Unnamed: 0,TERM_CODE,CAMPUS,CRS_DIVISION_NAME,CRN,CRS_ID,COURSE_TITLE,CREDIT_HOURS,INSTRUCTOR_INDICATOR,RESPONSIBILITY_PCT,CONTACT_HOURS,...,NO_HOLD,SECT_WAITLIST_MAX_ENRL,MAX_ENROLLMENT_ALLOWED_CRN,NBR_REGISTERED_STUDENTS_CRN,SUN,MON,TUE,WED,THU,Above Max Enrollment
81,202110,A,Humanities & Social Sciences,11315,HUMA102,Islamic Culture,3.0,Primary,100.0,3.0,...,0,0,30,41,Y,,Y,,,-11
117,202110,A,Electrical Engr & Computer Sci,11276,ECCE316,Microprocessor Systems,4.0,Primary,100.0,3.0,...,0,0,25,26,,Y,,Y,,-1
122,202110,A,English,11339,ENGL101,Academic English I,3.0,Primary,100.0,3.0,...,0,0,0,25,Y,,Y,,,-25
193,202110,A,Math,11260,MATH111,Calculus I,4.0,Primary,100.0,4.0,...,0,0,30,38,Y,Y,Y,Y,,-8
471,202110,A,Areospace Engineering,11070,AERO201,Engineering Dynamics,3.0,Primary,33.0,3.0,...,0,0,18,19,Y,,Y,,,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2241,202110,A,Physics,10207,PHYS122,University Physics 2,0.0,Primary,0.0,2.0,...,0,0,24,25,,,,Y,,-1
2268,202110,A,Areospace Engineering,10660,AERO465,Space Dynamics and Control,0.0,Primary,0.0,,...,0,0,18,21,,,Y,,,-3
2269,202110,A,Areospace Engineering,10661,AERO465,Space Dynamics and Control,0.0,Primary,0.0,,...,0,0,18,19,Y,,,,,-1
2288,202110,A,Civil Engineering,10679,CIVE370,Intro. to Environmental Engr.,0.0,Primary,100.0,,...,0,0,10,11,,,,Y,,-1


In [None]:
## write into excel; if required

## df_student_courses_max.to_excel('C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Above Max Enrollment.xlsx')

In [None]:
df_student_courses_max.to_sql("Above_Max_Courses", conn)

## 3.2 Scheduling of Courses

The below code analyzes the student courses schedule the same instructor / instructors during the same slot on multiple different days / periods. However, on analysis the output below showcases due to the multiple instructors assigned for each individual course, the exceptions are not valid.

As a recommendation, the output should be analyzed prior to reporting instances.

In [None]:
df_student_courses_schedule = df_student_courses

In [None]:
df_student_courses_schedule_exception = df_student_courses_schedule[(df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','SUN']) | 
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','MON']) |
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','TUE']) |
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','WED']) |
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','THU']))
                                                                    & df_student_courses_schedule['SUN'].notna()]

In [None]:
df_student_courses_schedule_exception

Unnamed: 0,TERM_CODE,CAMPUS,CRS_DIVISION_NAME,CRN,CRS_ID,COURSE_TITLE,CREDIT_HOURS,INSTRUCTOR_INDICATOR,RESPONSIBILITY_PCT,CONTACT_HOURS,...,NO_HOLD,SECT_WAITLIST_MAX_ENRL,MAX_ENROLLMENT_ALLOWED_CRN,NBR_REGISTERED_STUDENTS_CRN,SUN,MON,TUE,WED,THU,Above Max Enrollment
13,202110,A,PREP,11250,STEM002,STEM 2,12.0,Primary,100.0,12.0,...,0,0,48,24,Y,,Y,,,24
18,202110,A,PREP,11250,STEM002,STEM 2,12.0,Primary,100.0,12.0,...,0,0,48,24,Y,,,,,24
20,202110,A,PREP,11250,STEM002,STEM 2,12.0,Primary,100.0,12.0,...,0,0,48,24,Y,,,,,24
24,202110,A,PREP,11251,ENGL002,Preparatory English 2,14.0,Primary,100.0,14.0,...,0,0,20,6,Y,,,,,14
28,202110,A,PREP,11251,ENGL002,Preparatory English 2,14.0,Primary,100.0,14.0,...,0,0,20,6,Y,,,,,14
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2274,202110,A,Civil Engineering,10665,CIVE200,Statics,3.0,Primary,33.0,3.0,...,0,0,3,2,Y,,Y,,,1
2275,202110,A,Civil Engineering,10666,CIVE200,Statics,3.0,Primary,33.0,3.0,...,0,0,6,5,Y,,Y,,,1
2276,202110,A,Civil Engineering,10667,CIVE200,Statics,3.0,Primary,33.0,3.0,...,0,0,5,4,Y,,Y,,,1
2278,202110,A,Civil Engineering,10669,CIVE201,Engineering Dynamics,3.0,Primary,33.0,3.0,...,0,0,7,4,Y,,Y,,,3


In [None]:
## write into excel; if required

## df_student_courses_schedule_exception.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Student Courses.xlsx")

In [None]:
df_student_courses_schedule_exception.to_sql("Duplicate_Course_Schedule", conn)

# 4. Human Resources

The below code analyzes the duplicate employee accounts, emirates ID, employee leaves, bank details and missing information for employees. For each analysis the scripts generate different outputs stored as multiple links. Below are the in-built logic drivers:

1. Employee accounts - duplicates identified based on employee number and person ID
2. Employee leaves - identify all types of leaves availed for more than 60 day period. The output should be analyzed for consistency as maternity leave can vary
3. Bank details - duplicate bank details / incorrect bank details for employees are highlighted
4. Missing information - key missing information across the HR data is highlighted such as emirates id, contract date etc. for active employees as per the data source
5. Contract date - to analyze if any active employees are working in Khalifa University without valid contracts

In [None]:
## import master data for excel

## df_employee_master = pd.read_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Data/HR Master Data.xlsx")

In [None]:
df_employee_master

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
0,100043305,308866,Shaima Ahmad Abdulaziz Ahli,شيماء احمد عبدالعزيز اهلى,STAFF.2020208,STAFF,2020208,موظف,Specialist,Off Chart - Marketing and Communication,...,Marketing and Communication,إدارة التسويق والاتصال,Office of the Executive Vice President,مكتب نائب الرئيس التنفيذي,President Office,مكتب الرئيس,,KU100,Arif Sultan Al Hammadi,2021-12-31
1,100046247,309197,Chong Un Pyon,تشونج اون بيون,Adjunct Faculty.1062,Adjunct Faculty,1062,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2021-12-31
2,100046248,309466,Ayoung Sohn,ايونج سوهن,Adjunct Faculty.1061,Adjunct Faculty,1061,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2021-12-31
3,100049466,778886,Akihide Hidaka,أكيهايد هيديكا,Adjunct Faculty.878,Adjunct Faculty,878,Adjunct Faculty,Adjunct Faculty,Nuclear Engineering Institute,...,Nuclear Engineering Institute,الهندسة النووية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,2021-12-21
4,100049994,819498,Riyazdheen Kaffar,رياضدين كفار,Driver.5042,Driver,5042,سائق,Driver,College of Medicine & Health Sciences,...,College of Medicine & Health Sciences,كلية الطب,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,KU104,John Aubrey Rock,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1242,PI4182,625335,Sawsan Hussain Mohammadi,سوسن حسين محمدي,STAFF.2020263,STAFF,2020263,موظف,Specialist,"Off Chart - Administration, Facilities and EHS",...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,Executive Management Sector,,,KU947,Adnan Jasem Yaqoob AlMansoori,2021-12-31
1243,PI4204,625079,Basem Al Shaabi,باسم سيف محمد الشعبي,STAFF.2020058,STAFF,2020058,موظف,Specialist,Off Chart - Academic and Student Services,...,Academic and Student Services,قطاع الخدمات الطلابية والاكاديميه,Academic and Student Services,قطاع الخدمات الطلابية والاكاديميه,Academic and Student Services,قطاع الخدمات الطلابية والاكاديميه,,KU105,Ahmed Al Shoaibi,2021-12-31
1244,PI4223,624922,Ayesha Abdulla Al Zaabi,عائشة عبدالله سليمان الزعابي,STAFF.2020097,STAFF,2020097,موظف,Specialist,"Off Chart - Administration, Facilities and EHS",...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,Executive Management Sector,,,KU947,Adnan Jasem Yaqoob AlMansoori,2021-12-31
1245,PI4227,625073,Abdulla Al Hosani,عبدالله سليمان عبدالله الحوسني,STAFF.2020098,STAFF,2020098,موظف,Specialist,"Off Chart - Administration, Facilities and EHS",...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,Executive Management Sector,,,KU947,Adnan Jasem Yaqoob AlMansoori,2021-12-31


## 4.1 Duplicate Emirates ID

In [None]:
df_employee_EmiratesID = df_employee_master[df_employee_master.duplicated(['EmiratesID'])]

In [None]:
df_employee_EmiratesID

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
12,KU1002,1543032,Deepak Puthal,ديباك باثول,Assistant Professor.1010503,Assistant Professor,1010503,أستاذ جامعي مساعد,Faculty,Electrical Engineering and Computer Science,...,Electrical Engineering and Computer Science,الهندسة الكهربائية والحاسوب,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,2024-10-13
13,KU1003,1531630,Jiju Antony,جيجو و أنتوني,Professor.1011098,Professor,1011098,استاذ جامعي,Professor,Industrial and Systems Engineering,...,Industrial and Systems Engineering,الهندسة الصناعية و النظم,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,2024-05-31
16,KU1006,1546534,Omar Awartani,عمر عورتاني,Assistant Professor.1010636,Assistant Professor,1010636,أستاذ جامعي مساعد,Faculty,Mechanical Engineering,...,Mechanical Engineering,الهندسة الميكانيكية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,2024-10-23
19,KU1009,1533371,Charalampos Pitsalidis,تشارالامبوس بيتساليديس تشارالامبوس بيتساليديس,Assistant Professor.1011029,Assistant Professor,1011029,أستاذ جامعي مساعد,Assistant Professor,Physics,...,Physics,علوم الفيزياء,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2024-05-31
20,KU1010,1531453,Giorgio Consigli,جورجيو كونسيجلي جورجيو كونسيجلي,Associate Professor.1011034,Associate Professor,1011034,أستاذ جامعي مشارك,Associate Professor,Mathematics,...,Mathematics,علوم الرياضيات,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2024-05-31
22,KU1012,1533111,Hemin Koyi,هيمن كويي هيمن كويي,Professor.1010496,Professor,1010496,استاذ جامعي,Faculty,Earth Science,...,Earth Science,علوم الأرض,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2024-05-31
23,KU1013,1531470,Kheirat Habbal,خيرات الحبال,Assistant Professor.1050507,Assistant Professor,1050507,أستاذ جامعي مساعد,Assistant Professor,Family Medicine,...,Family Medicine,Family Medicine,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,,,2024-07-31
25,KU1015,1542145,Raquel Adelina Drewett,راكيل اديلينا دريويت,"Manager, Internal Awards & Collaborations.1010947","Manager, Internal Awards & Collaborations",1010947,"Manager, Internal Awards & Collaborations",Manager,University Sponsored Research,...,University Sponsored Research,قسم رعاية البحوث,Research and Development,قطاع البحوث والتطوير,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,2024-10-11
27,KU1017,1543375,Mathieu Jean Bernard Martins,ماثيو جين بيرنارد مارتنز,CORE Lab Engineer.1010976,CORE Lab Engineer,1010976,مهندس مختبر,Lab Engineer,Core Labs Support,...,Research Laboratories,مختبرات البحوث,Research and Development,قطاع البحوث والتطوير,Research and Development,قطاع البحوث والتطوير,,,,2024-10-17
52,KU1040,1547748,Diane Bryant Presley,ديان بريانت بريسلي,"Director, Operations and Support.5043","Director, Operations and Support",5043,"Director, Operations and Support",Director,College of Medicine & Health Sciences,...,College of Medicine & Health Sciences,كلية الطب,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,KU104,John Aubrey Rock,2024-10-31


In [None]:
## write into excel; if required 

## df_employee_EmiratesID.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Duplicate Employeese Emirates ID.xlsx")

In [None]:
df_employee_master['Length'] = df_employee_master['EmiratesID'].str.len()

In [None]:
df_employee_EID = df_employee[(df_employee['Length'] < 15) | (df_employee['Length'] > 15)]

In [None]:
df_employee_EID

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length
26,KU1016,1531373,Hatem Samir Issa Haddad,حاتم سمير عيسى حداد,"Senior Auditor, IT and Audit Analytics.1011012","Senior Auditor, IT and Audit Analytics",1011012,مدقق أول,Senior Specialist,Audit and Compliance,...,إدارة التدقيق والامتثال,Executive Office,Executive Office,President Office,مكتب الرئيس,,KU100,Arif Sultan Al Hammadi,2024-08-07,17.0


In [None]:
## write into excel; if required

## df_employee_EID.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Invalid Emirates ID.xlsx")

## 4.2 Employee ID

Employee ID are verified to ensure no two employees are under the same employee ID.

For the same, pandas library is utilized to automatically find the duplicated entries within the master data on "Emp No" (Employee ID) and "PersonID" (Old Employee ID).

In [None]:
df_employee_employeeID = df_employee_master[df_employee.duplicated(['EmpNo'])]

In [None]:
df_employee_employeeID

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length


In [None]:
df_employee_personID = df_employee_master[df_employee_master.duplicated(['PersonID'])]

In [None]:
df_employee_personID

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length


## 4.3 Contract Date

The below scripts analyzes the continuance of emplyoee beyond the contract period and incase any employees are active in the system beyond the contracted period.

Employee master contains two columnn namely "Contract End" (end date of contract) and "HireDate" (date of joining / continuance). By analyzing the difference between the columns, all employees beyond contracted period can be analyzed.

In [None]:
df_employee_master['Diff'] = df_employee_master['Contract_End'] - df_employee_master['HireDate'] 

In [None]:
df_employee_master['Diff']

0      2182 days
1      1459 days
2      1459 days
3      1213 days
4            NaT
          ...   
1242   2368 days
1243   2336 days
1244   2313 days
1245   2315 days
1246   2284 days
Name: Diff, Length: 1247, dtype: timedelta64[ns]

In [None]:
df_employee_contract = df_employee_master[(df_employee_master['Diff'] < '365')]

In [None]:
df_employee_contract

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length,Diff


## 4.4 Employee Leaves

Employee Leaves are granted on an annual basis to each employee. Below are the exceptions that can be identified for employee who have availed a leave for beyond 60 day period for any leave type, i.e., Annual Leave, Unpaid Leave etc.

In [None]:
df_employee_leave = pd.read_excel("C:/Users/ku1016/Downloads/Employee Leave Data.xlsx")
df_employee_leave.head(5)

Unnamed: 0,ID,EmployeeNumber,leaveCategory,LeaveType,LeaveStartDate,LeaveEndDate,LeaveStatus
0,1,KU195,V,Annual Leave,2020-12-16,2020-12-17,A
1,2,KU541,S,Short Sick Leave,2021-03-24,2021-03-25,A
2,3,KU541,V,Partial Day Annual Leave,2021-03-08,2021-03-08,A
3,4,KU541,V,Partial Day Annual Leave,2021-02-25,2021-02-25,A
4,5,KU541,S,Short Sick Leave,2021-02-21,2021-02-21,A


In [None]:
df_employee_leave['Diff'] = df_employee_leave['LeaveEndDate'] - df_employee_leave['LeaveStartDate']
df_employee_leave.head(5)

Unnamed: 0,ID,EmployeeNumber,leaveCategory,LeaveType,LeaveStartDate,LeaveEndDate,LeaveStatus,Diff
0,1,KU195,V,Annual Leave,2020-12-16,2020-12-17,A,1 days
1,2,KU541,S,Short Sick Leave,2021-03-24,2021-03-25,A,1 days
2,3,KU541,V,Partial Day Annual Leave,2021-03-08,2021-03-08,A,0 days
3,4,KU541,V,Partial Day Annual Leave,2021-02-25,2021-02-25,A,0 days
4,5,KU541,S,Short Sick Leave,2021-02-21,2021-02-21,A,0 days


In [None]:
df_employee_leave_exception = df_employee_leave[(df_employee_leave['Diff'] > '60 days')]
df_employee_leave_exception

Unnamed: 0,ID,EmployeeNumber,leaveCategory,LeaveType,LeaveStartDate,LeaveEndDate,LeaveStatus,Diff
298,299,KU162,H,Maternity Leave,2020-08-13,2020-11-10,A,89 days
974,975,KU500066,V,Annual Leave,2021-10-05,2021-12-09,A,65 days
1664,1665,KU451,UL,Unauthorized Unpaid Leave,2020-12-20,2021-03-02,A,72 days
2168,2169,KU283,H,Maternity Leave,2021-09-15,2021-12-13,A,89 days
3700,3701,KU732,H,Maternity Leave,2021-07-25,2021-10-22,A,89 days
3961,3962,KU702,H,Maternity Leave,2020-07-05,2020-10-02,A,89 days
4259,4260,KU500412,UL,Unauthorized Unpaid Leave,2020-08-30,2020-12-31,A,123 days
4654,4655,KU603,H,Maternity Leave,2021-06-24,2021-09-21,A,89 days
5140,5141,KU818,H,Maternity Leave,2021-04-30,2021-07-28,A,89 days
5191,5192,KU359,H,Maternity Leave,2021-03-07,2021-06-04,A,89 days


In [None]:
df_employee_leave_exception.to_sql("Employee_Leaves_Exception", conn)

## 4.5 Missing Employee Information

All active employees should have complete information as per the HR records and protocol within the master data. The below script analyzes the missing information to ensure completeness of the data.

To identify the missing values, the pandas libraries for isnull() is utilized and specifies any axis (column) to target all missing values in any column.

In [None]:
## df_employee_master = pd.read_excel("C:/Users/ku1016/Downloads/HR Employee Data.xlsx")
## df_employee_master.head(5)

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
0,100043305,308866,Shaima Ahmad Abdulaziz Ahli,شيماء احمد عبدالعزيز اهلى,STAFF.2020208,STAFF,2020208,موظف,Specialist,Off Chart - Marketing and Communication,...,Marketing and Communication,إدارة التسويق والاتصال,Office of the Executive Vice President,مكتب نائب الرئيس التنفيذي,President Office,مكتب الرئيس,,KU100,Arif Sultan Al Hammadi,2021-12-31
1,100046247,309197,Chong Un Pyon,تشونج اون بيون,Adjunct Faculty.1062,Adjunct Faculty,1062,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2021-12-31
2,100046248,309466,Ayoung Sohn,ايونج سوهن,Adjunct Faculty.1061,Adjunct Faculty,1061,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2021-12-31
3,100049466,778886,Akihide Hidaka,أكيهايد هيديكا,Adjunct Faculty.878,Adjunct Faculty,878,Adjunct Faculty,Adjunct Faculty,Nuclear Engineering Institute,...,Nuclear Engineering Institute,الهندسة النووية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,2021-12-21
4,100049994,819498,Riyazdheen Kaffar,رياضدين كفار,Driver.5042,Driver,5042,سائق,Driver,College of Medicine & Health Sciences,...,College of Medicine & Health Sciences,كلية الطب,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,KU104,John Aubrey Rock,NaT


In [None]:
df_employee_master.isnull()

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1242,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,True,False,False,False
1243,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
1244,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,True,False,False,False
1245,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,True,False,False,False


In [None]:
df_employee_missing = df_employee_master[df_employee_master.isnull().any(axis=1)]
df_employee_missing.head(5)

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
0,100043305,308866,Shaima Ahmad Abdulaziz Ahli,شيماء احمد عبدالعزيز اهلى,STAFF.2020208,STAFF,2020208,موظف,Specialist,Off Chart - Marketing and Communication,...,Marketing and Communication,إدارة التسويق والاتصال,Office of the Executive Vice President,مكتب نائب الرئيس التنفيذي,President Office,مكتب الرئيس,,KU100,Arif Sultan Al Hammadi,2021-12-31
1,100046247,309197,Chong Un Pyon,تشونج اون بيون,Adjunct Faculty.1062,Adjunct Faculty,1062,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2021-12-31
2,100046248,309466,Ayoung Sohn,ايونج سوهن,Adjunct Faculty.1061,Adjunct Faculty,1061,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,2021-12-31
3,100049466,778886,Akihide Hidaka,أكيهايد هيديكا,Adjunct Faculty.878,Adjunct Faculty,878,Adjunct Faculty,Adjunct Faculty,Nuclear Engineering Institute,...,Nuclear Engineering Institute,الهندسة النووية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,2021-12-21
4,100049994,819498,Riyazdheen Kaffar,رياضدين كفار,Driver.5042,Driver,5042,سائق,Driver,College of Medicine & Health Sciences,...,College of Medicine & Health Sciences,كلية الطب,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,KU104,John Aubrey Rock,NaT


In [None]:
df_employee_missing.to_sql("Missing_employee_info". conn)

## 4.6 Employee bank details and account number

For any bank details, no two employees can share the same bank details and the length of IBAN should remain constant number of digits as per the international IBAN standards (23 digits). 

Below are the scripts to analyze both scenarios.

In [None]:
## df_employee_bank_details = pd.read_excel("C:/users/ku1016/downloads/Employee Bank Details.xlsx")

In [None]:
## df_employee_bank_details.head(5)

Unnamed: 0,PERSONID,BANKSTARTDATE,BANKENDDATE,BANKPRIORITY,EMPLOYEENUMBER,ORG_PAYMENT_METHOD,BANKID,BANKNAME,BRANCHNAME,IBAN,Length of EID,ACCOUNTNO,CREATEDBY,LASTUPDATEDBY,LASTUPDATEDATE,LASTUPDATEDATE1
0,308657,2016-05-01 00:00:00.0000000,2016-12-31 00:00:00.0000000,,100036842,KUK Bank Transfer - KU,70617.0,AL HILAL BANK,,AE,2,,60850,62647,2017-01-21 14:38:25.0000000,2021-12-12 23:32:19.930
1,635927,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500095,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:08:36.0000000,2021-12-12 23:32:19.930
2,635999,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500183,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:09:51.0000000,2021-12-12 23:32:19.930
3,636017,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500082,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:11:43.0000000,2021-12-12 23:32:19.930
4,635983,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500078,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:12:46.0000000,2021-12-12 23:32:19.930


In [None]:
df_employee_bank_details_duplicated = df_employee_bank_details[df_employee_bank_details.duplicated(['IBAN'])]

In [None]:
df_employee_bank_details_duplicated.to_sql("Duplicatee_employee_bank", conn)

In [None]:
df_employee_bank_details['Length of IBAN'] = df_employee_bank_details['IBAN'].str.len()
df_employee_bank_details.head(5)

Unnamed: 0,PERSONID,BANKSTARTDATE,BANKENDDATE,BANKPRIORITY,EMPLOYEENUMBER,ORG_PAYMENT_METHOD,BANKID,BANKNAME,BRANCHNAME,IBAN,Length of EID,ACCOUNTNO,CREATEDBY,LASTUPDATEDBY,LASTUPDATEDATE,LASTUPDATEDATE1,Length of IBAN
0,308657,2016-05-01 00:00:00.0000000,2016-12-31 00:00:00.0000000,,100036842,KUK Bank Transfer - KU,70617.0,AL HILAL BANK,,AE,2,,60850,62647,2017-01-21 14:38:25.0000000,2021-12-12 23:32:19.930,2
1,635927,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500095,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:08:36.0000000,2021-12-12 23:32:19.930,2
2,635999,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500183,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:09:51.0000000,2021-12-12 23:32:19.930,2
3,636017,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500082,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:11:43.0000000,2021-12-12 23:32:19.930,2
4,635983,2017-12-31 00:00:00.0000000,2017-12-31 00:00:00.0000000,,KU500078,KUK Bank Transfer - KU,,,,AE,2,,0,188569,2018-03-05 11:12:46.0000000,2021-12-12 23:32:19.930,2


In [None]:
df_employee_bank_details_length = df_employee_bank_details[(df_employee_bank_details['Length of IBAN']<23)]

In [None]:
df_employee_bank_details_length.to_sql("Incorrect_employee_IBAN", conn)

# Finance and Procurement

## 5.1 Supplier Master Analysis

The following are analyzed for the supplier master:

- Duplicate supplier codes for the same supplier
- Duplicate bank details for different suppliers
- Duplicate TRN (Tax registration number) for different suppliers

In [None]:
## df_supplier_master = pd.read_excel("C:/users/ku1016/downloads/Supplier Master.xlsx")
## df_supplier_master.head(5)

Unnamed: 0,OPERATING_UNIT,REGISTER_MODE,SUPPLIER_NO,SUP_CREATION_DATE,SUPPLIER_INACTIVE_DATE,SUPPLIER_NAME,SUPPLIER_CATEGORY,TOTAL_APPROVE_PO_CNT,TOT_APPROVED_BPA_AMT,LAST_SUP_PO_REL_DATE,...,ADDRESS_LINE1,ADDRESS_LINE2,ADDRESS_LINE3,ADDRESS_LINE4,POSTAL_CODE,CITY,COUNTRY,SUP_PHONE_NUM,SUP_FAX_NUM,SUP_EMAIL_ADDRESS
0,KUST Khalifa University of Science and Technol...,Manual,1,2010-06-01 19:12:48,NaT,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,VENDOR,0,,NaT,...,ABU DHABI,,,555,,ABU DHABI,United Arab Emirates,02-5559636,02-6593630,
1,KUX - External Khalifa University O.U,Manual,1,2010-06-01 19:12:48,NaT,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,VENDOR,0,,NaT,...,ABU DHABI,,,555,,ABU DHABI,United Arab Emirates,02-5559636,02-6593630,
2,KUA - Ankabout Khalifa University,Manual,1,2010-06-01 19:12:48,NaT,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,VENDOR,0,,NaT,...,ABU DHABI,,,555,,ABU DHABI,United Arab Emirates,02-5559636,02-6593630,
3,KUJ - Aric Khalifa University O.U,Manual,1,2010-06-01 19:12:48,NaT,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,VENDOR,0,,NaT,...,ABU DHABI,,,555,,ABU DHABI,United Arab Emirates,02-5559636,02-6593630,
4,KUGRC OU,Manual,1,2010-06-01 19:12:48,NaT,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,VENDOR,0,,NaT,...,ABU DHABI,,,555,,ABU DHABI,United Arab Emirates,02-5559636,02-6593630,


In [None]:
df_supplier_master_duplicate = df_supplier_master[df_supplier_master.duplicated(['OPERATING_UNIT','SUPPLIER_NO'])]
df_supplier_master_duplicate

Unnamed: 0,OPERATING_UNIT,REGISTER_MODE,SUPPLIER_NO,SUP_CREATION_DATE,SUPPLIER_INACTIVE_DATE,SUPPLIER_NAME,SUPPLIER_CATEGORY,TOTAL_APPROVE_PO_CNT,TOT_APPROVED_BPA_AMT,LAST_SUP_PO_REL_DATE,...,ADDRESS_LINE1,ADDRESS_LINE2,ADDRESS_LINE3,ADDRESS_LINE4,POSTAL_CODE,CITY,COUNTRY,SUP_PHONE_NUM,SUP_FAX_NUM,SUP_EMAIL_ADDRESS
13,KUX - External Khalifa University O.U,Manual,2,2010-06-01 19:12:54,NaT,DEIRA GENERAL MARKETING,VENDOR,0,,NaT,...,DUBAI,,,,11370,DUBAI,United Arab Emirates,,,
14,KUA - Ankabout Khalifa University,Manual,2,2010-06-01 19:12:54,NaT,DEIRA GENERAL MARKETING,VENDOR,0,,NaT,...,DUBAI,,,,11370,DUBAI,United Arab Emirates,,,
15,KUJ - Aric Khalifa University O.U,Manual,2,2010-06-01 19:12:54,NaT,DEIRA GENERAL MARKETING,VENDOR,0,,NaT,...,DUBAI,,,,11370,DUBAI,United Arab Emirates,,,
17,KUGRC OU,Manual,2,2010-06-01 19:12:54,NaT,DEIRA GENERAL MARKETING,VENDOR,0,,NaT,...,DUBAI,,,,11370,DUBAI,United Arab Emirates,,,
18,KUST Khalifa University of Science and Technol...,Manual,2,2010-06-01 19:12:54,NaT,DEIRA GENERAL MARKETING,VENDOR,0,,NaT,...,DUBAI,,,,11370,DUBAI,United Arab Emirates,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54112,KUGRC OU,Manual,96145,2020-02-10 11:21:59,NaT,ARDIANSYAH AL FAROUQ,BENIFICIARY,0,,NaT,...,INDONESIA,,,,,INDONESIA,Indonesia,,,
54113,KUA - Ankabout Khalifa University,Manual,96145,2020-02-10 11:21:59,NaT,ARDIANSYAH AL FAROUQ,BENIFICIARY,0,,NaT,...,INDONESIA,,,,,INDONESIA,Indonesia,,,
54114,KUE - Ebitic Khalifa University OU,Manual,96145,2020-02-10 11:21:59,NaT,ARDIANSYAH AL FAROUQ,BENIFICIARY,0,,NaT,...,INDONESIA,,,,,INDONESIA,Indonesia,,,
54830,KUE - Ebitic Khalifa University OU,Manual,96636,2020-03-09 11:36:41,NaT,RABIA MAQSOOD,BENIFICIARY,0,,NaT,...,PAKISTAN,,,,,PAKISTAN,Pakistan,,,


In [None]:
df_supplier_master_duplicate.to_sql("Duplicate_suppliers_same_entity", conn)

The below output will require analysis to ensure the TRN duplicates are based on different suppliers and not the same supplier. The reason for the analysis is due to the fact Khalifa University registers the same vendor for the same entity twice based on minor tweaks.

As a recommendation, Internal Audit should recommend to establish 1 vendor across Khalifa University or remove the duplicated vendors registered multiple times for the same entity.

In [None]:
df_supplier_master_TRN = df_supplier_master[df_supplier_master.duplicated(['TAX_REGISTRATION_NUM','OPERATING_UNIT','CITY'])]
df_supplier_master_TRN

Unnamed: 0,OPERATING_UNIT,REGISTER_MODE,SUPPLIER_NO,SUP_CREATION_DATE,SUPPLIER_INACTIVE_DATE,SUPPLIER_NAME,SUPPLIER_CATEGORY,TOTAL_APPROVE_PO_CNT,TOT_APPROVED_BPA_AMT,LAST_SUP_PO_REL_DATE,...,ADDRESS_LINE1,ADDRESS_LINE2,ADDRESS_LINE3,ADDRESS_LINE4,POSTAL_CODE,CITY,COUNTRY,SUP_PHONE_NUM,SUP_FAX_NUM,SUP_EMAIL_ADDRESS
43,KUX - External Khalifa University O.U,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
44,KUA - Ankabout Khalifa University,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
45,KUE - Ebitic Khalifa University OU,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
46,KUK - Khalifa University Ledger OU,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
49,KUJ - Aric Khalifa University O.U,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,MINA ZAYED- ABU DHABI,,,,6885,ABU DHABI,United Arab Emirates,02-6819539,02-6815501,techserveOA.ABD@alfuttaim.ae
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63035,KUADRIC OU,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,
63036,KUGRC OU,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,
63037,KUST Khalifa University of Science and Technol...,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,
63038,KUE - Ebitic Khalifa University OU,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,


In [None]:
df_supplier_master_TRN.to_sql("TRN_supplier_duplicates", conn)

The bank related details are not part of the master supplier list and has been read from the excel file shared. In case the supplier master contains the bank details, **the code will only be required to be modifie from "df_supplier_bank" to "df_supplier_master"**.

In [None]:
df_supplier_bank = pd.read_excel("C:/users/ku1016/downloads/Supplier bank detail.xlsx")
df_supplier_bank.head(5)

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
0,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,100317020003,ABU DHABI COMMERCIAL BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,AE230030000100317020003,ABU DHABI COMMERCIAL BANK,DUBAI RIQQA,United Arab Emirates,NaT
1,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,1005612298,ADCB,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ABU DHABI COMMERCIAL BANK,DUBAI RIQQA,United Arab Emirates,2011-11-15
2,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3001-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,DEIRA/DUBAI,United Arab Emirates,2013-03-21
3,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3002-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,AL AIN,United Arab Emirates,2013-03-21
4,100006,GILSON COMPANY INC.,Manual,VENDOR,,,310961077,United States,سهيل احمد خان غلام سروار خان,2020-11-16,...,NaT,KUST Khalifa University of Science and Technol...,1306992777,CNB BANK,,,CNB BANK,CLEARFIELD,United States,NaT


The output will require analysis to ensure the IBANs are for different suppliers, since at Khalifa University **multiple active accounts are valid for same supplier with same bank details, same bank account number and same address**.

In [None]:
df_supplier_bank_duplicate = df_supplier_bank[(df_supplier_bank.duplicated(["Iban", "Bank Account"])) & (df_supplier_bank['Iban'].notna())]
df_supplier_bank_duplicate

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
500,103518,KHADJETOU JED,Manual,BENIFICIARY,,,INDIVIDUAL -103518,United Arab Emirates,نجود غانم سالم سعيد الصيعري,2021-08-01,...,NaT,KUST Khalifa University of Science and Technol...,3707596970002,HSBC,KHADJETOU JED,AE360340003707596970002,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,2021-09-08
1012,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,012063581391,UNB,,AE510450000012063581391,UNION NATIONAL BANK,ABU DHABI SALAM BRANCH,United Arab Emirates,NaT
1013,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,1051001004868027,Al Dhafra Secondry Private School,,AE080271051001004868027,FIRST GULF BANK,AL AIN KHALIFA ST,United Arab Emirates,NaT
1014,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,1051001004868027,FAB,,AE020351051001004868027,FIRST ABU DHABI BANK,ABU DHABI KHALIFA ST,United Arab Emirates,NaT
1015,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,4021003307897811,FAB,,AE450354021003307897811,FIRST ABU DHABI BANK,ABU DHABI KHALIFA ST,United Arab Emirates,2020-05-28
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15859,99020,Mohamed Ibrahim Hassan Ali,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-08-31,...,NaT,KUST Khalifa University of Science and Technol...,24392765,ADIB,,AE050500000000024392765,ABU DHABI ISLAMIC BANK,ABU DHABI BANIYAS ST,United Arab Emirates,NaT
15860,99020,Mohamed Ibrahim Hassan Ali,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-08-31,...,NaT,KUST Khalifa University of Science and Technol...,24392765,ADIB,,AE050500000000024392765,ABU DHABI ISLAMIC BANK,ABU DHABI BANIYAS ST,United Arab Emirates,NaT
15864,9906,EDUTECH MIDDLE EAST (L.L.C.),Manual,VENDOR,225130,2018-06-05 00:00:00,225130-DUBAI,United Arab Emirates,,2010-06-23,...,2017-06-05 11:44:21,KUST Khalifa University of Science and Technol...,258955319001,ADCB,EDUTECH MIDDLE EAST (L.L.C.),AE800030000258955319001,ABU DHABI COMMERCIAL BANK,DUBAI AL MEENA ROAD,United Arab Emirates,NaT
15896,99407,EMILIO PORCU,Manual,BENIFICIARY,,,INDIAVIDUAL- 99407,United Arab Emirates,نجود غانم سالم سعيد الصيعري,2020-10-01,...,NaT,KUST Khalifa University of Science and Technol...,12231395001,HSBC,EMILIO PORCU,AE120200000012231395001,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,NaT


In [None]:
df_supplier_duplicate.to_sql("IBAN_supplier", conn)

## 5.2 Missing information for suppliers

All active vendors at Khalifa University are required to have valid IBAN, address and TRN numbers at a minimum. Below are the analysis:

In [None]:
df_supplier_missing_iban = df_supplier_bank[df_supplier_bank['Iban'].isnull()]
df_supplier_missing_iban

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
1,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,1005612298,ADCB,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ABU DHABI COMMERCIAL BANK,DUBAI RIQQA,United Arab Emirates,2011-11-15
2,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3001-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,DEIRA/DUBAI,United Arab Emirates,2013-03-21
3,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3002-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,AL AIN,United Arab Emirates,2013-03-21
4,100006,GILSON COMPANY INC.,Manual,VENDOR,,,310961077,United States,سهيل احمد خان غلام سروار خان,2020-11-16,...,NaT,KUST Khalifa University of Science and Technol...,1306992777,CNB BANK,,,CNB BANK,CLEARFIELD,United States,NaT
8,100023,SYSTEMS TECHNOLOGY INC,Manual,,2657,,95-1957989,United States,مجيد حسين طلحه محمد,2020-11-18,...,2020-11-18 10:04:17,KUST Khalifa University of Science and Technol...,546343200,PACIFIC WESTERN BANK,,,PACIFIC WESTERN BANK,1025 W 190TH STREET,United States,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15954,99893,HUAIYIN INSTITUTE OF TECHNOLOGY,Manual,,,,NO VAT - 99893,China,اياراكات عبدالمجيد,2020-11-05,...,NaT,KUST Khalifa University of Science and Technol...,536568825651,BANK OF CHINA,HUAIYIN INSTITUTE OF TECHNOLOGY,,BANK OF CHINA,JIANGSU BRANCH,China,NaT
15960,99899,THE RECTOR AND VISITORS OF THE UNIVERSITY OF V...,Manual,,,,NO VAT - 99899,United States,اياراكات عبدالمجيد,2020-11-05,...,NaT,KUST Khalifa University of Science and Technol...,004117975749,BANK OF AMERICA,THE RECTOR AND VISITORS OF THE UNIVERSITY OF V...,,BANK OF AMERICA,NEW YORK,United States,NaT
15962,99912,DUKE UNIVERSITY,Manual,,,,NO VAT -99912,United States,اياراكات عبدالمجيد,2020-11-08,...,NaT,KUST Khalifa University of Science and Technol...,2000048265067,"WELLS FARGO BANK, N.A,",DUKE UNIVERSITY,,"WELLS FARGO BANK, N.A,",301 S TRYON ST,United States,NaT
15970,99955,GOLDEN KEY INTERNATIONAL HONOUR SOCIETY,Manual,,,,NO VAT -99955,United States,اياراكات عبدالمجيد,2020-11-11,...,NaT,KUST Khalifa University of Science and Technol...,000000193666,BANK OF AMERICA,GOLDEN KEY INTERNATIONAL HONOUR SOCIETY,,BANK OF AMERICA,600 PEACHTREE ST. NE,United States,NaT


In [None]:
df_supplier_missing_iban.to_sql("Missing_supplier_Iban", conn)

In [None]:
df_supplier_missing_trn = df_supplier_bank[df_supplier_bank['Tax Registration No.'].isnull()]
df_supplier_missing_trn

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
9,100027,AADEL HASSAN MOHAMED MOHAMED ALHMOUDI,Manual,,,,,United Arab Emirates,نجود غانم سالم سعيد الصيعري,2020-11-18,...,NaT,KUST Khalifa University of Science and Technol...,28230787,ADIB,AADEL HASSAN MOHAMED MOHAMED ALHMOUDI,AE630500000000028230787,ABU DHABI ISLAMIC BANK,ABU DHABI BANIYAS ST,United Arab Emirates,NaT
30,100099,TRIDENT SUPPORT FLAG POLES L.L.C,Manual,,,,,United Arab Emirates,عبدالله راشد مبارك فهاد الهاجري,2020-11-23,...,NaT,KUST Khalifa University of Science and Technol...,1014833627701,ENBD,,AE250260001014833627701,EMIRATES NBD,DUBAI MALL BRANCH,United Arab Emirates,NaT
62,100315,Mutasem El Fadel,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-12-09,...,NaT,KUST Khalifa University of Science and Technol...,012245361001,HSBC,,AE800200000012245361001,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,NaT
64,100331,Ismail Aejaz Baig,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-12-10,...,NaT,KUST Khalifa University of Science and Technol...,11878472920001,ADCB,,AE400030011878472920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
82,100540,Nnamdi Valbocso Ugwuoke,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-12-23,...,NaT,KUST Khalifa University of Science and Technol...,11807768920001,ADCB,,AE750030011807768920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15904,99490,Aamir Younis Raja,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-07,...,NaT,KUST Khalifa University of Science and Technol...,012233219001,HSBC,,AE760200000012233219001,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,NaT
15933,99688,Daniel Johannes Van Tonder,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-22,...,NaT,KUST Khalifa University of Science and Technol...,11853072920001,ADCB,,AE190030011853072920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
15934,99689,Thripti Vijayakumar,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-22,...,NaT,KUST Khalifa University of Science and Technol...,11852376920001,ADCB,,AE930030011852376920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
15935,99690,Partha Guha,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-22,...,NaT,KUST Khalifa University of Science and Technol...,11855091920001,ADCB,,AE460030011855091920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT


In [None]:
df_supplier_missing_trn.to_sql("Missing_supplier_trn", conn)

## 5.3 AP Master

The below scripts help to analyze the employee payments as vendor payments, duplicate invoices / payments issued or multiple payments issued to the same vendor in a short duration of time.

In [None]:
## df_AP_invoices = pd.read_excel("C:/users/ku1016/downloads/AP Master.xlsx")
## df_AP_invoices.head(5)

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE
0,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-05-29
1,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
2,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
3,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-08-27
4,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27


In Khalifa University, employee payments are rendered as vendor payments. As per industry best practice, all employee related payments should be routed through HR payroll to ensure adequate controls are-in-place.

Below are the details:

In [None]:
df_AP_employees = df_AP_invoices[(df_AP_invoices['VENDOR_TYPE']=='EMPLOYEE')]
df_AP_employees

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE
10,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-07-21
11,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-07-21
12,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-07-21
13,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-07-21
14,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-07-21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
487983,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,25.0,0.0,No,No,2021-06-16
487984,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,ITEM,,,,,No,No,2021-06-16
487985,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,25.0,0.0,No,No,2021-06-16
487986,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,25.0,0.0,No,No,2021-06-16


In [None]:
df_AP_employee.to_sql("Employee_Payments_as_Vendor", conn)

In [None]:
df_AP_duplicate = df_AP_invoices[df_AP_invoices.duplicated()]
df_AP_duplicate

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE


In [None]:
df_AP_duplicate = df_AP_invoices[(df_AP_invoices.duplicated(['VENDOR_NO','INVOICE_NUM','INVOICE_DESCRIPTION'])) & (df_AP_invoices['INVOICE_APPROVAL_STATUS'] == "APPROVED")]
df_AP_duplicate

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE
1,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
2,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
4,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27
5,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27
6,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
487974,KUST Ledger,98330,ETAP AUTOMATION DMCC,VENDOR,18042021,8471056358,APPROVED,STANDARD,PMR20212184-Renewal: ETAP - Educational Licens...,2021-05-16,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,642.7,0.0,Yes,No,2021-04-18
487977,KUX - Ledger,58199,ALLIANCE GLOBAL FZ - LLC,VENDOR,202106165,843X007976,APPROVED,STANDARD,,2021-06-08,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,2250.0,0.0,Yes,No,2021-05-25
487978,KUX - Ledger,58199,ALLIANCE GLOBAL FZ - LLC,VENDOR,202106165,843X007976,APPROVED,STANDARD,,2021-06-08,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,2250.0,0.0,Yes,No,2021-05-25
487980,KUX - Ledger,58199,ALLIANCE GLOBAL FZ - LLC,VENDOR,202105753,843X007977,APPROVED,STANDARD,,2021-06-08,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,1125.0,0.0,Yes,No,2021-03-10


In [None]:
df_AP_duplicate.to_sql("SameInvoiceNum_SameInvoiceDescription_SameVendor_Approved_Invoice", conn)

## 5.4 Purchase Orders

The below scripts enables IA Department to view all purchase orders created after the approval date of the purchase orders. Two columns utilized are "CREATION_DATE" (date of creation) and "PO_APPROVED_DATE" (date of approval).

In [None]:
## df_POs = pd.read_excel("C:/users/ku1016/downloads/PO Master.xlsx")
## df_POs.head(5)

Unnamed: 0,PO_NUMBER,AUTHORIZATION_STATUS,PO_TYPE,ITEM_CATEGORY,ITEM_CATEGORY_DESCRIPTION,ITEM_CODE,ITEM_DESCRIPTION,GL_ENCUMBERED_DATE,CREATION_DATE,ENCUMBERED_AMOUNT,...,VENDOR_SITE,CURRENCY_CODE,AMOUNT_ORDERED,QUANTITY_DELIVERED,AMOUNT_DELIVERED,QUANTITY_BILLED,AMOUNT_BILLED,QUANTITY_CANCELLED,AMOUNT_CANCELLED,VENDOR_COUNTRY
0,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,470.0,...,DUBAI,AED,470.0,1.0,470.0,1.0,470.0,0.0,0.0,United Arab Emirates
1,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,594.0,...,DUBAI,AED,594.0,2.0,594.0,2.0,594.0,0.0,0.0,United Arab Emirates
2,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,777.0,...,DUBAI,AED,777.0,5.0,777.0,5.0,777.0,0.0,0.0,United Arab Emirates
3,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,2201.5,...,DUBAI,AED,2201.5,5.0,2201.5,5.0,2201.5,0.0,0.0,United Arab Emirates
4,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,2543.0,...,DUBAI,AED,2543.0,1.0,2543.0,1.0,2543.0,0.0,0.0,United Arab Emirates


In [None]:
df_POs['Approval'] = df_POs['PO_APPROVED_DATE']-df_POs['CREATION_DATE']

In [None]:
df_POs.to_sql('Creation_After_Approval', conn)