# Khalifa University - CACM

### Table of Contents

- *Student Master Data* 
- *Human Resources Master Data*
- *Finance Master Data*

### Master Data Naming Convention

The scripts and pandas dataframe across the jupyter notebook follows the following naming convention:

<div class="alert alert-block alert-warning">
<b>Example:</b> df_[type of database]_[master] </div>

where df is the dataframe
type of database is students, finance, human resources
master refers to master data

### Exceptions Naming Convention

Additionally, all exceptions are named using the following naming convention:

<div class="alert alert-block alert-warning">
<b>Example:</b> df_[type of database]_[master]_[exception] </div>

where df is the dataframe
type of database is students, finance, human resources
master refers to master data
exception refers to type of exception, for e.g., duplicate students, employee details, invoices, payments etc.

# Import Data from SQL Server

### Key Libraries

**Pymssql** is a sql connector package that enables to extract all information in the server directly into pandas dataframe as an input.

Using pymssql extract all information from different sql queries and can be utilized for analysis / creating the mongoDB. 

The below scripts in other sections can utilize the pandas dataframe directly for running the analysis instead of connecting to excel files.

**Pandas** - Pandas is a fast, powerful, flexible and easy to use open source library which enables the user to perform data manipulation from any source (SQL, Excel, JSON, etc.)

**Numpy** - Numpy is a versatile tool in python for data analysis which enables to user to perform analysis around n-dimensional arrays, numerical computing tools and object oriented programming for ensuring readability / scalability of the code.

**Time** - Time is a library to import and stamp the current data / time of analysis and store for future information incase of tracking previous iteration of exceptions.

Additionally, the naming convention within python can assist in utilizing these libraries as abbreviations.

<div class="alert alert-block alert-warning">
<b>Example:</b> **import pandas as pd** enables the user to use "pd" within the python code to the use pandas library</div>

In [26]:
import pymssql
import pyodbc
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

In [27]:
conn = pymssql.connect(server = "KU1ICDDWV011", user = "cacm_user", password = "Welcome#123", database = "CACM")
cursor = conn.cursor()

To extract the relevant information from the SQL server, below the following example:

<div class="alert alert-block alert-warning">
<b>Example:</b> Query = SELET * FROM [Database] WHERE [Condition] </div>

where Database is the name of the database;
and conditions can be added as required such as WHERE, GroupBy, Join etc.

**Extract Student Master Information**

In [3]:
query = "SELECT * FROM dbo.KU_SRC_STD_MASTER"

df_student_master = pd.read_sql(query, conn)

**Extract Student Course Schedule**

In [4]:
query = "SELECT * FROM dbo.KU_SRC_STD_CRS_SCHEDULE"

df_student_course_schedule = pd.read_sql(query, conn)

**Extract Student Attednance**

In [5]:
query = "SELECT * FROM dbo.KU_SRC_STD_ATTENDANCE"

df_student_attendance = pd.read_sql(query, conn)

**Extract Employee Master Information**

In [6]:
query = "SELECT * FROM dbo.KU_SRC_EmployeeDetails"

df_employee_master = pd.read_sql(query, conn)

**Extract Employee Bank Details**

In [7]:
query = "SELECT * FROM dbo.KU_SRC_EmployeesBankDetails"

df_employee_bank_details = pd.read_sql(query, conn)

**Extract Employee Leaves**

In [8]:
query = "SELECT * FROM dbo.KU_SRC_EmployeesLeaves"

df_employee_leaves = pd.read_sql(query, conn)

**Extract AP Invoices**

In [9]:
query = "SELECT * FROM dbo.KU_SRC_AP_Invoices"

df_AP_invoices = pd.read_sql(query, conn)

**Extract Purchase Order Master**

In [10]:
query = "SELECT * FROM dbo.KU_SRC_POs"

df_POs = pd.read_sql(query, conn)

**Extract Purchase Requisitions Information**

In [11]:
query = "SELECT * FROM dbo.KU_SRC_PRs"

df_PRs = pd.read_sql(query, conn)

**Extract IT Tickets**

In [12]:
query = "SELECT * FROM dbo.KU_SRC_IT_Tickets"

df_it_tickets = pd.read_sql(query, conn)

**Extract Supplier Information**

In [13]:
query = "SELECT * FROM dbo.KU_SRC_Suppliers"

df_supplier_master = pd.read_sql(query, conn)

## Write into SQL Server

For writing files / exceptions into SQL Server, the following code can be utilized:

<div class="alert alert-block alert-warning">
<b>Example:</b> Code = [dataframe].to_sql("[Name of table / database]", conn) </div>

where dataframe is the name of the pandas dataframe
to_sql writes into the SQL server
Name of table / database is the name of the database in the SQL server
conn is the connection established to the SQL server

# Scripts for Student, Procurement and Finance Master Data

# 1. Student Master Data

Pandas dataframe library within python enables to import information from any database including SQL, Excel, Text, JSON, etc. To utilize Pandas Dataframe, it is important to designate the type of read access.

For importing from excel database the following naming convention is used:

<div class="alert alert-block alert-warning">
<b>Example:</b> df_[type of database]_[master] = pd.read_excel("[Name of file].xlsx" </div>

In [14]:
df_student_master.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,


## 1.1 Duplicate Student Records

### 1.1 Overall Duplicates

The below set of scripts enable the user to identify the duplicates in student master data based on **student ID, student names, emirates ID, passport details and length of emirates ID**.

Additionally, the script can further be modified to track active and inactive students to identify the admission of academically dismissed students to further test the re-admission of such students as per KU policies and procedures.

In [15]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated()]

In [16]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR


### 1.1.2 Student ID Duplicates

Student ID line items are stored in the database as column name **"ID"**. Therefore, to identify the duplicates, the script is enabled on column name "ID".

In [17]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated(['ID'])]

In [18]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR


### 1.1.3 Student Names Duplicates

Student Names line items are stored in the database as column name **"FULL_NAME"**. Therefore, to identify the duplicates, the script is enabled on column name "FULL_NAME".

In [30]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated(['FULL_NAME'])]

In [20]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR
3589,100058270,,Mohammad,Awadhalla,Mohammad Fadi Awadhalla,محمد فادي عوض الله,202010,2020-08-23 00:00:00,M,A,...,,,,Albarsha 2 Street 29 Villa 8,Dubai,United Arab Emirates,,,,Khaled Elbassioni
4156,100035081,,Alya,Alsaadi,Alya Alsaadi,,200920,2009-11-23 00:00:00,F,S,...,,,,,Abu Dhabi,United Arab Emirates,,,,
6735,100042377,,Hend,Mohamed Ahmed Almarzooqi,Hend Ahmed Mohamed Ahmed Almarzooqi,,201510,2015-08-23 00:00:00,F,A,...,,,,AL FALAH,Abu Dhabi,United Arab Emirates,,,,
6986,100001904,,Abdulla,Al Ali,Abdulla Rashid Al Ali,,200710,2007-06-18 12:06:23,M,S,...,,,,Abu Dhabi 77334,Abu Dhabi,United Arab Emirates,,,,
7092,100048093,,Nouf,Alzaabi,Nouf Ebrahim Alzaabi,نوف ابراهيم سالم مسعود الزعابي,201310,2013-09-08 00:00:00,F,M,...,,,,UN,UN,United Arab Emirates,,,,
7722,100046318,,Hanan,Hamdan,Hanan Ahmad Mohammad Hamdan,حنان احمد محمد حمدان,201520,2016-01-10 00:00:00,F,P,...,,,,P.O. Box 1065,Al Ain,JORDAN,,,0.0,
7757,100052916,,Fadi,Dawaymeh,Fadi Zeyad Dawaymeh,فادي زياد نواف دوايمه,201910,2019-08-25 00:00:00,M,A,...,,,,"Abu Dhabi,behind one to one hotel, Malqatah st...",Abu Dhabi,JORDAN,--,,,Nahla Saeed Al Amoodi
7808,100058255,,Dima,Ali,Dima Samer Ali,ديمة سامر علي,202010,2020-08-23 00:00:00,F,A,...,,,,Abu Dhabi,Abu Dhabi,JORDAN,,,,Isam Mustafa Janajreh
7833,100043364,,Fareha,Nasim,Fareha Zainab Nasim,,201520,2016-01-10 00:00:00,F,A,...,,,,Fatima Bint Mubarak Street,Abu Dhabi,PAKISTAN,,,,
7962,100061899,,Muhammad Ahmed,Humais,Muhammad Ahmed Humais,محمد أحمد حميس,202120,2022-01-17 00:00:00,M,A,...,,,,"R-1931, Block-14, Federal B Area",Karachi,PAKISTAN,,,,Mahmoud Al Qutayri


In [21]:
## write into excel; if required

## df_student_master_duplicate.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Duplicate Student Names.xlsx")

### 1.1.4 Emirates ID Duplicate

Student Emirates ID line items are stored in the database as column name **"EMIRATES_ID"**. Therefore, to identify the duplicates, the script is enabled on column name "EMIRATES_ID".

In [123]:
df_student_master_duplicate = df_student_master[df_student_master.duplicated(['EMIRATES_ID'])]

In [124]:
df_student_master_duplicate

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,MOBILE3,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
6,100020031,,Hamad,Tunaiji,Hamad Saeed Mohammed Tunaiji,,200810,2008-08-23,M,S,...,,,,,,United Arab Emirates,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16518,100061196,,Maryam,Ali,Maryam Abdulla Ali,مريم عبدالله علي,202110,2021-08-22,F,A,...,,,,House 1497 Road 1434 Block 514,Jid alhaj,BAHRAIN,,,,
16545,100058885,,Muhammad,Danishwar,Muhammad Zulfiqar Ahmad Danishwar,,202010,2020-08-23,M,A,...,,,,"470/02, Near Jiyo Banwuet Hall, Behind Butt Sw...",Lahore,PAKISTAN,,,,Isam Mustafa Janajreh
16553,100058294,,Nowshin Radiya,Kabir,Nowshin Radiya Kabir,,202010,2020-08-23,F,A,...,,,,P.O. Box 42069,Abu Dhabi,NEW ZEALAND,,,,Mohammad Abu Haija
16555,100058314,,Cyril,Pepple,Cyril Christopher Pepple,,202010,2020-08-23,M,A,...,,,,"10 Fatomi Crescent, Bajulaye Compound, Somolu",Lagos,NIGERIA,,,,Mohammed Al Kobaisi


In [125]:
## write into excel; if required

## df_student_master_duplicate.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Emirates ID Duplicate.xlsx")

### 1.1.5 Emirates ID

Emirates ID across the UAE contain exactly 15 digits, any deviation to the length of Emirates ID is an invalid detail. To identify such discrepancies, we utilize the **str.len() function** of python which counts the number of characters of each cell in the column.

Further, we filter the results for more than and less than **15 digits to identify discrepancies**.

We noted many student details did not contain Emirates ID details and such students are discarded as part of the test since all those students are inactive.

In [127]:
df_student_master['Length'] = df_student_master['EMIRATES_ID'].str.len()

In [128]:
df_student_master_EID = df_student_master[(df_student_master['Length'] < 15) | (df_student_master['Length'] > 15)]

In [130]:
df_student_master_EID

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length
58,100020289,,Marwa,Jaber,Marwa Ali Awadh Jaber,مروة علي عوض جبر,200810,2008-08-23,F,A,...,,,P O Box 7778,Abu Dhabi,United Arab Emirates,MaisaAli,0506118857,00,,7.0
128,100035191,,Zainab,Moazzam,Zainab Muhammad Moazzam Moazzam,زينب محمد معظم,201010,2010-05-01,F,A,...,,,P.O BOX 127788,Abu Dhabi,PAKISTAN,,,00,,7.0
249,100038590,,Mona,Ali Allbishr,Mona Ahmed Saleh Ali Allbishr,منى أحمد صالح البشر,201210,2012-09-02,F,S,...,,,P.O.Box 3211,Sharjah,United Arab Emirates,,,,,9.0
281,100038884,,Amna,Alzubaidi,Amna Abdulmajeed Brek Saeed Alzubaidi,أمنه عبد المجيد بريك الزبيدي,201210,2012-09-02,F,A,...,,,,-,United Arab Emirates,,,00,,9.0
463,100038838,,Lina,El-Haj,Lina Hesham El-Haj,لينا هشام الحاج,201210,2012-09-02,F,A,...,,,Madinet Zayed Falah Street,Abu Dhabi,AUSTRALIA,,,00,,9.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14654,100035189,,Waffa,Fadol,Waffa Badraldin Mohmed Fadol,وفاء بدر الدين محمد فضل,201010,2010-05-01,F,A,...,507730409,,P O Box 18616,Al Ain,SUDAN,BadraldinFadol,569139349,00,,8.0
14692,100038265,,Essa,Alshahei,Essa Ahmad Ali Alshahei,عيسى أحمد على الشحى,201210,2012-09-02,M,A,...,,,Al Warqaa 4 53A,Dubai,United Arab Emirates,,,,,9.0
14695,100033711,,Sara,Bazhair,Sara Khaled Abdulla Bazhair,ساره خالد عبدالله سالم سعيد بازهير,200910,2009-08-23,F,A,...,,,P O Box 89,Abu Dhabi,United Arab Emirates,NouraBazuhair,0507114172,00,,7.0
14742,100036765,,Mohammad Ather,Ali,Mohammad Ather Rana Ali Ali,محمد اطهر رانا علي أصغر,201110,2011-09-11,M,A,...,,,,-,PAKISTAN,,,00,,7.0


### 1.1.6 Passport

Student Passport Details line items are stored in the database as column name **"PASSPORT_ID"**. Therefore, to identify the duplicates, the script is enabled on column name "PASSPORT_ID".

Additionally, we can also filter results for all active students for whom the passport details are not entered in the system. This test is necessary for international / students of other nationalities.

In [133]:
df_student_master_Passport = df_student_master[(df_student_master['STUDENT_STATUS'] == 'Active') & (df_student_master['PASSPORT_ID'] == "NULL")]

In [134]:
df_student_master_Passport

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,PARENT_PHONE,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length


We can also test for the length in digits for passport details, however the details may or may not be accurate as the number of digits for passport differs across countries / nations.

In [135]:
df_student_master['Passport_Length'] = df_student_master['PASSPORT_ID'].str.len()

In [136]:
df_student_master['Passport_Length']

0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
        ... 
16587    9.0
16588    9.0
16589    9.0
16590    9.0
16591    9.0
Name: Passport_Length, Length: 16592, dtype: float64

In [137]:
df_student_master_Passport = df_student_master[(df_student_master['STUDENT_STATUS'] == 'Active') & (df_student_master['Passport_Length'] < 7)]

In [138]:
df_student_master_Passport

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length,Passport_Length
141,100036658,,Sarah,Azzam,Sarah Kassem Azzam,ساره قاسم عزام,201720,2018-01-14,F,A,...,,,Abu Dhabi,PALESTINE,--,,,Abdulrahim Abdulrahman Sajini,15.0,6.0
8301,100061297,,Malak,Zahran,Malak Yasser Zahran,ملك ياسر زهران,202110,2021-08-22,F,A,...,,Almamoora,Abu Dhabi,JORDAN,WafaNubani,,,Sean Shan Min Swei,15.0,6.0
9805,100058043,,George,Hajjar,George Abdulmassih Hajjar,جورج عبدالمسيح حجار,201920,2020-01-12,M,A,...,,Muwajii,Abu Dhabi,PALESTINE,,,,Abdullahi Umar,15.0,6.0
11194,100047903,,Khadije,El Kadi,Khadije El Kadi,خديجه شكري القاضي,202110,2021-08-22,F,A,...,,"Al Andalos Building, Flat 1801",Abu Dhabi,PALESTINE,,,,Isam Mustafa Janajreh,15.0,6.0
12309,100046297,,Omar,Elkhatib,Omar Salah Elkhatib,عمر صالح الخطيب,201920,2020-01-12,M,A,...,,Estiqlal Street,Abu Dhabi,PALESTINE,,,,Andreas Schiffer,15.0,6.0
12780,100039053,,Alyazyah,Alsuwaidi,Alyazyah Ahmed Saeed Binshaheen Alsuwaidi,اليازية أحمد بن شاهين السويدي,201941,2019-08-18,F,,...,,P.O.Box 44501,Sharjah,United Arab Emirates,NouraAlsuwaidi,971502121022.0,,Peter Corridon,15.0,6.0
14189,100059989,,Ibrahim,Zaydan,Ibrahim Ibrahim Zaydan Zaydan,ابراهيم عبد الرحمن زيدان,202020,2021-01-17,M,A,...,,Delma Street,Abu Dhabi,PALESTINE,,,,Aymen Laadhari,15.0,6.0
16450,100061132,,Saly,Srouji,Saly Oussama El Srouji,سالي أسامة السروجي,202110,2021-08-22,F,A,...,,Zayed The First Street,Abu Dhabi,PALESTINE,,,,Arjen Rene' Van Vliet,15.0,6.0


In [139]:
## write into excel; if required

## df_student_master_Passport.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Passport Length.xlsx")

## 1.2 Re-admission of previous students

Students at Khalifa University may be re-admitted based on appopriate approvals from Academic Management. To analyze the re-admitted students, we can identify the students admitted based on same Emirates ID versus the previous inactive accounts.

This will require to analyze the output report to identify such instances as one column does not notify of re-admission.

In [146]:
## df_student_master = pd.read_excel("C:/Users/ku1016/Downloads/Student Master Data.xlsx")
## df_student_master.head(5)
df_student_master.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16592 entries, 0 to 16591
Data columns (total 59 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   ID                              16592 non-null  object        
 1   NAPO_ID                         0 non-null      object        
 2   FIRST_NAME                      16591 non-null  object        
 3   LAST_NAME                       16592 non-null  object        
 4   FULL_NAME                       16592 non-null  object        
 5   ARABIC_NAME                     14294 non-null  object        
 6   ADMIT_TERM                      16592 non-null  object        
 7   START_DATE                      16592 non-null  datetime64[ns]
 8   GENDER                          16592 non-null  object        
 9   CAMPUS                          16511 non-null  object        
 10  MAX_TERM                        8838 non-null   object        
 11  ST

In [147]:
df_student_master_Readmission = df_student_master[(df_student_master['STUDENT_STATUS'] == 'Inactive') & (df_student_master.duplicated(['EMIRATES_ID']))]

In [148]:
df_student_master_Readmission

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length,Passport_Length
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,200810,2008-08-23 00:00:00,M,S,...,,,,United Arab Emirates,,,,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,200810,2008-08-23 00:00:00,M,S,...,,,,United Arab Emirates,,,,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,200810,2008-08-23 00:00:00,M,S,...,,,,United Arab Emirates,,,,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,200810,2008-08-23 00:00:00,M,S,...,,,,United Arab Emirates,,,,,,
6,100020031,,Hamad,Tunaiji,Hamad Saeed Mohammed Tunaiji,,200810,2008-08-23 00:00:00,M,S,...,,,,United Arab Emirates,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15500,100001643,,Mohamed,Mohamed,Mohamed Mohamed Mohamed,,200410,2004-06-30 09:41:27,M,S,...,,5601,Sharjah,United Arab Emirates,,,,,,
15536,100050089,,Saheed,Lateef,Saheed Lateef,,201820,2019-01-13 00:00:00,M,A,...,,Department of Chemical Engineering,Dhahran,NIGERIA,,,,,,9.0
15537,100050099,,Weichen,Zhan,Weichen Zhan,,201820,2019-01-13 00:00:00,M,A,...,,"Student Apartment, Xiamen Uni.",Xiamen City,CHINA,,,,,,9.0
15646,100053165,,Amal,Almarzooqi,Amal Abdulmonem Mohamed Darwish Almarzooqi,امل عبدالمنعم محمد درويش المرزوقي,201910,2019-08-25 00:00:00,F,A,...,,Khalifa City,Abu Dhabi,United Arab Emirates,,,,Mohammad Eid Alsuwaidi,15.0,9.0


## 1.3 Missing Student Information

All active students should have complete information as per the registrar records and protocol within the master data. The below script analyzes the missing information to ensure completeness of the data.

To review the missing information, the library for isnull() is utilized which identifies the missing elements across all columns using the axis (column) technique.

In [48]:
## df_student_master = pd.read_excel("C:/Users/ku1016/Downloads/Student Master Data.xlsx")

In [49]:
df_student_master.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length,Passport_Length
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,


In [51]:
df_student_missing = df_student_master[df_student_master.isnull().any(axis=1)]
df_student_missing.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length,Passport_Length
0,100020068,,Mohammed,Al Khaja,Mohammed Ali Al Khaja,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
1,100020083,,Rashid,Al Murashda,Rashid Rashed Saeed Al Murashda,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
2,100020008,,Abdulla,Mohammed,Abdulla Ahmed Abdullah Mohammed,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
3,100020015,,Ahmed,Al Wahbi,Ahmed Abdullah Ibrahim Al Wahbi,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,
4,100020016,,Ahmed,Al Suwaidi,Ahmed Mohammed Sultan Al Suwaidi,,200810,2008-08-23,M,S,...,,,,United Arab Emirates,,,,,,


In [149]:
df_student_missing_final = df_student_missing[(df_student_missing['STUDENT_STATUS'] == 'Active')]

In [150]:
df_student_missing_final.head(5)

Unnamed: 0,ID,NAPO_ID,FIRST_NAME,LAST_NAME,FULL_NAME,ARABIC_NAME,ADMIT_TERM,START_DATE,GENDER,CAMPUS,...,RESIDENCE_PHONE,POBOX,CITY,NATIONALITY,EMERGENCY_CONTACT_NAME,EMERGENCY_CONTACT_INFORMATION,ACADEMIC_STANDING,PRIMARY_ADVISOR,Length,Passport_Length
68,100031581,,Falah,Alhammadi,Falah Mohamed Amer Yousuf Alhammadi,فلاح محمد عامر يوسف الحمادى,202010,2020-08-23,M,A,...,,P O Box 50655,Abu Dhabi,United Arab Emirates,AdelAl Hammadi,504460448.0,,Anas Zakaria Alazzam,15.0,9.0
135,100035244,,Abdulrahman,Agha,Abdulrahman Mohamad Agha,عبدالرحمن محمد عمر آغا,201910,2019-08-25,M,A,...,,Al Sulaimaniah,Riyadh,LEBANON,,,,Anas Zakaria Alazzam,15.0,9.0
141,100036658,,Sarah,Azzam,Sarah Kassem Azzam,ساره قاسم عزام,201720,2018-01-14,F,A,...,,,Abu Dhabi,PALESTINE,--,,,Abdulrahim Abdulrahman Sajini,15.0,6.0
175,100020345,,Maryam,Al Ali,Maryam Mohamed Abdulla Mohamed Al Ali,مريم محمد عبدالله محمد العلي,202010,2020-08-23,F,A,...,,PO Box 127788,Abu Dhabi,United Arab Emirates,,,,Maryam Rashed Abdulrahman Alkindi,15.0,9.0
209,100040178,,Amna,Alshehhi,Amna Ali Samrah Ali Alshehhi,آمنه علي صمره علي الشحي,201910,2019-08-25,F,A,...,,"Alhumaidia, Sheikh Maktoum Bin Rashed Street",Ajman,United Arab Emirates,--,,,Panagiotis Liatsis,15.0,9.0


In [151]:
df_student_missing_final.to_sql("Missing_student_details", conn)

DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': 'params' arg (<class 'list'>) can be only a tuple or a dictionary.

# 2. Student Attendance

## 2.1 Student Attendance as per Policy

Student Attendance in Khalifa University as per the policies and procedures, follow two different protocols:

- 80% and above for undergraduates
- 50% and above for graduates

For the same, we analyze the student attendance master data in the following manner:

- Identify the Total Classes in the semester
- Calculate the %age of absences using the new Total Classes figure and number of absences
- Identify discrepancies of more than 20% and registration status as "RE" (RE refers to still registered)

<div class="alert alert-block alert-warning">
<b>Example:</b>((Total Classes)/(Total Absences))*100; Highlight all exceptions</div>

In [55]:
##import master data for excel

## df_student_attendance = pd.read_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Data/Student Attendance Data.xlsx")

In [56]:
df_student_attendance.head()

Unnamed: 0,TERM_CODE,COURSE,CRN,DIVISION,COURSE_TITLE,FACULTY_ID,FACULTY_NAME,CAMPUS,STUDENT_ID,STUDENT_NAME,REG_STATUS,TOTAL_ABSENCES,TOTAL_ATTENDED
0,202120,NUCE601,21075,Nuclear Engineering,Thermal Hydraulics in Nuc Syst,100035101,Yacine Addad,H,100060563,Nouf Talib Alhattawi,RE,0.0,8.0
1,202110,NUCE602,11347,Nuclear Engineering,"Nuc. Materials, Str Int & Chem",100036155,Yongsun Yi,H,100060563,Nouf Talib Alhattawi,RE,0.0,33.0
2,202110,NUCE602,11347,Nuclear Engineering,"Nuc. Materials, Str Int & Chem",100036155,Yongsun Yi,H,100041411,Saif Khalifa Salem Alfalasi,RE,1.0,32.0
3,202120,NUCE603,21076,Nuclear Engineering,Nuclear Reactor Theory,100043529,Saeed Alameri,H,100060563,Nouf Talib Alhattawi,RE,0.0,8.0
4,202110,NUCE624,11008,Nuclear Engineering,Rad. Damage and Nuc. Fuels,100036155,Yongsun Yi,H,100035400,Yousif Yaqoub Yousif Alhosani,RW,0.0,36.0


In [152]:
df_student_attendance['Total Classes'] = df_student_attendance["TOTAL_ABSENCES"]+df_student_attendance["TOTAL_ATTENDED"]

In [153]:
df_student_attendance['Absence Percentage'] = (df_student_attendance['TOTAL_ABSENCES']/df_student_attendance['Total Classes'])*100

In [154]:
df_student_attendance_exception = df_student_attendance[(df_student_attendance['Absence Percentage']>20) & (df_student_attendance['REG_STATUS']=="RE")]

In [155]:
df_student_attendance_exception

Unnamed: 0,TERM_CODE,COURSE,CRN,DIVISION,COURSE_TITLE,FACULTY_ID,FACULTY_NAME,CAMPUS,STUDENT_ID,STUDENT_NAME,REG_STATUS,TOTAL_ABSENCES,TOTAL_ATTENDED,Total Classes,Absence Percentage
1127,202120,ENGR111,20143,Engineering Department,Engineering Design,100046297,Omar Elkhatib,H,100060408,Noura Younis Abdulla Abbas Alkhouri,RE,2.0,6.0,8.0,25.0
1441,202120,ENGR111,20154,Engineering Department,Engineering Design,100046227,Valerie Eveloy,H,100061255,Meera Khamis Bati Abdulla Almheiri,RE,2.0,6.0,8.0,25.0
1488,202120,ENGR111,21218,Engineering Department,Engineering Design,100059712,Sanjana Chandran,H,100059774,Khalifa Abdalla Khalifa Salman Allahaf,RE,2.0,6.0,8.0,25.0
1506,202120,ENGR111,21218,Engineering Department,Engineering Design,100059712,Sanjana Chandran,H,100060157,Abdulla Jasim Mohamed Abdulla Alhosani,RE,3.0,5.0,8.0,37.5
1510,202120,ENGR111,21218,Engineering Department,Engineering Design,100059712,Sanjana Chandran,H,100059157,Rayan Ali Ahmed Ali Alhefeiti,RE,2.0,6.0,8.0,25.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37618,202142,MDBS709,40010,Medicine and Health Sciences,Nervous System,100057466,Israel Alfonso,A,100058299,Abdul Rahman Walid Taha,RE,3.0,7.0,10.0,30.0
38355,202120,ECCE427,20837,Electrical Engr & Computer Sci,Power System Protection,100046212,Khaled Al Jaafari,A,100044948,Omar Rashed Humaid Saeed Alshamsi,RE,3.0,5.0,8.0,37.5
40463,202120,PHYS122,20432,Physics,University Physics 2,100052467,Taha Ismail,H,100053616,Rashed Jamal Mohamed Khamis Aldosari,RE,1.0,3.0,4.0,25.0
41041,202120,COSC434,21182,Electrical Engr & Computer Sci,Intro to Machine Learning,100037703,Ahmed Alhammadi,A,100049821,Mohammad Kamal Ali Almaqadmeh,RE,1.0,3.0,4.0,25.0


In [156]:
## write into excel; if required

## df_student_attendance_exception.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Attendance Exception.xlsx")

In [157]:
df_student_attendance_exception.to_sql("Student_Attendance", conn)

  sql.to_sql(


DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': 'params' arg (<class 'list'>) can be only a tuple or a dictionary.

# 3. Student Courses

The below mentioned data source is imported to analyze the student courses within Khalifa University

In [158]:
## import master data for excel

## df_student_courses = pd.read_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Data/Student Course Data.xlsx")

In [159]:
df_student_courses_schedule

NameError: name 'df_student_courses_schedule' is not defined

## 3.1 Student Max Enrollment

Each course in Khalifa University has a maximum alloted number of students based on classroom size and in compliance with CAA standards to maintain student to faculty ratio.

As per the below analysis, we identified the maximum number of students as per a course (MAX_ENROLLMENT_ALLOWED_CRN) versus the number of students registered (NBR_REGISTERED_CRN) for the semester.

All exceptions are noted and identified

<div class="alert alert-block alert-warning">
<b>Example:</b>(Maximum number of students enrolled in a CRN) - (Number of students registered in a CRN); Highlight all exceptions</div>

In [160]:
df_student_courses_schedule['Above Max Enrollment'] = df_student_courses_schedule['MAX_ENROLLMENT_ALLOWED_CRN'] - df_student_courses_schedule['NBR_REGISTERED_STUDENTS_CRN'] 

NameError: name 'df_student_courses_schedule' is not defined

In [161]:
df_student_courses_max = df_student_courses_schedule[df_student_courses_schedule['Above Max Enrollment'] <0]

NameError: name 'df_student_courses_schedule' is not defined

In [162]:
df_student_courses_max

NameError: name 'df_student_courses_max' is not defined

In [163]:
## write into excel; if required

## df_student_courses_max.to_excel('C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Above Max Enrollment.xlsx')

In [164]:
df_student_courses_max.to_sql("Above_Max_Courses", conn)

NameError: name 'df_student_courses_max' is not defined

## 3.2 Scheduling of Courses

The below code analyzes the student courses schedule the same instructor / instructors during the same slot on multiple different days / periods. However, on analysis the output below showcases due to the multiple instructors assigned for each individual course, the exceptions are not valid.

As a recommendation, the output should be analyzed prior to reporting instances.

In [165]:
df_student_courses_schedule = df_student_courses

NameError: name 'df_student_courses' is not defined

In [166]:
df_student_courses_schedule_exception = df_student_courses_schedule[(df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','SUN']) | 
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','MON']) |
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','TUE']) |
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','WED']) |
                                                                    df_student_courses_schedule.duplicated(['INSTRUCTOR_NAME','SCHEDULE_START_TIME','SCHEDULE_END_TIME','THU']))
                                                                    & df_student_courses_schedule['SUN'].notna()]

NameError: name 'df_student_courses_schedule' is not defined

In [167]:
df_student_courses_schedule_exception

NameError: name 'df_student_courses_schedule_exception' is not defined

In [168]:
## write into excel; if required

## df_student_courses_schedule_exception.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Student Courses.xlsx")

In [169]:
df_student_courses_schedule_exception.to_sql("Duplicate_Course_Schedule", conn)

NameError: name 'df_student_courses_schedule_exception' is not defined

# 4. Human Resources

The below code analyzes the duplicate employee accounts, emirates ID, employee leaves, bank details and missing information for employees. For each analysis the scripts generate different outputs stored as multiple links. Below are the in-built logic drivers:

1. Employee accounts - duplicates identified based on employee number and person ID
2. Employee leaves - identify all types of leaves availed for more than 60 day period. The output should be analyzed for consistency as maternity leave can vary
3. Bank details - duplicate bank details / incorrect bank details for employees are highlighted
4. Missing information - key missing information across the HR data is highlighted such as emirates id, contract date etc. for active employees as per the data source
5. Contract date - to analyze if any active employees are working in Khalifa University without valid contracts

In [170]:
## import master data for excel

## df_employee_master = pd.read_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Data/HR Master Data.xlsx")

In [171]:
df_employee_master

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
0,100043305,308866.0,Shaima Ahmad Abdulaziz Ahli,شيماء احمد عبدالعزيز اهلى,STAFF.2020208,STAFF,2020208,موظف,Specialist,Off Chart - Marketing and Communication,...,Marketing and Communication,إدارة التسويق والاتصال,Office of the Executive Vice President,مكتب نائب الرئيس التنفيذي,President Office,مكتب الرئيس,,KU100,Arif Sultan Al Hammadi,31-Dec-2021
1,100046247,309197.0,Chong Un Pyon,تشونج اون بيون,Adjunct Faculty.1062,Adjunct Faculty,1062,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,31-Dec-2022
2,100046248,309466.0,Ayoung Sohn,ايونج سوهن,Adjunct Faculty.1061,Adjunct Faculty,1061,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,Humanities and Social Sciences,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,31-Dec-2022
3,100049466,778886.0,Akihide Hidaka,أكيهايد هيديكا,Adjunct Faculty.878,Adjunct Faculty,878,Adjunct Faculty,Adjunct Faculty,Nuclear Engineering Institute,...,Nuclear Engineering Institute,الهندسة النووية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,31-May-2022
4,100049994,819498.0,Riyazdheen Kaffar,رياضدين كفار,Driver.5042,Driver,5042,سائق,Driver,College of Medicine & Health Sciences,...,College of Medicine & Health Sciences,كلية الطب,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,KU104,John Aubrey Rock,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1259,PI4182,625335.0,Sawsan Hussain Mohammadi,سوسن حسين محمدي,STAFF.2020263,STAFF,2020263,موظف,Specialist,"Off Chart - Administration, Facilities and EHS",...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,Executive Management Sector,,,KU947,Adnan Jasem Yaqoob AlMansoori,31-Dec-2021
1260,PI4204,625079.0,Basem Al Shaabi,باسم سيف محمد الشعبي,STAFF.2020058,STAFF,2020058,موظف,Specialist,Off Chart - Academic and Student Services,...,Academic and Student Services,قطاع الخدمات الطلابية والاكاديميه,Academic and Student Services,قطاع الخدمات الطلابية والاكاديميه,Academic and Student Services,قطاع الخدمات الطلابية والاكاديميه,,KU105,Ahmed Al Shoaibi,31-Dec-2021
1261,PI4223,624922.0,Ayesha Abdulla Al Zaabi,عائشة عبدالله سليمان الزعابي,STAFF.2020097,STAFF,2020097,موظف,Specialist,"Off Chart - Administration, Facilities and EHS",...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,Executive Management Sector,,,KU947,Adnan Jasem Yaqoob AlMansoori,31-Dec-2021
1262,PI4227,625073.0,Abdulla Al Hosani,عبدالله سليمان عبدالله الحوسني,STAFF.2020098,STAFF,2020098,موظف,Specialist,"Off Chart - Administration, Facilities and EHS",...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,"Administration, Facilities and EHS",إدارة الشؤون الإدارية والمرافق ، البيئة و الص...,Executive Management Sector,,,KU947,Adnan Jasem Yaqoob AlMansoori,31-Dec-2021


## 4.1 Duplicate Emirates ID

In [172]:
df_employee_EmiratesID = df_employee_master[df_employee_master.duplicated(['EmiratesID'])]

In [173]:
df_employee_EmiratesID

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,Department,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End
68,KU1058,1563659.0,Nurlan Akhmetov,نورلان احمدوف,CORE Lab Specialist.1010972,CORE Lab Specialist,1010972,اخصائي مختبر,Specialist,ACBC Core Labs,...,Research Laboratories,مختبرات البحوث,Research and Development,قطاع البحوث والتطوير,Research and Development,قطاع البحوث والتطوير,,,,09-Jan-2025
69,KU1059,1563078.0,Michael Hughes,مايكل هيوز,Professor.1011033,Professor,1011033,استاذ جامعي,Professor,Biomedical Engineering,...,Biomedical Engineering,الهندسة الطبية الحيوية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,31-Dec-2024
72,KU1061,1563115.0,Alessandro Giacomo Maria Gardi,أليساندرو ا غاردي,Assistant Professor.1011079,Assistant Professor,1011079,أستاذ جامعي مساعد,Assistant Professor,Aerospace Engineering,...,Aerospace Engineering,هندسة الفضاء,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,31-Dec-2024
602,KU500358,1541516.0,Deepa Kishor Dumbre,ديبا كيشور دومبر,Research Scientist.3823,Research Scientist,3823,باحث علمي,Research Scientist,Temp Researcher Internal Fund,...,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,30-Jun-2023
638,KU500435,1563750.0,Israr Uddin,إصرار الدين,Post Doctoral Fellow.3929,Post Doctoral Fellow,3929,زمالة ما بعد الدكتوراه,Post Doctoral Fellow,Temp Researcher External Fund,...,Temp Researcher External Fund,الأبحاث من المنح الخارجية,Temp Researcher External Fund,الأبحاث من المنح الخارجية,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,31-Dec-2022
677,KU500493,1515591.0,Brian David Campos,بريان دافيد كامبوس,Research Associate.3796,Research Associate,3796,باحث مشارك,Research Associate,Temp Researcher Internal Fund,...,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,31-Oct-2022
753,KU500582,1556097.0,Pawan Verma,باوان فيرما,Post Doctoral Fellow.4320,Post Doctoral Fellow,4320,زمالة ما بعد الدكتوراه,Post Doctoral Fellow,Temp Researcher External Fund,...,Temp Researcher External Fund,الأبحاث من المنح الخارجية,Temp Researcher External Fund,الأبحاث من المنح الخارجية,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,19-Jun-2022
758,KU500589,1555266.0,Bhivraj Suthar,بهيفراج سوثار,Post Doctoral Fellow.8800119,Post Doctoral Fellow,8800119,زمالة ما بعد الدكتوراه,Post Doctoral Fellow,Temp Researcher Internal Fund,...,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,30-Jun-2023
766,KU500597,1563092.0,Mariyam Khalid,مريم خالد,Post Doctoral Fellow.8800117,Post Doctoral Fellow,8800117,زمالة ما بعد الدكتوراه,Post Doctoral Fellow,Temp Researcher Internal Fund,...,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,30-Jun-2023
770,KU500601,1555283.0,Abdul Latif,عبد الطيف,Research Associate.4203,Research Associate,4203,باحث مشارك,Research Associate,Temp Researcher Internal Fund,...,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Temp Researcher Internal Fund,الأبحات من التمويل الداخلي,Research and Development,قطاع البحوث والتطوير,,KU106,Steven Wesley Griffiths,06-Dec-2022


In [174]:
## write into excel; if required 

## df_employee_EmiratesID.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Duplicate Employeese Emirates ID.xlsx")

In [175]:
df_employee_master['Length'] = df_employee_master['EmiratesID'].str.len()

In [176]:
df_employee_EID = df_employee[(df_employee['Length'] < 15) | (df_employee['Length'] > 15)]

NameError: name 'df_employee' is not defined

In [177]:
df_employee_EID

NameError: name 'df_employee_EID' is not defined

In [178]:
## write into excel; if required

## df_employee_EID.to_excel("C:/Users/prabhjotsingh3/OneDrive - KPMG/Documents/2021 Projects/Khalifa University/Project/Analysis/Invalid Emirates ID.xlsx")

## 4.2 Employee ID

Employee ID are verified to ensure no two employees are under the same employee ID.

For the same, pandas library is utilized to automatically find the duplicated entries within the master data on "Emp No" (Employee ID) and "PersonID" (Old Employee ID).

In [179]:
df_employee_employeeID = df_employee_master[df_employee.duplicated(['EmpNo'])]

NameError: name 'df_employee' is not defined

In [180]:
df_employee_employeeID

NameError: name 'df_employee_employeeID' is not defined

In [181]:
df_employee_personID = df_employee_master[df_employee_master.duplicated(['PersonID'])]

In [182]:
df_employee_personID

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length


## 4.3 Contract Date

The below scripts analyzes the continuance of emplyoee beyond the contract period and incase any employees are active in the system beyond the contracted period.

Employee master contains two columnn namely "Contract End" (end date of contract) and "HireDate" (date of joining / continuance). By analyzing the difference between the columns, all employees beyond contracted period can be analyzed.

In [183]:
df_employee_master['Diff'] = df_employee_master['Contract_End'] - df_employee_master['HireDate'] 

TypeError: cannot subtract DatetimeArray from ndarray

In [184]:
df_employee_master['Diff']

KeyError: 'Diff'

In [185]:
df_employee_contract = df_employee_master[(df_employee_master['Diff'] < '365')]

KeyError: 'Diff'

In [186]:
df_employee_contract

NameError: name 'df_employee_contract' is not defined

## 4.4 Employee Leaves

Employee Leaves are granted on an annual basis to each employee. Below are the exceptions that can be identified for employee who have availed a leave for beyond 60 day period for any leave type, i.e., Annual Leave, Unpaid Leave etc.

In [187]:
df_employee_leave = pd.read_excel("C:/Users/ku1016/Downloads/Employee Leave Data.xlsx")
df_employee_leave.head(5)

FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/ku1016/Downloads/Employee Leave Data.xlsx'

In [188]:
df_employee_leave['Diff'] = df_employee_leave['LeaveEndDate'] - df_employee_leave['LeaveStartDate']
df_employee_leave.head(5)

NameError: name 'df_employee_leave' is not defined

In [189]:
df_employee_leave_exception = df_employee_leave[(df_employee_leave['Diff'] > '60 days')]
df_employee_leave_exception

NameError: name 'df_employee_leave' is not defined

In [190]:
df_employee_leave_exception.to_sql("Employee_Leaves_Exception", conn)

NameError: name 'df_employee_leave_exception' is not defined

## 4.5 Missing Employee Information

All active employees should have complete information as per the HR records and protocol within the master data. The below script analyzes the missing information to ensure completeness of the data.

To identify the missing values, the pandas libraries for isnull() is utilized and specifies any axis (column) to target all missing values in any column.

In [191]:
## df_employee_master = pd.read_excel("C:/Users/ku1016/Downloads/HR Employee Data.xlsx")
## df_employee_master.head(5)

In [192]:
df_employee_master.isnull()

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1259,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,True,False,False,False,False
1260,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,False,False
1261,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,True,False,False,False,False
1262,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,True,False,False,False,False


In [193]:
df_employee_missing = df_employee_master[df_employee_master.isnull().any(axis=1)]
df_employee_missing.head(5)

Unnamed: 0,EmpNo,PersonID,EmployeeName,EmployeeArabicName,Position,PositionName,PositionCode,PositionAR,Job,OrgUnit,...,DepartmentAR,Division,DivisionAR,Sector,SectorAR,College,VPID,VPName,Contract_End,Length
0,100043305,308866.0,Shaima Ahmad Abdulaziz Ahli,شيماء احمد عبدالعزيز اهلى,STAFF.2020208,STAFF,2020208,موظف,Specialist,Off Chart - Marketing and Communication,...,إدارة التسويق والاتصال,Office of the Executive Vice President,مكتب نائب الرئيس التنفيذي,President Office,مكتب الرئيس,,KU100,Arif Sultan Al Hammadi,31-Dec-2021,15
1,100046247,309197.0,Chong Un Pyon,تشونج اون بيون,Adjunct Faculty.1062,Adjunct Faculty,1062,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,31-Dec-2022,15
2,100046248,309466.0,Ayoung Sohn,ايونج سوهن,Adjunct Faculty.1061,Adjunct Faculty,1061,Adjunct Faculty,Adjunct Faculty,Humanities and Social Sciences,...,العلوم الإنسانية والإجتماعية,College of Arts and Science,كلية الاداب و العلوم,Academic Sector,القطاع الأكاديمي,College of Arts and Science,KU103,David Sheehan,31-Dec-2022,15
3,100049466,778886.0,Akihide Hidaka,أكيهايد هيديكا,Adjunct Faculty.878,Adjunct Faculty,878,Adjunct Faculty,Adjunct Faculty,Nuclear Engineering Institute,...,الهندسة النووية,College of Engineering,كلية الهندسة,Academic Sector,القطاع الأكاديمي,College of Engineering,KU218,Hassan Reda Barada,31-May-2022,15
4,100049994,819498.0,Riyazdheen Kaffar,رياضدين كفار,Driver.5042,Driver,5042,سائق,Driver,College of Medicine & Health Sciences,...,كلية الطب,College of Medicine & Health Sciences,كلية الطب,Academic Sector,القطاع الأكاديمي,College of Medicine & Health Sciences,KU104,John Aubrey Rock,,15


In [194]:
df_employee_missing.to_sql("Missing_employee_info". conn)

AttributeError: 'str' object has no attribute 'conn'

## 4.6 Employee bank details and account number

For any bank details, no two employees can share the same bank details and the length of IBAN should remain constant number of digits as per the international IBAN standards (23 digits). 

Below are the scripts to analyze both scenarios.

In [195]:
## df_employee_bank_details = pd.read_excel("C:/users/ku1016/downloads/Employee Bank Details.xlsx")

In [196]:
## df_employee_bank_details.head(5)

In [197]:
df_employee_bank_details_duplicated = df_employee_bank_details[df_employee_bank_details.duplicated(['IBAN'])]

In [198]:
df_employee_bank_details_duplicated.to_sql("Duplicatee_employee_bank", conn)

DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': 'params' arg (<class 'list'>) can be only a tuple or a dictionary.

In [199]:
df_employee_bank_details['Length of IBAN'] = df_employee_bank_details['IBAN'].str.len()
df_employee_bank_details.head(5)

Unnamed: 0,PERSONID,BANKSTARTDATE,BANKENDDATE,BANKPRIORITY,EMPLOYEENUMBER,ORG_PAYMENT_METHOD,BANKID,BANKNAME,BRANCHNAME,IBAN,ACCOUNTNO,CREATEDBY,LASTUPDATEDBY,LASTUPDATEDATE,LASTUPDATEDATE1,Length of IBAN
0,1482073.0,2020-12-15,4712-12-31 00:00:00,,100058647,KUK Bank Transfer - KU,,,,,,188488.0,188488.0,2020-12-15 07:31:13,2022-02-13 23:31:18.187,
1,635927.0,2017-12-31,2017-12-31 00:00:00,,KU500095,KUK Bank Transfer - KU,,,,AE,,0.0,188569.0,2018-03-05 11:08:36,2022-02-13 23:31:18.187,2.0
2,635999.0,2017-12-31,2017-12-31 00:00:00,,KU500183,KUK Bank Transfer - KU,,,,AE,,0.0,188569.0,2018-03-05 11:09:51,2022-02-13 23:31:18.187,2.0
3,636017.0,2017-12-31,2017-12-31 00:00:00,,KU500082,KUK Bank Transfer - KU,,,,AE,,0.0,188569.0,2018-03-05 11:11:43,2022-02-13 23:31:18.187,2.0
4,635983.0,2017-12-31,2017-12-31 00:00:00,,KU500078,KUK Bank Transfer - KU,,,,AE,,0.0,188569.0,2018-03-05 11:12:46,2022-02-13 23:31:18.187,2.0


In [200]:
df_employee_bank_details_length = df_employee_bank_details[(df_employee_bank_details['Length of IBAN']<23)]

In [201]:
df_employee_bank_details_length.to_sql("Incorrect_employee_IBAN", conn)

  sql.to_sql(


DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': 'params' arg (<class 'list'>) can be only a tuple or a dictionary.

# Finance and Procurement

## 5.1 Supplier Master Analysis

The following are analyzed for the supplier master:

- Duplicate supplier codes for the same supplier
- Duplicate bank details for different suppliers
- Duplicate TRN (Tax registration number) for different suppliers

In [202]:
## df_supplier_master = pd.read_excel("C:/users/ku1016/downloads/Supplier Master.xlsx")
## df_supplier_master.head(5)

In [203]:
df_supplier_master_duplicate = df_supplier_master[df_supplier_master.duplicated(['OPERATING_UNIT','SUPPLIER_NO'])]
df_supplier_master_duplicate

KeyError: Index(['OPERATING_UNIT'], dtype='object')

In [204]:
df_supplier_master_duplicate.to_sql("Duplicate_suppliers_same_entity", conn)

NameError: name 'df_supplier_master_duplicate' is not defined

The below output will require analysis to ensure the TRN duplicates are based on different suppliers and not the same supplier. The reason for the analysis is due to the fact Khalifa University registers the same vendor for the same entity twice based on minor tweaks.

As a recommendation, Internal Audit should recommend to establish 1 vendor across Khalifa University or remove the duplicated vendors registered multiple times for the same entity.

In [None]:
df_supplier_master_TRN = df_supplier_master[df_supplier_master.duplicated(['TAX_REGISTRATION_NUM','OPERATING_UNIT','CITY'])]
df_supplier_master_TRN

Unnamed: 0,OPERATING_UNIT,REGISTER_MODE,SUPPLIER_NO,SUP_CREATION_DATE,SUPPLIER_INACTIVE_DATE,SUPPLIER_NAME,SUPPLIER_CATEGORY,TOTAL_APPROVE_PO_CNT,TOT_APPROVED_BPA_AMT,LAST_SUP_PO_REL_DATE,...,ADDRESS_LINE1,ADDRESS_LINE2,ADDRESS_LINE3,ADDRESS_LINE4,POSTAL_CODE,CITY,COUNTRY,SUP_PHONE_NUM,SUP_FAX_NUM,SUP_EMAIL_ADDRESS
43,KUX - External Khalifa University O.U,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
44,KUA - Ankabout Khalifa University,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
45,KUE - Ebitic Khalifa University OU,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
46,KUK - Khalifa University Ledger OU,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,ABU DHABI,,,,73618,ABU DHABI,United Arab Emirates,,,
49,KUJ - Aric Khalifa University O.U,Manual,17,2010-06-01 19:13:00,NaT,AL FUTTAIM ELECTRONICS CO. (ABU DHABI) L.L.C.,VENDOR,0,,NaT,...,MINA ZAYED- ABU DHABI,,,,6885,ABU DHABI,United Arab Emirates,02-6819539,02-6815501,techserveOA.ABD@alfuttaim.ae
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63035,KUADRIC OU,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,
63036,KUGRC OU,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,
63037,KUST Khalifa University of Science and Technol...,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,
63038,KUE - Ebitic Khalifa University OU,Manual,104903,2021-10-10 12:23:11,NaT,https://uae.microless.com/,,0,,NaT,...,ABU DHABI,,,127788,,ABU DHABI,United Arab Emirates,,,


In [None]:
df_supplier_master_TRN.to_sql("TRN_supplier_duplicates", conn)

The bank related details are not part of the master supplier list and has been read from the excel file shared. In case the supplier master contains the bank details, **the code will only be required to be modifie from "df_supplier_bank" to "df_supplier_master"**.

In [None]:
df_supplier_bank = pd.read_excel("C:/users/ku1016/downloads/Supplier bank detail.xlsx")
df_supplier_bank.head(5)

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
0,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,100317020003,ABU DHABI COMMERCIAL BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,AE230030000100317020003,ABU DHABI COMMERCIAL BANK,DUBAI RIQQA,United Arab Emirates,NaT
1,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,1005612298,ADCB,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ABU DHABI COMMERCIAL BANK,DUBAI RIQQA,United Arab Emirates,2011-11-15
2,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3001-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,DEIRA/DUBAI,United Arab Emirates,2013-03-21
3,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3002-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,AL AIN,United Arab Emirates,2013-03-21
4,100006,GILSON COMPANY INC.,Manual,VENDOR,,,310961077,United States,سهيل احمد خان غلام سروار خان,2020-11-16,...,NaT,KUST Khalifa University of Science and Technol...,1306992777,CNB BANK,,,CNB BANK,CLEARFIELD,United States,NaT


The output will require analysis to ensure the IBANs are for different suppliers, since at Khalifa University **multiple active accounts are valid for same supplier with same bank details, same bank account number and same address**.

In [None]:
df_supplier_bank_duplicate = df_supplier_bank[(df_supplier_bank.duplicated(["Iban", "Bank Account"])) & (df_supplier_bank['Iban'].notna())]
df_supplier_bank_duplicate

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
500,103518,KHADJETOU JED,Manual,BENIFICIARY,,,INDIVIDUAL -103518,United Arab Emirates,نجود غانم سالم سعيد الصيعري,2021-08-01,...,NaT,KUST Khalifa University of Science and Technol...,3707596970002,HSBC,KHADJETOU JED,AE360340003707596970002,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,2021-09-08
1012,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,012063581391,UNB,,AE510450000012063581391,UNION NATIONAL BANK,ABU DHABI SALAM BRANCH,United Arab Emirates,NaT
1013,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,1051001004868027,Al Dhafra Secondry Private School,,AE080271051001004868027,FIRST GULF BANK,AL AIN KHALIFA ST,United Arab Emirates,NaT
1014,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,1051001004868027,FAB,,AE020351051001004868027,FIRST ABU DHABI BANK,ABU DHABI KHALIFA ST,United Arab Emirates,NaT
1015,12155,AL DHAFRA PRIVATE SCHOOL,Manual,SCHOOL,CN-1003266,2023-11-09 00:00:00,100310305600003-4,United Arab Emirates,,2010-06-23,...,2021-01-18 12:58:38,KUST Khalifa University of Science and Technol...,4021003307897811,FAB,,AE450354021003307897811,FIRST ABU DHABI BANK,ABU DHABI KHALIFA ST,United Arab Emirates,2020-05-28
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15859,99020,Mohamed Ibrahim Hassan Ali,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-08-31,...,NaT,KUST Khalifa University of Science and Technol...,24392765,ADIB,,AE050500000000024392765,ABU DHABI ISLAMIC BANK,ABU DHABI BANIYAS ST,United Arab Emirates,NaT
15860,99020,Mohamed Ibrahim Hassan Ali,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-08-31,...,NaT,KUST Khalifa University of Science and Technol...,24392765,ADIB,,AE050500000000024392765,ABU DHABI ISLAMIC BANK,ABU DHABI BANIYAS ST,United Arab Emirates,NaT
15864,9906,EDUTECH MIDDLE EAST (L.L.C.),Manual,VENDOR,225130,2018-06-05 00:00:00,225130-DUBAI,United Arab Emirates,,2010-06-23,...,2017-06-05 11:44:21,KUST Khalifa University of Science and Technol...,258955319001,ADCB,EDUTECH MIDDLE EAST (L.L.C.),AE800030000258955319001,ABU DHABI COMMERCIAL BANK,DUBAI AL MEENA ROAD,United Arab Emirates,NaT
15896,99407,EMILIO PORCU,Manual,BENIFICIARY,,,INDIAVIDUAL- 99407,United Arab Emirates,نجود غانم سالم سعيد الصيعري,2020-10-01,...,NaT,KUST Khalifa University of Science and Technol...,12231395001,HSBC,EMILIO PORCU,AE120200000012231395001,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,NaT


In [None]:
df_supplier_duplicate.to_sql("IBAN_supplier", conn)

## 5.2 Missing information for suppliers

All active vendors at Khalifa University are required to have valid IBAN, address and TRN numbers at a minimum. Below are the analysis:

In [None]:
df_supplier_missing_iban = df_supplier_bank[df_supplier_bank['Iban'].isnull()]
df_supplier_missing_iban

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
1,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,1005612298,ADCB,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ABU DHABI COMMERCIAL BANK,DUBAI RIQQA,United Arab Emirates,2011-11-15
2,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3001-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,DEIRA/DUBAI,United Arab Emirates,2013-03-21
3,1,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,Manual,VENDOR,CN-1003984,2020-10-01 00:00:00,100072684200003-10,United Arab Emirates,,2010-06-01,...,2020-01-20 07:54:50,KUST Khalifa University of Science and Technol...,3002-043002-311,ARAB BANK,AL FUTTAIM MOTOR COMPANY L.L.C. - ABU DHABI,,ARAB BANK PLC,AL AIN,United Arab Emirates,2013-03-21
4,100006,GILSON COMPANY INC.,Manual,VENDOR,,,310961077,United States,سهيل احمد خان غلام سروار خان,2020-11-16,...,NaT,KUST Khalifa University of Science and Technol...,1306992777,CNB BANK,,,CNB BANK,CLEARFIELD,United States,NaT
8,100023,SYSTEMS TECHNOLOGY INC,Manual,,2657,,95-1957989,United States,مجيد حسين طلحه محمد,2020-11-18,...,2020-11-18 10:04:17,KUST Khalifa University of Science and Technol...,546343200,PACIFIC WESTERN BANK,,,PACIFIC WESTERN BANK,1025 W 190TH STREET,United States,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15954,99893,HUAIYIN INSTITUTE OF TECHNOLOGY,Manual,,,,NO VAT - 99893,China,اياراكات عبدالمجيد,2020-11-05,...,NaT,KUST Khalifa University of Science and Technol...,536568825651,BANK OF CHINA,HUAIYIN INSTITUTE OF TECHNOLOGY,,BANK OF CHINA,JIANGSU BRANCH,China,NaT
15960,99899,THE RECTOR AND VISITORS OF THE UNIVERSITY OF V...,Manual,,,,NO VAT - 99899,United States,اياراكات عبدالمجيد,2020-11-05,...,NaT,KUST Khalifa University of Science and Technol...,004117975749,BANK OF AMERICA,THE RECTOR AND VISITORS OF THE UNIVERSITY OF V...,,BANK OF AMERICA,NEW YORK,United States,NaT
15962,99912,DUKE UNIVERSITY,Manual,,,,NO VAT -99912,United States,اياراكات عبدالمجيد,2020-11-08,...,NaT,KUST Khalifa University of Science and Technol...,2000048265067,"WELLS FARGO BANK, N.A,",DUKE UNIVERSITY,,"WELLS FARGO BANK, N.A,",301 S TRYON ST,United States,NaT
15970,99955,GOLDEN KEY INTERNATIONAL HONOUR SOCIETY,Manual,,,,NO VAT -99955,United States,اياراكات عبدالمجيد,2020-11-11,...,NaT,KUST Khalifa University of Science and Technol...,000000193666,BANK OF AMERICA,GOLDEN KEY INTERNATIONAL HONOUR SOCIETY,,BANK OF AMERICA,600 PEACHTREE ST. NE,United States,NaT


In [None]:
df_supplier_missing_iban.to_sql("Missing_supplier_Iban", conn)

In [None]:
df_supplier_missing_trn = df_supplier_bank[df_supplier_bank['Tax Registration No.'].isnull()]
df_supplier_missing_trn

Unnamed: 0,Supplier No.,Supplier Name,Registered Method,Vendor Type,Commercial Licence No,Commercial Licence Expiry Date,Tax Registration No.,Country,Supplier Created By,Supplier Creation Date,...,Business Class Last Update Date,Organization Name,Bank Account,Bank Account Name,Bank Acc Title,Iban,Bank Name,Branch Name,Country.1,End Date
9,100027,AADEL HASSAN MOHAMED MOHAMED ALHMOUDI,Manual,,,,,United Arab Emirates,نجود غانم سالم سعيد الصيعري,2020-11-18,...,NaT,KUST Khalifa University of Science and Technol...,28230787,ADIB,AADEL HASSAN MOHAMED MOHAMED ALHMOUDI,AE630500000000028230787,ABU DHABI ISLAMIC BANK,ABU DHABI BANIYAS ST,United Arab Emirates,NaT
30,100099,TRIDENT SUPPORT FLAG POLES L.L.C,Manual,,,,,United Arab Emirates,عبدالله راشد مبارك فهاد الهاجري,2020-11-23,...,NaT,KUST Khalifa University of Science and Technol...,1014833627701,ENBD,,AE250260001014833627701,EMIRATES NBD,DUBAI MALL BRANCH,United Arab Emirates,NaT
62,100315,Mutasem El Fadel,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-12-09,...,NaT,KUST Khalifa University of Science and Technol...,012245361001,HSBC,,AE800200000012245361001,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,NaT
64,100331,Ismail Aejaz Baig,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-12-10,...,NaT,KUST Khalifa University of Science and Technol...,11878472920001,ADCB,,AE400030011878472920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
82,100540,Nnamdi Valbocso Ugwuoke,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-12-23,...,NaT,KUST Khalifa University of Science and Technol...,11807768920001,ADCB,,AE750030011807768920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15904,99490,Aamir Younis Raja,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-07,...,NaT,KUST Khalifa University of Science and Technol...,012233219001,HSBC,,AE760200000012233219001,HSBC BANK MIDDLE EAST,ABU DHABI RASHID MAKTOUM ST,United Arab Emirates,NaT
15933,99688,Daniel Johannes Van Tonder,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-22,...,NaT,KUST Khalifa University of Science and Technol...,11853072920001,ADCB,,AE190030011853072920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
15934,99689,Thripti Vijayakumar,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-22,...,NaT,KUST Khalifa University of Science and Technol...,11852376920001,ADCB,,AE930030011852376920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT
15935,99690,Partha Guha,Manual,EMPLOYEE,,,,,اياراكات عبدالمجيد,2020-10-22,...,NaT,KUST Khalifa University of Science and Technol...,11855091920001,ADCB,,AE460030011855091920001,ABU DHABI COMMERCIAL BANK,ABU DHABI MAIN,United Arab Emirates,NaT


In [None]:
df_supplier_missing_trn.to_sql("Missing_supplier_trn", conn)

## 5.3 AP Master

The below scripts help to analyze the employee payments as vendor payments, duplicate invoices / payments issued or multiple payments issued to the same vendor in a short duration of time.

In [None]:
## df_AP_invoices = pd.read_excel("C:/users/ku1016/downloads/AP Master.xlsx")
## df_AP_invoices.head(5)

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE
0,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-05-29
1,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
2,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
3,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-08-27
4,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27


In Khalifa University, employee payments are rendered as vendor payments. As per industry best practice, all employee related payments should be routed through HR payroll to ensure adequate controls are-in-place.

Below are the details:

In [None]:
df_AP_employees = df_AP_invoices[(df_AP_invoices['VENDOR_TYPE']=='EMPLOYEE')]
df_AP_employees

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE
10,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-07-21
11,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,ITEM,,,,,Yes,No,2019-07-21
12,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-07-21
13,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-07-21
14,KUST Ledger,84369,Khaled Ebrahim Al Ali,EMPLOYEE,PC21072019,8471027995,APPROVED,STANDARD,Petty cash for Facilities- Vehicle Maint,2019-08-20,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-07-21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
487983,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,25.0,0.0,No,No,2021-06-16
487984,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,ITEM,,,,,No,No,2021-06-16
487985,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,25.0,0.0,No,No,2021-06-16
487986,KUST Ledger,84005,Mohammad Alsuwaidi,EMPLOYEE,2021-06-16 00:00:00,8471057066,CANCELLED,STANDARD,Expense claim,2021-06-16,...,REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,25.0,0.0,No,No,2021-06-16


In [None]:
df_AP_employee.to_sql("Employee_Payments_as_Vendor", conn)

In [None]:
df_AP_duplicate = df_AP_invoices[df_AP_invoices.duplicated()]
df_AP_duplicate

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE


In [None]:
df_AP_duplicate = df_AP_invoices[(df_AP_invoices.duplicated(['VENDOR_NO','INVOICE_NUM','INVOICE_DESCRIPTION'])) & (df_AP_invoices['INVOICE_APPROVAL_STATUS'] == "APPROVED")]
df_AP_duplicate

Unnamed: 0,LEDGER_NAME,VENDOR_NO,VENDOR_NAME,VENDOR_TYPE,INVOICE_NUM,VOUCHER_NUM,INVOICE_APPROVAL_STATUS,INVOICE_TYPE,INVOICE_DESCRIPTION,CREATION_DATE,...,WFAPPROVAL_STATUS,ACCOUNTED,LINE_TYPE,TAX_RATE_NAME,TAX_RATE,RECOVERABLE_TAX_AMOUNT,NON_RECOVERABLE_TAX_AMOUNT,PAYMENT_STATUS_FLAG,ENCUMBERED_FLAG,DUE_DATE
1,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
2,KUST Ledger,64998,MINISTRY OF HIGHER EDUCATION & SCIENTIFIC RESE...,VENDOR,29052019,8471027936,APPROVED,STANDARD,Accreditation charges- Doctor of Med Program (...,2019-08-18,...,NOT REQUIRED,Yes,TAX,NOT APPLICABLE,0.0,0.0,0.0,Yes,No,2019-05-29
4,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27
5,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27
6,KUST Ledger,82641,VIKAS MITTAL,BENIFICIARY,PR20193499,8471028118,APPROVED,STANDARD,RELOCATION VIKAS MITTAL,2019-08-27,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,0.0,0.0,Yes,No,2019-08-27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
487974,KUST Ledger,98330,ETAP AUTOMATION DMCC,VENDOR,18042021,8471056358,APPROVED,STANDARD,PMR20212184-Renewal: ETAP - Educational Licens...,2021-05-16,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,642.7,0.0,Yes,No,2021-04-18
487977,KUX - Ledger,58199,ALLIANCE GLOBAL FZ - LLC,VENDOR,202106165,843X007976,APPROVED,STANDARD,,2021-06-08,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,2250.0,0.0,Yes,No,2021-05-25
487978,KUX - Ledger,58199,ALLIANCE GLOBAL FZ - LLC,VENDOR,202106165,843X007976,APPROVED,STANDARD,,2021-06-08,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,2250.0,0.0,Yes,No,2021-05-25
487980,KUX - Ledger,58199,ALLIANCE GLOBAL FZ - LLC,VENDOR,202105753,843X007977,APPROVED,STANDARD,,2021-06-08,...,NOT REQUIRED,Yes,TAX,VAT INPUT STD - REC,5.0,1125.0,0.0,Yes,No,2021-03-10


In [None]:
df_AP_duplicate.to_sql("SameInvoiceNum_SameInvoiceDescription_SameVendor_Approved_Invoice", conn)

## 5.4 Purchase Orders

The below scripts enables IA Department to view all purchase orders created after the approval date of the purchase orders. Two columns utilized are "CREATION_DATE" (date of creation) and "PO_APPROVED_DATE" (date of approval).

In [None]:
## df_POs = pd.read_excel("C:/users/ku1016/downloads/PO Master.xlsx")
## df_POs.head(5)

Unnamed: 0,PO_NUMBER,AUTHORIZATION_STATUS,PO_TYPE,ITEM_CATEGORY,ITEM_CATEGORY_DESCRIPTION,ITEM_CODE,ITEM_DESCRIPTION,GL_ENCUMBERED_DATE,CREATION_DATE,ENCUMBERED_AMOUNT,...,VENDOR_SITE,CURRENCY_CODE,AMOUNT_ORDERED,QUANTITY_DELIVERED,AMOUNT_DELIVERED,QUANTITY_BILLED,AMOUNT_BILLED,QUANTITY_CANCELLED,AMOUNT_CANCELLED,VENDOR_COUNTRY
0,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,470.0,...,DUBAI,AED,470.0,1.0,470.0,1.0,470.0,0.0,0.0,United Arab Emirates
1,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,594.0,...,DUBAI,AED,594.0,2.0,594.0,2.0,594.0,0.0,0.0,United Arab Emirates
2,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,777.0,...,DUBAI,AED,777.0,5.0,777.0,5.0,777.0,0.0,0.0,United Arab Emirates
3,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,2201.5,...,DUBAI,AED,2201.5,5.0,2201.5,5.0,2201.5,0.0,0.0,United Arab Emirates
4,8431200017,APPROVED,STANDARD,23.01,University Books,-,,2017-03-07 15:00:02,2017-03-07 00:00:00,2543.0,...,DUBAI,AED,2543.0,1.0,2543.0,1.0,2543.0,0.0,0.0,United Arab Emirates


In [None]:
df_POs['Approval'] = df_POs['PO_APPROVED_DATE']-df_POs['CREATION_DATE']

In [None]:
df_POs.to_sql('Creation_After_Approval', conn)