## Analytic 12 Code

#### OPIM5770 | Fall 2018 | Team 4

###### This notebook contains code to generate file necessary for input to analytic 12. Designed by Team 4.

In [1]:
# Import required modules
import pandas as pd
import numpy as np
import csv
import os

In [3]:
# Load the RSEG_RBKP file
RSEG_RBKP_DF = pd.read_csv(r'./../../src/RSEG_RBKP.csv'
                         , usecols=[ 
                             'Company_Code'
                             , 'Purchasing_Document_Number'
                             , 'Item_Number_of_Purchasing_Document'
                             , 'Amount_in_Document_Currency'
                             , 'Document_Number_of_an_Invoice_Document'
                             , 'Document_Item_in_Invoice_Document'
                             , 'Quantity'
                             ]
                         , dtype={
                                'Company_Code':str
                                , 'Purchasing_Document_Number':str
                                , 'Item_Number_of_Purchasing_Document':str
                                , 'Amount_in_Document_Currency':float
                                , 'Accounting_Document_Number':str
                                , 'Document_Number_of_an_Invoice_Document':str
                                , 'Document_Item_in_Invoice_Document':str
                                , 'Quantity':float
                               }
                         , low_memory=False
                        )



RSEG_RBKP_DF.rename(columns=
                      {
                          'Company_Code':'COMPANY_CODE'
                          , 'Purchasing_Document_Number':'PO_NUMBER'
                          , 'Item_Number_of_Purchasing_Document':'PO_LINE_NUMBER'
                          , 'Amount_in_Document_Currency':'INVOICE_AMOUNT'
                          , 'Document_Number_of_an_Invoice_Document':'INVOICE_NUMBER'
                          , 'Document_Item_in_Invoice_Document':'INVOICE_LINE_NUMBER'
                          , 'Quantity':'QUANTITY'
                      },inplace=True) 

In [4]:
# Need to perform operations so that this table can be joined to EKPO_EKKO
RSEG_RBKP_DF['PO_NUMBER'] = RSEG_RBKP_DF['PO_NUMBER'].apply(lambda x: x.zfill(10))
RSEG_RBKP_DF['PO_LINE_NUMBER'] = RSEG_RBKP_DF['PO_LINE_NUMBER'].apply(lambda x: x.zfill(5))

In [5]:
# Need to filter out values that are causing duplicates (i.e., cancelled)
RSEG_RBKP_DF = RSEG_RBKP_DF[RSEG_RBKP_DF['QUANTITY']>0]

In [6]:
# Many invoices cover the aggregate of line items on a purchase order, need to aggregate to avoid misleading results
RSEG_RBKP_DF = RSEG_RBKP_DF.groupby(['COMPANY_CODE','PO_NUMBER','PO_LINE_NUMBER','INVOICE_NUMBER','INVOICE_LINE_NUMBER'], as_index=False)['INVOICE_AMOUNT'].sum()

# Example for Unit Testing: PO Number 0000064583 and PO Line Number 00060

In [12]:
# Load the EKPO_EKKO file
parse_dates = [ 'Purchasing_Document_Date']
EKPO_EKKO_DF = pd.read_csv(r'./../../src/EKPO_EKKO.csv'
                        , sep="|"
                        , quotechar="'"
                        , low_memory=False
                        , encoding='latin1'
                        , usecols=['Purchasing_Document_Number',# Purchase Order Number
                                   'Item_Number_of_Purchasing_Document',# Purchase Order Line Number
                                   'Purchasing_Document_Date', # Purchase Order Date
                                   'Net_Order_Value_in_PO_Currency',#Purchase Order Amount
                                   'Vendor_Account_Number',
                                   'Purchasing_Document_Date',
                                 ],
                         dtype={'Purchasing_Document_Number':str,
                                'Item_Number_of_Purchasing_Document':str,
                                'Purchasing_Document_Date':str,
                                'Net_Order_Value_in_PO_Currency':str,
                                'Vendor_Account_Number':str,
                                'Purchasing_Document_Date':str
                               },
                          parse_dates=parse_dates)

EKPO_EKKO_DF.rename(columns=
                    { 'Purchasing_Document_Number':'PO_NUMBER',
                      'Item_Number_of_Purchasing_Document':'PO_LINE_NUMBER',
                      'Purchasing_Document_Date':'PO_CREATE_DATE',
                      'Net_Order_Value_in_PO_Currency':'PO_AMOUNT',
                      'Vendor_Account_Number':'VENDOR_ID',
                      'Purchasing_Document_Date':'PO_DATE'
                    },inplace=True)

In [16]:
joinDF.head()

Unnamed: 0,COMPANY_CODE,PO_NUMBER,PO_LINE_NUMBER,INVOICE_NUMBER,INVOICE_LINE_NUMBER,INVOICE_AMOUNT,PO_AMOUNT,VENDOR_ID,PO_DATE
0,1001,64272,10,5190004049,1,1.66,1.66,20008131,2018-01-02
1,1001,64273,10,5190002485,1,31.94,31.94,20008131,2018-01-02
2,1001,64291,10,5190006464,1,40.0,40.04,20008131,2018-01-04
3,1001,64292,10,5190005420,1,10.36,10.36,20008131,2018-01-05
4,1001,64293,10,5190006379,1,36.96,36.96,20008131,2018-01-09


In [13]:
# We need to remove 'X' values in the amount field
EKPO_EKKO_DF = EKPO_EKKO_DF[EKPO_EKKO_DF['PO_AMOUNT']!='']
EKPO_EKKO_DF = EKPO_EKKO_DF[EKPO_EKKO_DF['PO_AMOUNT']!='X']
EKPO_EKKO_DF.PO_AMOUNT = EKPO_EKKO_DF.PO_AMOUNT.astype(float).fillna(0.0)

In [14]:
# PERFORM THE JOIN OPERATION
joinDF = pd.merge( left = RSEG_RBKP_DF,
                   right = EKPO_EKKO_DF,
                   left_on = ['PO_NUMBER','PO_LINE_NUMBER'],
                   right_on = ['PO_NUMBER','PO_LINE_NUMBER'],
                   how='inner')

In [15]:
# WRITE OUT THE RESULTS TO FILE
joinDF.to_csv(r'./../output/A12_Base.csv', index=False)

### Data Quality Acknowledgement

##### As of 10/31/18, we have asked two questions regarding the proper method to join purchase order information on EKPO_EKKO to BSAK_BKPF. We have tried to use the EKBE table as recommended in SAP documentation, focusing on fields that are common between the two tables. However, purchase order and purchase line number are not key fields and are not unique. Additionally, the reference document number on EKBE only joins to BSAK_BKPF in <30% of cases. The result is a very misleading table. 

##### As an approximate to demonstrate some value from the analytic, we leverage here the incoming invoice information on RSEG_RBKP as discussed during our check points early in the semester. If information regarding join fields for the PO-Cleared Invoice connection are answered, we will write the corresponding code and update this analytic with the correct code base.

##### For reference, Purchasing Document Number 63492, Line Number 1280 contains two entries on EKBE with different PO amounts and no reference document. Another example would be 64019 Line Number 30. We believe this data is correct; however, we believe we may be lacking key fields to reduce duplicate PO/Line combinations, resulting in very misleading values.