This program was written to organize and aggregate company expenses in any given timeperiod. The program reads in the company's transascation log and allows the user to enter a specific department number and date range to see the aggregated expenses. The final output of the program groups the total expenses by personnel, non-personnel, administrative costs and client wages. 

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("transactions_july.csv") #Read in the transaction register. 
df.head()

Unnamed: 0,3 Digit Exp,Full GL Coding,Description,Date,Jrnl No.,Orig. Audit Trail,Distribution Reference,Orig. Master Number,Orig. Master Name,Debit,Credit,Net,DEPT-LOC
0,377,01-377-5692-100,Department of Rehabilitation - Monterey County,07/29/21,185599,RMSLS00002419,WDS21065-01A DOR,WDS21065-01A,State of California,,2016.0,"(2,016.00)",5692-100
1,377,01-377-5692-500,Department of Rehabilitation - SLO County,07/31/21,185600,RMSLS00002419,WDS21075-01A,WDS21075-01A,State of California,,2245.63,"(2,245.63)",5692-500
2,385,01-385-5696-000,W.I.A. Revenue - Santa Cruz County,07/01/21,184848,GLREV00021384,REV ACCRUAL FOR 403B TRUE UP,,,736.7,,736.70,5696-000
3,385,01-385-5696-000,W.I.A. Revenue - Santa Cruz County,07/31/21,185598,GLTRX00021491,SC AJCC - XXX,,,,44996.54,"(44,996.54)",5696-000
4,385,01-385-5696-009,W.I.A. Revenue - OJT SC County,07/01/21,185147,RMSLS00002410,WDS21016-01C DM OJT Patt,WDS21016-01C DM,Santa Cruz County,,30.0,(30.00),5696-009


Clean & Organize Data

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 13 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   3 Digit Exp             1025 non-null   int64 
 1   Full GL Coding          1025 non-null   object
 2   Description             1025 non-null   object
 3   Date                    1025 non-null   object
 4   Jrnl No.                1025 non-null   object
 5   Orig. Audit Trail       1025 non-null   object
 6   Distribution Reference  1025 non-null   object
 7   Orig. Master Number     128 non-null    object
 8   Orig. Master Name       128 non-null    object
 9    Debit                  795 non-null    object
 10   Credit                 230 non-null    object
 11   Net                    1025 non-null   object
 12  DEPT-LOC                1025 non-null   object
dtypes: int64(1), object(12)
memory usage: 104.2+ KB


In [4]:
df.dropna(subset=[" Net "], inplace = True) #Drop nulls in df[Net] field

In [5]:
df.rename(columns={" Net ": "Net"}, inplace=True) #eliminated spaces in "Net" field title

In [6]:
df["Net"] = df["Net"].str.replace(")","") # remove ) from end of number

  df["Net"] = df["Net"].str.replace(")","") #remove ) from end of number


In [7]:
df["Net"] = df["Net"].str.replace("(","-") #replaced ( with minus sign to indicate negative number

  df["Net"] = df["Net"].str.replace("(","-") #replace ( with minus sign to indicate negative number


In [8]:
df["Net"] = df["Net"].str.replace(",","") #remove comma from numbers  

In [9]:
df["Net"] = df["Net"].str.replace(" - ","0") #replace "-" with zeros

In [10]:
df["Net"] = pd.to_numeric(df["Net"]) #converted "Net" into numeric datatype  

In [11]:
df["Date"].str.strip() #elimanated posssible extra spaces in date field

0       07/29/21
1       07/31/21
2       07/01/21
3       07/31/21
4       07/01/21
          ...   
1020    07/31/21
1021    07/31/21
1022    07/01/21
1023    07/01/21
1024    07/31/21
Name: Date, Length: 1025, dtype: object

In [12]:
df.dropna(subset = ["Full GL Coding"], inplace = True) #removing all nulls in "Full GL Coding" field

In [13]:
df["Date"] = pd.to_datetime(df["Date"]) #converted "Date" field into datetime datatype

Filtering and Organizing Data 

In [14]:
start_date = pd.to_datetime(input("Enter Start Date: ")) #The start date of the range that will be analyzed
end_date = pd.to_datetime(input("Enter End Date: ")) #The end date of the range that will be analyzed
dept_code = input("Enter Department Code: ") #The company department number

Enter Start Date: 07/01/2021
Enter End Date: 07/31/2021
Enter Department Code: 5657-100


In [15]:
df_filtered = df[df["Date"].between(start_date, end_date)] #Filtered rows for records in specified date range.
df_filtered = df_filtered[df_filtered["Full GL Coding"].str.contains(dept_code)] #Filtered for rows with specified department number

df_filtered = df_filtered[["3 Digit Exp", "Net"]] #Select expense codes and Expense Amount fields
df_final = df_filtered["Net"].groupby(df_filtered['3 Digit Exp']).sum() #Aggregated sum of expenses for each expense code
#df_final is now a series with expense codes acting as index and total expenses for each

Sum total expenses by client wages, personnel(S&B), non-personnel, and administrative  

In [16]:
if 711 in df_final.index: #checking if 711 (code for client wages) is present
    print ("Client Wages(WC & FICA): ", df_final.loc[711:721].sum()*1.1682) #Prints sum of total client wages together and labor burden of 16.82%
else: 
    print ("No Client Wages")

#Below sums all expense codes associated with salaries and benefits and eliminates client expenses
print ("S&B: ", (df_final.loc[722:758].sum()) - ((df_final.loc[711:721].sum()*1.1682)-(df_final.loc[711:721].sum())))

print ("Non-Personnel: ", df_final.loc[760:912].sum()) #Prints the sum all non-personnel expenses
print ("Admin: ", df_final.loc[949:951].sum()) #Prints the sum all administrative expenses

#Provides total Revenue, Rental Income and Total Income if the department code of 5672-000 is being analyzed
if dept_code == "5672-000":
    print("JSW Contract Revenue: ", df_final.loc[450], "Rental Income: ", df_final.loc[589], "Total Income: ", df_final.loc[450] + df_final.loc[589])
    

Client Wages(WC & FICA):  28096.565112
S&B:  41779.024888
Non-Personnel:  5759.3099999999995
Admin:  4966.25
