## Expense Claim Patterns and Fraud Analysis (Flag 90)

### Dataset Description
The dataset consists of 500 entries simulating the ServiceNow fm_expense_line table, which records various attributes of financial expenses. Key fields include 'number', 'opened_at', 'amount', 'state', 'short_description', 'ci', 'user', 'department', 'category', 'processed_date', 'source_id', and 'type'. This table documents the flow of financial transactions by detailing the amount, departmental allocation, and the nature of each expense. It provides a comprehensive view of organizational expenditures across different categories, highlighting both the timing and the approval state of each financial entry. Additionally, the dataset offers insights into the efficiency of expense processing based on different states, revealing potential areas for workflow optimization.

### Your Task
**Goal**: To detect and investigate instances of repeated identical expense claims by individual users, determining whether these repetitions are fraudulent or due to misunderstandings of the expense policy.

**Role**: Compliance and Audit Analyst

**Difficulty**: 3 out of 5.

**Category**: Finance Management


### Import Necessary Libraries
This cell imports all necessary libraries required for the analysis. This includes libraries for data manipulation, data visualization, and any specific utilities needed for the tasks. 

In [1]:
import argparse
import pandas as pd
import json
import requests
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from openai import OpenAI
from pandas import date_range



### Load Dataset
This cell loads the expense dataset to be analyzed. The data is orginally saved in the from a CSV file, and is here imported into a DataFrame. The steps involve specifying the path to the dataset, using pandas to read the file, and confirming its successful load by inspecting the first few table entries.

In [2]:
dataset_path = "csvs/flag-90.csv"
flag_data = pd.read_csv(dataset_path)
df = pd.read_csv(dataset_path)
flag_data.head()

Unnamed: 0,category,state,closed_at,opened_at,closed_by,number,sys_updated_by,location,assigned_to,caller_id,sys_updated_on,short_description,priority,assignement_group
0,Database,Closed,2023-07-25 03:32:18.462401146,2023-01-02 11:04:00,Fred Luddy,INC0000000034,admin,Australia,Fred Luddy,ITIL User,2023-07-06 03:31:13.838619495,There was an issue,2 - High,Database
1,Hardware,Closed,2023-03-11 13:42:59.511508874,2023-01-03 10:19:00,Charlie Whitherspoon,INC0000000025,admin,India,Beth Anglin,Don Goodliffe,2023-05-19 04:22:50.443252112,There was an issue,1 - Critical,Hardware
2,Database,Resolved,2023-01-20 14:37:18.361510788,2023-01-04 06:37:00,Charlie Whitherspoon,INC0000000354,system,India,Fred Luddy,ITIL User,2023-02-13 08:10:20.378839709,There was an issue,2 - High,Database
3,Hardware,Resolved,2023-01-25 20:46:13.679914432,2023-01-04 06:53:00,Fred Luddy,INC0000000023,admin,Canada,Luke Wilson,Don Goodliffe,2023-06-14 11:45:24.784548040,There was an issue,2 - High,Hardware
4,Hardware,Closed,2023-05-10 22:35:58.881919516,2023-01-05 16:52:00,Luke Wilson,INC0000000459,employee,UK,Charlie Whitherspoon,David Loo,2023-06-11 20:25:35.094482408,There was an issue,2 - High,Hardware


### **Question 1: What are the total expenses by department?**

This analysis will help identify which departments are incurring the most significant expenses. By summing up the expenses for each department, we can gain insights into how financial resources are allocated across the organization.

In [3]:
# import matplotlib.pyplot as plt

# # Group by department and sum the amount
# department_expenses = df.groupby('department')['amount'].sum().sort_values(ascending=False)

# # Plotting
# plt.figure(figsize=(10, 6))
# department_expenses.plot(kind='bar', color='skyblue')
# plt.title('Total Expenses by Department')
# plt.xlabel('Department')
# plt.ylabel('Total Expenses ($)')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()
print("N/A")

N/A


#### Generate JSON Description for the Insight

In [1]:
{
    "data_type": "comparative",
    "insight": "Analysis could not be performed due to missing 'department' column in the dataset",
    "insight_value": {},
    "plot": {
        "description": "Bar chart could not be generated due to KeyError indicating missing 'department' column"
    },
    "question": "What are the total expenses by department?",
    "actionable_insight": "No actionable insight could be generated due to missing 'department' column in the dataset"
}

{'data_type': 'comparative',
 'insight': "Analysis could not be performed due to missing 'department' column in the dataset",
 'insight_value': {},
 'plot': {'description': "Bar chart could not be generated due to KeyError indicating missing 'department' column"},
 'question': 'What are the total expenses by department?',
 'actionable_insight': "No actionable insight could be generated due to missing 'department' column in the dataset"}

### **Question 2:** What are the average expenses per user within each department?

This analysis will reveal the average expense per user within each department. This insight helps to understand individual spending behavior and whether there are significant discrepancies across departments.

In [5]:
# # Group by department and user, then calculate the average amount
# average_expense_per_user = df.groupby(['department', 'user'])['amount'].mean().groupby('department').mean().sort_values(ascending=False)

# # Plotting
# plt.figure(figsize=(10, 6))
# average_expense_per_user.plot(kind='bar', color='lightgreen')
# plt.title('Average Expense per User by Department')
# plt.xlabel('Department')
# plt.ylabel('Average Expense per User ($)')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()
print("N/A")

N/A


In [2]:
{
    "data_type": "comparative",
    "insight": "Analysis could not be completed due to a KeyError indicating that the 'department' column is missing from the dataset",
    "insight_value": {},
    "plot": {
        "description": "A bar chart was attempted to show average expenses per user across departments, but failed due to missing department column"
    },
    "question": "What are the average expenses per user within each department?",
    "actionable_insight": "No actionable insight could be generated due to missing 'department' column in the dataset"
}

{'data_type': 'comparative',
 'insight': "Analysis could not be completed due to a KeyError indicating that the 'department' column is missing from the dataset",
 'insight_value': {},
 'plot': {'description': 'A bar chart was attempted to show average expenses per user across departments, but failed due to missing department column'},
 'question': 'What are the average expenses per user within each department?',
 'actionable_insight': "No actionable insight could be generated due to missing 'department' column in the dataset"}

### **Question 3:What are the total expenses by category?**


Understanding the distribution of expenses across different categories can help identify areas where the company is spending the most and potentially optimize costs.



In [7]:
# import matplotlib.pyplot as plt

# # Group by category and sum the amount
# total_expenses_by_category = df.groupby('category')['amount'].sum().sort_values(ascending=False)

# # Plotting
# plt.figure(figsize=(10, 6))
# total_expenses_by_category.plot(kind='bar', color='mediumseagreen')
# plt.title('Total Expenses by Category')
# plt.xlabel('Category')
# plt.ylabel('Total Expenses ($)')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()
print("N/A")

N/A


#### Generate JSON Description for the Insight

In [3]:
{
    "data_type": "categorical",
    "insight": "Analysis could not be completed due to missing 'amount' column in the dataset",
    "insight_value": {},
    "plot": {
        "description": "A bar chart was attempted but failed due to missing 'amount' column in the dataset"
    },
    "question": "What are the total expenses by category?",
    "actionable_insight": "No actionable insight could be generated due to missing 'amount' column in the dataset"
}

{'data_type': 'categorical',
 'insight': "Analysis could not be completed due to missing 'amount' column in the dataset",
 'insight_value': {},
 'plot': {'description': "A bar chart was attempted but failed due to missing 'amount' column in the dataset"},
 'question': 'What are the total expenses by category?',
 'actionable_insight': "No actionable insight could be generated due to missing 'amount' column in the dataset"}

### **Question 4:  How many expenses have been processed by each department?**


This analysis reveals the workload and activity level of each department by showing the number of expenses that have been processed.

In [9]:
# import matplotlib.pyplot as plt

# # Filter for processed expenses and group by department
# processed_expenses_by_department = df[df['state'] == 'Processed'].groupby('department').size().sort_values(ascending=False)

# # Plotting
# plt.figure(figsize=(10, 6))
# processed_expenses_by_department.plot(kind='bar', color='dodgerblue')
# plt.title('Number of Processed Expenses by Department')
# plt.xlabel('Department')
# plt.ylabel('Number of Processed Expenses')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()
print("N/A")

N/A


#### Generate JSON Description for the Insight

In [4]:
{
    "data_type": "comparative",
    "insight": "Analysis could not be completed due to missing 'department' column in the dataset",
    "insight_value": {},
    "plot": {
        "description": "Bar chart could not be generated due to KeyError indicating missing 'department' column"
    },
    "question": "How many expenses have been processes by each department?",
    "actionable_insight": "No actionable insight could be generated due to missing 'department' column in the dataset"
}

{'data_type': 'comparative',
 'insight': "Analysis could not be completed due to missing 'department' column in the dataset",
 'insight_value': {},
 'plot': {'description': "Bar chart could not be generated due to KeyError indicating missing 'department' column"},
 'question': 'How many expenses have been processes by each department?',
 'actionable_insight': "No actionable insight could be generated due to missing 'department' column in the dataset"}

### **Question 5:  What is the average processing time by department?**


This analysis will provide insights into how quickly each department processes expenses, which can highlight potential bottlenecks or efficiency issues.

In [11]:
# import matplotlib.pyplot as plt

# # Group by department and calculate the average processing time for processed expenses
# average_processing_time_by_department = df[df['state'] == 'Processed'].groupby('department')['processing_time_hours'].mean().sort_values()

# # Plotting
# plt.figure(figsize=(10, 6))
# average_processing_time_by_department.plot(kind='bar', color='purple')
# plt.title('Average Processing Time by Department')
# plt.xlabel('Department')
# plt.ylabel('Average Processing Time (Hours)')
# plt.xticks(rotation=45, ha='right')
# plt.tight_layout()
# plt.show()
print("N/A")

N/A


#### Generate JSON Description for the Insight

In [5]:
{
    "data_type": "comparative",
    "insight": "Analysis could not be completed due to missing 'department' column in the dataset",
    "insight_value": {},
    "plot": {
        "description": "A bar chart was attempted but failed due to missing 'department' column in the dataset"
    },
    "question": "What is the average processing time by department?",
    "actionable_insight": "No actionable insight could be generated due to missing 'department' column in the dataset"
}

{'data_type': 'comparative',
 'insight': "Analysis could not be completed due to missing 'department' column in the dataset",
 'insight_value': {},
 'plot': {'description': "A bar chart was attempted but failed due to missing 'department' column in the dataset"},
 'question': 'What is the average processing time by department?',
 'actionable_insight': "No actionable insight could be generated due to missing 'department' column in the dataset"}

### Summary of Findings (Flag 90):

1. **Total Expenses by Department:** The analysis could not be conducted due to missing data in the 'department' field. Further data cleaning and imputation may be required to address this issue and provide insights into departmental expenses.

2. **Average Expense by Department:** This analysis could not be conducted due to missing data in the 'department' field. Further data cleaning and imputation may be required to address this issue and provide insights into individual spending behavior across departments.

3. **Total Expenses by Category:** This analysis could not be conducted due to missing data in the 'category' field. Further data cleaning and imputation may be required to address this issue and provide insights into expense distribution across different categories.