# Claims Optimization and Error Detection

This notebook contains exploratory data analysis (EDA) and visualizations to identify patterns in claim denials and errors.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set(style='whitegrid')

Matplotlib is building the font cache; this may take a moment.


In [2]:
# Load the dataset
data = pd.read_csv('../data/dataset.csv')
data.head()

In [3]:
# Summary statistics
data.describe()

In [4]:
# Check for missing values
missing_values = data.isnull().sum()
missing_values[missing_values > 0]

In [5]:
# Visualize claim amounts distribution
plt.figure(figsize=(10, 6))
sns.histplot(data['claim_amount'], bins=30, kde=True)
plt.title('Distribution of Claim Amounts')
plt.xlabel('Claim Amount')
plt.ylabel('Frequency')
plt.show()

In [6]:
# Analyze claim status
plt.figure(figsize=(10, 6))
sns.countplot(x='claim_status', data=data)
plt.title('Count of Claims by Status')
plt.xlabel('Claim Status')
plt.ylabel('Count')
plt.show()

In [7]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation = data.corr()
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

This notebook provides an initial analysis of the claims data. Further steps will include data cleaning, feature engineering, and model training.