# Supply Chain Verification System: Exploratory Data Analysis

This notebook contains exploratory data analysis for our supply chain verification system. We'll analyze product data, transfer patterns, ethical scores, and look for potential anomalies.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sqlalchemy import create_engine
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Set up database connection
db_url = os.getenv('DATABASE_URL')
engine = create_engine(db_url)

# Set plot styles
plt.style.use('seaborn')
sns.set_palette('viridis')

In [None]:
# Load data
query = """
SELECT 
    p.id AS product_id,
    p.name,
    p.manufacturer,
    p.manufacturing_date,
    p.batch_number,
    p.current_owner,
    p.category,
    p.price,
    p.quantity,
    p.last_updated,
    c.certification_body,
    c.certification_date,
    c.expiration_date,
    e.score_category,
    e.score,
    e.assessment_date,
    t.transfer_date,
    t.from_owner,
    t.to_owner,
    t.location,
    t.latitude,
    t.longitude
FROM products p
LEFT JOIN certifications c ON p.id = c.product_id
LEFT JOIN ethical_scores e ON p.id = e.product_id
LEFT JOIN transfers t ON p.id = t.product_id
"""

df = pd.read_sql(query, engine)
print(f"Loaded {len(df)} records")
df.head()

## Data Overview

In [None]:
# Display basic statistics
df.describe()

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
print("Missing values:")
print(missing_values[missing_values > 0])

## Product Analysis

In [None]:
# Product category distribution
plt.figure(figsize=(12, 6))
df['category'].value_counts().plot(kind='bar')
plt.title('Product Category Distribution')
plt.xlabel('Category')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Price distribution
plt.figure(figsize=(12, 6))
sns.histplot(df['price'], kde=True)
plt.title('Price Distribution')
plt.xlabel('Price')
plt.show()

## Transfer Analysis

In [None]:
# Number of transfers over time
df['transfer_date'] = pd.to_datetime(df['transfer_date'])
transfers_over_time = df.groupby('transfer_date').size().reset_index(name='count')

fig = px.line(transfers_over_time, x='transfer_date', y='count', title='Number of Transfers Over Time')
fig.show()

In [None]:
# Transfer network visualization
import networkx as nx

G = nx.from_pandas_edgelist(df, 'from_owner', 'to_owner')
pos = nx.spring_layout(G)

plt.figure(figsize=(15, 15))
nx.draw(G, pos, with_labels=True, node_color='lightblue', node_size=500, font_size=8, arrows=True)
plt.title('Transfer Network')
plt.show()

## Ethical Sourcing Analysis

In [None]:
# Ethical score distribution
plt.figure(figsize=(12, 6))
sns.boxplot(x='score_category', y='score', data=df)
plt.title('Ethical Score Distribution by Category')
plt.xlabel('Score Category')
plt.ylabel('Score')
plt.show()

In [None]:
# Certification status
cert_status = df['certification_body'].notna().value_counts()
plt.figure(figsize=(8, 8))
plt.pie(cert_status.values, labels=['Certified', 'Not Certified'], autopct='%1.1f%%')
plt.title('Product Certification Status')
plt.show()

## Anomaly Detection

In [None]:
from sklearn.ensemble import IsolationForest

# Prepare data for anomaly detection
anomaly_features = ['price', 'quantity', 'score']
X = df[anomaly_features].dropna()

# Train isolation forest model
clf = IsolationForest(contamination=0.1, random_state=42)
clf.fit(X)

# Predict anomalies
X['anomaly'] = clf.predict(X)

# Visualize anomalies
fig = px.scatter_3d(X, x='price', y='quantity', z='score', color='anomaly',
                    title='Anomaly Detection in 3D Space')
fig.show()

## Conclusion

This exploratory data analysis has provided insights into our supply chain verification system, including:

1. Product category distribution and pricing patterns
2. Transfer patterns over time and network visualization
3. Ethical sourcing scores and certification status
4. Potential anomalies in price, quantity, and ethical scores

These insights can be used to improve our verification processes, identify potential risks, and enhance the overall efficiency of the supply chain.