
# AI Use Cases Library – Starter Analysis Notebook

This notebook demonstrates **basic analysis workflows** with the AI Use Case dataset (**3,023 cases**).
It is intentionally **minimal** for v2.0 – just enough to show dataset loading and simple analysis.

**Folders**
- `../../data/use-cases.csv` – main dataset
- `../../insights/` – curated written insights
- `../../charts/` – PNG charts for quick browsing
- `../../tools/analysis-scripts/` – this folder
- Back to [README](../../README.md) · See insights: [Trends](../../insights/trends-analysis.md) · [Vendor Comparison](../../insights/vendor-comparison.md)

> Tip: Create a new branch for your experiments before committing changes.


## 1) Setup

In [None]:

import pandas as pd
import matplotlib.pyplot as plt

# Display options
pd.set_option("display.max_colwidth", 160)

# Load dataset (CSV). If you are testing locally, ensure this relative path is correct.
df = pd.read_csv("../../data/use-cases.csv")

print(f"Dataset shape: {df.shape}")
print(f"Total cases: {len(df)}")
df.head()


## 2) Basic Overview

In [None]:

# Column list
print("Columns in dataset:")
print(df.columns.tolist())
print("\nMissing values:")
df.isna().sum().sort_values(ascending=False).head(10)


## 3) Cases by Industry

In [None]:

industry_counts = (
    df["Use Case Industry"]
      .fillna("Unspecified")
      .value_counts()
      .head(15)
      .sort_values(ascending=True)
)

plt.figure(figsize=(10,6))
industry_counts.plot(kind="barh", color='#1f77b4')
plt.title("Top 15 Industries by Case Count", fontsize=14, fontweight='bold')
plt.xlabel("Number of Cases", fontsize=12)
plt.ylabel("")
plt.tight_layout()
plt.show()


## 4) Cases by Domain

In [None]:

domain_counts = (
    df["Use Case Domain"]
      .fillna("Unspecified")
      .value_counts()
      .head(15)
      .sort_values(ascending=True)
)

plt.figure(figsize=(10,6))
domain_counts.plot(kind="barh", color='#1f77b4')
plt.title("Top 15 Domains by Case Count", fontsize=14, fontweight='bold')
plt.xlabel("Number of Cases", fontsize=12)
plt.ylabel("")
plt.tight_layout()
plt.show()


## 5) Vendor Mentions (quick demo)

In [None]:

# Simple keyword searches in Tool/Technology column – adjust as needed
tools = df["Tool/Technology"].fillna("").str.lower()

# Define vendor search patterns (NO word boundaries - they cause issues)
vendor_patterns = {
    "Microsoft (Azure/Copilot)": r"azure|microsoft|copilot",
    "Google (Gemini/Vertex)": r"google|gemini|vertex",
    "OpenAI (GPT/ChatGPT)": r"openai|gpt|chatgpt",
    "AWS (Bedrock/SageMaker)": r"amazon|aws|bedrock|sagemaker",
    "Anthropic (Claude)": r"anthropic|claude",
    "NVIDIA": r"nvidia",
    "IBM (watsonx)": r"ibm|watson",
}

metrics = {}
for vendor, pattern in vendor_patterns.items():
    count = tools.str.contains(pattern, regex=True, case=False).sum()
    metrics[vendor] = count

# Display results
vendor_series = pd.Series(metrics).sort_values(ascending=False)
print("\nVendor Tool Mentions:")
for vendor, count in vendor_series.items():
    pct = (count / len(df) * 100)
    print(f"  {vendor:35s} {count:4d} cases ({pct:5.1f}%)")

vendor_series


## 6) Outcomes & Benefits (keyword scan)

In [None]:

outcomes = df["Outcomes & Benefits"].fillna("").str.lower()

# Define outcome keyword patterns
keywords = {
    "Time savings / Speed": r"time|faster|speed|latency|turnaround|accelerat",
    "Productivity": r"productivity|throughput|output|efficiency",
    "Accuracy / Quality": r"accuracy|quality|precision|error|defect",
    "Customer Satisfaction": r"satisfaction|csat|nps|experience|engagement",
    "Cost reduction": r"cost|expense|savings|reduce.*cost",
    "Revenue / Growth": r"revenue|sales|conversion|growth|profit",
    "Automation": r"automat|autonomous|straight-through",
    "Risk / Compliance": r"risk|compliance|regulatory|audit|fraud",
}

outcome_counts = {}
for category, pattern in keywords.items():
    count = outcomes.str.contains(pattern, regex=True, case=False).sum()
    outcome_counts[category] = count

# Sort and plot
series = pd.Series(outcome_counts).sort_values(ascending=True)

plt.figure(figsize=(10,6))
series.plot(kind="barh", color='#1f77b4')
plt.title("Outcomes & Benefits – keyword frequency", fontsize=14, fontweight='bold')
plt.xlabel("Number of Cases Mentioned", fontsize=12)
plt.ylabel("")
plt.tight_layout()
plt.show()


## 7) Export helper (optional)

In [None]:

# Example: Save a filtered slice as CSV (uncomment to use)
# subset = df[df["Use Case Industry"] == "Healthcare"]
# subset.to_csv("healthcare_cases.csv", index=False)
# print(f"Exported {len(subset)} healthcare cases")



---

### Notes
- For deeper analysis, see `../../insights/` and the charts in `../../charts/`.
- Please keep analyses reproducible (pin environments, comment code meaningfully).

**Happy exploring!**

Back to [README](../../README.md) · See insights: [Trends](../../insights/trends-analysis.md) · [Vendor Comparison](../../insights/vendor-comparison.md)

