# **LCOFI Algorithm on Pakistan Accident Dataset**

This document explains the process of applying the **LCOFI (Logic Circuit Optimization Frequent Itemset)** algorithm to a dataset of Pakistan traffic accidents. The algorithm mines frequent itemsets and generates association rules to uncover meaningful patterns.

---

## **Steps in the Code**

### **1. Preprocessing the Dataset**
- The dataset, stored in a CSV file, contains columns such as `Area`, `Year`, `Total number of accidents`, etc.
- Each row in the dataset is converted into a transaction where each column value is represented as `ColumnName_Value`. For example:
  - `Area_Punjab`, `Year_2021`, `Killed_10`.
- Transactions are stored as sets for efficient processing.

### **2. Define the LCOFI Algorithm**
- **Purpose**: Mine frequent itemsets using a graph-based approach.
- **Key Components**:
  1. **Bipartite Graph Representation**:
     - Transactions and items are represented as nodes in a bipartite graph.
     - Edges connect transactions to their items.
  2. **Frequent Itemset Mining**:
     - The algorithm starts with single-itemsets and iteratively generates larger itemsets.
     - Support for itemsets is counted by checking their presence in transactions.
     - Itemsets that meet the minimum support threshold are retained, while others are pruned.

### **3. Generate Association Rules**
- **Purpose**: Extract meaningful relationships between items in the frequent itemsets.
- **Process**:
  1. Flatten the frequent itemsets into a single dictionary.
  2. Prepare a DataFrame containing the itemsets and their normalized support values.
  3. Use the `mlxtend.frequent_patterns.association_rules` function to compute rules.
     - Metrics include:
       - **Support**: Proportion of transactions containing the itemset.
       - **Confidence**: Likelihood that the consequent occurs given the antecedent.
       - **Lift**: Measure of the strength of the rule compared to random chance.

### **4. Load and Analyze the Dataset**
- **Input File**: `pak-traffic-accidents-annual.csv`.
- **Parameters**:
  - `min_support`: Set to 0.09, meaning an itemset must appear in at least 9% of the transactions.
  - `min_confidence`: Set to 0.6, meaning rules must have at least 60% confidence.
- The dataset is preprocessed into transactions, and the LCOFI algorithm is run to mine frequent itemsets.
- Association rules are then generated and displayed.

---

## **Outputs**
1. **Frequent Itemsets**:
   - Lists itemsets at each level (e.g., single items, two-item combinations) with their support values.
   - Example:
     ```
     Level 1:
       {'Area_Punjab'}: 0.50
       {'Year_2021'}: 0.40
     Level 2:
       {'Area_Punjab', 'Year_2021'}: 0.30
     ```

2. **Association Rules**:
   - Displays relationships between itemsets with metrics such as support, confidence, and lift.
   - Example:
     ```
     {'Area_Punjab'} => {'Year_2021'} (Support: 0.30, Confidence: 0.60, Lift: 1.20)
     ```

---

## **Key Benefits of This Implementation**
1. **Efficient Mining**:
   - The LCOFI algorithm uses a graph-based approach to minimize dataset scans and computational overhead.
2. **Customizable Thresholds**:
   - Parameters like `min_support` and `min_confidence` can be adjusted to control the level of detail in the output.
3. **Actionable Insights**:
   - Association rules provide valuable patterns that can guide decision-making.

---

## **Applications**
- **Traffic Safety**:
  - Identify patterns in accident data (e.g., frequent occurrences in certain areas or time periods).
- **Policy Making**:
  - Use insights to target interventions, such as improving road safety in high-risk areas.
- **Healthcare**:
  - Analyze injury patterns to enhance emergency response systems.

This code demonstrates a practical approach to frequent itemset mining and association rule generation using real-world data. Adjust parameters as needed to tailor the analysis for different datasets or objectives.

In [None]:
import pandas as pd
import itertools
import networkx as nx
from mlxtend.frequent_patterns import association_rules

# Step 1: Preprocess the Dataset
def preprocess_accident_data(filename):
    """Preprocess the Pakistan accident dataset into transactions."""
    # Load the dataset
    data = pd.read_csv(filename)

    # Convert each row into a transaction
    transactions = []
    for _, row in data.iterrows():
        transaction = set()
        for col in row.index:
            if pd.notnull(row[col]):
                transaction.add(f"{col}_{row[col]}")
        transactions.append(transaction)
    return transactions

# Step 2: Define LCOFI Algorithm
def generate_candidates(frequent_itemsets, size):
    """Generate candidate itemsets of a specific size."""
    return set(
        frozenset(x) for x in itertools.combinations(set(itertools.chain(*frequent_itemsets)), size)
    )

def count_support(itemsets, transactions, min_support):
    """Count the support of itemsets."""
    support_counts = {item: 0 for item in itemsets}
    for transaction in transactions:
        for item in itemsets:
            if item.issubset(transaction):
                support_counts[item] += 1
    return {
        item: count for item, count in support_counts.items()
        if count / len(transactions) >= min_support
    }

def lcofi(transactions, min_support):
    """LCOFI algorithm for frequent itemsets using graph representation."""
    # Build a bipartite graph
    G = nx.Graph()
    for i, transaction in enumerate(transactions):
        for item in transaction:
            G.add_edge(f"Transaction_{i}", item)

    # Generate single-item frequent itemsets
    single_items = {frozenset([node]) for node in G if G.degree[node] > 0 and not node.startswith("Transaction")}
    frequent_itemsets = count_support(single_items, transactions, min_support)

    all_frequent_itemsets = [frequent_itemsets]

    # Iteratively generate larger itemsets
    k = 2
    while frequent_itemsets:
        candidates = generate_candidates(frequent_itemsets, k)
        frequent_itemsets = count_support(candidates, transactions, min_support)
        if frequent_itemsets:
            all_frequent_itemsets.append(frequent_itemsets)
        k += 1

    return all_frequent_itemsets

# Step 3: Generate Association Rules
def generate_association_rules(frequent_itemsets, transactions, min_confidence):
    """Generate association rules from frequent itemsets."""
    # Flatten frequent itemsets
    flat_itemsets = {}
    for level in frequent_itemsets:
        flat_itemsets.update(level)

    # Prepare DataFrame
    num_transactions = len(transactions)
    data = {
        'itemsets': list(flat_itemsets.keys()),
        'support': [support / num_transactions for support in flat_itemsets.values()]
    }
    frequent_itemsets_df = pd.DataFrame(data)

    # Generate association rules
    rules = association_rules(frequent_itemsets_df, metric="confidence", min_threshold=min_confidence, num_itemsets=num_transactions)
    return rules

# Step 4: Load and Analyze the Dataset
filename = "Datasets/pak-traffic-accidents-annual.csv"
transactions = preprocess_accident_data(filename)

# Parameters
min_support = 0.09  # Minimum support threshold
min_confidence = 0.2  # Minimum confidence threshold

# Run LCOFI Algorithm
frequent_itemsets = lcofi(transactions, min_support)

# Generate and Print Association Rules
rules = generate_association_rules(frequent_itemsets, transactions, min_confidence)

# Display Results
print("Frequent Itemsets:")
for k, itemsets in enumerate(frequent_itemsets, start=1):
    print(f"Level {k}:")
    for itemset, support in itemsets.items():
        print(f"  {set(itemset)}: {support / len(transactions):.2f}")

print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
Level 1:
  {'Area_Khyber Pakhtunkhwa'}: 0.18
  {'Area_Punjab'}: 0.18
  {'Area_Pakistan'}: 0.18
  {'Year_2017-18'}: 0.10
  {'Area_Islamabad'}: 0.11
  {'Year_2018-19'}: 0.10
  {'Area_Sindh'}: 0.18
  {'Area_Balochistan'}: 0.18

Association Rules:
Empty DataFrame
Columns: [antecedents, consequents, antecedent support, consequent support, support, confidence, lift, representativity, leverage, conviction, zhangs_metric, jaccard, certainty, kulczynski]
Index: []
