# **LCOFI Algorithm Analysis on `accidents.dat` Dataset**

This document outlines the steps and methodology for applying the **LCOFI (Logic Circuit Optimization Frequent Itemset)** algorithm to the `accidents.dat` dataset. The primary objective is to discover **frequent itemsets** and generate **association rules** from the transactional data in a computationally efficient manner using graph-based techniques.

---

## **Steps Involved**

### **Step 1: Load the Dataset**

The `chess.dat` dataset contains transactions where each line represents a set of items (represented as integers). These items correspond to attributes or characteristics of chess games. 

The dataset is loaded and processed into a structured format, where each line is read as a set of items (or a transaction). This transactional data will serve as the input for the LCOFI algorithm.

---

### **Step 2: Represent Transactions as a Bipartite Graph**

The LCOFI algorithm employs a **graph-based representation** of transactions:
- **Nodes**: The graph consists of two types of nodes:
  - Transaction nodes: Representing each transaction uniquely.
  - Item nodes: Representing individual items across all transactions.
- **Edges**: Each edge connects a transaction node to an item node if the item is present in the transaction.

This bipartite representation allows efficient traversal and processing of transactions for frequent itemset mining.

---

### **Step 3: Discover Frequent Itemsets**

Frequent itemsets are identified using the LCOFI algorithm, which includes the following steps:
1. **Initialize Single-Item Frequent Itemsets**:
   - Each item is treated as a single-item candidate, and its support (occurrence frequency) is computed.
   - Items meeting the **minimum support threshold** are retained as frequent single-itemsets.

2. **Iterative Candidate Generation**:
   - Larger candidate itemsets are generated from previously discovered frequent itemsets.
   - For `k`-itemsets, candidates are generated by combining `k-1` frequent itemsets while ensuring all subsets are frequent (Apriori property).

3. **Support Counting**:
   - The support of each candidate itemset is computed by checking its occurrence across transactions.
   - Candidates meeting the **minimum support threshold** are retained as frequent itemsets.

4. **Graph Optimization**:
   - The graph representation is updated dynamically to prune infrequent itemsets and reduce computational overhead.

This iterative process continues until no further frequent itemsets can be generated.

---

### **Step 4: Generate Association Rules**

Once frequent itemsets are identified, **association rules** are generated to uncover relationships between items. These rules are evaluated using the following metrics:
- **Support**: The proportion of transactions containing both the antecedent and consequent of the rule.
- **Confidence**: The probability that a transaction containing the antecedent also contains the consequent.
- **Lift**: A measure of the strength of the rule compared to random chance.

Rules meeting the **minimum confidence threshold** are retained, providing valuable insights into patterns and relationships in the dataset.

---

### **Step 5: Output the Results**

The results include:
1. **Frequent Itemsets**:
   - Sets of items that frequently appear together in the dataset, along with their support values.

2. **Association Rules**:
   - Logical rules derived from the frequent itemsets, showing relationships between items with metrics such as confidence and lift.

---

## **Why LCOFI?**

The LCOFI algorithm is chosen for its efficiency in mining frequent itemsets:
- **Graph-Based Optimization**:
  - By representing transactions as a bipartite graph, the algorithm can dynamically prune infrequent itemsets, reducing computational overhead.
- **Iterative Pruning**:
  - Candidate generation and pruning are performed iteratively, ensuring that only relevant itemsets are evaluated in subsequent steps.
- **Scalability**:
  - The algorithm is well-suited for large datasets like `chess.dat`, where traditional methods like Apriori may face performance bottlenecks due to multiple dataset scans.

---

## **Summary of Steps**

1. **Load Transactions**:
   - Read the dataset and store the transactions in a structured format.

2. **Graph Representation**:
   - Convert transactions into a bipartite graph with transaction and item nodes.

3. **Frequent Itemset Mining**:
   - Use the LCOFI algorithm to discover frequent itemsets based on the **minimum support** threshold.

4. **Association Rule Generation**:
   - Generate rules from frequent itemsets, filtering by the **minimum confidence** threshold.

5. **Result Analysis**:
   - Display frequent itemsets and association rules, providing insights into patterns and relationships in the chess dataset.

---

## **Applications**

Applying the LCOFI algorithm to the accidents.dat dataset helps uncover patterns such as:

- Frequently occurring combinations of accident attributes, such as types of accidents, number of vehicles involved, and severity.
- Relationships between factors like the number of vehicles involved, fatalities, and injuries.
- Insights into conditions or circumstances that contribute to specific types of accidents, enabling targeted safety measures.

These insights can assist policymakers, transportation authorities, and researchers in understanding accident trends, improving road safety, and formulating data-driven interventions to reduce accidents and fatalities.

---

This comprehensive analysis demonstrates the effectiveness of the LCOFI algorithm in mining frequent itemsets and generating association rules, enabling meaningful insights from transactional datasets like accidents.dat.

In [13]:
import pandas as pd
import itertools
import networkx as nx
from mlxtend.frequent_patterns import association_rules

# Step 1: Load and Preprocess the Dataset
def load_chess_dat(filename):
    """Load and preprocess the .dat file into transactions."""
    transactions = []
    with open(filename, 'r') as file:
        for line in file:
            # Split each line into items and convert to a set
            transaction = set(map(int, line.strip().split()))
            transactions.append(transaction)
    return transactions

# Step 2: LCOFI Algorithm Functions
def generate_candidates(frequent_itemsets, size):
    """Generate candidate itemsets of a specific size."""
    return set(
        frozenset(x) for x in itertools.combinations(set(itertools.chain(*frequent_itemsets)), size)
    )

def count_support(itemsets, transactions, min_support):
    """Count the support of itemsets."""
    support_counts = {item: 0 for item in itemsets}
    for transaction in transactions:
        for item in itemsets:
            if item.issubset(transaction):
                support_counts[item] += 1
    return {
        item: count for item, count in support_counts.items()
        if count / len(transactions) >= min_support
    }

def lcofi(transactions, min_support):
    """LCOFI algorithm for frequent itemsets using graph representation."""
    # Build a bipartite graph
    G = nx.Graph()
    for i, transaction in enumerate(transactions):
        transaction_node = f"Transaction_{i}"  # Transaction nodes are strings
        for item in transaction:
            G.add_edge(transaction_node, item)

    # Generate single-item frequent itemsets
    single_items = {frozenset([node]) for node in G.nodes if isinstance(node, int)}
    frequent_itemsets = count_support(single_items, transactions, min_support)

    all_frequent_itemsets = [frequent_itemsets]

    # Iteratively generate larger itemsets
    k = 2
    while frequent_itemsets:
        candidates = generate_candidates(frequent_itemsets, k)
        frequent_itemsets = count_support(candidates, transactions, min_support)
        if frequent_itemsets:
            all_frequent_itemsets.append(frequent_itemsets)
        k += 1

    return all_frequent_itemsets

# Step 3: Generate Association Rules
def generate_association_rules(frequent_itemsets, transactions, min_confidence):
    """Generate association rules from frequent itemsets."""
    # Flatten frequent itemsets
    flat_itemsets = {}
    for level in frequent_itemsets:
        flat_itemsets.update(level)

    # Prepare DataFrame
    num_transactions = len(transactions)
    data = {
        'itemsets': list(flat_itemsets.keys()),
        'support': [support / num_transactions for support in flat_itemsets.values()]
    }
    frequent_itemsets_df = pd.DataFrame(data)

    # Generate association rules
    rules = association_rules(frequent_itemsets_df, metric="confidence", min_threshold=min_confidence, num_itemsets=num_transactions)
    return rules

# Step 4: Apply LCOFI on Chess Dataset
filename = "Data/accidents.dat"  # Replace with the actual path to chess.dat
transactions = load_chess_dat(filename)

# Parameters
min_support = 0.85  # Minimum support threshold
min_confidence = 0.6  # Minimum confidence threshold

# Run LCOFI Algorithm
frequent_itemsets = lcofi(transactions, min_support)

# Generate and Print Association Rules
rules = generate_association_rules(frequent_itemsets, transactions, min_confidence)

# Display Results
print("Frequent Itemsets:")
for k, itemsets in enumerate(frequent_itemsets, start=1):
    print(f"Level {k}:")
    for itemset, support in itemsets.items():
        print(f"  {set(itemset)}: {support / len(transactions):.2f}")

print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
Level 1:
  {17}: 1.00
  {21}: 0.89
  {16}: 0.98
  {31}: 0.93
  {29}: 0.88
  {18}: 1.00
  {12}: 1.00
  {43}: 0.86
Level 2:
  {17, 43}: 0.86
  {17, 12}: 1.00
  {12, 21}: 0.89
  {16, 21}: 0.87
  {17, 21}: 0.89
  {43, 12}: 0.86
  {16, 17}: 0.98
  {17, 31}: 0.93
  {12, 31}: 0.93
  {16, 12}: 0.98
  {12, 29}: 0.88
  {16, 31}: 0.92
  {17, 18}: 1.00
  {18, 12}: 1.00
  {18, 43}: 0.85
  {16, 18}: 0.97
  {18, 21}: 0.89
  {18, 29}: 0.88
  {17, 29}: 0.88
  {18, 31}: 0.93
  {16, 29}: 0.86
Level 3:
  {16, 18, 31}: 0.92
  {17, 18, 31}: 0.93
  {16, 17, 29}: 0.86
  {17, 12, 31}: 0.93
  {16, 17, 12}: 0.98
  {17, 12, 21}: 0.89
  {16, 12, 21}: 0.87
  {16, 17, 21}: 0.87
  {16, 18, 29}: 0.86
  {17, 43, 12}: 0.86
  {16, 17, 31}: 0.92
  {17, 18, 43}: 0.85
  {16, 12, 29}: 0.86
  {18, 12, 31}: 0.93
  {17, 12, 29}: 0.88
  {18, 12, 29}: 0.88
  {17, 18, 21}: 0.89
  {17, 18, 29}: 0.88
  {18, 12, 21}: 0.89
  {16, 18, 12}: 0.97
  {18, 43, 12}: 0.85
  {17, 18, 12}: 1.00
  {16, 12, 31}: 0.92
  {16, 17,