# **LCOFI Algorithm on Connect Game Dataset**

This document explains the logic and steps involved in applying the **LCOFI (Logic Circuit Optimization Frequent Itemset)** algorithm to the Connect Game dataset. The dataset consists of positional values (`-1`, `0`, `1`) for a Connect Game grid, and the algorithm uncovers frequent patterns and association rules.

---

## **Logic Workflow**

### **1. Preprocessing the Dataset**
- **Objective**: Convert the Connect Game dataset into a transactional format suitable for frequent itemset mining.
- **Approach**:
  - Each position in the grid (e.g., `pos_01`, `pos_02`, ...) is treated as an item.
  - The value of the position (`-1`, `0`, `1`) is combined with the position name to create unique item labels (e.g., `pos_01_-1`).
  - Each row in the dataset is transformed into a transaction consisting of these labels.

### **2. LCOFI Algorithm**
The LCOFI algorithm mines frequent itemsets using a **graph-based approach**:

#### **2.1 Bipartite Graph Construction**
- Transactions and items are represented as nodes in a bipartite graph.
- Edges connect transaction nodes to their corresponding items.
- This representation simplifies traversal and relationship analysis.

#### **2.2 Frequent Itemset Generation**
- The algorithm starts with single-itemsets and calculates their support.
- Using the **Apriori property**, larger itemsets are iteratively generated by combining frequent subsets.
- Support is recalculated for these larger itemsets, and infrequent ones are pruned.

### **3. Association Rule Generation**
- Association rules are derived from the mined frequent itemsets.
- **Key Metrics**:
  - **Support**: Fraction of transactions containing the itemset.
  - **Confidence**: Likelihood of the consequent occurring, given the antecedent.
  - **Lift**: Strength of the rule compared to random chance.
- Rules are filtered based on a minimum confidence threshold to ensure significance.

---

## **Example Dataset**
The Connect Game dataset contains 42 positional columns (`pos_01` to `pos_42`) and a `winner` column:
- Each position holds a value (`-1`, `0`, `1`) indicating its state.
- Transactions are constructed using these positional values (e.g., `pos_01_-1`, `pos_02_1`).

---

## **Steps in the Workflow**

1. **Input**:
   - Dataset in CSV format with columns for positional values and the winner.
   - Example: `pos_01=-1`, `pos_02=1`.

2. **Preprocessing**:
   - Transform the dataset into transactions by combining column names with their values.
   - Example Transaction: `{pos_01_-1, pos_02_1, pos_03_0}`.

3. **Frequent Itemset Mining**:
   - Use the bipartite graph representation to:
     - Count the support for each itemset.
     - Prune infrequent itemsets dynamically.
     - Iteratively generate larger itemsets.

4. **Association Rule Generation**:
   - Derive rules from frequent itemsets.
   - Filter rules based on minimum confidence and lift.

---

## **Expected Outputs**

### **1. Frequent Itemsets**
- Identifies itemsets that occur frequently in the dataset.
- Example:
  - Single-itemsets: `{pos_01_1}: 0.15`, `{pos_02_-1}: 0.12`.
  - Two-itemsets: `{pos_01_1, pos_02_-1}: 0.10`.

### **2. Association Rules**
- Highlights relationships between itemsets.
- Example:
  - Rule: `{pos_01_1} => {pos_02_-1}`
    - Support: 0.10
    - Confidence: 0.66
    - Lift: 1.20

---

## **Key Features of LCOFI**
- **Efficient Representation**:
  - Bipartite graph reduces memory usage and computation time.
- **Dynamic Pruning**:
  - Infrequent itemsets are removed during traversal, minimizing overhead.
- **Iterative Expansion**:
  - Larger itemsets are generated only from frequent subsets, ensuring relevance.

---

## **Applications**
- **Game Analysis**:
  - Identify patterns in winning or losing moves.
- **Pattern Recognition**:
  - Discover positional combinations that frequently occur together.

---

## **Parameters**
- **Minimum Support (`min_support`)**:
  - Controls how often an itemset must appear to be considered frequent.
- **Minimum Confidence (`min_confidence`)**:
  - Ensures that generated rules are significant and actionable.

---

## **Conclusion**
This implementation of the LCOFI algorithm provides an efficient and scalable way to mine frequent patterns and generate association rules from the Connect Game dataset. By leveraging a graph-based approach, the algorithm ensures optimal performance even for larger datasets.

In [15]:
import pandas as pd
import itertools
import networkx as nx
from mlxtend.frequent_patterns import association_rules

# Step 1: Preprocess the Chess Dataset
def preprocess_chess_dataset(filename):
    """Preprocess the Chess dataset into transactions."""
    # Load the dataset
    data = pd.read_csv(filename)

    # Define bins for numerical columns
    rating_bins = [1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400]
    turn_bins = [0, 20, 40, 60, 80, 100]

    # Convert each row into a transaction
    transactions = []
    for _, row in data.iterrows():
        transaction = set()

        # Add categorical attributes
        transaction.add(f"rated_{row['rated']}")
        transaction.add(f"victory_status_{row['victory_status']}")
        transaction.add(f"winner_{row['winner']}")
        transaction.add(f"increment_code_{row['increment_code']}")
        transaction.add(f"opening_eco_{row['opening_eco']}")
        transaction.add(f"opening_name_{row['opening_name']}")

        # Bin numerical attributes
        white_rating_bin = pd.cut([row['white_rating']], bins=rating_bins, labels=False)[0]
        black_rating_bin = pd.cut([row['black_rating']], bins=rating_bins, labels=False)[0]
        turns_bin = pd.cut([row['turns']], bins=turn_bins, labels=False)[0]

        if pd.notnull(white_rating_bin):
            transaction.add(f"white_rating_bin_{rating_bins[white_rating_bin]}-{rating_bins[white_rating_bin+1]}")
        if pd.notnull(black_rating_bin):
            transaction.add(f"black_rating_bin_{rating_bins[black_rating_bin]}-{rating_bins[black_rating_bin+1]}")
        if pd.notnull(turns_bin):
            transaction.add(f"turns_bin_{turn_bins[turns_bin]}-{turn_bins[turns_bin+1]}")

        transactions.append(transaction)
    
    return transactions

# Step 2: LCOFI Algorithm (As defined earlier)
def generate_candidates(frequent_itemsets, size):
    """Generate candidate itemsets of a specific size."""
    return set(
        frozenset(x) for x in itertools.combinations(set(itertools.chain(*frequent_itemsets)), size)
    )

def count_support(itemsets, transactions, min_support):
    """Count the support of itemsets."""
    support_counts = {item: 0 for item in itemsets}
    for transaction in transactions:
        for item in itemsets:
            if item.issubset(transaction):
                support_counts[item] += 1
    return {
        item: count for item, count in support_counts.items()
        if count / len(transactions) >= min_support
    }

def lcofi(transactions, min_support):
    """LCOFI algorithm for frequent itemsets using graph representation."""
    # Build a bipartite graph
    G = nx.Graph()
    for i, transaction in enumerate(transactions):
        for item in transaction:
            G.add_edge(f"Transaction_{i}", item)

    # Generate single-item frequent itemsets
    single_items = {frozenset([node]) for node in G if G.degree[node] > 0 and not node.startswith("Transaction")}
    frequent_itemsets = count_support(single_items, transactions, min_support)

    all_frequent_itemsets = [frequent_itemsets]

    # Iteratively generate larger itemsets
    k = 2
    while frequent_itemsets:
        candidates = generate_candidates(frequent_itemsets, k)
        frequent_itemsets = count_support(candidates, transactions, min_support)
        if frequent_itemsets:
            all_frequent_itemsets.append(frequent_itemsets)
        k += 1

    return all_frequent_itemsets

# Step 3: Generate Association Rules
def generate_association_rules(frequent_itemsets, transactions, min_confidence):
    """Generate association rules from frequent itemsets."""
    # Flatten frequent itemsets
    flat_itemsets = {}
    for level in frequent_itemsets:
        flat_itemsets.update(level)

    # Prepare DataFrame
    num_transactions = len(transactions)
    data = {
        'itemsets': list(flat_itemsets.keys()),
        'support': [support / num_transactions for support in flat_itemsets.values()]
    }
    frequent_itemsets_df = pd.DataFrame(data)

    # Generate association rules
    rules = association_rules(frequent_itemsets_df, metric="confidence", min_threshold=min_confidence, num_itemsets=num_transactions)
    return rules

# Step 4: Load and Analyze the Dataset
filename = "Data/chess.csv"  # Path to the Chess dataset
transactions = preprocess_chess_dataset(filename)

# Parameters
min_support = 0.1  # Minimum support threshold
min_confidence = 0.6  # Minimum confidence threshold

# Run LCOFI Algorithm
frequent_itemsets = lcofi(transactions, min_support)

# Generate and Print Association Rules
rules = generate_association_rules(frequent_itemsets, transactions, min_confidence)

# Display Results
print("Frequent Itemsets:")
for k, itemsets in enumerate(frequent_itemsets, start=1):
    print(f"Level {k}:")
    for itemset, support in itemsets.items():
        print(f"  {set(itemset)}: {support / len(transactions):.2f}")

print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
Level 1:
  {'increment_code_10+0'}: 0.38
  {'victory_status_mate'}: 0.32
  {'winner_black'}: 0.45
  {'black_rating_bin_1800-2000'}: 0.15
  {'turns_bin_20-40'}: 0.21
  {'black_rating_bin_1600-1800'}: 0.21
  {'rated_True'}: 0.81
  {'winner_white'}: 0.50
  {'white_rating_bin_1600-1800'}: 0.22
  {'black_rating_bin_1200-1400'}: 0.18
  {'white_rating_bin_1800-2000'}: 0.15
  {'turns_bin_40-60'}: 0.27
  {'white_rating_bin_1200-1400'}: 0.17
  {'turns_bin_60-80'}: 0.19
  {'rated_False'}: 0.19
  {'turns_bin_80-100'}: 0.11
  {'black_rating_bin_1400-1600'}: 0.29
  {'white_rating_bin_1400-1600'}: 0.29
  {'victory_status_resign'}: 0.56
Level 2:
  {'rated_True', 'white_rating_bin_1400-1600'}: 0.22
  {'rated_True', 'winner_black'}: 0.37
  {'rated_True', 'white_rating_bin_1600-1800'}: 0.18
  {'turns_bin_40-60', 'increment_code_10+0'}: 0.11
  {'rated_True', 'black_rating_bin_1400-1600'}: 0.22
  {'winner_black', 'white_rating_bin_1400-1600'}: 0.13
  {'rated_True', 'turns_bin_20-40'}: 0.