# **Connect Game Apriori Analysis**

This script performs **Apriori Analysis** on the Connect Game dataset using Python's **`mlxtend`** library. The workflow involves preprocessing the dataset, mining frequent itemsets, and generating association rules.

---

## **Workflow**

### **1. Dataset Preprocessing**
- The dataset is read as a CSV file.
- Each row represents a game state, with columns corresponding to game positions (e.g., `pos_01`, `pos_02`, ..., `pos_42`) and a `winner` column.
- Preprocessing steps:
  1. **Exclude the `winner` column**: This column is not used for frequent itemset mining.
  2. **Create transactions**: Each non-null value in a row (e.g., `pos_01_X`) is treated as an item in the transaction.

### **2. One-Hot Encoding**
- The transactions are converted into a **one-hot encoded DataFrame**:
  - Each column corresponds to an item (e.g., `pos_01_X`).
  - Each row represents a transaction, with `True` or `False` indicating the presence or absence of an item.

### **3. Frequent Itemset Mining**
- The **Apriori algorithm** from `mlxtend` is applied to identify itemsets that occur at least a specified minimum number of times (`min_support`).
- Example:
  - If `min_support = 0.3`, only itemsets that appear in at least 30% of transactions are considered frequent.

### **4. Association Rules Generation**
- **Association rules** are generated from frequent itemsets using metrics like:
  - **Support**: Proportion of transactions containing the itemset.
  - **Confidence**: Likelihood that a consequent is present given an antecedent.
  - **Lift**: Measure of how much the antecedent boosts the likelihood of the consequent.

---

## **Installation**

To run the script, the following Python components need to be installed:

### **Dependencies**
1. **Python 3.6+**
2. **Required Libraries**:
   - `pandas`: For data handling and preprocessing.
   - `mlxtend`: For Apriori algorithm and association rules.

### **Bash Commands to Install Dependencies**

#### Install `pandas`:
```bash
pip install pandas
```

#### Install `mlxtend`:
```bash
pip install mlxtend
```

#### Verify Installation:
```bash
pip show pandas mlxtend
```

## **Usage Instructions**
1. Place the Connect Game dataset in the Data folder.

- The dataset should be in CSV format with columns: pos_01, pos_02, ..., pos_42, winner. 

2.Adjust Parameters:

- nrows: Limit the number of rows read from the dataset to manage memory usage.
- min_support: Set the minimum support threshold for frequent itemsets.
- min_confidence: Set the minimum confidence threshold for association rules.

3.Run the Script:

- Execute the Python script to generate frequent itemsets and association rules.

In [72]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

# Preprocess the Connect Game dataset
def preprocess_connect_game(filename, nrows=300):
    """
    Preprocess the Connect Game dataset into transactions.
    Only process up to `nrows` rows.
    """
    # Load the dataset with limited rows
    data = pd.read_csv(filename, nrows=nrows)

    # Drop the 'winner' column
    if 'winner' in data.columns:
        data = data.drop(columns=['winner'])

    # Convert each row into a transaction (list of items)
    transactions = []
    for _, row in data.iterrows():
        transaction = []
        for col, value in row.items():
            if pd.notnull(value):  # Ignore null values
                transaction.append(f"{col}_{value}")
        transactions.append(transaction)
    
    return transactions

# Load and preprocess the dataset (only first 300 rows)
filename = "Data/connect-game.csv"
transactions = preprocess_connect_game(filename, nrows=300)

# Convert transactions to one-hot encoded DataFrame
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_array, columns=te.columns_)

# Apply Apriori
min_support = 0.3  # Set minimum support threshold
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)

# Calculate num_itemsets for older versions of association_rules
num_itemsets = frequent_itemsets['itemsets'].apply(len).max()

# Display frequent itemsets
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate association rules
min_confidence = 0.7  # Set minimum confidence threshold
rules = association_rules(frequent_itemsets, num_itemsets=num_itemsets, metric="confidence", min_threshold=min_confidence)

# Display association rules
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
        support                                           itemsets
0      0.876667                                       (pos_01_0.0)
1      0.786667                                       (pos_02_0.0)
2      0.690000                                       (pos_03_0.0)
3      0.616667                                       (pos_04_0.0)
4      0.770000                                       (pos_05_0.0)
...         ...                                                ...
33746  0.340000  (pos_02_0.0, pos_05_0.0, pos_08_0.0, pos_21_0....
33747  0.313333  (pos_02_0.0, pos_05_0.0, pos_09_0.0, pos_21_0....
33748  0.303333  (pos_02_0.0, pos_05_0.0, pos_20_0.0, pos_21_0....
33749  0.316667  (pos_05_0.0, pos_08_0.0, pos_21_0.0, pos_12_0....
33750  0.310000  (pos_05_0.0, pos_08_0.0, pos_20_0.0, pos_21_0....

[33751 rows x 2 columns]

Association Rules:
                                  antecedents  \
0                                (pos_01_0.0)   
1                                