<a href="https://colab.research.google.com/github/blarpblarp/CIS-2100-Project-3/blob/main/CIS_2100_Project_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1.   Vincent Le
2.   CIS 2100
3.   Project 3


# Corporate Market Basket Analysis Report
### Project Overview
This project conducts a Market Basket Analysis of annual sales data from a corporate network of stores. The primary objective is to uncover hidden purchasing patterns and relationships among products to inform strategic decisions in sales and marketing.

### Analytical Approach
##### - Data Exploration: Analyze overall product frequencies across the corporation.
##### - Frequent Itemset Mining: Leverage itemset analysis to discover co-occurrence patterns in customer purchases.
##### - Store-Level Analysis: Provide detailed, store-specific findings to support localized marketing and inventory efforts.
##### - Visualization: Use visual representations to communicate results effectively.

### Goals of the Analysis
This analysis aims to empower decision-makers with data-driven insights for:

##### - Enhancing customer experience through tailored recommendations.
##### - Driving sales growth by promoting high-margin, frequently co-purchased products.
##### - Improving operational efficiency in stock management.
##### - Enabling competitive differentiation through store-specific marketing strategies.
#### By following the outlined structure and leveraging Python for data manipulation, visualization, and analysis, this project delivers actionable insights that directly contribute to achieving these business goals.


1. Imports
    - Intialize the project

In [40]:
import pandas as pd
import numpy as np
from itertools import combinations
from google.colab import drive
import matplotlib.pyplot as plt
import seaborn as sns

2. Market Basket Analyzer Class

In [41]:
class MarketBasketAnalyzer:
    def __init__(self, csv_file="/content/drive/MyDrive/Colab Notebooks/annual_sales_data.csv"):
        """
        Market Basket Analysis for Corporate Sales Data

        Project Objectives:
        1. Uncover hidden purchasing patterns across corporate stores
        2. Identify most frequent product combinations in customer baskets
        3. Provide actionable insights for:
           - Cross-selling strategies
           - Inventory management
           - Store-specific marketing approaches

        Analytical Approach:
        - Comprehensive analysis of customer purchase behaviors
        - Itemset mining to discover product co-occurrence patterns
        - Quantitative and visual representation of findings
        """
        # Mount Google Drive
        drive.mount('/content/drive')

        # Read the sales data
        self.df = pd.read_csv(csv_file)

        # Prepare data for analysis
        self.prepare_data()

    # Data Preparation
    def prepare_data(self):
        """
        Preprocess the sales data for market basket analysis
        """
        # Group purchases by order
        self.order_baskets = (
            self.df.groupby('OrderID')['ProductName']
            .apply(list)
            .reset_index()
        )

    # Product Frequency Analysis
    def calculate_product_frequency(self):
        """
        Calculate overall product frequency across all stores

        Returns:
        - DataFrame of product frequencies
        """
        product_freq = self.df['ProductName'].value_counts()
        product_freq_df = product_freq.reset_index()
        product_freq_df.columns = ['Product Name', 'Total Frequency']
        product_freq_df['Percentage'] = (product_freq_df['Total Frequency'] / len(self.df) * 100).round(2)

        return product_freq_df

    # Finding Frequent Itemsets
    def find_frequent_itemsets(self, min_support=0.01):
        """
        Find frequent itemsets using a custom implementation

        Parameters:
        - min_support: Minimum support threshold for itemsets

        Returns:
        - DataFrame of frequent itemsets
        """
        # Calculate all possible item combinations
        all_itemsets = []

        # Iterate through different combination lengths
        for r in range(2, 4):  # Look for pairs and triples
            for basket in self.order_baskets['ProductName']:
                # Generate all combinations of products in the basket
                basket_combinations = list(combinations(set(basket), r))
                all_itemsets.extend(basket_combinations)

        # Count itemset frequencies
        itemset_counts = {}
        for itemset in all_itemsets:
            itemset = tuple(sorted(itemset))
            itemset_counts[itemset] = itemset_counts.get(itemset, 0) + 1

        # Convert to DataFrame and calculate support
        itemset_df = pd.DataFrame.from_dict(
            itemset_counts,
            orient='index',
            columns=['Frequency']
        ).reset_index()

        itemset_df.columns = ['Product Combination', 'Frequency']
        itemset_df['Support'] = itemset_df['Frequency'] / len(self.order_baskets)

        # Filter by minimum support
        frequent_itemsets = itemset_df[itemset_df['Support'] >= min_support]

        return frequent_itemsets.sort_values('Frequency', ascending=False)

    # Market Basket Analysis
    def store_specific_analysis(self):
        """
        Perform market basket analysis for each store

        Returns:
        - Dictionary of store-specific frequent itemsets
        """
        store_analyses = {}

        # Unique stores in the dataset
        stores = self.df['StoreID'].unique()

        for store in stores:
            # Filter data for specific store
            store_df = self.df[self.df['StoreID'] == store]

            # Group purchases by order for this store
            store_order_baskets = (
                store_df.groupby('OrderID')['ProductName']
                .apply(list)
                .reset_index()
            )

            # Find frequent itemsets for this store
            all_itemsets = []
            for r in range(2, 4):
                for basket in store_order_baskets['ProductName']:
                    basket_combinations = list(combinations(set(basket), r))
                    all_itemsets.extend(basket_combinations)

            # Count itemset frequencies
            itemset_counts = {}
            for itemset in all_itemsets:
                itemset = tuple(sorted(itemset))
                itemset_counts[itemset] = itemset_counts.get(itemset, 0) + 1

            # Convert to DataFrame
            itemset_df = pd.DataFrame.from_dict(
                itemset_counts,
                orient='index',
                columns=['Frequency']
            ).reset_index()

            itemset_df.columns = ['Product Combination', 'Frequency']
            itemset_df['Support'] = itemset_df['Frequency'] / len(store_order_baskets)

            # Store results
            store_analyses[store] = itemset_df.sort_values('Frequency', ascending=False).head(10)

        return store_analyses

    # Visualizations
    def visualize_product_frequency(self, top_n=10):
        """
        Create a visualization of top product frequencies

        Parameters:
        - top_n: Number of top products to visualize
        """
        plt.figure(figsize=(12, 6))

        # Get product frequencies
        product_freq = self.calculate_product_frequency().head(top_n)

        # Create bar plot
        plt.bar(product_freq['Product Name'], product_freq['Percentage'])
        plt.title('Top Product Frequencies Across Corporation')
        plt.xlabel('Product Name')
        plt.ylabel('Percentage of Total Sales')
        plt.xticks(rotation=45, ha='right')
        plt.tight_layout()
        plt.savefig('product_frequency_distribution.png')
        plt.close()

    # Comprehensive Report
    def generate_comprehensive_report(self):
        """
        Generate a comprehensive market basket analysis report
        """
        # Product Frequency Analysis
        print("--- Product Frequency Analysis ---")
        product_freq = self.calculate_product_frequency()
        print(product_freq.head(10))

        # Visualize product frequencies
        self.visualize_product_frequency()

        # Frequent Itemsets Analysis
        print("\n--- Frequent Product Combinations ---")
        frequent_itemsets = self.find_frequent_itemsets()
        print(frequent_itemsets.head(10))

        # Store-Specific Analysis
        print("\n--- Store-Specific Frequent Itemsets ---")
        store_analyses = self.store_specific_analysis()
        for store, analysis in store_analyses.items():
            print(f"\nStore {store} Top Product Combinations:")
            print(analysis)

    # Markdown Exports
    def export_markdown_report(self, filename='market_basket_analysis_report.md'):
        """
        Export comprehensive analysis to markdown
        """
        with open(filename, 'w') as f:
            # Project Overview
            f.write("# Corporate Market Basket Analysis Report\n\n")

            f.write("## Project Objectives\n")
            f.write("- Uncover hidden purchasing patterns across corporate stores\n")
            f.write("- Identify most frequent product combinations in customer baskets\n")
            f.write("- Provide actionable insights for cross-selling and inventory management\n\n")

            # Product Frequency
            f.write("## Product Frequency Analysis\n")
            product_freq = self.calculate_product_frequency()
            f.write(product_freq.head(10).to_markdown())

            # Frequent Itemsets
            f.write("\n## Frequent Product Combinations\n")
            frequent_itemsets = self.find_frequent_itemsets()
            f.write(frequent_itemsets.head(10).to_markdown())

            # Store-Specific Analysis
            f.write("\n## Store-Specific Insights\n")
            store_analyses = self.store_specific_analysis()
            for store, analysis in store_analyses.items():
                f.write(f"### Store {store} Top Product Combinations\n")
                f.write(analysis.to_markdown())
                f.write("\n")

In [42]:
def main():
    # Initialize and run the market basket analysis
    csv_file = "/content/drive/MyDrive/Colab Notebooks/annual_sales_data.csv"
    analyzer = MarketBasketAnalyzer(csv_file)
    analyzer.generate_comprehensive_report()
    analyzer.export_markdown_report()

if __name__ == "__main__":
    main()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
--- Product Frequency Analysis ---
  Product Name  Total Frequency  Percentage
0   Product_17             4974        5.21
1   Product_10             4831        5.06
2   Product_16             4799        5.02
3    Product_3             4797        5.02
4    Product_9             4793        5.02
5    Product_0             4793        5.02
6   Product_12             4788        5.01
7    Product_4             4788        5.01
8   Product_14             4786        5.01
9    Product_8             4780        5.00

--- Frequent Product Combinations ---
          Product Combination  Frequency   Support
4    (Product_10, Product_17)        931  0.034655
125   (Product_17, Product_4)        916  0.034096
148  (Product_12, Product_14)        909  0.033836
178   (Product_0, Product_17)        905  0.033687
2    (Product_13, Product_17)        901  0.033538
30    (