# Indigo Reviews AI: Visualization Dashboard

This notebook focuses exclusively on creating and displaying visualizations of the app review data. It builds on the preprocessed data from the `data_preprocessing.ipynb` notebook and provides interactive controls for customizing visualizations.

## Key Features
- Loads preprocessed data (doesn't repeat preprocessing)
- Groups related visualizations (sentiment, topics, versions, etc.)
- Offers interactive controls for custom visualizations
- Provides detailed insights based on the visualizations
- Can export visualizations for use in the final dashboard

## Navigation

In [1]:
# Navigation cell
from IPython.display import display, HTML

navbar = HTML("""
<div style="background-color: #444; padding: 10px; margin-bottom: 20px; border-radius: 5px;">
    <strong>Indigo Reviews AI Notebooks:</strong>
    <a href="data_preprocessing.ipynb" target="_blank" style="margin-left: 15px;">Data Preprocessing</a>
    <a href="visualization_dashboard.ipynb" target="_blank" style="margin-left: 15px; font-weight: bold;">Visualizations</a>
    <a href="topic_analysis.ipynb" target="_blank" style="margin-left: 15px;">Topic Analysis</a>
    <a href="metrics_kpi.ipynb" target="_blank" style="margin-left: 15px;">Metrics & KPIs</a>
    <a href="llm_insights.ipynb" target="_blank" style="margin-left: 15px;">LLM Insights</a>
    <a href="dashboard_generator.ipynb" target="_blank" style="margin-left: 15px;">Dashboard Generator</a>
</div>
""")
display(navbar)

## Setup & Configuration

First, let's import the necessary libraries and set up our environment.

In [2]:
# Standard library imports
import os
import sys
import json
import warnings
from datetime import datetime
from typing import Dict, List, Any, Optional, Union

# Data analysis imports
import pandas as pd
import numpy as np

# Visualization imports
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, HTML
import ipywidgets as widgets
from matplotlib.ticker import MaxNLocator
from matplotlib.colors import LinearSegmentedColormap

# Suppress warnings
warnings.filterwarnings('ignore')

# Add project root to path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Configure Matplotlib
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 7)
%matplotlib inline

# Custom visualization theme
sns.set(style="whitegrid")
colors = ['#3498db', '#e74c3c', '#2ecc71', '#f39c12', '#9b59b6', '#1abc9c', '#d35400', '#34495e']
sns.set_palette(sns.color_palette(colors))

## Import Project Modules

Now we'll import our project's modules.

In [3]:
# Try to import project modules with fallbacks
try:
    from src.modules.preprocessing.data_loader import load_reviews, get_data_summary
    print("Successfully imported data loader module")
except ImportError as e:
    print(f"Error importing data loader: {e}")
    print("Visualization may be limited without data loader module")

# Try to import visualization module (if it exists)
try:
    from src.modules.visualization.visualizer import create_rating_distribution, create_sentiment_over_time
    print("Successfully imported visualization module")
    has_viz_module = True
except ImportError as e:
    print(f"Visualization module not found: {e}")
    print("Using local visualization functions")
    has_viz_module = False

try:
    from src.config import config
    print("Successfully imported project config")
except ImportError as e:
    print(f"Error importing config: {e}")
    print("Using default configuration")
    config = {
        "data": {
            "default_source": "csv",
            "csv_path": os.path.join(project_root, 'data', 'enhanced_reviews.csv'),
            "raw_path": os.path.join(project_root, 'data', 'reviews.csv')
        },
        "visualization": {
            "theme": "default",
            "export_path": os.path.join(project_root, 'reports', 'visualizations')
        }
    }

DEBUG: Applying APP_ID from environment: 'in.goindigo.android  # Google Play Store app ID'
Successfully imported data loader module
Successfully imported visualization module
Successfully imported project config


## Load Enhanced Data

We'll load the enhanced data that was processed in the data preprocessing notebook.

In [4]:
def load_enhanced_data():
    """Load enhanced data from the standard location"""
    enhanced_path = os.path.join(project_root, 'data', 'enhanced_reviews.csv')
    
    if os.path.exists(enhanced_path):
        print(f"Loading enhanced data from: {enhanced_path}")
        df = pd.read_csv(enhanced_path)
        
        # Convert date columns to datetime
        date_columns = ['date', 'at', 'timestamp']
        for col in date_columns:
            if col in df.columns:
                df[col] = pd.to_datetime(df[col], errors='coerce')
                print(f"Converted {col} to datetime")
        
        return df
    else:
        print("Enhanced data not found. Checking for processed data...")
        processed_path = os.path.join(project_root, 'data', 'processed_reviews.csv')
        
        if os.path.exists(processed_path):
            print(f"Loading processed data from: {processed_path}")
            df = pd.read_csv(processed_path)
            
            # Convert date columns to datetime
            date_columns = ['date', 'at', 'timestamp']
            for col in date_columns:
                if col in df.columns:
                    df[col] = pd.to_datetime(df[col], errors='coerce')
                    print(f"Converted {col} to datetime")
            
            return df
        else:
            print("Processed data not found. Loading raw data...")
            try:
                df = load_reviews(use_mock_data=False)
                print("Data loaded using load_reviews function.")
                return df
            except Exception as e:
                print(f"Error loading data: {e}")
                print("Loading mock data...")
                return load_reviews(use_mock_data=True)

# Load the data
reviews_df = load_enhanced_data()

# Display basic info
print(f"Loaded {len(reviews_df)} reviews with {len(reviews_df.columns)} columns")
print(f"Columns: {', '.join(reviews_df.columns)}")
display(reviews_df.head(3))

Loading enhanced data from: /Users/dipesh/Local-Projects/indigo-reviews-ai/data/enhanced_reviews.csv
Converted date to datetime
Converted timestamp to datetime
Loaded 4398 reviews with 30 columns
Columns: review_id, author, date, rating, text, version, timestamp, thumbsUpCount, replyContent, repliedAt, cleaned_text, normalized_text, primary_topic, topic_confidence, text_length, sentiment_negative, sentiment_neutral, sentiment_positive, sentiment_compound, sentiment_category, year, month, day, day_of_week, day_name, is_weekend, year_month, quarter, year_quarter, text_length_category


Unnamed: 0,review_id,author,date,rating,text,version,timestamp,thumbsUpCount,replyContent,repliedAt,...,year,month,day,day_of_week,day_name,is_weekend,year_month,quarter,year_quarter,text_length_category
0,54fb749d-9891-431c-aa67-74b62740ccb3,Vinay Thota,NaT,1,This app is too bad. To start with the perform...,7.2.4,NaT,0,,,...,,,,,,,,,,medium
1,c42775b8-1f56-4b4c-b63a-12b6d88e60eb,Kd Koushik,NaT,2,glitches,7.2.4,NaT,0,,,...,,,,,,,,,,very_short
2,59028efa-8e33-4ac2-b28e-2e720da62f46,Vinay Didla,NaT,1,What a waste of time.. most annoying app ever....,,NaT,0,,,...,,,,,,,,,,short


## Define Local Visualization Functions

If we don't have the visualization module, define local visualization functions.

In [5]:
if not has_viz_module:
    def create_rating_distribution(df, title="Rating Distribution", figsize=(12, 6)):
        """Create a bar chart showing the distribution of ratings"""
        if 'rating' not in df.columns:
            print("Rating column not found in data")
            return None
        
        plt.figure(figsize=figsize)
        rating_counts = df['rating'].value_counts().sort_index()
        
        # Calculate percentages
        total = rating_counts.sum()
        percentages = rating_counts / total * 100
        
        # Plot bars
        ax = rating_counts.plot(kind='bar', color='skyblue')
        plt.title(title, fontsize=16)
        plt.xlabel('Rating', fontsize=14)
        plt.ylabel('Count', fontsize=14)
        plt.xticks(fontsize=12)
        plt.yticks(fontsize=12)
        
        # Add labels with counts and percentages
        for i, (count, percentage) in enumerate(zip(rating_counts, percentages)):
            label = f"{count}\n({percentage:.1f}%)"
            ax.text(i, count + (rating_counts.max() * 0.02), 
                    label, 
                    ha='center', va='bottom',
                    fontweight='bold')
        
        plt.tight_layout()
        plt.grid(axis='y', linestyle='--', alpha=0.7)
        return plt.gcf()
    
    def create_sentiment_over_time(df, time_period="month", figsize=(14, 7)):
        """Create a line chart showing sentiment trends over time"""
        # Check required columns
        if 'sentiment_category' not in df.columns or 'date' not in df.columns:
            missing = []
            if 'sentiment_category' not in df.columns:
                missing.append('sentiment_category')
            if 'date' not in df.columns:
                missing.append('date')
            print(f"Required columns not found: {', '.join(missing)}")
            return None
        
        # Ensure date is datetime
        if not pd.api.types.is_datetime64_dtype(df['date']):
            df = df.copy()
            df['date'] = pd.to_datetime(df['date'], errors='coerce')
        
        # Create time period column
        df = df.copy()
        if time_period == "day":
            df['period'] = df['date'].dt.strftime('%Y-%m-%d')
            period_title = "Day"
        elif time_period == "week":
            df['period'] = df['date'].dt.strftime('%Y-%U')
            period_title = "Week"
        elif time_period == "quarter":
            df['period'] = df['date'].dt.year.astype(str) + '-Q' + df['date'].dt.quarter.astype(str)
            period_title = "Quarter"
        else:  # default to month
            df['period'] = df['date'].dt.strftime('%Y-%m')
            period_title = "Month"
        
        # Aggregate by period and sentiment
        sentiment_counts = df.groupby(['period', 'sentiment_category']).size().unstack(fill_value=0)
        
        # Calculate percentages
        totals = sentiment_counts.sum(axis=1)
        sentiment_pct = sentiment_counts.div(totals, axis=0) * 100
        
        # Create plot
        plt.figure(figsize=figsize)
        
        # Plot lines for each sentiment
        sentiment_colors = {'positive': 'green', 'neutral': 'gray', 'negative': 'red'}
        for sentiment in sentiment_pct.columns:
            plt.plot(sentiment_pct.index, sentiment_pct[sentiment], 
                     marker='o', linewidth=3, label=sentiment.capitalize(),
                     color=sentiment_colors.get(sentiment, 'blue'))
        
        # Add chart elements
        plt.title(f'Sentiment Trends by {period_title}', fontsize=16)
        plt.xlabel(period_title, fontsize=14)
        plt.ylabel('Percentage (%)', fontsize=14)
        plt.xticks(rotation=45, ha='right', fontsize=12)
        plt.yticks(fontsize=12)
        plt.legend(fontsize=12)
        plt.grid(linestyle='--', alpha=0.7)
        
        # Add percentage labels at the end of each line
        for sentiment in sentiment_pct.columns:
            last_value = sentiment_pct[sentiment].iloc[-1]
            plt.annotate(f'{last_value:.1f}%', 
                         xy=(sentiment_pct.index[-1], last_value),
                         xytext=(10, 0),
                         textcoords='offset points',
                         fontweight='bold',
                         color=sentiment_colors.get(sentiment, 'blue'))
        
        plt.tight_layout()
        return plt.gcf()
    
    def create_version_comparison(df, metric='rating', figsize=(14, 7)):
        """Create a comparison chart of versions based on specified metric"""
        # Check required columns
        if 'reviewCreatedVersion' not in df.columns or metric not in df.columns:
            missing = []
            if 'reviewCreatedVersion' not in df.columns:
                missing.append('reviewCreatedVersion')
            if metric not in df.columns:
                missing.append(metric)
            print(f"Required columns not found: {', '.join(missing)}")
            return None
        
        # Get top versions by count
        version_counts = df['reviewCreatedVersion'].value_counts()
        top_versions = version_counts[version_counts >= 10].index.tolist()
        
        if not top_versions:
            print("No versions with sufficient data found")
            return None
        
        # Filter to top versions
        version_df = df[df['reviewCreatedVersion'].isin(top_versions)].copy()
        
        # Calculate metric by version
        if metric == 'rating':
            # For rating, calculate mean
            metric_by_version = version_df.groupby('reviewCreatedVersion')['rating'].mean().sort_values()
            metric_label = 'Average Rating'
            color_map = 'RdYlGn'
        elif 'sentiment' in metric:
            # For sentiment metrics, calculate mean
            metric_by_version = version_df.groupby('reviewCreatedVersion')[metric].mean().sort_values()
            metric_label = f'Average {metric.replace("_", " ").title()}'
            color_map = 'coolwarm'
        else:
            # For other metrics, calculate mean
            metric_by_version = version_df.groupby('reviewCreatedVersion')[metric].mean().sort_values()
            metric_label = f'Average {metric}'
            color_map = 'viridis'
        
        # Create horizontal bar chart
        plt.figure(figsize=figsize)
        bars = plt.barh(metric_by_version.index, metric_by_version.values, 
                        color=plt.cm.get_cmap(color_map)(np.linspace(0, 1, len(metric_by_version))))
        
        # Add labels
        for i, (version, value) in enumerate(metric_by_version.items()):
            count = version_counts[version]
            plt.text(value + 0.05, i, f"{value:.2f} (n={count})", va='center', fontweight='bold')
        
        # Add chart elements
        plt.title(f'{metric_label} by App Version', fontsize=16)
        plt.xlabel(metric_label, fontsize=14)
        plt.ylabel('App Version', fontsize=14)
        plt.yticks(fontsize=12)
        plt.xticks(fontsize=12)
        plt.grid(axis='x', linestyle='--', alpha=0.7)
        
        # Add count annotation
        plt.figtext(0.5, 0.01, f"Note: Only showing versions with at least 10 reviews. ({len(top_versions)} of {len(version_counts)} versions)", 
                    ha="center", fontsize=12, style='italic')
        
        plt.tight_layout()
        return plt.gcf()
    
    def create_rating_time_heatmap(df, time_period="month", figsize=(16, 8)):
        """Create a heatmap showing rating distribution over time"""
        # Check required columns
        if 'rating' not in df.columns or 'date' not in df.columns:
            missing = []
            if 'rating' not in df.columns:
                missing.append('rating')
            if 'date' not in df.columns:
                missing.append('date')
            print(f"Required columns not found: {', '.join(missing)}")
            return None
        
        # Ensure date is datetime
        if not pd.api.types.is_datetime64_dtype(df['date']):
            df = df.copy()
            df['date'] = pd.to_datetime(df['date'], errors='coerce')
        
        # Create time period column
        df = df.copy()
        if time_period == "day":
            df['period'] = df['date'].dt.strftime('%Y-%m-%d')
            period_title = "Day"
        elif time_period == "week":
            df['period'] = df['date'].dt.strftime('%Y-%U')
            period_title = "Week"
        elif time_period == "quarter":
            df['period'] = df['date'].dt.year.astype(str) + '-Q' + df['date'].dt.quarter.astype(str)
            period_title = "Quarter"
        else:  # default to month
            df['period'] = df['date'].dt.strftime('%Y-%m')
            period_title = "Month"
        
        # Get valid ratings and periods
        ratings = sorted(df['rating'].unique().tolist())
        periods = sorted(df['period'].unique().tolist())
        
        # Count reviews by period and rating
        period_rating_counts = df.groupby(['period', 'rating']).size().unstack(fill_value=0)
        
        # Calculate percentages per period
        period_totals = period_rating_counts.sum(axis=1)
        period_rating_pct = period_rating_counts.div(period_totals, axis=0) * 100
        
        # Create heatmap
        plt.figure(figsize=figsize)
        ax = sns.heatmap(period_rating_pct.T, annot=True, fmt='.1f', cmap='RdYlGn', 
                         linewidths=0.5, cbar_kws={'label': 'Percentage (%)'}, vmin=0, vmax=100)
        
        # Customize chart
        plt.title(f'Rating Distribution by {period_title}', fontsize=16)
        plt.xlabel(period_title, fontsize=14)
        plt.ylabel('Rating', fontsize=14)
        plt.xticks(rotation=45, ha='right', fontsize=12)
        plt.yticks(fontsize=12)
        
        # Add annotation with sample sizes
        plt.figtext(0.5, 0.01, f"Numbers represent percentage of ratings within each {time_period.lower()}.\nSample sizes: " + 
                    ", ".join([f"{p}: {c}" for p, c in period_totals.items() if c > 0][:5]) +
                    ("..." if len(period_totals) > 5 else ""),
                    ha="center", fontsize=12, style='italic')
        
        plt.tight_layout()
        return plt.gcf()

## Interactive Visualization Dashboard

Let's create an interactive dashboard for visualizing the data.

In [6]:
# Create visualization dashboard with tabs
tab_titles = [
    'Rating Distribution', 
    'Sentiment Trends', 
    'Version Comparison', 
    'Rating Heatmap',
    'Custom'
]

# Create tab widgets
tabs = widgets.Tab()
tab_contents = [widgets.Output() for _ in range(len(tab_titles))]
tabs.children = tab_contents

# Set tab titles
for i, title in enumerate(tab_titles):
    tabs.set_title(i, title)

# Display the tabs
display(tabs)

# Tab 1: Rating Distribution
with tab_contents[0]:
    try:
        # Create control widgets
        title_widget = widgets.Text(
            value='Rating Distribution',
            description='Title:',
            style={'description_width': 'initial'}
        )
        
        width_widget = widgets.IntSlider(
            value=12,
            min=6,
            max=20,
            step=1,
            description='Width:',
            style={'description_width': 'initial'}
        )
        
        height_widget = widgets.IntSlider(
            value=6,
            min=3,
            max=12,
            step=1,
            description='Height:',
            style={'description_width': 'initial'}
        )
        
        # Create update button
        update_button = widgets.Button(
            description='Update Chart',
            button_style='primary',
            icon='refresh'
        )
        
        # Create output area
        output_area = widgets.Output()
        
        # Define update function
        def update_rating_chart(b):
            with output_area:
                output_area.clear_output(wait=True)
                title = title_widget.value
                figsize = (width_widget.value, height_widget.value)
                
                # Create chart
                fig = create_rating_distribution(reviews_df, title=title, figsize=figsize)
                plt.show()
        
        # Connect button to function
        update_button.on_click(update_rating_chart)
        
        # Display controls and output
        controls = widgets.VBox([
            widgets.HBox([title_widget]),
            widgets.HBox([width_widget, height_widget]),
            widgets.HBox([update_button])
        ])
        
        display(controls)
        display(output_area)
        
        # Initial chart
        update_rating_chart(None)
    except Exception as e:
        print(f"Error creating rating distribution chart: {e}")

# Tab 2: Sentiment Trends
with tab_contents[1]:
    try:
        if 'sentiment_category' not in reviews_df.columns:
            print("Sentiment category column not found in data. This visualization requires sentiment analysis from the data preprocessing notebook.")
        else:
            # Create control widgets
            period_widget = widgets.RadioButtons(
                options=['day', 'week', 'month', 'quarter'],
                value='month',
                description='Time Period:',
                style={'description_width': 'initial'}
            )
            
            width_widget = widgets.IntSlider(
                value=14,
                min=8,
                max=20,
                step=1,
                description='Width:',
                style={'description_width': 'initial'}
            )
            
            height_widget = widgets.IntSlider(
                value=7,
                min=4,
                max=12,
                step=1,
                description='Height:',
                style={'description_width': 'initial'}
            )
            
            # Create update button
            update_button = widgets.Button(
                description='Update Chart',
                button_style='primary',
                icon='refresh'
            )
            
            # Create output area
            output_area = widgets.Output()
            
            # Define update function
            def update_sentiment_chart(b):
                with output_area:
                    output_area.clear_output(wait=True)
                    period = period_widget.value
                    figsize = (width_widget.value, height_widget.value)
                    
                    # Create chart
                    fig = create_sentiment_over_time(reviews_df, time_period=period, figsize=figsize)
                    plt.show()
            
            # Connect button to function
            update_button.on_click(update_sentiment_chart)
            
            # Display controls and output
            controls = widgets.VBox([
                widgets.HBox([period_widget]),
                widgets.HBox([width_widget, height_widget]),
                widgets.HBox([update_button])
            ])
            
            display(controls)
            display(output_area)
            
            # Initial chart
            update_sentiment_chart(None)
    except Exception as e:
        print(f"Error creating sentiment trends chart: {e}")

# Tab 3: Version Comparison
with tab_contents[2]:
    try:
        if 'reviewCreatedVersion' not in reviews_df.columns:
            print("Version column not found in data. This visualization requires app version information.")
        else:
            # Create control widgets
            metric_options = []
            if 'rating' in reviews_df.columns:
                metric_options.append(('Average Rating', 'rating'))
            
            sentiment_cols = [col for col in reviews_df.columns if 'sentiment_' in col and reviews_df[col].dtype != 'object']
            for col in sentiment_cols:
                metric_options.append((f"Average {col.replace('sentiment_', '').title()}", col))
            
            if not metric_options:
                print("No suitable metrics found for version comparison.")
            else:
                metric_widget = widgets.Dropdown(
                    options=metric_options,
                    value=metric_options[0][1],
                    description='Metric:',
                    style={'description_width': 'initial'}
                )
                
                width_widget = widgets.IntSlider(
                    value=14,
                    min=8,
                    max=20,
                    step=1,
                    description='Width:',
                    style={'description_width': 'initial'}
                )
                
                height_widget = widgets.IntSlider(
                    value=7,
                    min=4,
                    max=15,
                    step=1,
                    description='Height:',
                    style={'description_width': 'initial'}
                )
                
                # Create update button
                update_button = widgets.Button(
                    description='Update Chart',
                    button_style='primary',
                    icon='refresh'
                )
                
                # Create output area
                output_area = widgets.Output()
                
                # Define update function
                def update_version_chart(b):
                    with output_area:
                        output_area.clear_output(wait=True)
                        metric = metric_widget.value
                        figsize = (width_widget.value, height_widget.value)
                        
                        # Create chart
                        fig = create_version_comparison(reviews_df, metric=metric, figsize=figsize)
                        plt.show()
                
                # Connect button to function
                update_button.on_click(update_version_chart)
                
                # Display controls and output
                controls = widgets.VBox([
                    widgets.HBox([metric_widget]),
                    widgets.HBox([width_widget, height_widget]),
                    widgets.HBox([update_button])
                ])
                
                display(controls)
                display(output_area)
                
                # Initial chart
                update_version_chart(None)
    except Exception as e:
        print(f"Error creating version comparison chart: {e}")

# Tab 4: Rating Heatmap
with tab_contents[3]:
    try:
        if 'rating' not in reviews_df.columns or 'date' not in reviews_df.columns:
            missing = []
            if 'rating' not in reviews_df.columns:
                missing.append('rating')
            if 'date' not in reviews_df.columns:
                missing.append('date')
            print(f"Required columns not found: {', '.join(missing)}. This visualization requires rating and date information.")
        else:
            # Create control widgets
            period_widget = widgets.RadioButtons(
                options=['month', 'quarter'],
                value='month',
                description='Time Period:',
                style={'description_width': 'initial'}
            )
            
            width_widget = widgets.IntSlider(
                value=16,
                min=8,
                max=20,
                step=1,
                description='Width:',
                style={'description_width': 'initial'}
            )
            
            height_widget = widgets.IntSlider(
                value=8,
                min=4,
                max=12,
                step=1,
                description='Height:',
                style={'description_width': 'initial'}
            )
            
            # Create update button
            update_button = widgets.Button(
                description='Update Chart',
                button_style='primary',
                icon='refresh'
            )
            
            # Create output area
            output_area = widgets.Output()
            
            # Define update function
            def update_heatmap_chart(b):
                with output_area:
                    output_area.clear_output(wait=True)
                    period = period_widget.value
                    figsize = (width_widget.value, height_widget.value)
                    
                    # Create chart
                    fig = create_rating_time_heatmap(reviews_df, time_period=period, figsize=figsize)
                    plt.show()
            
            # Connect button to function
            update_button.on_click(update_heatmap_chart)
            
            # Display controls and output
            controls = widgets.VBox([
                widgets.HBox([period_widget]),
                widgets.HBox([width_widget, height_widget]),
                widgets.HBox([update_button])
            ])
            
            display(controls)
            display(output_area)
            
            # Initial chart
            update_heatmap_chart(None)
    except Exception as e:
        print(f"Error creating rating heatmap: {e}")

# Tab 5: Custom Visualization
with tab_contents[4]:
    print("Custom visualization tab - to be implemented")
    print("\nThis tab will allow for creating custom visualizations with:")
    print("- User-selected X and Y variables")
    print("- Choice of chart type (bar, line, scatter, etc.)")
    print("- Customization of colors, titles, and other visual elements")
    print("- Ability to save custom visualizations for later use")

Tab(children=(Output(), Output(), Output(), Output(), Output()), selected_index=0, titles=('Rating Distributio…

## Save Visualizations for Dashboard

Let's create a function to save visualizations for use in the final dashboard.

In [7]:
def save_visualization(fig, name, description=None, format='png', dpi=300):
    """Save a visualization to disk for use in the dashboard"""
    if fig is None:
        print("Error: No figure provided")
        return None
    
    # Create output directory if it doesn't exist
    viz_dir = os.path.join(project_root, 'reports', 'visualizations')
    os.makedirs(viz_dir, exist_ok=True)
    
    # Create filepath
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{name}_{timestamp}.{format}"
    filepath = os.path.join(viz_dir, filename)
    
    # Save the figure
    try:
        fig.savefig(filepath, format=format, dpi=dpi, bbox_inches='tight')
        print(f"Saved visualization to: {filepath}")
        
        # Create metadata file
        metadata = {
            "name": name,
            "description": description or f"Visualization of {name}",
            "created": timestamp,
            "format": format,
            "file": filename,
            "path": filepath
        }
        
        # Save metadata
        meta_filepath = os.path.join(viz_dir, f"{name}_{timestamp}.json")
        with open(meta_filepath, 'w') as f:
            json.dump(metadata, f, indent=2)
        
        print(f"Saved metadata to: {meta_filepath}")
        return filepath
    except Exception as e:
        print(f"Error saving visualization: {e}")
        return None

# Create a widget to save visualizations
save_output = widgets.Output()
display(save_output)

with save_output:
    # Create widgets
    viz_type_widget = widgets.Dropdown(
        options=[
            ('Rating Distribution', 'rating_dist'),
            ('Sentiment Trends', 'sentiment_trends'),
            ('Version Comparison', 'version_comp'),
            ('Rating Heatmap', 'rating_heatmap')
        ],
        value='rating_dist',
        description='Visualization:',
        style={'description_width': 'initial'}
    )
    
    name_widget = widgets.Text(
        value='',
        placeholder='visualization_name',
        description='Name:',
        style={'description_width': 'initial'}
    )
    
    desc_widget = widgets.Textarea(
        value='',
        placeholder='Description of the visualization',
        description='Description:',
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='50%', height='80px')
    )
    
    format_widget = widgets.Dropdown(
        options=['png', 'jpg', 'svg', 'pdf'],
        value='png',
        description='Format:',
        style={'description_width': 'initial'}
    )
    
    dpi_widget = widgets.IntSlider(
        value=300,
        min=72,
        max=600,
        step=1,
        description='DPI:',
        style={'description_width': 'initial'}
    )
    
    # Save button
    save_button = widgets.Button(
        description='Generate & Save',
        button_style='success',
        icon='save'
    )
    
    # Output area
    save_result = widgets.Output()
    
    # Save function
    def on_save_clicked(b):
        with save_result:
            save_result.clear_output()
            
            # Get parameters
            viz_type = viz_type_widget.value
            name = name_widget.value
            if not name:
                name = viz_type
            description = desc_widget.value
            format = format_widget.value
            dpi = dpi_widget.value
            
            print(f"Generating {viz_type} visualization...")
            
            # Create the visualization
            fig = None
            if viz_type == 'rating_dist':
                fig = create_rating_distribution(reviews_df, title='Rating Distribution')
            elif viz_type == 'sentiment_trends':
                if 'sentiment_category' in reviews_df.columns:
                    fig = create_sentiment_over_time(reviews_df)
                else:
                    print("Error: Sentiment category column not found in data")
            elif viz_type == 'version_comp':
                if 'reviewCreatedVersion' in reviews_df.columns and 'rating' in reviews_df.columns:
                    fig = create_version_comparison(reviews_df)
                else:
                    print("Error: Required columns not found in data")
            elif viz_type == 'rating_heatmap':
                if 'rating' in reviews_df.columns and 'date' in reviews_df.columns:
                    fig = create_rating_time_heatmap(reviews_df)
                else:
                    print("Error: Required columns not found in data")
            
            # Save the visualization
            if fig is not None:
                # Display the visualization
                plt.figure(figsize=(12, 7))
                plt.show()
                
                # Save it
                filepath = save_visualization(fig, name, description, format, dpi)
                if filepath:
                    print(f"Visualization saved successfully")
                    # Display the saved image
                    if format in ['png', 'jpg']:
                        display(HTML(f"<img src='{filepath}' width='600'>"))
                else:
                    print("Error saving visualization")
            else:
                print("Error creating visualization")
    
    # Connect button to function
    save_button.on_click(on_save_clicked)
    
    # Display widgets
    print("# Save Visualizations for Dashboard")
    print("Select a visualization type, provide a name and description, then click 'Generate & Save'")
    display(widgets.VBox([
        widgets.HBox([viz_type_widget, name_widget]),
        desc_widget,
        widgets.HBox([format_widget, dpi_widget]),
        save_button,
        save_result
    ]))

Output()

## Export All Key Visualizations

Let's generate and export all key visualizations for use in the dashboard.

In [None]:
def export_all_visualizations():
    """Generate and export all key visualizations"""
    # Create output directory
    viz_dir = os.path.join(project_root, 'reports', 'visualizations')
    os.makedirs(viz_dir, exist_ok=True)
    
    # Current timestamp for filenames
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Dictionary to store export results
    results = {}
    
    # 1. Rating Distribution
    print("Generating Rating Distribution...")
    try:
        if 'rating' in reviews_df.columns:
            fig = create_rating_distribution(reviews_df, title='Rating Distribution')
            filepath = os.path.join(viz_dir, f"rating_distribution_{timestamp}.png")
            fig.savefig(filepath, dpi=300, bbox_inches='tight')
            results['rating_distribution'] = filepath
            print(f"Saved to: {filepath}")
            plt.close(fig)
        else:
            print("Skipped: 'rating' column not found")
    except Exception as e:
        print(f"Error: {e}")
    
    # 2. Sentiment Trends
    print("\nGenerating Sentiment Trends...")
    try:
        if 'sentiment_category' in reviews_df.columns and 'date' in reviews_df.columns:
            fig = create_sentiment_over_time(reviews_df, time_period='month')
            filepath = os.path.join(viz_dir, f"sentiment_trends_{timestamp}.png")
            fig.savefig(filepath, dpi=300, bbox_inches='tight')
            results['sentiment_trends'] = filepath
            print(f"Saved to: {filepath}")
            plt.close(fig)
        else:
            print("Skipped: Required columns not found")
    except Exception as e:
        print(f"Error: {e}")
    
    # 3. Version Comparison
    print("\nGenerating Version Comparison...")
    try:
        if 'reviewCreatedVersion' in reviews_df.columns and 'rating' in reviews_df.columns:
            fig = create_version_comparison(reviews_df, metric='rating')
            filepath = os.path.join(viz_dir, f"version_comparison_{timestamp}.png")
            fig.savefig(filepath, dpi=300, bbox_inches='tight')
            results['version_comparison'] = filepath
            print(f"Saved to: {filepath}")
            plt.close(fig)
        else:
            print("Skipped: Required columns not found")
    except Exception as e:
        print(f"Error: {e}")
    
    # 4. Rating Heatmap
    print("\nGenerating Rating Heatmap...")
    try:
        if 'rating' in reviews_df.columns and 'date' in reviews_df.columns:
            fig = create_rating_time_heatmap(reviews_df, time_period='month')
            filepath = os.path.join(viz_dir, f"rating_heatmap_{timestamp}.png")
            fig.savefig(filepath, dpi=300, bbox_inches='tight')
            results['rating_heatmap'] = filepath
            print(f"Saved to: {filepath}")
            plt.close(fig)
        else:
            print("Skipped: Required columns not found")
    except Exception as e:
        print(f"Error: {e}")
    
    # Save export manifest
    manifest = {
        "timestamp": timestamp,
        "visualizations": results,
        "total": len(results)
    }
    
    manifest_path = os.path.join(viz_dir, f"visualization_manifest_{timestamp}.json")
    with open(manifest_path, 'w') as f:
        json.dump(manifest, f, indent=2)
    
    print(f"\nExport complete. {len(results)} visualizations saved.")
    print(f"Manifest saved to: {manifest_path}")
    
    return manifest

# Add export button
export_button = widgets.Button(
    description='Export All Visualizations',
    button_style='danger',
    icon='download',
    layout=widgets.Layout(width='250px', height='40px')
)

export_output = widgets.Output()

def on_export_clicked(b):
    with export_output:
        export_output.clear_output()
        manifest = export_all_visualizations()
        
        # Display count and thumbnails
        print(f"\nGenerated {manifest['total']} visualizations")
        if manifest['total'] > 0:
            # Create small thumbnails in a grid
            html = "<div style='display: flex; flex-wrap: wrap;'>"
            for viz_name, filepath in manifest['visualizations'].items():
                html += f"""
                <div style='margin: 10px; text-align: center;'>
                    <img src='{filepath}' width='300' style='border: 1px solid #ddd;'>
                    <div>{viz_name}</div>
                </div>
                """
            html += "</div>"
            display(HTML(html))

export_button.on_click(on_export_clicked)

display(widgets.VBox([
    widgets.HBox([export_button]),
    export_output
]))

## Conclusion

This notebook provides a dedicated environment for creating and customizing visualizations of the app review data. By separating the visualization logic from the data preprocessing and analysis, we've improved the organization and maintainability of the project.

Key accomplishments in this notebook:

1. Created a set of standard visualizations for review data:
   - Rating distribution
   - Sentiment trends over time
   - Version comparison
   - Rating distribution heatmap

2. Provided interactive controls for customizing visualizations

3. Implemented functionality to save and export visualizations for the dashboard

4. Created a batch export function for generating all key visualizations at once

Next steps in the modularization process:

1. Extract common visualization functions to the `src/modules/visualization/visualizer.py` module
2. Implement the Topic Analysis notebook for deeper text analysis
3. Create the Metrics & KPI notebook for quantitative analysis
4. Develop the Dashboard Generator notebook to combine all insights into a cohesive presentation

These changes will continue to improve the project structure and make it more maintainable and collaborative.