# Streamlit Dashboard for Humor Detection EDA

## Overview
This notebook creates an interactive Streamlit dashboard for exploring the humor detection dataset. It connects with the main analysis notebook to leverage all the processed data and visualizations.

## Dashboard Features
1. **Dataset Overview**: Basic statistics and data quality metrics
2. **Demographic Analysis**: Interactive exploration of participant demographics
3. **Joke Performance Analysis**: Individual joke performance and response patterns
4. **Cultural Insights**: Cross-cultural humor patterns and comprehension analysis
5. **Advanced Visualizations**: Heatmaps and correlation analysis

---

In [1]:
# STREAMLIT DASHBOARD CREATION
# This cell creates the complete Streamlit dashboard file

dashboard_code = '''
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Page configuration
st.set_page_config(
    page_title="Humor Detection EDA Dashboard",
    page_icon=":bar_chart:",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Custom CSS for better styling
st.markdown("""
<style>
.main-header {
    font-size: 2.5rem;
    color: #1f77b4;
    text-align: center;
    margin-bottom: 2rem;
}
.metric-card {
    background-color: #f0f2f6;
    padding: 1rem;
    border-radius: 0.5rem;
    margin: 0.5rem 0;
}
.insight-box {
    padding: 1rem;
    border-left: 4px solid #1f77b4;
    margin: 1rem 0;
}
</style>
""", unsafe_allow_html=True)

@st.cache_data
def load_data():
    """Load and process the humor detection dataset"""
    try:
        # Load the main dataset
        df = pd.read_csv('Dataset.csv')
        
        # Data processing functions (copied from main notebook)
        def clean_joke_text(joke_text):
            import re
            pattern = r'^s\\d+:\\s*'
            cleaned_text = re.sub(pattern, '', joke_text, flags=re.IGNORECASE)
            return cleaned_text.strip()
        
        def standardize_country(country):
            if pd.isna(country) or country in ['Unknown', '']:
                return 'Unknown'
            
            country_str = str(country).strip().lower()
            
            if any(term in country_str for term in ['uk', 'united kingdom', 'england', 'britain', 'british']):
                return 'UK'
            
            country_mapping = {
                'usa': 'United States', 'us': 'United States', 'america': 'United States',
                'australia': 'Australia', 'canada': 'Canada', 'india': 'India'
            }
            
            return country_mapping.get(country_str, country.strip())
        
        def standardize_ethnicity(ethnicity):
            if pd.isna(ethnicity) or ethnicity in ['Unknown', '']:
                return 'Prefer not to say'
            
            ethnicity_str = str(ethnicity).strip().lower()
            
            if any(term in ethnicity_str for term in ['indian', 'pakistani', 'bangladeshi', 'south asian']):
                return 'South Asian'
            elif any(term in ethnicity_str for term in ['white', 'caucasian', 'european', 'british']):
                return 'White/Caucasian'
            elif any(term in ethnicity_str for term in ['black', 'african', 'caribbean']):
                return 'Black/African/Caribbean'
            elif any(term in ethnicity_str for term in ['hispanic', 'latino', 'mexican']):
                return 'Hispanic/Latino'
            elif any(term in ethnicity_str for term in ['chinese', 'japanese', 'korean', 'east asian']):
                return 'East Asian'
            elif any(term in ethnicity_str for term in ['arab', 'persian', 'middle eastern']):
                return 'Middle Eastern/North African'
            else:
                return 'Other'
        
        # Process the data
        clean_df = df.copy()
        
        # Find demographic columns
        age_col = next((col for col in df.columns if 'Your Age' in col), None)
        gender_col = next((col for col in df.columns if 'gender' in col.lower()), None)
        ethnicity_col = next((col for col in df.columns if 'ethnic' in col.lower()), None)
        
        # Find humor columns
        humor_cols = [col for col in df.columns if col.startswith('s') and ':' in col]
        
        # Create processed dataset
        final_data = []
        for idx, row in clean_df.iterrows():
            participant_id = idx
            age = row[age_col] if age_col else 'Unknown'
            gender = row[gender_col] if gender_col else 'Unknown'
            
            ethnicity_raw = row[ethnicity_col] if ethnicity_col else 'Unknown'
            ethnicity = standardize_ethnicity(ethnicity_raw)
            
            country_residence = standardize_country(row.get('Country of Residence:', 'Unknown'))
            country_birth = standardize_country(row.get('Country of Birth:', 'Unknown'))
            
            for i, col in enumerate(humor_cols, 1):
                response = row[col]
                understand_responses = ["I didn't understand", "I didn't understand the statement"]
                if pd.notna(response) and response in ['Yes', 'No'] + understand_responses:
                    clean_response = "I didn't understand" if response in understand_responses else response
                    
                    final_data.append({
                        'participant_id': participant_id,
                        'age': age,
                        'gender': gender,
                        'ethnicity': ethnicity,
                        'country_residence': country_residence,
                        'country_birth': country_birth,
                        'joke_id': f's{i}',
                        'joke_text': clean_joke_text(col),
                        'response': clean_response
                    })
        
        final_df = pd.DataFrame(final_data)
        return df, final_df, humor_cols
        
    except Exception as e:
        st.error(f"Error loading data: {str(e)}")
        return None, None, None

def main():
    # Main header
    st.markdown('<h1 class="main-header">Humor Detection EDA Dashboard</h1>', unsafe_allow_html=True)
    
    # Load data
    with st.spinner('Loading data...'):
        raw_df, processed_df, humor_cols = load_data()
    
    if processed_df is None:
        st.error("Failed to load data. Please ensure Dataset.csv is in the same directory.")
        return
    
    # Sidebar navigation
    st.sidebar.title("Navigation")
    page = st.sidebar.radio(
        "Choose a section:",
        ["Dataset Overview", "Demographics", "Joke Performance", 
         "Cultural Analysis", "Advanced Insights"]
    )
    
    # Dataset Overview Page
    if page == "Dataset Overview":
        st.header("Dataset Overview")
        
        # Key metrics
        col1, col2, col3, col4 = st.columns(4)
        
        with col1:
            st.metric("Total Participants", processed_df['participant_id'].nunique())
        with col2:
            st.metric("Total Jokes", processed_df['joke_id'].nunique())
        with col3:
            st.metric("Total Responses", len(processed_df))
        with col4:
            st.metric("Response Rate", f"{(len(processed_df) / (processed_df['participant_id'].nunique() * processed_df['joke_id'].nunique()) * 100):.1f}%")
        
        # Response distribution
        st.subheader("Response Distribution")
        response_counts = processed_df['response'].value_counts()
        
        col1, col2 = st.columns(2)
        
        with col1:
            fig_pie = px.pie(values=response_counts.values, names=response_counts.index,
                           title="Overall Response Distribution")
            st.plotly_chart(fig_pie, use_container_width=True)
        
        with col2:
            fig_bar = px.bar(x=response_counts.index, y=response_counts.values,
                           title="Response Counts", labels={'x': 'Response', 'y': 'Count'})
            st.plotly_chart(fig_bar, use_container_width=True)
        
        # Data quality insights
        st.markdown('<div class="insight-box"><h4>Key Findings from Response Analysis</h4>', unsafe_allow_html=True)
        funny_rate = response_counts['Yes'] / len(processed_df) * 100
        not_funny_rate = response_counts['No'] / len(processed_df) * 100
        understand_key = "I didn't understand"
        comprehension_rate = response_counts.get(understand_key, 0) / len(processed_df) * 100
        
        st.write(f"• **Humor Appreciation**: {funny_rate:.1f}% of responses found jokes funny, indicating moderate engagement with humor content")
        st.write(f"• **Clear Rejection**: {not_funny_rate:.1f}% explicitly did not find jokes funny, suggesting diverse humor preferences")
        st.write(f"• **Comprehension Barriers**: {comprehension_rate:.1f}% had comprehension issues, highlighting potential cultural or linguistic barriers")
        
        if funny_rate > 50:
            st.write("• **Overall Assessment**: Majority positive response suggests effective humor selection for this demographic")
        elif funny_rate > 30:
            st.write("• **Overall Assessment**: Mixed responses indicate diverse humor preferences across participants")
        else:
            st.write("• **Overall Assessment**: Lower appreciation rates may indicate cultural specificity or comprehension challenges")
        st.markdown('</div>', unsafe_allow_html=True)
    
    # Demographics Page
    elif page == "Demographics":
        st.header("Demographic Analysis")
        
        # Get unique participants
        participants = processed_df.drop_duplicates('participant_id')
        
        # Age distribution - filter out non-numeric ages
        participants_with_age = participants[participants['age'] != 'Unknown'].copy()
        participants_with_age['age_numeric'] = pd.to_numeric(participants_with_age['age'], errors='coerce')
        participants_with_age = participants_with_age.dropna(subset=['age_numeric'])
        
        if len(participants_with_age) > 0:
            st.subheader("Age Distribution")
            fig_age = px.histogram(participants_with_age, x='age_numeric', nbins=15, 
                                 title="Age Distribution of Participants",
                                 labels={'age_numeric': 'Age', 'count': 'Number of Participants'})
            st.plotly_chart(fig_age, use_container_width=True)
        else:
            st.subheader("Age Distribution")
            st.write("No valid age data available for visualization.")
        
        # Ethnicity and Gender
        col1, col2 = st.columns(2)
        
        with col1:
            st.subheader("Ethnicity Distribution")
            ethnicity_counts = participants['ethnicity'].value_counts()
            fig_eth = px.bar(x=ethnicity_counts.values, y=ethnicity_counts.index, orientation='h',
                           title="Participants by Ethnicity")
            fig_eth.update_layout(height=400)
            st.plotly_chart(fig_eth, use_container_width=True)
        
        with col2:
            st.subheader("Gender Distribution")
            gender_counts = participants['gender'].value_counts()
            fig_gender = px.pie(values=gender_counts.values, names=gender_counts.index,
                              title="Gender Distribution")
            st.plotly_chart(fig_gender, use_container_width=True)
        
        # Geographic analysis
        st.subheader("Geographic Distribution")
        col1, col2 = st.columns(2)
        
        with col1:
            residence_counts = participants['country_residence'].value_counts().head(10)
            fig_res = px.bar(x=residence_counts.values, y=residence_counts.index, orientation='h',
                           title="Top 10 Countries of Residence")
            st.plotly_chart(fig_res, use_container_width=True)
        
        with col2:
            birth_counts = participants['country_birth'].value_counts().head(10)
            fig_birth = px.bar(x=birth_counts.values, y=birth_counts.index, orientation='h',
                             title="Top 10 Countries of Birth")
            st.plotly_chart(fig_birth, use_container_width=True)
        
        # Demographic insights
        st.markdown('<div class="insight-box"><h4>Demographic Profile Insights</h4>', unsafe_allow_html=True)
        total_participants = len(participants)
        age_range = f"{participants_with_age['age_numeric'].min():.0f}-{participants_with_age['age_numeric'].max():.0f}" if len(participants_with_age) > 0 else "Unknown"
        dominant_ethnicity = participants['ethnicity'].value_counts().index[0]
        dominant_gender = participants['gender'].value_counts().index[0]
        
        st.write(f"• **Sample Size**: {total_participants} participants provide robust data for humor analysis")
        st.write(f"• **Age Demographics**: Age range of {age_range} years suggests focus on young adult perspectives")
        st.write(f"• **Cultural Diversity**: {participants['ethnicity'].nunique()} ethnic groups represented, with {dominant_ethnicity} being most prevalent")
        st.write(f"• **Gender Balance**: {dominant_gender} participants comprise {participants['gender'].value_counts().iloc[0]/total_participants*100:.1f}% of sample")
        
        # Geographic mobility analysis
        migrants = (participants['country_birth'] != participants['country_residence']).sum()
        mobility_rate = migrants / total_participants * 100
        st.write(f"• **Geographic Mobility**: {mobility_rate:.1f}% of participants are migrants, offering cross-cultural humor perspectives")
        st.markdown('</div>', unsafe_allow_html=True)
    
    # Joke Performance Page
    elif page == "Joke Performance":
        st.header("Joke Performance Analysis")
        
        # Individual joke performance
        joke_stats = processed_df.groupby('joke_id')['response'].value_counts().unstack(fill_value=0)
        joke_stats['total'] = joke_stats.sum(axis=1)
        joke_stats['funny_rate'] = (joke_stats['Yes'] / joke_stats['total'] * 100).round(1)
        
        # Handle the "I didn't understand" response separately to avoid f-string backslash issue
        understand_key = "I didn't understand"
        joke_stats['comprehension_issues'] = (joke_stats.get(understand_key, 0) / joke_stats['total'] * 100).round(1)
        
        # Top and bottom performers
        col1, col2 = st.columns(2)
        
        with col1:
            st.subheader("Top 10 Funniest Jokes")
            top_jokes = joke_stats.nlargest(10, 'funny_rate')[['funny_rate', 'total']]
            fig_top = px.bar(x=top_jokes.index, y=top_jokes['funny_rate'],
                           title="Highest Rated Jokes (%)", labels={'y': 'Funny Rate (%)', 'x': 'Joke ID'})
            st.plotly_chart(fig_top, use_container_width=True)
        
        with col2:
            st.subheader("Comprehension Challenges")
            comp_issues = joke_stats.nlargest(10, 'comprehension_issues')[['comprehension_issues', 'total']]
            fig_comp = px.bar(x=comp_issues.index, y=comp_issues['comprehension_issues'],
                            title="Jokes with Comprehension Issues (%)", 
                            labels={'y': 'Comprehension Issues (%)', 'x': 'Joke ID'})
            st.plotly_chart(fig_comp, use_container_width=True)
        
        # Interactive joke selector
        st.subheader("Detailed Joke Analysis")
        selected_joke = st.selectbox("Select a joke to analyze:", joke_stats.index.tolist())
        
        if selected_joke:
            joke_data = processed_df[processed_df['joke_id'] == selected_joke]
            joke_text = joke_data['joke_text'].iloc[0]
            
            st.markdown(f"**Joke Text:** {joke_text}")
            
            col1, col2, col3 = st.columns(3)
            with col1:
                st.metric("Funny Rate", f"{joke_stats.loc[selected_joke, 'funny_rate']:.1f}%")
            with col2:
                st.metric("Total Responses", joke_stats.loc[selected_joke, 'total'])
            with col3:
                st.metric("Comprehension Issues", f"{joke_stats.loc[selected_joke, 'comprehension_issues']:.1f}%")
            
            # Response breakdown by demographics
            st.subheader(f"Response Breakdown for {selected_joke}")
            
            col1, col2 = st.columns(2)
            
            with col1:
                eth_responses = joke_data.groupby(['ethnicity', 'response']).size().unstack(fill_value=0)
                if not eth_responses.empty:
                    fig_eth_resp = px.bar(eth_responses, title=f"Responses by Ethnicity - {selected_joke}")
                    st.plotly_chart(fig_eth_resp, use_container_width=True)
            
            with col2:
                gender_responses = joke_data.groupby(['gender', 'response']).size().unstack(fill_value=0)
                if not gender_responses.empty:
                    fig_gender_resp = px.bar(gender_responses, title=f"Responses by Gender - {selected_joke}")
                    st.plotly_chart(fig_gender_resp, use_container_width=True)
        
        # Performance insights
        st.markdown('<div class="insight-box"><h4>Joke Performance Analysis Insights</h4>', unsafe_allow_html=True)
        top_performer = joke_stats.loc[joke_stats['funny_rate'].idxmax()]
        worst_performer = joke_stats.loc[joke_stats['funny_rate'].idxmin()]
        avg_funny_rate = joke_stats['funny_rate'].mean()
        most_confusing = joke_stats.loc[joke_stats['comprehension_issues'].idxmax()]
        
        st.write(f"• **Top Performer**: {top_performer.name} achieved {top_performer['funny_rate']:.1f}% funny rating, demonstrating broad appeal")
        st.write(f"• **Performance Range**: Funny rates vary from {worst_performer['funny_rate']:.1f}% to {top_performer['funny_rate']:.1f}%, indicating diverse joke effectiveness")
        st.write(f"• **Average Appeal**: Mean funny rate of {avg_funny_rate:.1f}% suggests moderate overall humor success")
        st.write(f"• **Comprehension Challenge**: {most_confusing.name} had {most_confusing['comprehension_issues']:.1f}% comprehension issues, possibly due to cultural references or complexity")
        
        high_performers = (joke_stats['funny_rate'] > avg_funny_rate + 10).sum()
        st.write(f"• **Standout Content**: {high_performers} jokes significantly exceeded average performance, indicating successful humor elements")
        st.markdown('</div>', unsafe_allow_html=True)
    
    # Cultural Analysis Page
    elif page == "Cultural Analysis":
        st.header("Cross-Cultural Humor Analysis")
        
        # Age group analysis
        participants = processed_df.drop_duplicates('participant_id')
        
        # Filter out non-numeric ages and convert to numeric
        participants_numeric_age = participants[participants['age'] != 'Unknown'].copy()
        participants_numeric_age['age'] = pd.to_numeric(participants_numeric_age['age'], errors='coerce')
        participants_numeric_age = participants_numeric_age.dropna(subset=['age'])
        
        if len(participants_numeric_age) > 0:
            participants_numeric_age['age_group'] = pd.cut(participants_numeric_age['age'], 
                                             bins=[0, 25, 35, 45, 100], 
                                             labels=['18-25', '26-35', '36-45', '46+'])
            
            # Merge age groups back
            df_with_age_groups = processed_df.merge(
                participants_numeric_age[['participant_id', 'age_group']], on='participant_id', how='left'
            )
        else:
            # If no valid ages, create empty age_group column
            df_with_age_groups = processed_df.copy()
            df_with_age_groups['age_group'] = None
        
        # Cultural patterns
        st.subheader("Humor Preferences by Demographics")
        
        demo_choice = st.selectbox(
            "Select demographic dimension:",
            ["Age Group", "Ethnicity", "Gender", "Geographic Mobility"]
        )
        
        if demo_choice == "Age Group":
            demo_col = 'age_group'
            df_demo = df_with_age_groups
        elif demo_choice == "Geographic Mobility":
            # Add mobility indicator
            df_demo = processed_df.copy()
            df_demo['mobility'] = df_demo['country_birth'] != df_demo['country_residence']
            df_demo['mobility'] = df_demo['mobility'].map({True: 'Migrant', False: 'Resident'})
            demo_col = 'mobility'
        elif demo_choice == "Ethnicity":
            demo_col = 'ethnicity'
            df_demo = processed_df
        elif demo_choice == "Gender":
            demo_col = 'gender'
            df_demo = processed_df
        else:
            demo_col = demo_choice.lower()
            df_demo = processed_df
        
        # Calculate percentages
        demo_analysis = df_demo.groupby([demo_col, 'response']).size().unstack(fill_value=0)
        demo_percentages = demo_analysis.div(demo_analysis.sum(axis=1), axis=0) * 100
        
        # Visualization
        fig_demo = px.bar(demo_percentages, title=f"Humor Preferences by {demo_choice} (%)")
        fig_demo.update_layout(height=500)
        st.plotly_chart(fig_demo, use_container_width=True)
        
        # Insights
        if 'Yes' in demo_percentages.columns:
            most_appreciative = demo_percentages['Yes'].idxmax()
            least_appreciative = demo_percentages['Yes'].idxmin()
            variance = demo_percentages['Yes'].std()
            
            st.markdown('<div class="insight-box">', unsafe_allow_html=True)
            st.write(f"**Cross-Cultural Humor Analysis - {demo_choice}:**")
            st.write(f"• **Highest Appreciation**: {most_appreciative} group shows {demo_percentages.loc[most_appreciative, 'Yes']:.1f}% humor appreciation")
            st.write(f"• **Lowest Appreciation**: {least_appreciative} group shows {demo_percentages.loc[least_appreciative, 'Yes']:.1f}% humor appreciation")
            st.write(f"• **Cultural Variance**: {variance:.1f}% standard deviation indicates {'high' if variance > 15 else 'moderate' if variance > 8 else 'low'} cultural specificity in humor preferences")
            
            if demo_choice == "Age Group":
                st.write("• **Age Factor**: Generational differences in humor appreciation may reflect varying cultural exposures and communication styles")
            elif demo_choice == "Ethnicity":
                st.write("• **Cultural Impact**: Ethnic variation in humor appreciation highlights the role of cultural background in comedy comprehension")
            elif demo_choice == "Geographic Mobility":
                st.write("• **Migration Effect**: Differences between migrants and residents suggest cultural adaptation influences humor perception")
            st.markdown('</div>', unsafe_allow_html=True)
        
        # Comprehension analysis
        st.subheader("Comprehension Challenges by Culture")
        
        understand_key = "I didn't understand"
        if understand_key in df_demo['response'].values:
            comp_by_ethnicity = df_demo[df_demo['response'] == understand_key].groupby('ethnicity').size()
            total_by_ethnicity = df_demo.groupby('ethnicity').size()
            comp_rates = (comp_by_ethnicity / total_by_ethnicity * 100).fillna(0).sort_values(ascending=False)
            
            fig_comp_cult = px.bar(x=comp_rates.values, y=comp_rates.index, orientation='h',
                                 title="Comprehension Difficulty by Ethnicity (%)",
                                 labels={'x': 'Comprehension Issues (%)', 'y': 'Ethnicity'})
            st.plotly_chart(fig_comp_cult, use_container_width=True)
            
            # Comprehension insights
            st.markdown('<div class="insight-box"><h4>Cultural Comprehension Analysis</h4>', unsafe_allow_html=True)
            highest_difficulty = comp_rates.index[0] if len(comp_rates) > 0 else "None"
            lowest_difficulty = comp_rates.index[-1] if len(comp_rates) > 0 else "None"
            
            st.write(f"• **Comprehension Barriers**: {highest_difficulty} group shows highest comprehension difficulty, suggesting cultural or linguistic challenges")
            st.write(f"• **Cultural Accessibility**: {lowest_difficulty} group demonstrates better humor comprehension, indicating cultural alignment with content")
            st.write(f"• **Language Factor**: Comprehension issues may reflect varying English proficiency or cultural reference familiarity")
            st.write(f"• **Design Implication**: Future humor content should consider cultural context and linguistic accessibility for diverse audiences")
            st.markdown('</div>', unsafe_allow_html=True)
    
    # Advanced Insights Page
    elif page == "Advanced Insights":
        st.header("Advanced Analytics & Insights")
        
        # Cultural specificity analysis
        st.subheader("Cultural Specificity Analysis")
        
        # Calculate cultural variance for each joke
        cultural_variance = []
        for joke_id in processed_df['joke_id'].unique():
            joke_data = processed_df[processed_df['joke_id'] == joke_id]
            
            # Performance by ethnicity
            eth_performance = []
            for ethnicity in joke_data['ethnicity'].unique():
                eth_subset = joke_data[joke_data['ethnicity'] == ethnicity]
                if len(eth_subset) >= 3:  # Minimum sample size
                    yes_rate = (eth_subset['response'] == 'Yes').mean() * 100
                    eth_performance.append(yes_rate)
            
            variance = np.std(eth_performance) if len(eth_performance) > 1 else 0
            overall_performance = (joke_data['response'] == 'Yes').mean() * 100
            
            cultural_variance.append({
                'joke_id': joke_id,
                'cultural_variance': variance,
                'overall_performance': overall_performance
            })
        
        cultural_df = pd.DataFrame(cultural_variance)
        
        # Scatter plot
        fig_cultural = px.scatter(cultural_df, 
                                x='cultural_variance', 
                                y='overall_performance',
                                hover_data=['joke_id'],
                                title="Cultural Specificity vs Overall Performance",
                                labels={'cultural_variance': 'Cultural Variance', 
                                       'overall_performance': 'Overall Performance (%)'})
        st.plotly_chart(fig_cultural, use_container_width=True)
        
        # Key insights
        correlation = cultural_df['cultural_variance'].corr(cultural_df['overall_performance'])
        high_universal = cultural_df[(cultural_df['cultural_variance'] < cultural_df['cultural_variance'].median()) & 
                                   (cultural_df['overall_performance'] > cultural_df['overall_performance'].median())]
        high_specific = cultural_df[cultural_df['cultural_variance'] > cultural_df['cultural_variance'].quantile(0.75)]
        
        st.markdown('<div class="insight-box">', unsafe_allow_html=True)
        st.write("**Advanced Statistical Insights:**")
        st.write(f"• **Universality vs Performance**: Correlation of {correlation:.3f} suggests {'strong' if abs(correlation) > 0.5 else 'moderate' if abs(correlation) > 0.3 else 'weak'} relationship between cultural specificity and performance")
        st.write(f"• **Universal Appeals**: {len(high_universal)} jokes demonstrate broad cross-cultural appeal with high performance and low variance")
        st.write(f"• **Cultural Specificity**: {len(high_specific)} jokes show high cultural variance, indicating audience-specific humor preferences")
        
        if correlation < -0.3:
            st.write("• **Key Finding**: Universal jokes tend to perform better, suggesting shared humor elements across cultures")
        elif correlation > 0.3:
            st.write("• **Key Finding**: Culturally specific jokes may achieve higher peaks within target demographics")
        else:
            st.write("• **Key Finding**: Mixed relationship suggests both universal and culture-specific humor strategies can be effective")
        
        st.write("• **Strategic Recommendation**: Balance universal humor elements with targeted cultural content for optimal engagement")
        st.markdown('</div>', unsafe_allow_html=True)
        
        # Export functionality
        st.subheader("Data Export")
        
        if st.button("Generate Summary Report"):
            summary_stats = {
                'Total Participants': processed_df['participant_id'].nunique(),
                'Total Jokes': processed_df['joke_id'].nunique(),
                'Overall Funny Rate': f"{(processed_df['response'] == 'Yes').mean() * 100:.1f}%",
                'Comprehension Issues Rate': f"{(processed_df['response'] == understand_key).mean() * 100:.1f}%",
                'Most Diverse Ethnicity': processed_df['ethnicity'].value_counts().index[0],
                'Cultural Variance Range': f"{cultural_df['cultural_variance'].min():.1f}-{cultural_df['cultural_variance'].max():.1f}"
            }
            
            st.json(summary_stats)

if __name__ == "__main__":
    main()
'''

# Write the dashboard code to a file
with open('humor_dashboard.py', 'w', encoding='utf-8') as f:
    f.write(dashboard_code)



## Dashboard Setup Instructions

The above cell creates a comprehensive Streamlit dashboard file that connects with your main analysis. Here's how to set it up:

### Prerequisites
```bash
pip install streamlit plotly
```

### File Structure
```
Project/
├── Dataset.csv                 # Your main dataset
├── Project_File.ipynb         # Your main analysis notebook
├── streamlit_dashboard.ipynb  # This notebook
└── humor_dashboard.py         # Generated dashboard file
```

### Running the Dashboard
```bash
streamlit run humor_dashboard.py
```

---

In [2]:
# HELPER FUNCTIONS FOR DATA PROCESSING
# These functions can be imported into the main notebook if needed

import pandas as pd
import numpy as np

def create_age_groups(df, age_column):
    """Create age groups from age data"""
    return pd.cut(df[age_column], 
                  bins=[0, 25, 35, 45, 100], 
                  labels=['18-25', '26-35', '36-45', '46+'])

def calculate_joke_performance(df):
    """Calculate performance metrics for each joke"""
    joke_stats = df.groupby('joke_id')['response'].value_counts().unstack(fill_value=0)
    joke_stats['total'] = joke_stats.sum(axis=1)
    joke_stats['funny_rate'] = (joke_stats['Yes'] / joke_stats['total'] * 100).round(1)
    joke_stats['comprehension_issues'] = (joke_stats.get("I didn't understand", 0) / joke_stats['total'] * 100).round(1)
    return joke_stats

def calculate_cultural_variance(df):
    """Calculate cultural variance for each joke"""
    cultural_variance = []
    
    for joke_id in df['joke_id'].unique():
        joke_data = df[df['joke_id'] == joke_id]
        
        # Performance by ethnicity
        eth_performance = []
        for ethnicity in joke_data['ethnicity'].unique():
            eth_subset = joke_data[joke_data['ethnicity'] == ethnicity]
            if len(eth_subset) >= 3:  # Minimum sample size
                yes_rate = (eth_subset['response'] == 'Yes').mean() * 100
                eth_performance.append(yes_rate)
        
        variance = np.std(eth_performance) if len(eth_performance) > 1 else 0
        overall_performance = (joke_data['response'] == 'Yes').mean() * 100
        
        cultural_variance.append({
            'joke_id': joke_id,
            'cultural_variance': variance,
            'overall_performance': overall_performance
        })
    
    return pd.DataFrame(cultural_variance)

def export_summary_stats(df):
    """Export summary statistics"""
    joke_stats = calculate_joke_performance(df)
    
    summary = {
        'dataset_overview': {
            'total_participants': df['participant_id'].nunique(),
            'total_jokes': df['joke_id'].nunique(),
            'total_responses': len(df),
            'response_rate': len(df) / (df['participant_id'].nunique() * df['joke_id'].nunique())
        },
        'response_distribution': df['response'].value_counts().to_dict(),
        'top_jokes': joke_stats.nlargest(5, 'funny_rate')['funny_rate'].to_dict(),
        'comprehension_challenges': joke_stats.nlargest(5, 'comprehension_issues')['comprehension_issues'].to_dict(),
        'demographic_diversity': {
            'ethnicities': df['ethnicity'].nunique(),
            'countries_residence': df['country_residence'].nunique(),
            'countries_birth': df['country_birth'].nunique()
        }
    }
    
    return summary

print("Helper functions defined successfully!")
print("These functions can be used in your main analysis or imported into other notebooks.")

Helper functions defined successfully!
These functions can be used in your main analysis or imported into other notebooks.


## Integration with Main Notebook

### Option 1: Import Processed Data
You can save the processed data from your main notebook and load it in the dashboard:

```python
# In your main notebook (Project_File.ipynb)
final_df.to_csv('processed_humor_data.csv', index=False)

# In the dashboard
processed_df = pd.read_csv('processed_humor_data.csv')
```

### Option 2: Import Functions
You can import the processing functions from your main notebook:

```python
# Create a separate utils.py file with your processing functions
# Then import in the dashboard
from utils import clean_joke_text, standardize_country, standardize_ethnicity
```

### Option 3: Direct Connection
The dashboard currently replicates your data processing logic to maintain independence while ensuring consistency with your main analysis.

---

## Dashboard Features Summary

### Dataset Overview
- Key metrics and KPIs
- Response distribution analysis
- Data quality insights

### Demographics
- Age distribution visualization
- Ethnicity and gender breakdowns
- Geographic distribution analysis

### Joke Performance
- Individual joke performance rankings
- Comprehension challenge identification
- Interactive joke analysis with demographic breakdowns

### Cultural Analysis
- Cross-cultural humor patterns
- Demographic comparison tools
- Comprehension difficulty by culture

### Advanced Insights
- Joke performance correlation matrix
- Cultural specificity analysis
- Advanced analytics and export functionality

The dashboard provides an interactive way to explore all the insights from your main analysis while allowing stakeholders to dive deeper into specific aspects of the data.