Okay, here are comprehensive notes for **Module 3: Advanced Visualizations & Geospatial Data Insights**, based on the provided transcript. This is designed for easy copy-pasting into Google Docs.

***

**A Note on Data:** The code examples in this module often assume you have Pandas DataFrames/Series like `df_canada` (with 'Continent' and 'Total' columns) and `df_continents_total` (sum of 'Total' immigration grouped by 'Continent') available, as potentially processed in Modules 1 and 2. For clarity, conceptual setup or recreation of these will be included in examples where essential. You will also need specific libraries like `pywaffle`, `wordcloud`, `seaborn`, and `folium` installed.

# Module 3: Advanced Visualizations & Geospatial Data Insights

This module explores more specialized and advanced visualization techniques. We'll cover novel ways to represent proportions, methods for visualizing textual data, leveraging Seaborn for sophisticated statistical graphics, and an introduction to visualizing geospatial data with Folium. [Source: 1]

## 3.1. Waffle Charts: Proportional Representation with PyWaffle

Waffle charts present an alternative, often more intuitive, method for showing proportions in categorical data, especially when compared to pie charts. [Source: 2] They use a grid of equal-sized square tiles, where each tile represents a specific unit or category. The number or color of these tiles indicates the magnitude or proportion. [Source: 3, 27]

[*Image: Example of a waffle chart showing market share of three products, with each product represented by different colored squares in a 10x10 grid.*]

**Definition and Use Cases:** [Source: 3]
Waffle charts are effective for displaying:
* The composition of a whole (e.g., market share percentages, demographic breakdowns). [Source: 3]
* Project completion status (e.g., X out of Y tasks done). [Source: 3]
* Budget allocations, survey responses, or election results. [Source: 3]
Their grid-based layout can make proportions easier to grasp visually than the angles in a pie chart because each unit is distinctly represented. [Source: 4]

**The `pywaffle` Library:** [Source: 5]
These charts are typically created in Python using the `pywaffle` library. [Source: 5, 27] `pywaffle` integrates with Matplotlib; its `Waffle` class can be used with `matplotlib.pyplot.figure()` to generate a Matplotlib Figure. [Source: 5, 6, 27]

**Creation Process:** [Source: 6]
1.  **Import:** `from pywaffle import Waffle` and `import matplotlib.pyplot as plt`. [Source: 7]
2.  **Prepare Data:** Usually a dictionary (categories as keys, counts/proportions as values) or a list of values if labels are separate. [Source: 7]
3.  **Use `plt.figure(FigureClass=Waffle, **waffle_params)`:** [Source: 8]
    * `values`: The data (dictionary or list). [Source: 8]
    * `rows`: Number of rows in the grid. [Source: 8]
    * `columns`: Number of columns. If both `rows` and `columns` are set, the total blocks are fixed, and values are scaled. [Source: 9] If only one is set, the other can be calculated if values are absolute block counts. [Source: 10]
    * `colors`: List of colors for categories. [Source: 11]
    * `title`: Dictionary for title properties (label, location, font size). [Source: 11]
    * `labels` or `legend`: For labeling categories. [Source: 12]
    * `icons`: Can use Font Awesome icons instead of squares (pictogram style). [Source: 13, 29]
    * Other parameters: `starting_location`, `block_arranging_style` (e.g., 'snake'). [Source: 13, 51]

**Code Example (Immigration by Continent Proportion):**
This example uses `df_continents_total` (sum of 'Total' immigration grouped by 'Continent'). [Source: 13]

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
# Ensure pywaffle is installed: pip install pywaffle

# --- Conceptual df_continents_total setup (if not available from Module 2) ---
# This would typically come from:
# df_canada = pd.read_excel(...) # (loaded and processed as in Module 1)
# df_continents_total = df_canada.groupby('Continent')['Total'].sum()

# For a runnable example, let's create a dummy df_continents_total Series:
continents_data = {
    'Asia': 15000000,
    'Europe': 8000000,
    'Africa': 3000000,
    'Oceania': 1000000,
    'North America': 2500000,
    'South America': 1500000
}
df_continents_total = pd.Series(continents_data, name='Total')
# --- End conceptual setup ---

try:
    from pywaffle import Waffle # [Source: 7, 14]

    # Convert Pandas Series to a dictionary for PyWaffle [Source: 14]
    data_for_waffle = df_continents_total.to_dict()

    # Scale values: each block represents ~1 million immigrants [Source: 15, 16]
    scaled_values = {k: round(v / 1_000_000) for k, v in data_for_waffle.items()}
    # Filter out categories that round to 0 blocks [Source: 17]
    scaled_values = {k: v for k, v in scaled_values.items() if v > 0}

    if not scaled_values:
        print("Scaled values resulted in no blocks to display. Check scaling or input data.")
    else:
        fig = plt.figure(
            FigureClass=Waffle,
            rows=15,  # Define number of rows [Source: 17]
            # columns=30, # Optionally set columns for a fixed grid [Source: 18]
            values=scaled_values,  # Pass scaled data [Source: 18]
            title={
                'label': 'Immigration to Canada by Continent (Each block ≈ 1M immigrants)', # [Source: 18]
                'loc': 'center',
                'fontsize': 14
            },
            labels=[f"{k} ({v:.0f}M)" for k, v in scaled_values.items()], # Custom legend labels [Source: 19]
            legend={
                'loc': 'lower left',
                'bbox_to_anchor': (0, -0.40), # Adjusted for potentially more items [Source: 19]
                'ncol': max(1, len(scaled_values) // 2), # Dynamic columns [Source: 19, 20]
                'framealpha': 0.7,
                'fontsize': 10
            },
            figsize=(12, 9), # Adjusted figure size for legend [Source: 20]
            # icons='user', # Example: use Font Awesome icons [Source: 13, 20]
            # icon_size=12,
            # icon_legend=True,
            block_arranging_style='snake', # How blocks are laid out [Source: 20, 21]
            colors=None # Let PyWaffle choose, or specify list [Source: 21]
        )
        fig.set_facecolor('#EEEEEE') # Light grey background [Source: 21]
        plt.show()

except ImportError:
    print("The pywaffle library is not installed. Please install it using: pip install pywaffle") # [Source: 21, 22]
except Exception as e:
    print(f"An error occurred while creating the waffle chart: {e}") # [Source: 22]
    if 'scaled_values' in locals() and not scaled_values:
        print("Note: Scaled values resulted in no blocks to display. Check scaling or input data.") # [Source: 22]

Waffle charts make proportions concrete as each unit (square/icon) is visible. This can be more intuitive than pie chart angles for some, especially for counts or parts of a discrete total. [Source: 22, 23] Effectiveness depends on grid size and value scaling. [Source: 23, 29] If blocks represent fixed numbers, total blocks vary. If total blocks are fixed, values must be scaled proportionally. These choices impact granularity and interpretability. [Source: 23, 24]

## 3.2. Word Clouds: Visualizing Textual Data

Word clouds (or tag/text clouds) visually represent textual data by emphasizing the most frequent or important words. [Source: 25]

[*Image: Example of a word cloud generated from a news article, with common words like "government," "policy," and "economy" appearing larger.*]

**Definition and Use Cases:** [Source: 26]
* Word size (and often boldness/color intensity) is proportional to its frequency or significance in the source text. [Source: 26]
* Provides a quick, at-a-glance summary of prominent themes/terms. [Source: 27]
* Common applications: [Source: 28]
    * Analyzing social media for trending topics. [Source: 28]
    * Summarizing customer feedback. [Source: 28]
    * Content analysis of articles/speeches. [Source: 29]
    * Market research via product reviews. [Source: 29]
    * Visualizing keywords in resumes/job descriptions. [Source: 30]

**Creation Process:** [Source: 30]
Typically uses specialized Python libraries like `wordcloud`. [Source: 31]
Crucial preceding step: **Text Preprocessing**. [Source: 31] This may include:
* Converting to lowercase. [Source: 31]
* Removing punctuation and numbers. [Source: 32]
* Removing "stop words" (common words like "the," "is," "a" that add little meaning). [Source: 32, 33]
* Stemming or lemmatization (reducing words to their root form). [Source: 33]

**Code Example (Conceptual using `wordcloud` library):** [Source: 33]

In [None]:
# Ensure the wordcloud library is installed: pip install wordcloud
try:
    from wordcloud import WordCloud, STOPWORDS # [Source: 33]
    import matplotlib.pyplot as plt

    # Sample text (e.g., from course transcript) [Source: 33]
    text_data = """
    Data visualization is the graphical representation of data and information.
    It involves the process of creating visual representations of data.
    Data visualization can take many forms, from basic charts and graphs to more complex
    interactive dashboards, maps, and infographics. Basic charts and graphs are the simplest
    form for representing numerical data. Interactive dashboards and maps are more complex
    forms used to provide real-time information. Why is Data Visualization Important?
    It helps easily understand complex datasets that might be difficult to comprehend in their raw form.
    It can highlight patterns, trends, and relationships that might not be immediately apparent
    from looking at the data. Python is a key tool for data visualization.
    """ # [Source: 34, 35, 36]

    # Define a set of stopwords (can be expanded) [Source: 36]
    stopwords = set(STOPWORDS)
    stopwords.update(["form", "forms", "helps", "might", "looking", "data", "visualization", "visual"]) # Added more custom stopwords

    # Create a WordCloud object [Source: 36]
    wordcloud_generator = WordCloud(width=800, height=400,
                                  background_color='white', # [Source: 36, 37]
                                  stopwords=stopwords, # [Source: 37]
                                  min_font_size=10, # [Source: 37]
                                  colormap='viridis', # Example colormap [Source: 37]
                                  collocations=False # Avoids bi-grams (pairs of words) [Source: 38]
                                 ).generate(text_data)

    # Display the generated image using Matplotlib [Source: 38]
    plt.figure(figsize=(10, 5), facecolor=None)
    plt.imshow(wordcloud_generator, interpolation='bilinear') # 'bilinear' for smoother image [Source: 38, 39]
    plt.axis("off") # No axes for word cloud [Source: 39]
    plt.tight_layout(pad=0) # Remove padding [Source: 39]
    plt.title("Word Cloud from Sample Text Data", fontsize=16, y=1.03) # [Source: 39]
    plt.show()

except ImportError:
    print("The wordcloud library is not installed. Please install it using: pip install wordcloud") # [Source: 39, 40]
except Exception as e:
    print(f"An error occurred while creating the word cloud: {e}") # [Source: 40]

Word clouds are primarily qualitative, exploratory tools. They offer a quick visual summary but lack precision for detailed frequency analysis. [Source: 40] Visual encoding of frequency by size is for impression, not quantitative accuracy like a bar chart of word frequencies. [Source: 40] Their utility heavily depends on **quality text preprocessing**. [Source: 41] Without good stop word removal, stemming/lemmatization, and case handling, common but uninformative words can dominate, obscuring meaningful terms. [Source: 41] Generating useful word clouds requires sound text mining practices. [Source: 42]

## 3.3. Seaborn: Statistical Graphics Made Easy

Seaborn is a popular Python data visualization library built on Matplotlib. [Source: 43] It provides a high-level interface for creating attractive and informative statistical graphics, often with much less code than Matplotlib. [Source: 44, 13]

[*Image: Seaborn logo or a gallery showcasing various attractive Seaborn plots like violin plots, heatmaps, and regression plots.*]

**Key Features:**
* **Statistical Focus:** Seaborn functions work with entire datasets (especially Pandas DataFrames) and often perform automatic statistical estimation and aggregation (means, medians, confidence intervals, regression models). [Source: 44, 45, 21]
* **High-Level Interface:** Abstracts Matplotlib complexities, enabling sophisticated plots (distribution, categorical, regression) with concise syntax. [Source: 45, 46] (Some plots might need 5x less code than Matplotlib. [Source: 46])
* **Integration with Pandas DataFrames:** Seamlessly uses DataFrame columns for plot aesthetics (x, y, hue, size, style). [Source: 47, 48]
* **Attractive Default Aesthetics:** Comes with built-in themes and color palettes for visually appealing plots. `sns.set_theme()` or `sns.set_style()` apply these globally. [Source: 48, 22]
* **Specialized Plot Types:** [Source: 49]
    * **Distribution plots:** `displot` (histograms, KDE, ECDF), `kdeplot`, `histplot`, `ecdfplot`. [Source: 49, 21]
    * **Relational plots:** `relplot` (scatter, line), `scatterplot`, `lineplot`. [Source: 49]
    * **Categorical plots:** `catplot` (figure-level interface for `stripplot`, `swarmplot`, `boxplot`, `violinplot`, `barplot`, `pointplot`, `countplot`), `barplot`, `boxplot`, `violinplot`. [Source: 50, 22]
    * **Regression plots:** `regplot`, `lmplot` for linear relationships. [Source: 51]
    * **Matrix plots:** `heatmap`, `clustermap` for matrix data. [Source: 51]

Seaborn acts as a "productivity layer" over Matplotlib, abstracting boilerplate code and statistical calculations. [Source: 52, 53] This allows users to focus on interpreting statistical insights, speeding up Exploratory Data Analysis (EDA). [Source: 53, 54] While Seaborn simplifies standard statistical plots, Matplotlib understanding remains beneficial for fine-grained customization beyond Seaborn's high-level functions, as Seaborn plots are Matplotlib objects. [Source: 54, 55, 21]

### 3.3.1. Regression Plots: Visualizing Trends and Confidence Intervals

Seaborn's regression plots excel at visualizing relationships between two numerical variables, including a fitted linear regression model and its uncertainty. [Source: 56]

[*Diagram: A Seaborn regression plot showing scattered data points, a clear regression line, and the shaded confidence interval around the line.*]

**Functionality:** [Source: 57]
* Displays a scatter plot of data points.
* Overlays a regression line that best fits the observed relationship. [Source: 57]
* Often shows a **confidence interval** (usually 95% by default) around the regression line, visually indicating uncertainty in the estimated trend. [Source: 58, 21]

**Creation with Seaborn:**
* `seaborn.regplot(x='x_col', y='y_col', data=dataframe)`: Versatile for simple linear regression, automatically fitting and plotting the line and confidence interval. [Source: 58, 59]
* `seaborn.lmplot(x='x_col', y='y_col', data=dataframe, hue='cat_col', col='another_cat_col')`: More powerful, combines `regplot` with `FacetGrid`. Allows plotting regression lines for data subsets conditioned on other categorical variables (using `hue`, `col`, `row`). [Source: 60, 61, 62]

**Customization:** Parameters like `color`, `marker`, `scatter_kws` (dict for scatter plot args), and `line_kws` (dict for regression line args) allow detailed customization. [Source: 62, 63]

**Code Example (Total Immigration Trend with Regression Line):**
Uses `df_total_trend` ('Year', 'Total_Immigrants') from scatter plot section (Module 2.6). [Source: 63]

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np # For dummy data

# --- Conceptual df_total_trend setup (if not available from Module 2) ---
# This would typically come from:
# years_for_plotting = [y for y in range(1980, 2014)]
# total_immigration_per_year = df_canada[years_for_plotting].sum(axis=0)
# total_immigration_per_year.index = total_immigration_per_year.index.map(int)
# df_total_trend = pd.DataFrame({'Year': total_immigration_per_year.index, 'Total_Immigrants': total_immigration_per_year.values})

# For a runnable example, let's create a dummy df_total_trend:
years_numeric_list = list(range(1980, 2014))
total_immigrants_list = np.linspace(150000, 300000, len(years_numeric_list)) + np.random.normal(0, 20000, len(years_numeric_list))
df_total_trend = pd.DataFrame({'Year': years_numeric_list, 'Total_Immigrants': total_immigrants_list})
# --- End conceptual setup ---


plt.figure(figsize=(10, 6)) # Use Matplotlib for figure size [Source: 64]
sns.regplot(x='Year',
            y='Total_Immigrants',
            data=df_total_trend, # [Source: 64]
            color='dodgerblue',     # Color for scatter and line (can be overridden) [Source: 64, 65]
            marker='o',             # Marker style [Source: 65]
            scatter_kws={'s': 50, 'alpha': 0.6, 'edgecolor':'w'}, # Args for scatter plot [Source: 65]
            line_kws={'color': 'red', 'linewidth': 2.5})      # Args for regression line [Source: 65]
plt.title('Total Immigration to Canada Trend with Regression Line (1980-2013)') # [Source: 65]
plt.xlabel('Year') # [Source: 65]
plt.ylabel('Total Number of Immigrants') # [Source: 65]
plt.grid(True, linestyle='--', alpha=0.7) # [Source: 65]
plt.show()

A major advantage of Seaborn's regression plots is the **automatic inclusion of confidence intervals**, providing an immediate visual measure of uncertainty in the trend line. [Source: 66, 21, 67] `lmplot`'s ability to integrate `regplot` with `FacetGrid` allows powerful comparative regression analyses across data subsets (e.g., separate lines for different continents), facilitating deeper investigation. [Source: 68, 69, 21, 70]

### 3.3.2. Categorical Data Plots with Seaborn

Seaborn offers a rich suite of functions for visualizing relationships involving categorical data, often providing more insight than basic bar charts. [Source: 71]

[*Image: A panel of Seaborn categorical plots: a `countplot`, a `barplot` with error bars, a `boxplot`, and a `violinplot`, all comparing a numerical variable across different categories.*]

**Common Categorical Plot Types:** [Source: 72]
* `sns.countplot(data=df, x='category_column')`: Shows observation counts in each category (like a histogram for discrete categories). [Source: 72, 73]
* `sns.barplot(data=df, x='category_column', y='numeric_column')`: Shows central tendency (mean by default) of a numeric variable per category, with error bars (typically confidence intervals). [Source: 73, 74]
* `sns.boxplot(data=df, x='category_column', y='numeric_column')`: Box plots per category for comparing distributions. [Source: 75, 76]
* `sns.violinplot(data=df, x='category_column', y='numeric_column')`: Combines box plot aspects with kernel density estimates to show distribution shape per category. [Source: 76, 77]
* `sns.stripplot()` and `sns.swarmplot()`: Show individual data points for a numeric variable, categorized. Swarm plots avoid overlap. [Source: 77, 78]
* `catplot()`: Versatile figure-level interface for these plots (`kind='bar'`, `'box'`, etc.), also facilitates faceting. [Source: 78, 79]

**Code Example (Total Immigration by Continent & Country Counts):**
Uses `df_canada` DataFrame. [Source: 80]

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np # For dummy data

# --- Conceptual df_canada setup (if not available) ---
# This would typically come from Module 1.
# For a runnable example, let's create a dummy df_canada:
countries = ['India', 'China', 'UK', 'Philippines', 'Pakistan', 'USA', 'Iran', 'Sri Lanka', 'South Korea', 'Poland', 'Nigeria', 'Egypt', 'Colombia', 'Brazil']
years_for_plotting = [y for y in range(1980, 2014)]
data = {}
for year in years_for_plotting:
    data[year] = np.random.randint(100, 30000, size=len(countries))
df_canada = pd.DataFrame(data, index=countries)
df_canada['Total'] = df_canada[years_for_plotting].sum(axis=1)
df_canada['Continent'] = np.random.choice(['Asia', 'Europe', 'Africa', 'Asia', 'Asia', 'North America', 'Asia', 'Asia', 'Asia', 'Europe', 'Africa', 'Africa', 'South America', 'South America'], size=len(countries))
# --- End conceptual df_canada setup ---

# --- Count plot for number of countries per continent ---
plt.figure(figsize=(10, 6))
ax_count = sns.countplot(data=df_canada,
                         x='Continent',
                         palette='viridis', # Example color palette [Source: 80]
                         order=df_canada['Continent'].value_counts().index) # Order by frequency [Source: 81]
ax_count.set_title('Number of Countries per Continent in Dataset') # [Source: 81]
ax_count.set_xlabel('Continent') # [Source: 81]
ax_count.set_ylabel('Count of Countries') # [Source: 81]
plt.xticks(rotation=45, ha='right') # Rotate labels [Source: 81]
plt.tight_layout() # [Source: 81]
plt.show()

# --- Bar plot for total immigration by continent ---
# sns.barplot by default computes mean. For sum, pre-aggregate or use estimator=sum. [Source: 82]
df_continent_sum_totals = df_canada.groupby('Continent')['Total'].sum().reset_index() # [Source: 83]

plt.figure(figsize=(12, 7))
ax_bar = sns.barplot(data=df_continent_sum_totals,
                     x='Continent',
                     y='Total',
                     palette='magma',
                     order=df_continent_sum_totals.sort_values('Total', ascending=False)['Continent']) # Order bars [Source: 83]
ax_bar.set_title('Total Immigration by Continent (1980-2013)') # [Source: 84]
ax_bar.set_xlabel('Continent') # [Source: 84]
ax_bar.set_ylabel('Total Number of Immigrants (in millions)') # [Source: 84]
# Format y-axis to show millions [Source: 84]
ax_bar.get_yaxis().set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1_000_000:.1f}M'))
plt.xticks(rotation=45, ha='right') # [Source: 84]
plt.tight_layout() # [Source: 84]
plt.show()

A key strength of Seaborn's categorical plots (like `barplot`) is their default inclusion of **statistical estimation** (e.g., mean with confidence intervals), providing more rigor than simple sum/count bar charts. [Source: 85, 86] This focus on statistical information with minimal user effort is a Seaborn hallmark. [Source: 87]

## 3.4. Visualizing Geospatial Data with Folium

Folium is a powerful Python library for visualizing geospatial data. [Source: 88] It creates interactive maps using the Leaflet.js JavaScript library. [Source: 89, 23]

[*Image: An interactive Folium map displayed in a Jupyter Notebook, perhaps showing markers or a choropleth layer.*]

**Core Functionality:** [Source: 89]
Folium generates various map types, renderable in Jupyter notebooks or as standalone HTML files for easy sharing. [Source: 90]

**Map Creation and Styling:**
* **Base Maps:**
    *(The provided text cuts off here.)*

**Note:** The provided content for section 3.4 is incomplete. Further details on creating different map types (e.g., base maps, markers, choropleth maps) with Folium would typically follow here.

***

## Module 3 Practice Questions

Here are practice questions covering the available topics in Module 3.

**Section A: Multiple Choice Questions (MCQ)**

1.  Waffle charts are primarily used to represent:
    a)  Time-series data
    b)  Hierarchical data
    c)  Categorical data proportions
    d)  Geospatial distributions

2.  Which library is commonly used in Python to create Waffle charts? [Source: 5]
    a)  Seaborn
    b)  WordCloud
    c)  PyWaffle
    d)  Folium

3.  In a Waffle chart, if both `rows` and `columns` parameters are specified, how are the input `values` typically handled? [Source: 9]
    a)  The grid expands to accommodate all absolute values.
    b)  Values are scaled to fit the fixed number of blocks in the grid.
    c)  An error is raised as only one dimension should be fixed.
    d)  Values are ignored, and the grid is filled randomly.

4.  What is the primary purpose of a word cloud? [Source: 25, 27]
    a)  To show the grammatical structure of a text.
    b)  To provide a precise count of each word in a text.
    c)  To offer a quick visual summary of the most frequent or important words in a text.
    d)  To translate text from one language to another.

5.  Which of these is a crucial text preprocessing step before generating a meaningful word cloud? [Source: 32, 33]
    a)  Converting text to all uppercase.
    b)  Adding more punctuation for emphasis.
    c)  Removing "stop words" (common, uninformative words).
    d)  Encrypting the text data.

6.  Seaborn is built on top of which other Python visualization library? [Source: 43]
    a)  Plotly
    b)  Bokeh
    c)  Matplotlib
    d)  PyWaffle

7.  A key feature of Seaborn's `regplot` and `lmplot` is the automatic inclusion of: [Source: 58, 21, 66]
    a)  Pie chart segments
    b)  Axis labels for categorical data
    c)  Confidence intervals around the regression line
    d)  Waffle chart grids

8.  Which Seaborn function is a figure-level interface for creating various types of categorical plots (e.g., box, violin, bar) and facilitates faceting? [Source: 78, 79]
    a)  `sns.barplot()`
    b)  `sns.boxplot()`
    c)  `sns.catplot()`
    d)  `sns.kdeplot()`

9.  What does `sns.countplot(data=df, x='category_col')` typically display? [Source: 72, 73]
    a)  The mean of a numeric variable for each category.
    b)  The sum of values for each category.
    c)  The count of observations in each category of `category_col`.
    d)  A scatter plot of categories.

10. Folium is used in Python for: [Source: 88]
    a)  Creating statistical regression models.
    b)  Visualizing textual data as word clouds.
    c)  Generating interactive geospatial maps.
    d)  Building complex financial dashboards.

11. In `pywaffle`, what does the `icons` parameter allow you to do? [Source: 13, 29]
    a)  Change the shape of the legend.
    b)  Use Font Awesome icons instead of square tiles.
    c)  Add image icons to the plot background.
    d)  Control the color of the waffle blocks.

12. What does the `collocations=False` parameter in the `WordCloud` object often prevent? [Source: 38]
    a)  Single words from appearing.
    b)  Stop words from being removed.
    c)  Pairs of words (bi-grams) from being treated as single entities.
    d)  The word cloud from using different colors.

13. Seaborn's `lmplot` is more powerful than `regplot` primarily because it: [Source: 60, 61]
    a)  Can only plot non-linear regressions.
    b)  Integrates with `FacetGrid` for conditional plotting on subsets of data.
    c)  Uses a different statistical model by default.
    d)  Does not require a Pandas DataFrame as input.

14. When using `sns.barplot(data=df, x='category', y='value')`, what does the error bar on top of each bar typically represent by default? [Source: 74]
    a)  Standard deviation
    b)  Range (min-max)
    c)  Confidence interval around the estimate of central tendency (mean)
    d)  Interquartile range

15. The `block_arranging_style='snake'` in `pywaffle` refers to: [Source: 20, 21]
    a)  The shape of the icons used.
    b)  The color palette for the waffle blocks.
    c)  The way blocks are laid out in the grid (e.g., left-to-right, then next row).
    d)  The animation style if the waffle chart is dynamic.

**Section B: True/False Questions**

1.  Waffle charts use angles to represent proportions, similar to pie charts. (T/F) [Source: 3, 4]
2.  The `pywaffle` library is completely independent of Matplotlib. (T/F) [Source: 5, 6]
3.  Word clouds are highly precise tools for quantitative comparison of word frequencies. (T/F) [Source: 40]
4.  Stemming or lemmatization is generally discouraged before creating a word cloud as it reduces word diversity. (T/F) [Source: 33, 41] (It's encouraged to group related words)
5.  Seaborn provides a low-level interface requiring more code than Matplotlib for similar statistical plots. (T/F) [Source: 44, 46]
6.  Seaborn's `regplot` can only display the scatter points and not the regression line itself. (T/F) [Source: 57, 58]
7.  Confidence intervals in a Seaborn regression plot provide a measure of uncertainty around the estimated trend. (T/F) [Source: 58, 66]
8.  `sns.violinplot` combines aspects of a box plot and a kernel density estimate. (T/F) [Source: 76, 77]
9.  Folium maps are static and cannot be interacted with (e.g., zoom, pan). (T/F) [Source: 89] (Folium leverages Leaflet.js for interactivity)
10. In Waffle charts, if each block represents a fixed number (e.g., 100 units), the total number of blocks in the chart will always be the same regardless of the data. (T/F) [Source: 23]
11. Text preprocessing is an optional step that has little impact on the final appearance of a word cloud. (T/F) [Source: 31, 41]
12. Seaborn's `catplot` function can only create bar plots. (T/F) [Source: 78, 79]
13. `lmplot` in Seaborn can use the `hue` parameter to plot different regression lines for different categories within the same plot. (T/F) [Source: 61]
14. `sns.barplot` by default displays the sum of the `y` variable for each category of `x`. (T/F) [Source: 73, 74] (It displays the mean by default)
15. Folium allows maps to be saved as standalone HTML files. (T/F) [Source: 90]

**Section C: Short Answer / Explanation Questions**

1.  Explain the basic principle of a Waffle chart and why it might be preferred over a pie chart for certain audiences or data. [Source: 2, 3, 4, 23]
2.  Describe two important parameters you would use when creating a Waffle chart with `pywaffle` and what they control. [Source: 8-13]
3.  What is text preprocessing in the context of creating word clouds, and why is it crucial for generating a meaningful visualization? List two common preprocessing steps. [Source: 31-33, 41, 42]
4.  How does Seaborn simplify the creation of statistical graphics compared to using Matplotlib directly? Give one example of a Seaborn feature that exemplifies this. [Source: 44-48, 52, 53]
5.  What information does a regression plot in Seaborn (e.g., using `regplot`) convey beyond a simple scatter plot with a trend line? [Source: 57, 58, 66]
6.  Explain the difference between Seaborn's `regplot` and `lmplot`. When might `lmplot` be particularly useful? [Source: 59-62, 68-70]
7.  Describe two different types of categorical plots available in Seaborn (e.g., `countplot`, `boxplot`, `violinplot`) and what kind of insights they help reveal. [Source: 72-79]
8.  What is the role of the `values` and `rows`/`columns` parameters in configuring a Waffle chart? How does their interaction affect the chart? [Source: 8, 9, 10, 23]
9.  Why are word clouds considered more of a qualitative and exploratory tool rather than a precise analytical one? [Source: 40]
10. What are "stop words" in text analysis, and why are they typically removed before generating a word cloud? [Source: 32, 33, 41]
11. Explain the concept of "faceting" as used by Seaborn's `lmplot` or `catplot`. How does it enhance data analysis? [Source: 61, 70, 79]
12. What does the `estimator` parameter in `sns.barplot` allow you to control? (Hint: by default, it shows the mean). [Source: 82, 83 implicitly]
13. What is the primary Python library Folium interacts with to render maps? [Source: 89, 23]
14. If you scaled the values for a Waffle chart such that some categories became zero, how might this affect the chart and how could you handle it? [Source: 17, 22]
15. How can `scatter_kws` and `line_kws` be used to customize a Seaborn regression plot? [Source: 62, 63, 65]

**Section D: Code Interpretation / "What's the Output?" / "Identify the Error"**

1.  **Code Snippet (PyWaffle):**

In [None]:
# from pywaffle import Waffle
    # import matplotlib.pyplot as plt
    # data = {'Category A': 50, 'Category B': 30, 'Category C': 20}
    # plt.figure(
    #     FigureClass=Waffle,
    #     rows=10,
    #     values=data,
    #     # What parameter would you add here to show the labels in a legend?
    # )
    # plt.show()

What parameter (and its likely value structure) would you add to the `plt.figure` call to include a legend with labels for 'Category A', 'Category B', and 'Category C'? [Source: 12, 19]

2.  **Identify the Plot Type:** You see a visualization where word frequency in a document is represented by word size and color intensity, with more frequent words being larger and more prominent. What type of plot is this? [Source: 25, 26]

3.  **Code Snippet (Seaborn):**

In [None]:
# import seaborn as sns
    # import matplotlib.pyplot as plt
    # # df_titanic is a DataFrame with 'Pclass' (categorical) and 'Age' (numerical)
    # sns.boxplot(x='Pclass', y='Age', data=df_titanic)
    # plt.title("Age Distribution by Passenger Class")
    # plt.show()

What will this code primarily visualize? [Source: 75, 76]

4.  **Conceptual Question:** If you use `sns.regplot()` and observe a very wide, fanned-out confidence interval around the regression line, what does this suggest about the certainty of the trend? [Source: 58, 66]

5.  **Code Snippet (WordCloud):**

In [None]:
# from wordcloud import WordCloud
    # my_text = "apple banana apple orange banana apple"
    # wc = WordCloud(collocations=True).generate(my_text)
    # # If collocations were True, what might appear as a single entity in the word cloud?
    # # (Assume no stopwords are being filtered for this specific question)

If `collocations` were `True` (default is `True`, but the example in notes used `False`), and assuming "apple banana" appeared frequently together, what might be displayed as a prominent feature? (Hint: it's about word pairs). [Source: 38 for `False` implies understanding `True`]

**Section E: "Choose the Right Plot" Scenarios**

For each scenario, choose the most appropriate plot type from Module 3 (Waffle Chart, Word Cloud, Seaborn Regression Plot, Seaborn Categorical Plot - specify type e.g., boxplot, countplot) and briefly justify your choice. (Assume Folium is for map-based scenarios if it were fully covered).

1.  You have survey data showing how 100 respondents rated a product on a scale of "Poor," "Average," "Good," "Excellent." You want to show the count for each rating.
2.  You want to visualize the relationship between years of experience and salary for employees in a company, and also see if this trend is statistically significant.
3.  You've analyzed the text of your company's mission statement and want to quickly show the most emphasized terms to your team.
4.  You want to represent the funding sources for a non-profit, where 60% comes from donations, 30% from grants, and 10% from investments, using a grid-based visual where each small square represents 1% of funding.
5.  You are comparing the distribution of customer satisfaction scores (a numerical score from 1-100) across three different product lines (A, B, C) to see differences in median scores, spread, and potential outliers for each line.

***

This set of notes and questions should give you a thorough understanding of the advanced visualization techniques covered in the available sections of Module 3. Remember to install the necessary libraries (`pywaffle`, `wordcloud`, `seaborn`) to run the code examples.