<a href="https://colab.research.google.com/github/TCU-DCDA/WRIT20833-2025/blob/main/notebooks/exercises/Review_08_Data_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WRIT 20833 Review 08: Data Visualization

Create compelling visualizations to communicate cultural data insights.

**Make a copy:** File > Save a copy in Drive

## Exercise 1: Setting Up Visualization Tools
Import libraries and create basic plots.

In [None]:
# Import basic libraries (only what was covered in CodeAlongs)
import matplotlib.pyplot as plt
import pandas as pd

# Create sample cultural data for visualization
books_data = {
    'title': ['1984', 'Pride and Prejudice', 'The Handmaid\'s Tale', 'Beloved', 'The Great Gatsby', 
             'To Kill a Mockingbird', 'Jane Eyre', 'Wuthering Heights'],
    'author': ['George Orwell', 'Jane Austen', 'Margaret Atwood', 'Toni Morrison', 'F. Scott Fitzgerald',
              'Harper Lee', 'Charlotte Brontë', 'Emily Brontë'],
    'year': [1949, 1813, 1985, 1987, 1925, 1960, 1847, 1847],
    'pages': [328, 432, 311, 275, 180, 281, 507, 416],
    'genre': ['Dystopian', 'Romance', 'Dystopian', 'Historical Fiction', 'Modernist', 
             'Coming of Age', 'Gothic', 'Gothic'],
    'rating': [4.2, 4.1, 4.3, 4.4, 3.9, 4.5, 4.0, 3.8]
}

books_df = pd.DataFrame(books_data)

print("Sample Cultural Dataset:")
print(books_df.head())
print()
print("Dataset shape: " + str(books_df.shape))

# Test basic plotting (matching CodeAlongs patterns)
plt.figure(figsize=(8, 5))
plt.scatter(books_df['year'], books_df['pages'])
plt.xlabel('Publication Year')
plt.ylabel('Number of Pages')
plt.title('Book Length vs. Publication Year')
plt.show()

print("Basic plotting functionality working!")

## Exercise 2: Bar Charts for Categorical Data
Visualize distributions and comparisons in cultural data.

In [None]:
# Bar charts for categorical data (simplified to match CodeAlongs)

# Count books by genre
genre_counts = books_df['genre'].value_counts()

plt.figure(figsize=(10, 6))
plt.bar(genre_counts.index, genre_counts.values)
plt.xlabel('Genre')
plt.ylabel('Number of Books')
plt.title('Books by Genre')
plt.show()

# Bar chart for average ratings by genre (simplified)
unique_genres = books_df['genre'].unique()
genre_ratings = []

for genre in unique_genres:
    genre_books = books_df[books_df['genre'] == genre]
    avg_rating = genre_books['rating'].mean()
    genre_ratings.append(avg_rating)

plt.figure(figsize=(10, 6))
plt.bar(unique_genres, genre_ratings)
plt.xlabel('Genre')  
plt.ylabel('Average Rating')
plt.title('Average Book Ratings by Genre')
plt.show()

# Summary
print("Bar chart insights:")
print("Most common genre: " + genre_counts.index[0] + " (" + str(genre_counts.iloc[0]) + " books)")

# Find highest rated genre manually
highest_rating_idx = 0
for i in range(len(genre_ratings)):
    if genre_ratings[i] > genre_ratings[highest_rating_idx]:
        highest_rating_idx = i

highest_rated_genre = unique_genres[highest_rating_idx]
highest_rating = genre_ratings[highest_rating_idx]
print("Highest-rated genre: " + highest_rated_genre + " (" + str(round(highest_rating, 2)) + " average rating)")

## Exercise 3: Scatter Plots and Correlations
Explore relationships between numerical variables.

In [None]:
# Basic scatter plot: Year vs Pages
plt.figure(figsize=(10, 6))
plt.scatter(books_df['year'], books_df['pages'])
plt.xlabel('Publication Year')
plt.ylabel('Number of Pages')
plt.title('Book Length Over Time')
plt.show()

# Scatter plot: Rating vs Pages
plt.figure(figsize=(10, 6))
plt.scatter(books_df['rating'], books_df['pages'])
plt.xlabel('Rating')
plt.ylabel('Number of Pages')
plt.title('Book Rating vs Length')
plt.show()

# Colored scatter plot by genre (simplified)
genres = books_df['genre'].unique()
colors = ['red', 'blue', 'green', 'orange', 'purple', 'brown', 'pink', 'gray']

plt.figure(figsize=(12, 8))
for i, genre in enumerate(genres):
    genre_data = books_df[books_df['genre'] == genre]
    color = colors[i % len(colors)]  # Cycle through colors
    plt.scatter(genre_data['year'], genre_data['pages'], label=genre, color=color)

plt.xlabel('Publication Year')
plt.ylabel('Number of Pages')
plt.title('Book Length Over Time by Genre')
plt.legend()
plt.show()

# Analysis
print("Scatter plot insights:")
correlation_year_pages = books_df['year'].corr(books_df['pages'])
correlation_rating_pages = books_df['rating'].corr(books_df['pages'])
print("Correlation between year and pages: " + str(round(correlation_year_pages, 3)))
print("Correlation between rating and pages: " + str(round(correlation_rating_pages, 3)))

## Exercise 4: Time Series and Historical Trends
Visualize changes in cultural data over time.

In [None]:
# Line plot showing trends over time (simplified)
plt.figure(figsize=(10, 6))

# Get years in order
years_sorted = sorted(books_df['year'].unique())
avg_pages_by_year = []

for year in years_sorted:
    year_books = books_df[books_df['year'] == year]
    avg_pages = year_books['pages'].mean()
    avg_pages_by_year.append(avg_pages)

plt.plot(years_sorted, avg_pages_by_year, marker='o')
plt.xlabel('Year')
plt.ylabel('Average Pages')
plt.title('Average Book Length Over Time')
plt.show()

# Simple histogram for page distribution
plt.figure(figsize=(10, 6))
plt.hist(books_df['pages'], bins=10)
plt.xlabel('Number of Pages')
plt.ylabel('Frequency')
plt.title('Distribution of Book Lengths')
plt.show()

# Simple analysis
print("Line plot insights:")
print("Years covered: " + str(min(years_sorted)) + " to " + str(max(years_sorted)))
print("Average pages overall: " + str(round(books_df['pages'].mean(), 1)))
print("Most common page range: " + str(books_df['pages'].min()) + " - " + str(books_df['pages'].max()))

## Exercise 5: Advanced Visualizations
Create more sophisticated plots for cultural analysis.

In [None]:
# Simple pie chart for genre distribution
plt.figure(figsize=(8, 8))
genre_counts = books_df['genre'].value_counts()

plt.pie(genre_counts.values, labels=genre_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Distribution of Books by Genre')
plt.show()

# Simple histogram for rating distribution
plt.figure(figsize=(10, 6))
plt.hist(books_df['rating'], bins=8)
plt.xlabel('Rating')
plt.ylabel('Frequency') 
plt.title('Distribution of Book Ratings')
plt.show()

# Summary statistics
print("Genre and Rating Analysis:")
print("Most common genre: " + genre_counts.index[0])
print("Number of different genres: " + str(len(genre_counts)))
print("Average rating: " + str(round(books_df['rating'].mean(), 2)))
print("Rating range: " + str(books_df['rating'].min()) + " to " + str(books_df['rating'].max()))

## Exercise 6: Creating Your Own Visualizations
Apply visualization techniques to your own cultural dataset.

In [None]:
# TODO: Create your own visualizations
print("CREATE YOUR OWN CULTURAL VISUALIZATIONS")
print("=" * 40)

# TODO: Replace with your own cultural dataset
your_data = {
    'item': ['Item A', 'Item B', 'Item C', 'Item D', 'Item E'],
    'value1': [10, 15, 12, 18, 14],
    'value2': [5.2, 3.8, 4.1, 4.5, 3.9],
    'category': ['Type X', 'Type Y', 'Type X', 'Type Z', 'Type Y']
}

your_df = pd.DataFrame(your_data)

print("Your sample dataset:")
print(your_df)
print()

# TODO: Create visualizations of your data
print("Sample visualizations:")

# Simple bar chart
plt.figure(figsize=(8, 6))
plt.bar(your_df['item'], your_df['value1'])
plt.xlabel('Items')
plt.ylabel('Value 1')
plt.title('Your Data - Bar Chart')
plt.show()

# Simple scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(your_df['value1'], your_df['value2'])
plt.xlabel('Value 1')
plt.ylabel('Value 2')
plt.title('Your Data - Scatter Plot')
plt.show()

# TODO: Customize with your research questions
print()
print("TO CUSTOMIZE THIS SECTION:")
print("1. Replace 'your_data' with your cultural dataset")
print("2. Choose appropriate chart types for your data")
print("3. Add meaningful titles and labels")
print("4. Analyze patterns you discover")
print("5. Connect visualizations to your research questions")

print()
print("=" * 40)
print("Remember: Good visualizations tell a story about your cultural data!")

## Exercise 7: Interactive and Multi-panel Dashboards
Create comprehensive visualization dashboards.

In [None]:
# Simple analysis and summary
print("DATA VISUALIZATION SUMMARY")
print("=" * 40)

# Basic dataset summary
print("Dataset Overview:")
print("Total books: " + str(len(books_df)))
print("Average pages: " + str(round(books_df['pages'].mean(), 1)))
print("Average rating: " + str(round(books_df['rating'].mean(), 2)))
print("Publication years: " + str(books_df['year'].min()) + " to " + str(books_df['year'].max()))
print()

# Genre analysis
print("Genre Analysis:")
genre_counts = books_df['genre'].value_counts()
print("Most common genre: " + genre_counts.index[0] + " (" + str(genre_counts.iloc[0]) + " books)")
print("Total genres: " + str(len(genre_counts)))
print()

# Simple correlations
correlation_year_pages = books_df['year'].corr(books_df['pages'])
correlation_rating_pages = books_df['rating'].corr(books_df['pages'])
print("Simple Correlations:")
print("Year vs Pages: " + str(round(correlation_year_pages, 3)))
print("Rating vs Pages: " + str(round(correlation_rating_pages, 3)))
print()

print("VISUALIZATION SKILLS LEARNED:")
print("- Basic scatter plots with plt.scatter()")
print("- Bar charts with plt.bar() and value_counts()")
print("- Line plots with plt.plot()")
print("- Histograms with plt.hist()")
print("- Pie charts with plt.pie()")
print("- Adding titles, labels, and legends")
print("- Basic data analysis with pandas")

print()
print("NEXT STEPS:")
print("- Practice with your own cultural datasets")
print("- Experiment with different chart types")
print("- Focus on telling stories with your visualizations")
print("- Always include clear titles and labels")

## Summary

You explored:
- Setting up visualization libraries and creating basic plots
- Bar charts for categorical cultural data analysis
- Scatter plots and correlation analysis
- Time series visualization for historical trends
- Advanced visualization techniques (heatmaps, box plots, pie charts)
- Creating comprehensive multi-panel dashboards
- Critical evaluation of visualization ethics and effectiveness

**Key Visualization Types:**
- **Bar Charts**: Categorical comparisons, distributions
- **Scatter Plots**: Relationships, correlations, trends
- **Line Charts**: Time series, historical changes
- **Histograms**: Data distributions, patterns
- **Box Plots**: Statistical summaries, outliers
- **Heatmaps**: Correlation matrices, complex relationships
- **Pie Charts**: Proportional data, compositions
- **Dashboards**: Comprehensive multi-dimensional analysis

**Design Principles:**
- Choose appropriate chart types for your data and message
- Use color strategically and accessibility-consciously
- Provide clear titles, labels, and legends
- Consider your audience and context
- Balance detail with clarity
- Be aware of potential biases and misrepresentations

**Cultural Applications:**
- Literary analysis across time periods and genres
- Historical trend identification
- Cross-cultural comparisons
- Pattern recognition in large cultural datasets
- Communicating research findings to diverse audiences

**Next:** Review 09 will integrate all skills in a comprehensive  exercise.

---
 