<a href="https://colab.research.google.com/github/TCU-DCDA/WRIT20833-2025/blob/main/notebooks/homework/WRIT20833_Found_Data_Student_Practice_F25.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Found Data & Pandas Fundamentals: Student Practice
## Your First Cultural Data Analysis Project

**Name:** ________________________________  
**Date:** ________________________________

Welcome to your first hands-on cultural data analysis project! This notebook is your space to practice the fundamental pandas skills from the main lesson using your own found cultural dataset.

### üìã Dataset Requirements Checklist
**‚úÖ Your dataset should include:**
- [ ] **Text data** (names, titles, categories, descriptions)
- [ ] **Numeric data** (counts, ratings, years, measurements)
- [ ] **At least 10-15 rows** for meaningful exploration
- [ ] **Cultural relevance** (arts, literature, history, media, etc.)

### üìÇ Where to Find Cultural Datasets:
- **Kaggle**: Search for "movies", "books", "music", "museums", "art"
- **Government Data**: Cultural statistics, arts funding, census data
- **Digital Collections**: Library catalogs, museum databases
- **Academic Sources**: Digital humanities repositories

### üéØ Learning Goals:
By the end of this practice, you will:
- Load and explore a real cultural dataset
- Apply basic pandas operations (filtering, sorting, calculations)
- Create meaningful visualizations
- Reflect critically on what your data reveals about culture

## Part 1: Data Ethics and Collection Context

Before diving into analysis, let's think about the ethics and context of your chosen dataset.

### ü§î Reflection: Your Dataset's Origin
**Answer these questions about your chosen dataset:**

1. **What cultural topic does your dataset represent?**

2. **Where did this data come from? Who collected it and why?**

3. **What biases might exist in how this data was collected or categorized?**

4. **Who or what might be missing from this dataset?**

5. **How might the data collection methods affect your analysis?**

## Part 2: Loading Your Cultural Dataset

In [None]:
# Import essential libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Set display options for better readability
pd.options.display.max_rows = 50
pd.options.display.max_columns = 15

In [None]:
# Load your cultural dataset
# Replace 'your_filename.csv' with your actual file name
# For other formats: pd.read_excel(), pd.read_json(), etc.

cultural_data = pd.read_csv('your_filename.csv')

print(f"üìä Successfully loaded dataset!")
print(f"Dataset contains {len(cultural_data)} rows and {len(cultural_data.columns)} columns")

## Part 3: Initial Data Exploration

In [None]:
# First look at your data
print("First 5 rows of your cultural dataset:")
cultural_data.head()

In [None]:
# Get basic information about your dataset
print("Dataset structure and info:")
cultural_data.info()

In [None]:
# Examine column names and types
print("Column names:")
for i, col in enumerate(cultural_data.columns, 1):
    print(f"{i}. {col} ({cultural_data[col].dtype})")

In [None]:
# Look at a few random samples
print("Random sample from your dataset:")
cultural_data.sample(3)

### üìù Initial Observations:
**Record your first impressions:**

**What columns seem most interesting for cultural analysis?**

**What data types do you see? (text, numbers, dates)**

**Any obvious data quality issues?**

**What cultural questions could this data help answer?**

## Part 4: Understanding Your Data Structure

In [None]:
# Check the shape and basic statistics
print(f"Dataset shape: {cultural_data.shape}")
print(f"Total data points: {cultural_data.shape[0] * cultural_data.shape[1]}")

# Check for missing data
print("\nMissing data check:")
missing_data = cultural_data.isnull().sum()
if missing_data.sum() > 0:
    print(missing_data[missing_data > 0])
else:
    print("‚úÖ No missing data found!")

In [None]:
# Explore categorical/text columns
print("Text/Categorical columns analysis:")
for col in cultural_data.select_dtypes(include=['object']).columns:
    unique_count = cultural_data[col].nunique()
    print(f"\n{col}: {unique_count} unique values")
    if unique_count <= 10:
        print(f"  Values: {cultural_data[col].unique().tolist()}")
    else:
        print(f"  Sample values: {cultural_data[col].unique()[:5].tolist()}...")

In [None]:
# Explore numeric columns
numeric_columns = cultural_data.select_dtypes(include=[np.number]).columns
if len(numeric_columns) > 0:
    print("Numeric columns summary:")
    print(cultural_data[numeric_columns].describe())
else:
    print("No numeric columns found in this dataset.")
    print("üí° Focus on categorical analysis and text exploration.")

## Part 5: Selecting and Filtering Your Data

Practice selecting specific columns and filtering rows based on your cultural research interests.

In [None]:
# Select specific columns for analysis
# Replace with your actual column names
key_columns = ['column1', 'column2', 'column3']  # Update these

# Uncomment and modify the line below:
# focused_data = cultural_data[key_columns]
# print("Selected columns for analysis:")
# focused_data.head()

print("üëÜ Update the column names above to match your dataset")

In [None]:
# Practice filtering - Example templates
# Modify these examples to work with your data:

# Example 1: Filter by text value
# filtered_data = cultural_data[cultural_data['category_column'] == 'specific_value']

# Example 2: Filter by numeric value
# filtered_data = cultural_data[cultural_data['numeric_column'] > 100]

# Example 3: Complex filter
# filtered_data = cultural_data[(cultural_data['year'] > 2000) & (cultural_data['category'] == 'Fiction')]

print("Add your filtering code here based on your cultural research questions")
print("Examples:")
print("- Books published after a certain year")
print("- Movies of a specific genre")
print("- Artists from a particular region")
print("- Museums with high visitor counts")

### üéØ Your Filtering Strategy:
**What filtering will help answer your cultural questions?**

**Filter 1:** 

**Filter 2:** 

**Filter 3:** 

## Part 6: Counting and Analyzing Categories

In [None]:
# Count values in categorical columns
# Replace 'category_column' with your actual column name

# category_counts = cultural_data['category_column'].value_counts()
# print("Distribution of categories:")
# print(category_counts)

print("üëÜ Choose a categorical column from your dataset to analyze")
print("This could be: genres, countries, time periods, types, etc.")

In [None]:
# Analyze a second categorical column
# second_category_counts = cultural_data['second_column'].value_counts()
# print("Distribution of second category:")
# print(second_category_counts)

print("Analyze another categorical column here")

## Part 7: Basic Statistical Analysis

In [None]:
# Calculate basic statistics for numeric columns
# Replace 'numeric_column' with your actual column name

# if 'numeric_column' in cultural_data.columns:
#     print("Statistical summary:")
#     print(f"Average: {cultural_data['numeric_column'].mean():.2f}")
#     print(f"Median: {cultural_data['numeric_column'].median():.2f}")
#     print(f"Minimum: {cultural_data['numeric_column'].min()}")
#     print(f"Maximum: {cultural_data['numeric_column'].max()}")
#     print(f"Standard deviation: {cultural_data['numeric_column'].std():.2f}")

print("üëÜ Choose a numeric column to analyze")
print("This could be: years, ratings, counts, prices, durations, etc.")

In [None]:
# Find extremes - most and least
# Replace with your actual columns

# # Find highest value
# max_row = cultural_data[cultural_data['numeric_column'] == cultural_data['numeric_column'].max()]
# print("Highest value:")
# print(max_row[['title_column', 'numeric_column']].iloc[0])

# # Find lowest value
# min_row = cultural_data[cultural_data['numeric_column'] == cultural_data['numeric_column'].min()]
# print("\nLowest value:")
# print(min_row[['title_column', 'numeric_column']].iloc[0])

print("Find the items with highest and lowest values in your dataset")

## Part 8: Data Visualization

In [None]:
# Create a bar chart for categorical data
# Replace with your actual data

# category_counts = cultural_data['category_column'].value_counts().head(10)  # Top 10
# category_counts.plot(kind='bar', title='Distribution of Categories', figsize=(10, 6))
# plt.xlabel('Category')
# plt.ylabel('Count')
# plt.xticks(rotation=45)
# plt.tight_layout()
# plt.show()

print("Create a bar chart showing the distribution of a categorical variable")
print("This helps reveal which categories are most/least common in your cultural data")

In [None]:
# Create a histogram for numeric data
# Replace with your actual numeric column

# cultural_data['numeric_column'].hist(bins=20, title='Distribution of Values', figsize=(8, 6))
# plt.xlabel('Value')
# plt.ylabel('Frequency')
# plt.title('Distribution of [Your Variable Name]')
# plt.show()

print("Create a histogram to show the distribution of a numeric variable")
print("This reveals patterns like: Are most values clustered? Are there outliers?")

In [None]:
# Create a scatter plot (if you have two numeric columns)
# Replace with your actual columns

# cultural_data.plot.scatter(x='numeric_column1', y='numeric_column2', 
#                           title='Relationship Between Variables', figsize=(8, 6))
# plt.xlabel('Variable 1')
# plt.ylabel('Variable 2')
# plt.show()

print("If you have two numeric columns, create a scatter plot")
print("This shows relationships: Do higher values in one variable relate to higher values in another?")

## Part 9: Cultural Analysis and Insights

In [None]:
# Combine what you've learned into a cultural insight
# Example analysis combining multiple findings:

# print("CULTURAL ANALYSIS SUMMARY")
# print("=" * 40)
# print(f"Dataset covers: [Your cultural domain]")
# print(f"Time period: [If applicable]")
# print(f"Geographic scope: [If applicable]")
# print(f"Most common category: [From your analysis]")
# print(f"Average [metric]: [From your calculations]")
# print(f"Notable outlier: [Interesting finding]")

print("Summarize your key findings about the cultural patterns in your data")

## Part 10: Save Your Work

In [None]:
# Save any filtered or processed data
# cultural_data.to_csv('my_analyzed_cultural_data.csv', index=False)
# print("‚úÖ Saved your analyzed dataset")

print("Save your work if you've created any interesting filtered datasets")

## Part 11: Critical Reflection and Discussion

### üîç Data Analysis Reflection:
**What surprised you most about your dataset?**


**What patterns or trends did you discover?**


**What limitations did you encounter with your data?**


### üé® Cultural Insights:
**What does your analysis reveal about the cultural domain you studied?**


**How might historical, social, or economic factors explain the patterns you found?**


**What stereotypes or assumptions does your data challenge or confirm?**


### ü§î Critical Questions:
**Who or what might be missing from this dataset?**


**How might the data collection methods have influenced your findings?**


**What ethical considerations should researchers keep in mind when using this type of data?**


### üöÄ Future Research:
**What additional data would help you understand this cultural domain better?**


**What new questions has this analysis raised?**


**How could this analysis be useful for cultural institutions, policymakers, or communities?**


## üéì Congratulations!

You've completed your first cultural data analysis project! You've successfully:

### ‚úÖ Technical Skills Mastered:
- Loaded and explored a real cultural dataset
- Applied fundamental pandas operations
- Created meaningful visualizations
- Identified patterns and trends in cultural data

### ‚úÖ Critical Thinking Skills Developed:
- Questioned data sources and collection methods
- Recognized limitations and biases in cultural datasets
- Connected quantitative findings to cultural contexts
- Considered ethical implications of data analysis

### üéØ Next Steps:
1. **Try different datasets** to practice these skills
2. **Learn advanced pandas techniques** for more complex analysis
3. **Explore specialized cultural data tools** for your research interests
4. **Share your findings** with classmates and get feedback

Remember: **Every dataset tells a story about culture and society.** The skills you've practiced here will help you become a more thoughtful and critical analyst of cultural information!

### üìö Submit Your Work:
Make sure your notebook includes:
- [ ] Completed reflection questions
- [ ] Working code with your actual dataset
- [ ] At least one meaningful visualization
- [ ] Cultural insights and critical analysis
- [ ] Discussion of limitations and future research directions