# Categorical vs Categorical data analysis

In categorical vs. categorical data analysis, you're examining relationships between two categorical variables. Here are some common cases or topics in this type of analysis, along with code snippets and descriptions for each:

### 1. **Contingency Table (Cross-tabulation)**

**Description**: A contingency table shows the frequency distribution of variables. It's useful for understanding the relationship between two categorical variables.

**Code**:
```python
import pandas as pd

# Create a contingency table
contingency_table = pd.crosstab(df['Categorical_Variable_1'], df['Categorical_Variable_2'])
print(contingency_table)
```

### 2. **Chi-Square Test of Independence**

**Description**: This test determines if there is a significant association between two categorical variables.

**Code**:
```python
from scipy.stats import chi2_contingency

# Perform Chi-Square test
chi2, p, _, _ = chi2_contingency(contingency_table)
print(f'Chi-Square Statistic: {chi2}')
print(f'P-Value: {p}')
```

### 3. **Stacked Bar Plot**

**Description**: This visualization helps in understanding the proportion of categories within each group.

**Code**:
```python
import seaborn as sns
import matplotlib.pyplot as plt

# Plot a stacked bar plot
pd.crosstab(df['Categorical_Variable_1'], df['Categorical_Variable_2']).plot(kind='bar', stacked=True)
plt.title('Stacked Bar Plot of Categorical Variables')
plt.xlabel('Categorical Variable 1')
plt.ylabel('Count')
plt.show()
```

### 4. **Mosaic Plot**

**Description**: A mosaic plot provides a graphical representation of the contingency table, showing the relationship between categorical variables.

**Code**:
```python
from statsmodels.graphics.mosaicplot import mosaic

# Plot a mosaic plot
mosaic(df, ['Categorical_Variable_1', 'Categorical_Variable_2'])
plt.title('Mosaic Plot of Categorical Variables')
plt.show()
```

### 5. **Clustered Bar Plot**

**Description**: This plot groups bars by categories to compare the counts or proportions between groups.

**Code**:
```python
# Generate a clustered bar plot
sns.countplot(data=df, x='Categorical_Variable_1', hue='Categorical_Variable_2')
plt.title('Clustered Bar Plot of Categorical Variables')
plt.xlabel('Categorical Variable 1')
plt.ylabel('Count')
plt.show()
```

### 6. **Heatmap of Contingency Table**

**Description**: A heatmap provides a visual representation of the contingency table, showing the intensity of the counts.

**Code**:
```python
import seaborn as sns
import matplotlib.pyplot as plt

# Plot a heatmap
sns.heatmap(contingency_table, annot=True, cmap='Blues', fmt='d')
plt.title('Heatmap of Contingency Table')
plt.xlabel('Categorical Variable 2')
plt.ylabel('Categorical Variable 1')
plt.show()
```

### 7. **Relative Frequency Table**

**Description**: This table shows the proportions or percentages of occurrences within each category of the variables.

**Code**:
```python
# Create a relative frequency table
relative_freq_table = pd.crosstab(df['Categorical_Variable_1'], df['Categorical_Variable_2'], normalize='all')
print(relative_freq_table)
```

### 8. **Bar Plot of Proportions**

**Description**: This visualization shows the proportions of categories in each group, making it easier to compare relative sizes.

**Code**:
```python
# Calculate proportions
prop_table = pd.crosstab(df['Categorical_Variable_1'], df['Categorical_Variable_2'], normalize='index')

# Plot proportions
prop_table.plot(kind='bar', stacked=True)
plt.title('Bar Plot of Proportions')
plt.xlabel('Categorical Variable 1')
plt.ylabel('Proportion')
plt.show()
```

These analyses and visualizations help in understanding how categorical variables interact with each other and identifying any significant associations or patterns in your data.