#### Part 24: Categorical Data (Continued) and Visualization in Pandas

In this notebook, we'll explore:
- More operations with categorical data
- Introduction to data visualization with pandas
- Various plot types and customization options

##### Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set the plotting style
plt.style.use('ggplot')

# Make plots appear in the notebook
%matplotlib inline

##### 1. Categorical Data (Continued)

### 1.1 String and Datetime Accessors with Categorical Data

The accessors `.str` and `.dt` work with categorical data if the categories are of the appropriate type:

In [None]:
# Create a string Series and convert to category
str_s = pd.Series(list('aabb'))
str_cat = str_s.astype('category')
print("Categorical string Series:")
print(str_cat)

# Use string accessor
print("\nUsing .str.contains():")
print(str_cat.str.contains("a"))

In [None]:
# Create a datetime Series and convert to category
date_s = pd.Series(pd.date_range('1/1/2015', periods=5))
date_cat = date_s.astype('category')
print("Categorical datetime Series:")
print(date_cat)

# Use datetime accessor
print("\nUsing .dt.day:")
print(date_cat.dt.day)

Note that the returned Series from methods on the accessors of a categorical Series will be of the same type as if you used the method on a regular Series of that type (not of type category):

In [None]:
# Compare the results of .str.contains() on both Series types
ret_s = str_s.str.contains("a")
ret_cat = str_cat.str.contains("a")

print(f"Same dtype: {ret_s.dtype == ret_cat.dtype}")
print("\nSame values:")
print(ret_s == ret_cat)

### 1.2 Setting Values in Categorical Data

Setting values in a categorical column (or Series) works as long as the value is included in the categories:

In [None]:
# Create a DataFrame with a categorical column
idx = pd.Index(["h", "i", "j", "k", "l", "m", "n"])
cats = pd.Categorical(["a", "a", "a", "a", "a", "a", "a"], categories=["a", "b"])
values = [1, 1, 1, 1, 1, 1, 1]
df = pd.DataFrame({"cats": cats, "values": values}, index=idx)
print("Original DataFrame:")
print(df)

# Set values using .iloc
df.iloc[2:4, :] = [["b", 2], ["b", 2]]
print("\nAfter setting values:")
print(df)

If you try to set a value that is not in the categories, you'll get an error:

In [None]:
# Try to set a value that is not in the categories
try:
    df.iloc[2:4, :] = [["c", 3], ["c", 3]]
except ValueError as e:
    print(f"Error: {e}")

Setting values by assigning categorical data will also check that the categories match:

In [None]:
# Set values using a categorical with matching categories
df.loc["j":"k", "cats"] = pd.Categorical(["a", "a"], categories=["a", "b"])
print("After setting with matching categories:")
print(df)

# Try to set values using a categorical with different categories
try:
    df.loc["j":"k", "cats"] = pd.Categorical(["b", "b"], categories=["a", "b", "c"])
except ValueError as e:
    print(f"\nError: {e}")

##### 2. Data Visualization with Pandas

Pandas provides a high-level interface for creating various types of plots using matplotlib. Let's explore some of the visualization capabilities.

### 2.1 Basic Plotting

Let's create a simple DataFrame and plot it:

In [None]:
# Create a DataFrame with time series data
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=['A', 'B', 'C', 'D'])
df = df.cumsum()

# Plot the DataFrame
df.plot(figsize=(10, 6), title='Basic Time Series Plot')

You can also plot specific columns:

In [None]:
# Plot specific columns
df[['A', 'B']].plot(figsize=(10, 6), title='Columns A and B')

### 2.2 Other Plot Types

Pandas supports various plot types, which can be specified using the `kind` parameter or by using the `.plot.<kind>()` method. Some of the available plot types include:

- 'line' (default): Line plot
- 'bar' or 'barh': Bar plot (vertical or horizontal)
- 'hist': Histogram
- 'box': Box plot
- 'kde' or 'density': Kernel density estimate plot
- 'area': Area plot
- 'scatter': Scatter plot
- 'hexbin': Hexagonal bin plot
- 'pie': Pie plot

Let's explore some of these plot types:

In [None]:
# Bar plot of a single row
plt.figure(figsize=(10, 6))
df.iloc[5].plot(kind='bar', title='Bar Plot of Row 5')
plt.axhline(0, color='k')  # Add a horizontal line at y=0

In [None]:
# Create a DataFrame for bar plot
df2 = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])

# Bar plot of the entire DataFrame
df2.plot.bar(figsize=(10, 6), title='Multiple Bar Plot')

In [None]:
# Stacked bar plot
df2.plot.bar(stacked=True, figsize=(10, 6), title='Stacked Bar Plot')

In [None]:
# Horizontal bar plot
df2.plot.barh(figsize=(10, 6), title='Horizontal Bar Plot')

### 2.3 Histograms and Density Plots

In [None]:
# Histogram
df['A'].plot.hist(bins=20, figsize=(10, 6), title='Histogram of Column A')

In [None]:
# Kernel Density Estimate (KDE) plot
df['A'].plot.kde(figsize=(10, 6), title='Density Plot of Column A')

### 2.4 Box Plots

In [None]:
# Box plot
df.plot.box(figsize=(10, 6), title='Box Plot')

### 2.5 Area Plots

In [None]:
# Area plot
df.plot.area(figsize=(10, 6), title='Area Plot')

### 2.6 Scatter Plots

In [None]:
# Scatter plot
df.plot.scatter(x='A', y='B', figsize=(10, 6), title='Scatter Plot of A vs B')

In [None]:
# Scatter plot with color and size
df.plot.scatter(x='A', y='B', c='C', s=df['D'] * 100, figsize=(10, 6), 
                title='Scatter Plot with Color and Size')

### 2.7 Pie Charts

In [None]:
# Create data for pie chart
pie_data = pd.Series(3 * np.random.rand(4), index=['a', 'b', 'c', 'd'], name='series')

# Pie chart
pie_data.plot.pie(figsize=(10, 6), autopct='%.2f', title='Pie Chart')

##### Summary

In this notebook, we've explored:

1. More operations with categorical data in pandas, including:
   - Using string and datetime accessors with categorical data
   - Setting values in categorical data

2. Data visualization with pandas, including:
   - Basic line plots
   - Bar plots (vertical, horizontal, and stacked)
   - Histograms and density plots
   - Box plots
   - Area plots
   - Scatter plots
   - Pie charts

These visualization capabilities make pandas a powerful tool for exploratory data analysis, allowing you to quickly visualize your data in various ways.