#### Pandas Part 70: Hexbin Plotting and Pandas Arrays

This notebook covers hexagonal binning plots and introduces pandas arrays and data types.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### 1. Hexbin Plotting

Hexagonal binning is a form of bivariate histogram that bins points into hexagonal cells. It's useful for visualizing the relationship between two continuous variables, especially when dealing with large datasets where scatter plots might become too crowded.

In [None]:
# Generate random data from a normal distribution
n = 10000
df = pd.DataFrame({
    'x': np.random.randn(n),
    'y': np.random.randn(n)
})

# Display the first few rows
df.head()

In [None]:
# Create a hexbin plot with default settings
ax = df.plot.hexbin(x='x', y='y', gridsize=20)
plt.title('Hexbin Plot with Default Settings')

### Hexbin with Custom Reduce Function

We can use the `C` parameter to specify values at coordinates and `reduce_C_function` to determine how these values are aggregated within each hexagonal bin.

In [None]:
# Generate data with observations
n = 500
df2 = pd.DataFrame({
    'coord_x': np.random.uniform(-3, 3, size=n),
    'coord_y': np.random.uniform(30, 50, size=n),
    'observations': np.random.randint(1, 5, size=n)
})

# Display the first few rows
df2.head()

In [None]:
# Create a hexbin plot with sum as the reduce function
ax = df2.plot.hexbin(
    x='coord_x',
    y='coord_y',
    C='observations',
    reduce_C_function=np.sum,
    gridsize=10,
    cmap="viridis"
)
plt.title('Hexbin Plot with Sum Reduce Function')

In [None]:
# Create a hexbin plot with mean as the reduce function (default)
ax = df2.plot.hexbin(
    x='coord_x',
    y='coord_y',
    C='observations',
    gridsize=10,
    cmap="plasma"
)
plt.title('Hexbin Plot with Mean Reduce Function (Default)')

In [None]:
# Create a hexbin plot with max as the reduce function
ax = df2.plot.hexbin(
    x='coord_x',
    y='coord_y',
    C='observations',
    reduce_C_function=np.max,
    gridsize=10,
    cmap="inferno"
)
plt.title('Hexbin Plot with Max Reduce Function')

### Customizing Hexbin Plots

In [None]:
# Create a hexbin plot with custom gridsize
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

df.plot.hexbin(x='x', y='y', gridsize=10, ax=axes[0])
axes[0].set_title('Gridsize = 10')

df.plot.hexbin(x='x', y='y', gridsize=20, ax=axes[1])
axes[1].set_title('Gridsize = 20')

df.plot.hexbin(x='x', y='y', gridsize=30, ax=axes[2])
axes[2].set_title('Gridsize = 30')

plt.tight_layout()

In [None]:
# Create a hexbin plot with different colormaps
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

df.plot.hexbin(x='x', y='y', gridsize=20, cmap='viridis', ax=axes[0, 0])
axes[0, 0].set_title('Colormap: viridis')

df.plot.hexbin(x='x', y='y', gridsize=20, cmap='plasma', ax=axes[0, 1])
axes[0, 1].set_title('Colormap: plasma')

df.plot.hexbin(x='x', y='y', gridsize=20, cmap='inferno', ax=axes[1, 0])
axes[1, 0].set_title('Colormap: inferno')

df.plot.hexbin(x='x', y='y', gridsize=20, cmap='magma', ax=axes[1, 1])
axes[1, 1].set_title('Colormap: magma')

plt.tight_layout()

##### 2. Pandas Arrays and Data Types

Pandas extends NumPy's type system with additional data types for various kinds of data. Let's explore some of these data types.

### DatetimeTZDtype - Timezone-aware Datetime

In [None]:
# Create a Series with timezone-aware datetime
dates = pd.date_range('2023-01-01', periods=5, tz='US/Eastern')
s_dates = pd.Series(dates)
s_dates

In [None]:
# Check the dtype
s_dates.dtype

### Timedelta Data

In [None]:
# Create a Series with timedelta data
td = pd.Series([pd.Timedelta(days=i) for i in range(5)])
td

In [None]:
# Check the dtype
td.dtype

### Period Data (Time Spans)

In [None]:
# Create a Series with period data
periods = pd.Series(pd.period_range('2023-01', periods=5, freq='M'))
periods

In [None]:
# Check the dtype
periods.dtype

### Interval Data

In [None]:
# Create a Series with interval data
intervals = pd.Series(pd.interval_range(start=0, end=5))
intervals

In [None]:
# Check the dtype
intervals.dtype

### Nullable Integer

In [None]:
# Create a Series with nullable integer data
int_with_na = pd.Series([1, 2, None, 4, 5], dtype='Int64')
int_with_na

In [None]:
# Check the dtype
int_with_na.dtype

### Categorical Data

In [None]:
# Create a Series with categorical data
cat = pd.Series(['a', 'b', 'c', 'a', 'b'], dtype='category')
cat

In [None]:
# Check the dtype
cat.dtype

### Sparse Data

In [None]:
# Create a Series with sparse data
sparse = pd.Series([0, 0, 1, 0, 0, 2, 0, 0, 0]).astype('Sparse')
sparse

In [None]:
# Check the dtype
sparse.dtype

### String Data

In [None]:
# Create a Series with string data
strings = pd.Series(['a', 'b', None, 'd'], dtype='string')
strings

In [None]:
# Check the dtype
strings.dtype

### Boolean Data with Missing Values

In [None]:
# Create a Series with boolean data including missing values
bools = pd.Series([True, False, None, True], dtype='boolean')
bools

In [None]:
# Check the dtype
bools.dtype

##### 3. Using pandas.array() Function

The `pandas.array()` function can be used to create arrays with specific data types.

In [None]:
# Create an integer array with missing values
int_array = pd.array([1, 2, None, 4, 5], dtype='Int64')
int_array

In [None]:
# Create a boolean array with missing values
bool_array = pd.array([True, False, None, True], dtype='boolean')
bool_array

In [None]:
# Create a string array
string_array = pd.array(['a', 'b', None, 'd'], dtype='string')
string_array

In [None]:
# Create a Series from the array
s = pd.Series(int_array)
s

In [None]:
# Create a DataFrame with arrays
df = pd.DataFrame({
    'integers': int_array,
    'booleans': bool_array,
    'strings': string_array
})
df

In [None]:
# Check the dtypes
df.dtypes