<a href="https://colab.research.google.com/github/componavt/LLLE-R1900s/blob/main/src/visualization/grouped_bar_chart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üìä Grouped Bar Chart: Annual Loan Volumes & Counts by Credit Type

Visualizes **annual dynamics** of selected credit categories across all settlements.

**Top bars (‚Üë)** = total loan amount (in **thousands of rubles**)  
**Bottom bars (‚Üì)** = total number of loans  

Each selected credit type is shown with:
- **Unique color**
- **Consistent style** for both amount and count (same color for ‚Üë and ‚Üì)

‚ú® **Features:**
- üåç Fully bilingual: toggle between Russian (`ru`) and English (`en`)
- üìÖ X-axis = years (e.g., 1913‚Äì1917)
- üé® Configurable credit types via `SELECTED_CREDIT_ITEMS`
- üìà Dual-axis: positive (amount) and negative (count) bars share color per category
- üñºÔ∏è Rendered directly in Google Colab

‚öôÔ∏è Configure via:
- `USE_LANGUAGE` ‚Äî `"ru"` or `"en"`
- `SELECTED_CREDIT_ITEMS` ‚Äî list of `Name` values from `credit_items.csv`

‚öôÔ∏è [2] Install Dependencies & Set Language + Credit Selection

In [1]:
# Install compatible versions for static image rendering in Colab
!pip install -q python-dotenv pandas plotly

# Clone the repo if running in Colab (optional ‚Äî if data not uploaded manually)
import os

if not os.path.exists('LLLE-R1900s'):
    !git clone https://github.com/componavt/LLLE-R1900s.git
    %cd LLLE-R1900s
else:
    %cd LLLE-R1900s

# === LANGUAGE SWITCH ===
USE_LANGUAGE = "ru"
assert USE_LANGUAGE in ("ru", "en"), "USE_LANGUAGE must be 'ru' or 'en'"

# === SELECT CREDIT TYPES TO VISUALIZE ===
# Must use values from the 'Name' column in credit_items.csv
SELECTED_CREDIT_ITEMS = ["Migration", "CraftMaterials", "CraftTools"]

Cloning into 'LLLE-R1900s'...
remote: Enumerating objects: 561, done.[K
remote: Counting objects: 100% (561/561), done.[K
remote: Compressing objects: 100% (259/259), done.[K
remote: Total 561 (delta 418), reused 400 (delta 296), pack-reused 0 (from 0)[K
Receiving objects: 100% (561/561), 1.41 MiB | 6.51 MiB/s, done.
Resolving deltas: 100% (418/418), done.
/content/LLLE-R1900s


üì• [3] Load & Prepare Data

In [2]:
import os
import pandas as pd
from dotenv import load_dotenv

# Validate SELECTED_CREDIT_ITEMS early (now that pandas is available)
valid_names = set(pd.read_csv('data/credit_items.csv')['Name'])
invalid = set(SELECTED_CREDIT_ITEMS) - valid_names
if invalid:
    raise ValueError(f"Invalid credit item(s): {invalid}. Must be from 'Name' column in credit_items.csv.")

# Load configuration
load_dotenv('config.env')

# Paths
csv_out_dir = os.getenv('CSV_OUT_DIR', 'data/csv_out')
output_file_name = os.getenv('OUTPUT_CSV_FILE')

if not output_file_name:
    csv_files = [f for f in os.listdir(csv_out_dir) if f.endswith('.csv')]
    if not csv_files:
        raise FileNotFoundError("No CSV files found in the output directory.")
    output_file_name = csv_files[0]

csv_path = os.path.join(csv_out_dir, output_file_name)
print(f"Loading loan data from: {csv_path}")

# Load main loan data
df_loans = pd.read_csv(csv_path)
print(f"Loaded {len(df_loans)} loan records.")

# Load credit items
df_credit = pd.read_csv('data/credit_items.csv')
print(f"Loaded {len(df_credit)} credit item definitions.")

# --- Build credit item display label map ---
if USE_LANGUAGE == "ru":
    df_credit['display_label'] = df_credit['loan_short_ru']
else:
    df_credit['display_label'] = df_credit['Name']

label_map = dict(zip(df_credit['Name'], df_credit['display_label']))

# Filter loans to only selected credit items
df_loans = df_loans[df_loans['credit_item'].isin(SELECTED_CREDIT_ITEMS)].copy()

# Ensure numeric types
df_loans['amount_rubles'] = pd.to_numeric(df_loans['amount_rubles'], errors='coerce').fillna(0)
df_loans['loan_count'] = pd.to_numeric(df_loans['loan_count'], errors='coerce').fillna(0).astype(int)

print(f"Filtered to {len(df_loans)} records for selected credit types: {SELECTED_CREDIT_ITEMS}")

Loading loan data from: data/csv_out/loans_s28_i21.csv
Loaded 1768 loan records.
Loaded 21 credit item definitions.
Filtered to 264 records for selected credit types: ['Migration', 'CraftMaterials', 'CraftTools']


üîç [4] Calculate Axis Extents and Prepare Annotations

In [3]:
# Aggregate by year and credit_item
df_annual = df_loans.groupby(['year', 'credit_item']).agg(
    total_amount=('amount_rubles', 'sum'),
    total_count=('loan_count', 'sum')
).reset_index()

df_annual['total_amount_k'] = df_annual['total_amount'] / 1000.0

years = sorted(df_annual['year'].unique())
credit_items = SELECTED_CREDIT_ITEMS

max_amount = df_annual['total_amount_k'].max() if not df_annual.empty else 1
max_count = df_annual['total_count'].max() if not df_annual.empty else 1

print(f"üìà Max loan amount (thsd rub): {max_amount:.2f}")
print(f"üìä Max loan count: {max_count}")

# --- Normalize both series to [-1, 1] range for visual balance ---
# We'll map [0, max_amount] ‚Üí [0, 1] and [0, max_count] ‚Üí [0, -1]
amount_norm = {}
count_norm = {}
amount_annotations = {}
count_annotations = {}

CREDIT_SYMBOLS = {
    "Migration": "üö∂",
    "CraftMaterials": "üßµ",
    "CraftTools": "üî®"
}

for item in credit_items:
    subset = df_annual[df_annual['credit_item'] == item]
    amount_dict = dict(zip(subset['year'], subset['total_amount_k']))
    count_dict = dict(zip(subset['year'], subset['total_count']))

    raw_amounts = [amount_dict.get(y, 0) for y in years]
    raw_counts = [count_dict.get(y, 0) for y in years]

    # Normalize
    norm_amounts = [a / max_amount if max_amount > 0 else 0 for a in raw_amounts]
    norm_counts = [-c / max_count if max_count > 0 else 0 for c in raw_counts]  # negative direction

    amount_norm[item] = norm_amounts
    count_norm[item] = norm_counts

    # Annotations show REAL values (not normalized)
    amount_annotations[item] = [f"{a:.1f}" if a > 0 else "" for a in raw_amounts]
    count_annotations[item] = [
        f"{CREDIT_SYMBOLS.get(item, '')}\n{int(c)}" if c > 0 else ""
        for c in raw_counts
    ]

# Store for cell 5
PLOT_DATA = {
    'years': years,
    'credit_items': credit_items,
    'amount_norm': amount_norm,
    'count_norm': count_norm,
    'amount_annotations': amount_annotations,
    'count_annotations': count_annotations,
    'max_amount': max_amount,
    'max_count': max_count
}

üìà Max loan amount (thsd rub): 9.56
üìä Max loan count: 249


üìä [5] Grouped Bar Chart: Annual Amount (‚Üë) and Count (‚Üì) by Credit Type
(Balanced Visual Scales and Custom Y-axis Labels)

In [5]:
import plotly
import plotly.graph_objects as go
import os

years = PLOT_DATA['years']
credit_items = PLOT_DATA['credit_items']
amount_norm = PLOT_DATA['amount_norm']
count_norm = PLOT_DATA['count_norm']
amount_annotations = PLOT_DATA['amount_annotations']
count_annotations = PLOT_DATA['count_annotations']
max_amount = PLOT_DATA['max_amount']
max_count = PLOT_DATA['max_count']

# Unicode symbols for legend
CREDIT_SYMBOLS = {
    "Migration": "üö∂",
    "CraftMaterials": "üßµ",
    "CraftTools": "üî®"
}

fig = go.Figure()

colors = plotly.colors.qualitative.Plotly

# Add traces (using normalized Y values)
for i, item in enumerate(credit_items):
    color = colors[i % len(colors)]
    base_label = label_map[item]
    symbol = CREDIT_SYMBOLS.get(item, "")
    legend_label = f"{base_label} {symbol}".strip()

    # Amount (normalized upward)
    fig.add_trace(go.Bar(
        x=years,
        y=amount_norm[item],
        name=legend_label,
        marker_color=color,
        offsetgroup=i,
        legendgroup=item,
        text=amount_annotations[item],
        textposition='outside',
        textfont=dict(size=10),
        hovertemplate=(
            (f"<b>{base_label}</b><br>Year: %{{x}}<br>Amount: %{{text}}k rub" if USE_LANGUAGE == "en"
             else f"<b>{base_label}</b><br>–ì–æ–¥: %{{x}}<br>–°—É–º–º–∞: %{{text}} —Ç—ã—Å. —Ä—É–±.")
            + "<extra></extra>"
        )
    ))

    # Count (normalized downward)
    fig.add_trace(go.Bar(
        x=years,
        y=count_norm[item],
        name=legend_label,
        marker_color=color,
        opacity=0.75,
        offsetgroup=i,
        legendgroup=item,
        showlegend=False,
        text=count_annotations[item],
        textposition='outside',
        textfont=dict(size=10),
        hovertemplate=(
            (f"<b>{base_label}</b><br>Year: %{{x}}<br>Loans: %{{text.split()[-1]}}" if USE_LANGUAGE == "en"
             else f"<b>{base_label}</b><br>–ì–æ–¥: %{{x}}<br>–°—Å—É–¥: %{{text.split()[-1]}}")
            + "<extra></extra>"
        )
    ))

# --- Custom Y-axis tick labels ---
n_ticks_top = 5
tickvals_top = [i / (n_ticks_top - 1) for i in range(n_ticks_top)]
ticktext_top = [f"{(i / (n_ticks_top - 1)) * max_amount:.1f}" for i in range(n_ticks_top)]

n_ticks_bottom = 5
tickvals_bottom = [-i / (n_ticks_bottom - 1) for i in range(n_ticks_bottom)]
ticktext_bottom = [f"{int((i / (n_ticks_bottom - 1)) * max_count)}" for i in range(n_ticks_bottom)]

tickvals = tickvals_bottom[::-1][:-1] + tickvals_top
ticktext = ticktext_bottom[::-1][:-1] + ticktext_top

# Update layout
if USE_LANGUAGE == "en":
    yaxis_title = "Amount (thsd rub) ‚Üë / Number of Loans ‚Üì"
    title = "Annual Loan Volume by Credit Type"
    legend_title_text = "Credit Type"
else:
    yaxis_title = "–°—É–º–º–∞ (—Ç—ã—Å. —Ä—É–±.) ‚Üë / –ß–∏—Å–ª–æ —Å—Å—É–¥ ‚Üì"
    title = "–ì–æ–¥–æ–≤–æ–π –æ–±—ä—ë–º —Å—Å—É–¥ –ø–æ –∫–∞—Ç–µ–≥–æ—Ä–∏—è–º"
    legend_title_text = "–¢–∏–ø —Å—Å—É–¥—ã"  # ‚úÖ –ò—Å–ø—Ä–∞–≤–ª–µ–Ω–æ

fig.update_layout(
    title=title,
    barmode='group',
    xaxis=dict(
        title="Year" if USE_LANGUAGE == "en" else "–ì–æ–¥",
        tickmode='linear'
    ),
    yaxis=dict(
        title=yaxis_title,
        tickmode='array',
        tickvals=tickvals,
        ticktext=ticktext,
        range=[-1.1, 1.1],
        zeroline=True,
        zerolinewidth=2,
        zerolinecolor='black'
    ),
    legend=dict(
        title=legend_title_text,
        traceorder="normal"
    ),
    height=650,
    font=dict(size=12)
)

# Show in Colab
fig.show(renderer="colab")

# --- Save to file ---
output_dir = "figures"
os.makedirs(output_dir, exist_ok=True)

# Generate filename based on selected items and language
items_str = "_".join(SELECTED_CREDIT_ITEMS)
lang_suffix = "en" if USE_LANGUAGE == "en" else "ru"
filename = f"grouped_bar_chart_{items_str}_{lang_suffix}.png"
filepath = os.path.join(output_dir, filename)

# Save as PNG (requires kaleido)
try:
    fig.write_image(filepath, width=1200, height=800, scale=2)
    print(f"\n‚úÖ Chart saved as: {filepath}")
except Exception as e:
    print(f"\n‚ö†Ô∏è Could not save image (kaleido may not be installed): {e}")
    print("To enable saving, run: !pip install kaleido")

print(f"\n‚úÖ Displayed balanced grouped bar chart for {len(credit_items)} credit types.")


‚ö†Ô∏è Could not save image (kaleido may not be installed): 
Image export using the "kaleido" engine requires the kaleido package,
which can be installed using pip:
    $ pip install -U kaleido

To enable saving, run: !pip install kaleido

‚úÖ Displayed balanced grouped bar chart for 3 credit types.


üåü [4] Individual Settlement Charts

üìì –Ø—á–µ–π–∫–∞ 4: –û—Ç–¥–µ–ª—å–Ω—ã–µ –¥–∏–∞–≥—Ä–∞–º–º–∞ –ø–æ –≤—Å–µ–º –ø–æ—Å–µ–ª–µ–Ω–∏—è–º (–µ—â—ë –∫–∞—Ä—Ç—É –±—ã...)