# Vanilla WCI (accusation-count index)
This notebook builds a very simple 0–1 index based purely on total accusation counts per country from the wide survey file `data/wci_data.csv`.

Spec:
- Load the matrix (we treat the 25 nomination columns as the "matrix" of accusations).
- For each country i: A[i] = total times the country is nominated across all nomination columns, unweighted.
- Let A_max = max_i A[i].
- WCI[i] = A[i] / A_max.
- WCI_abs[i] = A[i].
- Write results to `data/vanillawci.csv`.


In [4]:
import pandas as pd
from pathlib import Path

DATA_DIR = Path('data')
RAW_PATH = DATA_DIR / 'wci_data.csv'
OUT_PATH = DATA_DIR / 'vanillawci.csv'

df = pd.read_csv(RAW_PATH)
try:
    from IPython.display import display  # type: ignore
except Exception:
    display = lambda x: print(x.head() if hasattr(x, 'head') else x)

print('Loaded:', RAW_PATH, 'shape=', df.shape)
display(df.head(3))


Loaded: data/wci_data.csv shape= (92, 108)


Unnamed: 0,ResponseID,Nationality,Residence,Technical1,Technical2,Technical3,Technical4,Technical5,Technical1_impact,Technical1_professional,...,Cash4_professional,Cash4_techskill,Cash5_impact,Cash5_professional,Cash5_techskill,Expert_crimetype,Expert_crimetype_other,Expert_region,Expert_region_other,Comments
0,R1,United Kingdom,United Kingdom,Ukraine,Russia,Brazil,Romania,Latvia,3,6,...,,,,,,Technical products / services,,No,,
1,R2,Australia,Prefer not to say,Russia,Ukraine,--,--,--,10,9,...,,,,,,"Technical products / services,Attacks and exto...",,No,,
2,R3,Australia,Australia,Russia,Ukraine,United States,--,--,8,9,...,5.0,5.0,,,,"Attacks and extortions,Data/identity theft,Cas...",,Yes (please list below),Asia Pacific,While the survey should capture the highlights...


## Identify nomination columns (unweighted accusations)
We count a nomination whenever a non-empty, non-"--" country appears in any of the 25 fields:
`Technical1..5`, `Attack1..5`, `Data1..5`, `Scams1..5`, `Cash1..5`.


In [5]:
type_prefixes = ['Technical','Attack','Data','Scams','Cash']
nom_cols = [f"{p}{i}" for p in type_prefixes for i in range(1,6) if f"{p}{i}" in df.columns]
print('Nomination columns found (count):', len(nom_cols))
print(nom_cols[:10], '...')

if len(nom_cols) == 0:
    raise RuntimeError('No nomination columns found. Check the input file format.')

# Flatten all nominations into a single Series of country names
stacked = (
    df[nom_cols]
    .astype(str)
    .stack(dropna=False)
    .rename('country')
    .reset_index(drop=False)
)

# Clean and filter valid country names
stacked['country'] = stacked['country'].str.strip()
valid = stacked['country'].notna() & (stacked['country'] != '') & (stacked['country'] != '--')
valid_countries = stacked.loc[valid, 'country']

print('Total nominations (raw cells):', len(stacked))
print('Valid nominations (non-empty, not --):', valid_countries.shape[0])

# Count total accusations per country
counts = (
    valid_countries
    .value_counts()
    .rename_axis('Country')
    .reset_index(name='A')
)

if counts.empty:
    raise RuntimeError('No valid nominations found to count.')

A_max = counts['A'].max()
counts['WCI'] = counts['A'] / A_max
counts['WCI_abs'] = counts['A']

print('A_max =', A_max)

# Sort by WCI desc, then A desc, then Country asc for stability
counts = counts.sort_values(['WCI','A','Country'], ascending=[False, False, True]).reset_index(drop=True)

display(counts.head(10))


Nomination columns found (count): 25
['Technical1', 'Technical2', 'Technical3', 'Technical4', 'Technical5', 'Attack1', 'Attack2', 'Attack3', 'Attack4', 'Attack5'] ...
Total nominations (raw cells): 2300
Valid nominations (non-empty, not --): 1737
A_max = 304


  .stack(dropna=False)


Unnamed: 0,Country,A,WCI,WCI_abs
0,Russia,304,1.0,304
1,Ukraine,202,0.664474,202
2,China,162,0.532895,162
3,United States,154,0.506579,154
4,Nigeria,143,0.470395,143
5,Romania,96,0.315789,96
6,"Korea, North",64,0.210526,64
7,Brazil,63,0.207237,63
8,United Kingdom,57,0.1875,57
9,India,40,0.131579,40


## Save output
Write vanilla WCI to `data/vanillawci.csv`.


In [6]:
OUT_PATH.parent.mkdir(parents=True, exist_ok=True)
counts.to_csv(OUT_PATH, index=False)
print('Wrote', OUT_PATH.resolve())


Wrote /Users/user/codeprojects/wci/data/vanillawci.csv
