# Togo Active Learning Experiment Runner

This notebook manages the execution of sampling experiments on the Togo soil fertility dataset.

Author: Livia Betti  
Date: July 2025

### To run this notebook, the following tasks should be completed:
1. Generate relevant groups in Togo. I have generated group assignments based on regions, but if there are other representative groups that might be useful, we can generate those as well.
2. Make initial samples representing Cluster Sampling and Convenience Sampling.
3. Assign (distance-based) costs for the Convenience sampling setting
4. (Optional) Run regressions on the initial sample (with no augmentation) --> this will give a better understanding as to what initial samples are useful to augment (ideally, initial samples that yield a small, positive R2 score). If this step is skipped, an initial sample should just be chosen by the user.

## Imports

In [34]:
import os
import sys

RUN_DIR = "/home/libe2152/optimizedsampling/3_sampling/tools"

PROJECT_ROOT = os.path.abspath(os.path.join(RUN_DIR, ".."))
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

#change current working directory to tools/
os.chdir(RUN_DIR)

print("CWD now:", os.getcwd())
print("sys.path[0]:", sys.path[0])


CWD now: /home/libe2152/optimizedsampling/3_sampling/tools
sys.path[0]: /home/libe2152/optimizedsampling/3_sampling


## Core config values

In [35]:
# Core file paths and experiment identifiers
cfg = "/home/libe2152/optimizedsampling/3_sampling/configs/togo/RIDGE.yaml"
script = "train.py"
seed = 42

initial_set_str = "cluster_sampling/10_state_county_desired_10ppc_100_size"
exp_init_name = "cluster_sampling_10_state_county_desired_10ppc_100_size"
exp_name = f"togo_{exp_init_name}_cost_cluster_based_c1_10_c2_15_method_poprisk_regions_budget_50_seed_{seed}"

id_path = f"/home/libe2152/optimizedsampling/0_data/initial_samples/togo/cluster_sampling/randomstrata/sample_1_region_prefecture_desired_10ppc_1000_size_seed_{seed}.pkl"

## Cost related arguments

In [36]:
cost_func = "cluster_based"
cost_name = "region_aware_unit_cost"

#optional: these are specific to region-aware cost
unit_assignment_path = "/home/libe2152/optimizedsampling/0_data/groups/togo/ea_assignments_dict.pkl"
unit_type = "cluster"
points_per_unit = 10

region_assignment_path = "/home/libe2152/optimizedsampling/0_data/groups/togo/prefecture_assignment.pkl"
in_region_unit_cost = 10
out_of_region_unit_cost = 15


## Method Related arguments

In [37]:
sampling_fn = "poprisk"
budget = 50

group_assignment_path = "/home/libe2152/optimizedsampling/0_data/groups/togo/region_assignment.pkl"
group_type = "regions"

util_lambda = 0.5  # poprisk-specific

### args for similarity/diversity method

In [38]:
similarity_matrix_path = "/home/libe2152/optimizedsampling/0_data/cosine_similarity/togo/cosine_similarity_train_test.npz"

## Run

In [39]:
cmd = f"""
python {script} \
  --cfg {cfg} \
  --exp-name {exp_name} \
  --sampling_fn {sampling_fn} \
  --budget {budget} \
  --initial_set_str {initial_set_str} \
  --id_path {id_path} \
  --seed {seed} \
  --cost_func {cost_func} \
  --cost_name {cost_name} \
  --unit_assignment_path {unit_assignment_path} \
  --unit_type {unit_type} \
  --points_per_unit {points_per_unit} \
  --region_assignment_path {region_assignment_path} \
  --in_region_unit_cost {in_region_unit_cost} \
  --out_of_region_unit_cost {out_of_region_unit_cost} \
  --group_assignment_path {group_assignment_path} \
  --group_type {group_type} \
  --util_lambda {util_lambda} \
"""

!{cmd}


🗑️  Removing previous log file: /home/libe2152/optimizedsampling/0_output/TOGO/cluster_sampling/10_state_county_desired_10ppc_100_size/region_aware_unit_cost/opt/poprisk/regions/budget_50/util_lambda_0.5/seed_42/stdout.log
Sampling initial pool from IDS
[?12l[?25hPython 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.4.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: IPython supports combining unicode identifiers, eg F\vec<tab> will become F⃗, useful for physics equations. Play with \dot \ddot and others.

[0mC 4h[?25l[0m[?7l[0m[J[0mIn [1]:[7D[8C[?7h[0m[?12l[?25h[?25l[?7l[8D[0m[J[0;38;5;102mIn [1]:                                                                        
[J[?7h[0m[?12l[?25h[?2004l^C
Error in sys.excepthook:
Traceback (most recent call last):
  File "/share/anaconda3/envs/al/lib/python3.11/site-packages/IPython/core/ultratb.py", line 819, in get_records
    mo