# ARC - Auto-Select Data Source

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/profLewis/ARC/blob/main/notebooks/test_arc.ipynb)

This notebook demonstrates ARC's auto data source selection. It:
1. Checks which data sources are available
2. Probes download speed from each source
3. Runs the full ARC pipeline with the fastest (or user-chosen) source

Run the [setup_credentials](setup_credentials.ipynb) notebook first to configure credentials.

## 1. Install ARC

In [None]:
!pip install -q https://github.com/profLewis/ARC/archive/refs/heads/main.zip

## 2. Check available data sources

In [None]:
from eof import print_config_status
print_config_status()

## 3. Set up test field

In [None]:
import arc
import os

arc_dir = os.path.dirname(os.path.realpath(arc.__file__))
geojson_path = f"{arc_dir}/test_data/SF_field.geojson"
print(f"Test field: {geojson_path}")

## 4. Probe data source speeds

This downloads a single Sentinel-2 band from each available source and times it.
The fastest source will be recommended.

In [None]:
from eof import probe_download_speed, get_available_sources

available = get_available_sources()
print(f"Available sources: {available}\n")

speeds = probe_download_speed(geojson_path, sources=available)

## 5. Choose data source

Set `data_source` below. Options:
- `'auto'` — picks the first available from your preference list
- `'aws'`, `'cdse'`, `'planetary'`, `'gee'` — use a specific source

In [None]:
# Change this to override auto-selection
data_source = 'auto'

# Or pick the fastest from the speed probe above:
if speeds and data_source == 'auto':
    fastest = min(speeds, key=speeds.get)
    print(f"Speed probe recommends: {fastest} ({speeds[fastest]:.1f}s)")
    data_source = fastest

print(f"\nUsing data source: {data_source}")

## 6. Run ARC pipeline

In [None]:
import numpy as np
from pathlib import Path

# Parameters
start_date = '2022-07-15'
end_date = '2022-11-30'
START_OF_SEASON = 225
CROP_TYPE = 'wheat'
NUM_SAMPLES = 100000
GROWTH_SEASON_LENGTH = 45

# Output folder
S2_data_folder = Path.home() / f'Downloads/{Path(geojson_path).stem}'
S2_data_folder.mkdir(parents=True, exist_ok=True)

print(f'Data source: {data_source}')
print(f'Output: {S2_data_folder}')

scale_data, post_bio_tensor, post_bio_unc_tensor, mask, doys = arc.arc_field(
    start_date,
    end_date,
    geojson_path,
    START_OF_SEASON,
    CROP_TYPE,
    f'{S2_data_folder}/SF_field.npz',
    NUM_SAMPLES,
    GROWTH_SEASON_LENGTH,
    str(S2_data_folder),
    plot=True,
    data_source=data_source,
)

print(f'\nDone! Shape: {post_bio_tensor.shape}')

## 7. Plot LAI time series

In [None]:
import matplotlib.pyplot as plt

lai = post_bio_tensor[:, 4].T / 100  # Scale LAI

plt.figure(figsize=(12, 6))
plt.plot(doys, lai, '-', lw=1, alpha=0.3)
plt.plot(doys, np.nanmean(lai, axis=0), 'k-', lw=3, label='Field mean')
plt.ylabel('LAI (m$^2$/m$^2$)')
plt.xlabel('Day of Year')
plt.title(f'LAI Time Series ({data_source})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 8. Plot LAI maps

In [None]:
nrows = int(len(doys) / 5) + int(len(doys) % 5 > 0)
fig, axs = plt.subplots(ncols=5, nrows=nrows, figsize=(20, 4 * nrows))
axs = axs.ravel()

for i in range(len(doys)):
    lai_map = np.zeros(mask.shape) * np.nan
    lai_map[~mask] = lai[:, i]
    im = axs[i].imshow(lai_map, vmin=0, vmax=7)
    fig.colorbar(im, ax=axs[i], shrink=0.8, label='LAI (m$^2$/m$^2$)')
    axs[i].set_title(f'DOY: {doys[i]}')

for i in range(len(doys), len(axs)):
    axs[i].axis('off')

plt.suptitle(f'LAI Maps ({data_source})', fontsize=14)
plt.tight_layout()
plt.show()

## 9. Validation

In [None]:
lai_vals = post_bio_tensor[:, 4].T / 100
cab_vals = post_bio_tensor[:, 1].T / 100

print(f'LAI range: [{np.nanmin(lai_vals):.2f}, {np.nanmax(lai_vals):.2f}]')
print(f'Cab range: [{np.nanmin(cab_vals):.2f}, {np.nanmax(cab_vals):.2f}]')
print(f'Number of dates: {len(doys)}')
print(f'Number of pixels: {post_bio_tensor.shape[0]}')

assert np.nanmax(lai_vals) < 15, 'LAI values unreasonably high'
assert np.nanmin(lai_vals) >= 0, 'LAI values negative'
print('\nValidation passed!')