
# Mapineq API Tutorial (Python helper) — `mapineqpy` Edition

This notebook shows an **easier** way to work with the Mapineq API using the Python helper **`mapineqpy`**.  
We’ll pull the dataset **`meta_migration_in`**, discover valid years/levels, explore filters, fetch data, compute totals, and plot a quick chart.

**What you'll do:**
1. Install and import `mapineqpy`.
2. Discover available sources at a chosen NUTS level.
3. Inspect coverage for `meta_migration_in` (years × NUTS levels).
4. See available **origin** filters (labels & values).
5. Fetch data for a chosen **origin** and compute **totals** across origins.
6. Save CSVs and plot a simple bar chart.

> Requirements: `mapineqpy`, `pandas`, `matplotlib`.


## 1) Install dependencies

In [2]:
# # If needed, install packages. Uncomment the next lines to install.
# %pip install --quiet git+https://github.com/e-kotov/mapineqpy.git pandas matplotlib

Note: you may need to restart the kernel to use updated packages.


## 2) Imports & settings

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt

import mapineqpy as mi  # helper library for Mapineq

# Tutorial settings — feel free to change these:
RESOURCE = "meta_migration_in"
LEVEL_CHOICE = "0"   # NUTS level as string: "0", "1", "2", or "3"
YEAR_CHOICE = None   # None = auto-pick latest; or set a specific year like 2022
ORIGIN_CHOICE = "UA" # If not in available origins, we'll fall back to the first

SAVE_DIR = "outputs_mapineqpy_meta_migration_in"
os.makedirs(SAVE_DIR, exist_ok=True)

print("Imports complete.")

Imports complete.


## 3) Discover sources at your NUTS level

In [3]:
sources = mi.sources(level=LEVEL_CHOICE)
display(sources.head())
print(f"{len(sources)} sources at NUTS level {LEVEL_CHOICE}.")

# Show the row(s) related to the resource of interest
candidate = sources[sources["source_name"].str.lower() == RESOURCE.lower()]
display(candidate)

Unnamed: 0,source_name,short_description,description
0,eheso_budgetclass,Acad. Budget,"Academic Budget by, year, NUTS 1, NUTS 2 and N..."
1,eheso_expendclass,Acad. Expenditure,"Academic Expenditure by, year, NUTS 1, NUTS 2 ..."
2,eheso_fundingclass,Acad. Funding,"Academic Funding by, year, NUTS 1, NUTS 2 and ..."
3,eheso_graduates,Acad. Graduates,"Academic Graduates by year, citizenship, level..."
4,MIGR_ACQ1CTZ,New citizenship,"Acquisition of citizenship by age group, sex a..."


692 sources at NUTS level 0.


Unnamed: 0,source_name,short_description,description
332,meta_migration_in,META Migration In,META estimation of migration inflow at the cou...


## 4) Coverage (years × NUTS levels) for `meta_migration_in`

In [4]:
coverage = mi.source_coverage(RESOURCE)

# Ensure numeric types for convenience
coverage["year"] = coverage["year"].astype(int, errors="ignore")
coverage["nuts_level"] = coverage["nuts_level"].astype(int, errors="ignore")

display(coverage.sort_values(["nuts_level","year"]).reset_index(drop=True))

if YEAR_CHOICE is None:
    year = int(coverage["year"].max())
else:
    year = int(YEAR_CHOICE)

# pick a level that exists for that year; fall back otherwise
levels_for_year = coverage.loc[coverage["year"] == year, "nuts_level"]
if levels_for_year.empty:
    level = str(coverage.iloc[0]["nuts_level"])
    year = int(coverage.iloc[0]["year"])
    print(f"[info] No coverage for chosen year; falling back to year={year}, level={level}")
else:
    # prefer LEVEL_CHOICE if available for that year
    if str(LEVEL_CHOICE) in levels_for_year.astype(str).unique().tolist():
        level = str(LEVEL_CHOICE)
    else:
        level = str(levels_for_year.iloc[0])
        print(f"[info] Level {LEVEL_CHOICE} unavailable for {year}; using level={level} instead.")

print(f"Using year={year}, level={level}")

Unnamed: 0,nuts_level,year,source_name,short_description,description
0,0,2019,meta_migration_in,Settlement type,Satellite-based settlement types based on imagery
1,0,2020,meta_migration_in,Settlement type,Satellite-based settlement types based on imagery
2,0,2021,meta_migration_in,Settlement type,Satellite-based settlement types based on imagery
3,0,2022,meta_migration_in,Settlement type,Satellite-based settlement types based on imagery


Using year=2022, level=0


## 5) Available filters (labels vs values)

In [5]:
filters = mi.source_filters(RESOURCE, year=year, level=level)
display(filters.head(30))

# For convenience, isolate the origin filter if present
origin_df = filters[filters["field"] == "origin"].copy()
if not origin_df.empty:
    origin_values = origin_df["value"].dropna().unique().tolist()
    print(f"Found {len(origin_values)} origin options. First 25: {origin_values[:25]}")
else:
    origin_values = []
    print("No 'origin' filter exposed for this dataset/year/level.")

Unnamed: 0,field,field_label,label,value
0,origin,origin,RU,RU
1,origin,origin,DK,DK
2,origin,origin,SV,SV
3,origin,origin,SN,SN
4,origin,origin,SI,SI
5,origin,origin,CZ,CZ
6,origin,origin,KR,KR
7,origin,origin,JP,JP
8,origin,origin,VE,VE
9,origin,origin,BS,BS


Found 181 origin options. First 25: ['RU', 'DK', 'SV', 'SN', 'SI', 'CZ', 'KR', 'JP', 'VE', 'BS', 'UZ', 'TL', 'AU', 'CL', 'QA', 'MZ', 'GE', 'EE', 'AT', 'VN', 'TD', 'CI', 'KW', 'AR', 'NI']


## 6) Fetch data for one origin

In [7]:
if origin_values:
    origin = ORIGIN_CHOICE if ORIGIN_CHOICE in origin_values else origin_values[0]
    if origin != ORIGIN_CHOICE:
        print(f"[info] ORIGIN_CHOICE '{ORIGIN_CHOICE}' not found; using '{origin}'.")
    df_one = mi.data(
        x_source=RESOURCE,
        year=year,
        level=level,
        x_filters={"origin": origin},
        limit=2000
    )
    if "x" in df_one.columns:
        df_one = df_one.rename(columns={"x": f"migration_in_from_{origin}"})
else:
    # No origin filters: fetch without them
    df_one = mi.data(
        x_source=RESOURCE,
        year=year,
        level=level,
        x_filters={},
        limit=25000
    )
    if "x" in df_one.columns:
        df_one = df_one.rename(columns={"x": "migration_in"})

display(df_one.head())
print(df_one.shape)

ValueError: The API returned duplicate values for some geographic regions. This may indicate that not all necessary filters were specified.

For the 'x' variable (source: 'meta_migration_in'): The following filter fields (with multiple available options) were not specified: time_granularity, month. You can review available filters by running:
  mi.source_filters(source_name='meta_migration_in', year=2022, level='0')

## 7) Compute totals per destination (sum across origins)

In [None]:

import numpy as np
import pandas as pd

if origin_values:
    total_codes = {"TOTAL", "TOT", "ALL"}
    total_candidate = next((v for v in origin_values if str(v).upper() in total_codes), None)

    if total_candidate:
        df_total = mi.data(
            x_source=RESOURCE,
            year=year,
            level=level,
            x_filters={"origin": total_candidate},
            limit=25000
        ).rename(columns={"x": "migration_in_total"})
    else:
        frames = []
        for ov in origin_values:
            d = mi.data(
                x_source=RESOURCE,
                year=year,
                level=level,
                x_filters={"origin": ov},
                limit=25000
            )
            frames.append(d[["data_year","geo","geo_name","geo_source","geo_year","x"]])
        big = pd.concat(frames, ignore_index=True)
        big["x"] = pd.to_numeric(big["x"], errors="coerce")
        df_total = (big.groupby(["data_year","geo","geo_name","geo_source","geo_year"], as_index=False)["x"]
                      .sum()
                      .rename(columns={"x": "migration_in_total"}))
else:
    df_total = df_one.rename(columns={df_one.columns[-1]: "migration_in_total"})

display(df_total.head())
print(df_total.shape)

## 8) Save tidy CSV files

In [None]:

p_one = os.path.join(SAVE_DIR, f"mapineqpy_{RESOURCE}_{year}_L{level}_one_origin.csv")
p_tot = os.path.join(SAVE_DIR, f"mapineqpy_{RESOURCE}_{year}_L{level}_totals.csv")

df_one.to_csv(p_one, index=False)
df_total.to_csv(p_tot, index=False)

print("Saved:")
print("-", p_one)
print("-", p_tot)

## 9) Plot top destinations by total inflow

In [None]:

topN = 15
top = df_total.sort_values("migration_in_total", ascending=False).head(topN)

plt.figure(figsize=(10, 6))
plt.barh(top["geo_name"], top["migration_in_total"])
plt.gca().invert_yaxis()
plt.xlabel("Total migration inflow")
plt.ylabel("Destination (geo_name)")
plt.title(f"Meta Migration In — totals by destination (year={year}, level={level})")
plt.tight_layout()

chart_path = os.path.join(SAVE_DIR, f"mapineqpy_{RESOURCE}_{year}_L{level}_totals_top{topN}.png")
plt.savefig(chart_path, dpi=150)
print("Chart saved:", chart_path)


## 10) Notes — labels vs values, fields vs field labels

- **`field`**: the internal column name (e.g. `origin`).  
- **`field_label`**: human-friendly label (e.g. "Origin country"). Sometimes identical to `field`.
- **`value`**: the raw value to pass to the API (e.g. `UA`).  
- **`label`**: human-friendly value label (e.g. "Ukraine"). Sometimes identical to `value`.

If labels equal values in this dataset, that's normal — other sources may use longer labels.
