# UK Postcode to British National Grid Mapper

This tool links each postcode to British National Grid Reference numbers. Before running this notebook, please download following data to `data` folder into their own directories:

- British National Grid from [here](https://github.com/OrdnanceSurvey/OS-British-National-Grids)
- Latest National Statistics Postcode Lookup [here](https://geoportal.statistics.gov.uk/search?sort=-modified&tags=prd_nspl) (for [February 2022](https://geoportal.statistics.gov.uk/datasets/national-statistics-postcode-lookup-february-2022/about))

For convenience, British National Grid repository added as a submodule. To get it, you can either clone this repository with `--recurse-submodules` flag or execute the following commands to pull it after cloning.

```bash
git submodule init
git submodule update
```

In [None]:
import pandas as pd
import geopandas as gpd
from pathlib import Path

DATA = Path("data")

In [None]:
ONSPD_VERSION = "FEB_2022"

In [None]:
ONSPD_PATH = DATA/f'ONSPD_{ONSPD_VERSION}_UK'/'Data'/f'ONSPD_{ONSPD_VERSION}_UK.csv'
BNG_PATH = DATA / "OS-British-National-Grids" / "os_bng_grids.gpkg"

assert ONSPD_PATH.exists(), (f"ONSPD file not found: {ONSPD_PATH}. \n"
"Please download it from: "
"https://geoportal.statistics.gov.uk/search?sort=-modified&tags=prd_nspl."
" and decompress under the 'data' directory.\n"
"If you already downloaded, please check ONSPD_VERSION and the file path.")

assert BNG_PATH.with_suffix('.7z').exists(), (
    f"British National Grid file not found: {BNG_PATH}. \n"
    "Please run the following command to download it:\n"
    "git submodule init\n"
    "git submodule update"
)

print(
    "Decompressing British National Grid file. "
    "You can comment this line if it's already decompressed."
)

!tar -xJf {BNG_PATH.with_suffix('.7z').absolute()} -C {BNG_PATH.parent.absolute()}

print("Decompression completed.")

assert BNG_PATH.exists(), (
    f"British National Grid cannot be decompressed. Please decompress the "
    f"{BNG_PATH.with_suffix('.7z').name} file manually"
)

In [None]:
# reference: https://stackoverflow.com/a/57971376/1360267
bng_refs = {
    layername: gpd.read_file(BNG_PATH, layer=layername)[["tile_name", "geometry"]]
    .rename(columns={"tile_name": layername})
    .set_index(layername)
    .to_crs(epsg=4326)
    for layername in gpd.io.file.fiona.listlayers(BNG_PATH)
}

In [None]:
postcode_gpd = (
    pd.read_csv(
        ONSPD_PATH,
        usecols=["pcd", "lat", "long"],
    )
    .rename(
        columns={
            "pcd": "Postcode",
            "long": "lon",
        }
    )
    .pipe(
        lambda df: gpd.GeoDataFrame(
            df,
            geometry=gpd.points_from_xy(
                df["lon"],
                df["lat"],
                crs="EPSG:4326",
            ),
        ).set_index("Postcode")
    )
)

In [None]:
postcode_to_national_grid_gdf = postcode_gpd.copy()
for grid_ref, bng in bng_refs.items():
    # for each postcode coordinate, find the containing grid reference
    postcode_to_national_grid_gdf = postcode_to_national_grid_gdf.sjoin(
        bng, how="left", predicate="within"
    ).rename(columns={"index_right": grid_ref})

In [None]:
# convert it to pandas dataframe and set national grid reference types to categorical
postcode_to_national_grid_df = (
    pd.DataFrame(postcode_to_national_grid_gdf.drop(columns="geometry")).pipe(
        lambda df: df.assign(
            **{
                # convert grid reference values to categorical
                grid_ref: df[grid_ref].astype("category")
                for grid_ref in bng_refs.keys()
            }
        )
    )
    # drop rows with missing values - i.e. unused postcodes
    .dropna(how="any")
)

postcode_to_national_grid_df.to_parquet(
    DATA / "postcode_to_national_grid_references.parquet", compression="gzip"
)

postcode_to_national_grid_df.to_csv(
    DATA / "postcode_to_national_grid_references.csv.gz", compression="gzip"
)