# Filter rooftop data to new pilot areas
Aasim tried to select urban rooftops in 4 pilot districts. In Latur and Katni, he found it hard because a large portion of rooftops were in villages. We also want to add Solan and Agra districts. For Gandhinagar, he was able to select urban rooftops but not enough

**Purpose:**  


**Contents:**  
1. Import packages and set paths
2. Import SHRUG district boundary data and filter for Gandhinagar, Agra, and Solan
3. Import SHRUG subdistrict boundary and filter for ‘Murwara’ subdistrict in Katni district and ‘Latur’ subdistrict in Latur district
4. Append the selected district and subdistrict gdfs together
5. Filter rooftops for these areas and save the filtered output

## 1. Import packages and set paths

In [None]:
from pathlib import Path

import geopandas as gpd
import pandas as pd
from tqdm import tqdm

from rooftop_tools.utils_rooftop import (
    get_matched_rooftop_centroids_from_s2_file,
    get_overlapping_s2_cell_ids,
)

In [None]:
# set paths
FH_SAMPLING_FOLDER = Path("../") / "data" / "fortify_data"
SHRUG_district_path = (
    FH_SAMPLING_FOLDER / "Shape files/shrug-pc11dist-poly-shp/district.shp"
)
SHRUG_sub_path = (
    FH_SAMPLING_FOLDER / "Shape files/shrug-pc11subdist-poly-shp/subdistrict.shp"
)

## 2. Import and clean SHRUG data

In [None]:
# import SHRUG district boundary and select just the pilot districts
SHRUG_districts = gpd.read_file(SHRUG_district_path)
pilot_districts = SHRUG_districts[
    SHRUG_districts["d_name"].isin(["Solan", "Gandhinagar", "Agra"])
]
# print the gdf to make sure there were no spelling mistakes in district names
pilot_districts

In [None]:
SHRUG_subs = gpd.read_file(SHRUG_sub_path)
pilot_subs = SHRUG_subs[
    (SHRUG_subs["pc11_d_id"] == "450") & (SHRUG_subs["sd_name"] == "Murwara")
    | (SHRUG_subs["pc11_d_id"] == "524") & (SHRUG_subs["sd_name"] == "Latur")
]
pilot_subs["d_name"] = pilot_subs["pc11_d_id"].map({"450": "Katni", "524": "Latur"})
pilot_subs

In [None]:
pilot_areas = pd.concat([pilot_districts, pilot_subs])
pilot_areas

## 3. For each s2 file, filter for rooftops in districts

In [None]:
s2_cell_ids = get_overlapping_s2_cell_ids(pilot_areas)
matched_rooftop_centroids_gdf_list = []

for s2_cell_id in tqdm(s2_cell_ids):
    matched_rooftop_centroids_gdf = get_matched_rooftop_centroids_from_s2_file(
        s2_file_dir=FH_SAMPLING_FOLDER / "Rooftop Data",
        s2_cell_id=s2_cell_id,
        boundaries_gdf=pilot_areas,
    )
    matched_rooftop_centroids_gdf_list.append(matched_rooftop_centroids_gdf)

In [None]:
for gdf in matched_rooftop_centroids_gdf_list:
    print(gdf.crs)

In [None]:
for gdf in matched_rooftop_centroids_gdf_list:
    gdf.to_crs(epsg=4326, inplace=True)

In [None]:
# concatenate the gdfs
matched_rooftop_centroids_gdf = pd.concat(
    matched_rooftop_centroids_gdf_list, ignore_index=True
)
# Save the matched rooftops data
matched_rooftop_centroids_gdf.to_parquet(
    FH_SAMPLING_FOLDER / "Cleaned rooftop data" / "Rooftops in new pilot areas.parquet"
)