[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/SatelliteVu/SatelliteVu-AWS-Disaster-Response-Hackathon/blob/main/data_quality/overlapping.ipynb)

## Use Geopandas to Find Overlapping Frames and Keep Only One
We discovered that some frames in the dataset overlapped in space/time, which will result in duplicated features in the training dataset. This notebook demonstrated how these frames were identified, so they could be removed from the training dataset

In [2]:
import geopandas as gpd
from tqdm import *

gdf = gpd.GeoDataFrame.from_file('s3://../records_with_geom.zip/')

num_overlapping = 0
overlapping = []
filtered_instances = []
for n, record in tqdm(gdf.iterrows(), total=len(gdf)):
    if record['idx'] in overlapping:
        continue
    instances = gdf[(gdf.intersects(record['geometry'])) & (gdf['date'] == record['date'])]
    if len(instances) > 1:
        filtered_instances.append(instances.iloc[0])
        overlapping.extend(instances['idx'].values)
        num_overlapping += 1
    else:
        filtered_instances.append(instances.iloc[0])
filtered_gdf = gpd.GeoDataFrame.from_records(filtered_instances)

100%|██████████| 37681/37681 [37:56<00:00, 16.55it/s]  


In [3]:
filtered_gdf

Unnamed: 0,idx,left,bottom,right,top,epsg,date,geometry
0,0,517500.0,3364500.0,549500.0,3396500.0,32616,2013-01-07,"POLYGON ((-86.81780 30.41242, -86.81726 30.701..."
1,1,567500.0,4892000.0,599500.0,4924000.0,32613,2013-01-08,"POLYGON ((-104.15555 44.17809, -104.15141 44.4..."
2,2,356500.0,3349500.0,388500.0,3381000.0,32617,2013-01-11,"POLYGON ((-82.49180 30.26868, -82.49613 30.552..."
3,3,379000.0,6012000.0,410500.0,6044000.0,32610,2013-01-16,"POLYGON ((-124.85680 54.24165, -124.86983 54.5..."
4,4,440500.0,3524500.0,472500.0,3556500.0,32617,2013-01-19,"POLYGON ((-81.62893 31.85467, -81.63091 32.143..."
...,...,...,...,...,...,...,...,...
21020,37676,247500.0,4047500.0,279500.0,4079500.0,32614,2021-12-15,"POLYGON ((-101.82059 36.53958, -101.83114 36.8..."
21021,37677,546500.0,4059500.0,578500.0,4091500.0,32614,2021-12-15,"POLYGON ((-98.47956 36.67999, -98.47761 36.968..."
21022,37678,738000.0,3933500.0,770000.0,3965500.0,32613,2021-12-15,"POLYGON ((-102.37553 35.51658, -102.36607 35.8..."
21023,37679,477000.0,4313500.0,509000.0,4345500.0,32614,2021-12-16,"POLYGON ((-99.26550 38.97017, -99.26659 39.258..."
