Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain patch level metadata (e.g. geospatial bounds and cloud cover), save and demo DEP use case (sim search) #172

Merged
merged 33 commits into from
Mar 24, 2024

Conversation

lillythomas
Copy link
Contributor

@lillythomas lillythomas commented Mar 6, 2024

This PR will work towards a demonstration of how to obtain patch level embeddings and write them to GeoParquet files to run similarity search with.

Main tasks that need to be done:

Reference tickets: #168 #140

@lillythomas lillythomas marked this pull request as draft March 6, 2024 08:31
@lillythomas lillythomas marked this pull request as ready for review March 11, 2024 20:48
@lillythomas
Copy link
Contributor Author

The notebook docs/tutorial_digital_earth_pacific_patch_level.ipynb walks through an example of:

  • generating patch level embeddings for an area where known mining extraction events occur
  • saving the patch level embeddings to independent GeoParquet files
  • executing similarity search based on a ground truth point's overlapping patch

@weiji14 @yellowcap ready for when you have time to review.

@weiji14
Copy link
Contributor

weiji14 commented Mar 11, 2024

Thanks @lillythomas! I haven't looked too closely yet, but would it be possible to show where the similarity search results are located? Maybe something like showing the bounding boxes of all the patches on a map, and also overlay where the original quarry points are.

@lillythomas
Copy link
Contributor Author

Thanks @lillythomas! I haven't looked too closely yet, but would it be possible to show where the similarity search results are located? Maybe something like showing the bounding boxes of all the patches on a map, and also overlay where the original quarry points are.

Yes! Great idea. Working on this tomorrow.

Copy link
Member

@yellowcap yellowcap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lilly, this is great to see! I did not yet have time get to the sim search part, but will get back to it on Monday. Left some comments for now.

Could you add some more context on what the purpose is? Do you think we going to apply this to a region for searching certain events / features?

"outputs": [],
"source": [
"mrd = gpd.read_file(\n",
" \"../mineral-resource-detection/training_data/draft_inputs/MRD_dissagregated_25.geojson\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"outputs": [],
"source": [
"DATA_DIR = \"data/minicubes\"\n",
"CKPT_PATH = \"/home/ubuntu/data/checkpoints/mae_epoch-24_val-loss-0.46.ckpt\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" box_emb = shapely.geometry.box(box_[0], box_[1], box_[2], box_[3])\n",
"\n",
" # Create the GeoDataFrame\n",
" gdf = gpd.GeoDataFrame(data, geometry=[box_emb], crs=f\"EPSG:{epsg}\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why making one pdf file per embedding and not one that contains all? This would make everything downstream much easier. Would it be possible to create one before the loop and add rows to it?

" lambda x: Path(x).stem.rsplit(\"/\")[-1].rsplit(\"_\")[0]\n",
" )\n",
" gdf[\"idx\"] = \"_\".join(emb.split(\"/\")[-1].split(\"_\")[2:]).replace(\".gpq\", \"\")\n",
" gdf[\"box\"] = [gdf.geometry[0].bounds]\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not work for me, replaced with the following to get it to work

gdf["box"] = [box(*geom.bounds) for geom in gdf.geometry]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. It works on my end. Do you have the traceback?

"# Combine patch level geodataframes into one\n",
"embeddings_gdf = pd.concat(gdfs, ignore_index=True)\n",
"# Make a polygon for each patch level bounding box\n",
"embeddings_gdf[\"bbox\"] = embeddings_gdf[\"box\"].apply(lambda bbox: box(*bbox))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda function is no longer necessary if comment above is applied

},
{
"data": {
"image/png": "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I got the following error.

ArrowInvalid: Could not convert <POLYGON ((177.507 -17.778, 177.507 -17.775, 177.504 -17.775, 177.504 -17.77...> with type Polygon: did not recognize Python value type when inferring an Arrow data type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced by making your suggested change in #172 (comment) but I have a way to handling it. Essentially pyarrow doesn't like a polygon object, so I can get a list equivalent for the table by adding the bounds method to "box": row["box"] e.g. "box": row["box"].bounds, which I am in favor of doing as it allows us to drop the lamba function as you pointed out.

@@ -790,7 +790,7 @@ def __init__( # noqa: PLR0913
wd=0.05,
b1=0.9,
b2=0.95,
embeddings_level: Literal["mean", "patch", "group"] = "mean",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop this from this PR seems not necessary for the notebook to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Will do.

… results and ground truth points on rgb xarray, add descriptions, revert change in model_clay.py
@lillythomas lillythomas changed the title Obtain patch level metadata (e.g. geospatial bounds) and demo DEP use case Obtain patch level metadata (e.g. geospatial bounds and cloud cover), save and demo DEP use case (sim search) Mar 19, 2024
@lillythomas lillythomas force-pushed the patch_level_metadata branch 2 times, most recently from b8cdd25 to a87597e Compare March 22, 2024 19:09
@lillythomas lillythomas merged commit 7bda731 into main Mar 24, 2024
6 checks passed
@lillythomas lillythomas deleted the patch_level_metadata branch March 24, 2024 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants