# **InferenceVision: Provide Geographic Coordinates from Bounding Boxes**

In contemporary scientific research and applications, there is an increasing demand for accurate geospatial analysis to address various real-world challenges, ranging from environmental monitoring to urban planning and disaster response. The ability to precisely locate and identify objects within geographic areas plays a pivotal role in such endeavors. In this scientific project, we aim to enhance geospatial analysis by integrating object detection techniques with geographic coordinate calculations.

## **Problem Statement**
Traditional methods of geospatial analysis often rely on manual identification and mapping of objects within geographical regions. However, these methods are time-consuming, labor-intensive, and prone to errors. Moreover, they may lack the scalability required for large-scale analyses. Therefore, there is a need for automated solutions that can accurately detect and locate objects within geographic areas, enabling efficient and scalable geospatial analysis.

## **Project Objective**
Our project seeks to address the aforementioned challenges by developing an automated system that combines object detection algorithms with geographic coordinate calculations. By integrating these components, we aim to achieve the following objectives:

<br>

1. **Object Detection:** Utilize state-of-the-art object detection algorithms, such as YOLO (You Only Look Once), to automatically identify and localize objects within satellite or aerial imagery.

2. **Geographic Coordinate Calculation:** Develop algorithms to calculate the geographic coordinates (latitude and longitude) of detected objects relative to a given bounding polygon. This involves converting normalized center coordinates of objects within the bounding polygon to precise geographic coordinates.

3. **Integration and Visualization:** Integrate object detection results with calculated geographic coordinates to create a comprehensive geospatial dataset. Visualize the detected objects and their geographic locations on maps for further analysis and interpretation.

## **Methodology**

In this section, we outline the methodology employed for deriving geographic coordinates from input data within the InferenceVision framework. This methodological approach combines advanced techniques in satellite image analysis, object detection, and geographic coordinate calculation to enable precise geospatial analysis and visualization. Let's delve into the steps involved:

**Given a set of inputs, the calculation unfolds as follows:**

<br>

**1- Transform VHR Satellite Image Coordinates to WGS 84 (EPSG:4326) and Extract Polygon Coordinates:** The target Coordinate Reference System (CRS) is WGS 84, representing a geographic coordinate system. Converting to this CRS standardizes the data. We use Nearest Neighbor interpolation, which can result in a blocky appearance. Transformed coordinates are precise to 6 decimal places. First, we transform image coordinates to WGS 84. Then, we extract polygon coordinates, defining the geographical extent with top-left and bottom-right corners as reference points for computing the geographic coordinates of normalized centers.

<br>

$$ \text{Polygon coordinates} = {(lat_{top \space left}, lon_{top \space left}), (lat_{bottom \space right}, lon_{bottom \space right})} $$

<br>

**2- Calculate Normalized Centers:** Retrieve the pixel coordinates of the bounding box, then calculate its center. Next, proceed to normalize each center of the bounding box using image size. Representing the relative positions of objects within the defined polygon, normalized center coordinates are structured as a matrix with rows $(y_{norm}, x_{norm})$ for $i = 1, 2, \ldots, number \space of \space centers$.

<br>

<br>

**3- Calculate Geographic Coordinates:** The function calculates geographic coordinates using the following equations. For each $i$ from $1$ to $number \space of \space centers$ in the image:

<br>

$$ lat = lat_{top \space left} + (lat_{bottom \space right} - lat_{top \space left}) \times y_{norm} $$

$$ lon = lon_{top \space left} + (lon_{bottom \space right} - lon_{top \space left}) \times x_{norm} $$

   
   **Where:**

   - $lat$ represents latitude.
   - $lon$ represents longitude.
   - $y_{norm}$ and $x_{norm}$ are the normalized center coordinates.
   - $lat_{top \space left}, lon_{top \space left}, lat_{bottom \space right},$ and $lon_{bottom \space right}$ are the latitude and longitude of the top-left and bottom-right corners of the polygon, respectively.

## **Scientific Significance**
The proposed project has several scientific implications and contributions:

- **Automation and Efficiency:** By automating the process of object detection and geographic coordinate calculation, our system significantly reduces the time and effort required for geospatial analysis, thereby enhancing efficiency and scalability.

- **Accuracy and Precision:** Through the integration of advanced algorithms, our system ensures high accuracy and precision in object detection and geographic coordinate calculation, leading to reliable and trustworthy results.

- **Versatility and Adaptability:** The developed system is versatile and adaptable to various applications, including environmental monitoring, urban planning, agriculture, and disaster response. It provides researchers and practitioners with a powerful tool for analyzing geospatial data in diverse contexts.

## **Conclusion**
This calculation elucidates the process of deriving geographic coordinates from given inputs, a pivotal step within `InferenceVision` framework. It facilitates the transformation of normalized center coordinates into precise geographic coordinates, fostering accurate geospatial analysis and visualization. Geographic coordinates, namely latitude and longitude, are indispensable for pinpointing specific locations on Earth's surface. This process outlined here harmonizes normalized center coordinates, relative values within a bounding area, into a set of coordinates mappable onto a geographical map for comprehensive analysis. In conclusion, our scientific project aims to advance the field of geospatial analysis by leveraging cutting-edge technologies and methodologies. By combining object detection with geographic coordinate calculation, we strive to provide researchers and practitioners with an efficient, accurate, and versatile solution for addressing complex geospatial challenges.

# **1. How to Use**

To utilize the `InferenceVision` class for object detection and geographic coordinate calculation, follow these steps:

## **1.1. Clone and Install**

In [None]:
!git clone https://github.com/doguilmak/InferenceVision.git
%cd InferenceVision
!pip install -r /content/InferenceVision/requirements.txt -q

## **1.2. Initialization**

Initialize an instance of the `InferenceVision` class by providing the following parameters:
- `tif_path`: Path to the input TIFF image file.
- `model_path`: Path to the YOLO model file for object detection.
- `image_width`: Width of the input image.
- `image_height`: Height of the input image.

In [2]:
from inference_vision import InferenceVision

print(f"InferenceVision version: {InferenceVision.VERSION}")

InferenceVision version: 1.0


**NOTE: The input image must have a CRS set to ensure accurate geographic coordinate calculation.**

In [3]:
width = 640
height = 640

inference = InferenceVision(tif_path="/content/image.tif", # Path to your image.
                            model_path="/content/model.pt", # Path to your model.
                            image_width=width,
                            image_height=height)

## **1.3. Process Image**

Invoke the `process_image` method to perform object detection and geographic coordinate calculation on the input image. Using GPU can speed up the process.

In [6]:
inference.process_image(build_csv=True, csv_filename="output.csv")


image 1/1 /content/image.tif: 640x640 5 collapseds, 10 non_collapseds, 5012.9ms
Speed: 9.2ms preprocess, 5012.9ms inference, 1854.9ms postprocess per image at shape (1, 3, 640, 640)


DataFrame saved as output.csv


This method detects objects in the input image, calculates their geographic coordinates, and optionally saves the results to a CSV file.

In [7]:
# Prints results to console
inference.process_image(build_csv=False)


image 1/1 /content/image.tif: 640x640 5 collapseds, 10 non_collapseds, 4596.2ms
Speed: 2.5ms preprocess, 4596.2ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)
------------------------------
Point 0
Latitude: 36.209461546 - Longitude: 36.152872433 (36.209461546, 36.152872433)
Object type: non_collapsed
Object type: 1.0
Coordinates: [23.909339904785156, 87.95037841796875, 101.81745147705078, 148.59400939941406]
Probability: 0.814360499382019
Bounding Box Center: [308.854736328125, 425.17608642578125]
Normalized Bounding Box Center: [0.09822405576705932, 0.18480030298233033]
------------------------------
Point 1
Latitude: 36.208325861 - Longitude: 36.153735035 (36.208325861, 36.153735035)
Object type: collapsed
Object type: 0.0
Coordinates: [267.8592529296875, 502.86468505859375, 352.1070556640625, 585.2774047851562]
Probability: 0.7815790772438049
Bounding Box Center: [308.854736328125, 425.17608642578125]
Normalized Bounding Box Center: [0.4843486785888672, 0.85011

## **1.4. Interpret Results**

Once the image processing is complete, you can interpret the results either by printing them to the console or by analyzing the generated CSV file. Now, let's load CSV file into a DataFrame.

In [8]:
import pandas as pd

df = pd.read_csv("output.csv")

Analyze and visualize the results.

In [9]:
df.head()

Unnamed: 0,Image,Point,Latitude,Longitude,Object Type,Coordinates,Probability,Bounding Box Center,Normalized Bounding Box Center
0,/content/image.tif,0,36.209462,36.152872,non_collapsed,"[23.909339904785156, 87.95037841796875, 101.81...",0.81436,"[62.86339569091797, 118.2721939086914]","[0.09822405576705932, 0.18480030298233033]"
1,/content/image.tif,1,36.208326,36.153735,collapsed,"[267.8592529296875, 502.86468505859375, 352.10...",0.781579,"[309.983154296875, 544.071044921875]","[0.4843486785888672, 0.8501110076904297]"
2,/content/image.tif,2,36.208898,36.153387,non_collapsed,"[172.59658813476562, 296.422607421875, 247.739...",0.774597,"[210.16781616210938, 329.7408447265625]","[0.3283872127532959, 0.5152200698852539]"
3,/content/image.tif,3,36.208667,36.153118,non_collapsed,"[93.84264373779297, 380.179931640625, 172.4491...",0.767572,"[133.14590072631836, 416.28009033203125]","[0.20804046988487243, 0.6504376411437989]"
4,/content/image.tif,4,36.209662,36.153169,non_collapsed,"[107.0134048461914, 9.381965637207031, 188.354...",0.763969,"[147.68405532836914, 43.02626037597656]","[0.23075633645057678, 0.06722853183746338]"


**Debugging:** In case of any errors or unexpected behavior during image processing, carefully review the input data, model configuration, and method calls. Use debugging tools such as print statements, logging, or interactive debugging to identify and resolve issues.

<br>

**Future Improvements:** Consider incorporating additional features or enhancements to further optimize the performance and usability of the `InferenceVision` class. Potential improvements may include support for alternative object detection models, integration with other geospatial libraries, or optimization of computational efficiency.

<br>
<hr>

*Library will be available as a package on PyPI (Python Package Index).*