<img src="https://raw.githubusercontent.com/doguilmak/InferenceVision/main/assets/Inference%20Vision%20Cover.png" alt="github.com/doguilmak/InferenceVision"/>

In contemporary scientific research and applications, there is an increasing demand for accurate geospatial analysis to address various real-world challenges, ranging from environmental monitoring to urban planning and disaster response. The ability to precisely locate and identify objects within geographic areas plays a pivotal role in such endeavors. In this scientific project, we aim to enhance geospatial analysis by integrating object detection techniques with geographic coordinate calculations.

## Problem Statement

Traditional methods of geospatial analysis often rely on manual identification and mapping of objects within geographical regions. However, these methods are time-consuming, labor-intensive, and prone to errors. Moreover, they may lack the scalability required for large-scale analyses. Therefore, there is a need for automated solutions that can accurately detect and locate objects within geographic areas, enabling efficient and scalable geospatial analysis.

## Project Objective

Our project seeks to address the aforementioned challenges by developing an automated system that combines object detection algorithms with geographic coordinate calculations. By integrating these components, we aim to achieve the following objectives:

<br>

1. **Object Detection:** Utilize state-of-the-art object detection algorithms, such as YOLO (You Only Look Once), to automatically identify and localize objects within satellite or aerial imagery.

2. **Geographic Coordinate Calculation:** Develop algorithms to calculate the geographic coordinates (latitude and longitude) of detected objects relative to a given bounding polygon. This involves converting normalized center coordinates of objects within the bounding polygon to precise geographic coordinates.

3. **Integration and Visualization:** Integrate object detection results with calculated geographic coordinates to create a comprehensive geospatial dataset. Visualize the detected objects and their geographic locations on maps for further analysis and interpretation.

## Methodology

In this section, we outline the methodology employed for deriving geographic coordinates from input data within the InferenceVision framework. This methodological approach combines advanced techniques in satellite image analysis, object detection, and geographic coordinate calculation to enable precise geospatial analysis and visualization. Let's delve into the steps involved:

**Given a set of inputs, the calculation unfolds as follows:**

<br>

**1- Transform VHR Satellite Image Coordinates to WGS 84 (EPSG:4326) and Extract Polygon Coordinates:** The target Coordinate Reference System (CRS) is WGS 84, representing a geographic coordinate system. Converting to this CRS standardizes the data. We use Nearest Neighbor interpolation, which can result in a blocky appearance. Transformed coordinates are precise to 9 decimal places (as default). First, we transform image coordinates to WGS 84. The coordinates of the polygons are defined as G (geometric shapes) and the following transformation operations are applied to convert these shapes to the geographic coordinate system:

<br>

$$ G_{EPSG:4326} = transform(G_{dataset}, CRS_{dataset}) $$

<br>

Then, we extract polygon coordinates, defining the geographical extent with top-left ($TL$) and bottom-right ($BR$) corners as reference points for computing the geographic coordinates of normalized centers.

<br>

<br>

**2- Calculate Normalized Centers:** In the second stage, model making prediction and making detections. Then, the center coordinates of the detected objects are calculated from their bounding boxes. The edge coordinates determined for each object ($x_{min}$, $y_{min}$, $x_{max}$, $y_{max}$) are used. $x_{min}$ and $x_{max}$ are the pixel coordinates of the left and right edges of the bounding boxes on the x-axis, and $y_{min}$ and $y_{max}$ are the pixel coordinates of the top and bottom edges of the bounding boxes on the y-axis. The center point of the object is determined by the following formula:

<br>

$$ (x_{center}, y_{center}) = (\frac{x_{min} + x_{max}}{2} + \frac{y_{min}+y_{max}}{2})$$

<br>

The centroids of the bounding boxes are then normalized, which is necessary to convert the pixel coordinates of object locations within the image into a standard format. Normalization can be expressed as:

<br>

$$N_x = \frac{x_{center}}{W}$$

$$N_y = \frac{y_{center}}{H}$$

Where:
- **$N_x$**: The value of the normalized pixel coordinate of the center point along the x-axis.
- **$N_y$**: The value of the normalized pixel coordinate of the center point along the y-axis.
- **$X_{center}$**: The x pixel coordinate of the center of the bounding box.
- **$Y_{center}$**: The y pixel coordinate of the center of the bounding box.
- **$W$**: The total width of the raster image.
- **$H$**: The total height of the raster image.

<br>

<br>

**3- Calculate Geographic Coordinates:** Finally, the geographic coordinates are calculated using the normalized center coordinates. In this stage, the corner coordinates of the extracted polygon are taken as reference. The normalized values ​​are used to determine the actual locations on the geographic area by associating them with these corner coordinates. The calculation is carried out with the following formulas:

<br>

$$ lat = lat_{TL} + (lat_{BR} - lat_{TL}) \times N_{x} $$

$$ lon = lon_{TL} + (lon_{BR} - lon_{TL}) \times N_{y} $$

   
   **Where:**

   - $lat$ represents latitude.
   - $lon$ represents longitude.
   - $N_{x}$ and $N_{y}$ are the normalized center coordinates.
   - $lat_{TL}, lon_{TL}, lat_{BR},$ and $lon_{BR}$ are the latitude and longitude of the top-left and bottom-right corners of the polygon, respectively.

## Scientific Significance
The proposed project has several scientific implications and contributions:

- **Automation and Efficiency:** By automating the process of object detection and geographic coordinate calculation, our system significantly reduces the time and effort required for geospatial analysis, thereby enhancing efficiency and scalability.

- **Accuracy and Precision:** Through the integration of advanced algorithms, our system ensures high accuracy and precision in object detection and geographic coordinate calculation, leading to reliable and trustworthy results.

- **Versatility and Adaptability:** The developed system is versatile and adaptable to various applications, including environmental monitoring, urban planning, agriculture, and disaster response. It provides researchers and practitioners with a powerful tool for analyzing geospatial data in diverse contexts.

<br>

## 1. How to Use

If you are using Colab environment, make sure your runtime is **GPU** (_not_ CPU or TPU). And if it is an option, make sure you are using _Python 3_. You can select these settings by going to `Runtime -> Change runtime type -> Select the above mentioned settings and then press SAVE`. To utilize the `InferenceVision` class for object detection and geographic coordinate calculation, follow these steps:

<br>

### 1.1. Clone and Install

In [None]:
!git clone https://github.com/doguilmak/InferenceVision.git
%cd InferenceVision

Cloning into 'InferenceVision'...
remote: Enumerating objects: 475, done.[K
remote: Counting objects: 100% (367/367), done.[K
remote: Compressing objects: 100% (266/266), done.[K
remote: Total 475 (delta 226), reused 171 (delta 97), pack-reused 108 (from 1)[K
Receiving objects: 100% (475/475), 20.74 MiB | 24.66 MiB/s, done.
Resolving deltas: 100% (267/267), done.
/content/InferenceVision


In [None]:
!pip install -r requirements.txt -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.7/40.7 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.5/21.5 MB[0m [31m58.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m760.3/760.3 kB[0m [31m47.8 MB/s[0m eta [36m0:00:00[0m
[?25h

<br>

### 1.2. Initialization

Initialize an instance of the `InferenceVision` class by providing the following parameters:
- `tif_path`: Path to the input TIFF image file.
- `model_path`: Path to the YOLO model file for object detection.

In [None]:
from inference_vision import InferenceVision

print(f"InferenceVision version: {InferenceVision.VERSION}")

InferenceVision version: 1.2


In [None]:
help(InferenceVision)

Help on class InferenceVision in module inference_vision:

class InferenceVision(builtins.object)
 |  InferenceVision(tif_path, model_path, coord_precision=9)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, tif_path, model_path, coord_precision=9)
 |      Initialize an InferenceVision instance with configurable precision.
 |      
 |      Parameters
 |      ----------
 |      tif_path : str
 |          The file path to the TIFF image to be processed.
 |      model_path : str
 |          The file path to the YOLO model weights.
 |      coord_precision : int, optional
 |          The number of decimal places to use for geographic coordinates. Default is 9.
 |      
 |      Returns
 |      -------
 |      None
 |          This method initializes the object and does not return any value.
 |      
 |      Example
 |      -------
 |      >>> iv = InferenceVision("image.tif", "model.pt", coord_precision=6)
 |  
 |  calculate_bbox_center(self, coordinates)
 |      Calculate the center o

**NOTE: The input image must have a CRS set to ensure accurate geographic coordinate calculation.**

In [None]:
iv = InferenceVision(
    tif_path="/content/image.tif", # Path to your image.
    model_path="/content/model.pt" # Path to your model.
    # coord_precision=6 # Number of decimal places for geographic coordinates. Default is 9.
)

<br>

### 1.3. Process Image

Invoke the `process_image` method to perform object detection and geographic coordinate calculation on the input image. Using GPU can speed up the process.

In [None]:
# Process the image and save results to a CSV file
csv_filename = "results.csv"
iv.process_image(build_csv=True, csv_filename=csv_filename)


image 1/1 /content/image.tif: 640x640 5 collapseds, 10 non_collapseds, 82.1ms
Speed: 7.5ms preprocess, 82.1ms inference, 1857.0ms postprocess per image at shape (1, 3, 640, 640)

DataFrame saved as results.csv


This method detects objects in the input image, calculates their geographic coordinates, and optionally saves the results to a CSV file.

In [None]:
# Process the image and print results to the console
iv.process_image()

  ckpt = torch.load(file, map_location="cpu")



image 1/1 /content/image.tif: 640x640 5 collapseds, 10 non_collapseds, 96.3ms
Speed: 1.6ms preprocess, 96.3ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)

Point 0 --------------------
Latitude: 36.209461282 | Longitude: 36.152872430
Object Type: non_collapsed
Coordinates (Bounding Box): [23.909332275390625, 87.95037841796875, 101.81744384765625, 148.59400939941406]
Confidence Score: 0.8144
Bounding Box Center (X, Y): (62.86338806152344, 118.2721939086914)
Normalized Bounding Box Center (X, Y): [0.09822404384613037, 0.18480030298233033]

Point 1 --------------------
Latitude: 36.208325574 | Longitude: 36.153734978
Object Type: collapsed
Coordinates (Bounding Box): [267.8592529296875, 502.86468505859375, 352.1070556640625, 585.2774047851562]
Confidence Score: 0.7816
Bounding Box Center (X, Y): (309.983154296875, 544.071044921875)
Normalized Bounding Box Center (X, Y): [0.4843486785888672, 0.8501110076904297]

Point 2 --------------------
Latitude: 36.208897244 | Lon

<br>

### 1.4. Interpret Results

Once the image processing is complete, you can interpret the results either by printing them to the console or by analyzing the generated CSV file. Now, let's load CSV file into a DataFrame.

In [None]:
import pandas as pd

df = pd.read_csv("results.csv")

Analyze and visualize the results.

In [None]:
df.head()

Unnamed: 0,Image,Point,Latitude,Longitude,Object Type,Coordinates,Confidence Score,Bounding Box Center,Normalized Bounding Box Center
0,/content/image.tif,0,36.209461,36.152872,non_collapsed,"[23.909332275390625, 87.95037841796875, 101.81...",0.81436,"[62.86338806152344, 118.2721939086914]","[0.09822404384613037, 0.18480030298233033]"
1,/content/image.tif,1,36.208326,36.153735,collapsed,"[267.8592529296875, 502.86468505859375, 352.10...",0.781579,"[309.983154296875, 544.071044921875]","[0.4843486785888672, 0.8501110076904297]"
2,/content/image.tif,2,36.208897,36.153387,non_collapsed,"[172.59658813476562, 296.422607421875, 247.739...",0.774597,"[210.16781616210938, 329.7408447265625]","[0.3283872127532959, 0.5152200698852539]"
3,/content/image.tif,3,36.208666,36.153118,non_collapsed,"[93.84264373779297, 380.179931640625, 172.4491...",0.767572,"[133.14590072631836, 416.28009033203125]","[0.20804046988487243, 0.6504376411437989]"
4,/content/image.tif,4,36.209662,36.153168,non_collapsed,"[107.0134048461914, 9.3819580078125, 188.35470...",0.763968,"[147.68405532836914, 43.0262565612793]","[0.23075633645057678, 0.0672285258769989]"


## Conclusion
This calculation elucidates the process of deriving geographic coordinates from given inputs, a pivotal step within `InferenceVision` framework. It facilitates the transformation of normalized center coordinates into precise geographic coordinates, fostering accurate geospatial analysis and visualization. Geographic coordinates, namely latitude and longitude, are indispensable for pinpointing specific locations on Earth's surface. This process outlined here harmonizes normalized center coordinates, relative values within a bounding area, into a set of coordinates mappable onto a geographical map for comprehensive analysis. In conclusion, our scientific project aims to advance the field of geospatial analysis by leveraging cutting-edge technologies and methodologies. By combining object detection with geographic coordinate calculation, we strive to provide researchers and practitioners with an efficient, accurate, and versatile solution for addressing complex geospatial challenges.

<br>

**Debugging:** In case of any errors or unexpected behavior during image processing, carefully review the input data, model configuration, and method calls. Use debugging tools such as print statements, logging, or interactive debugging to identify and resolve issues.

<br>

**Future Improvements:** Consider incorporating additional features or enhancements to further optimize the performance and usability of the `InferenceVision` class. Potential improvements may include support for alternative object detection models, integration with other geospatial libraries, or optimization of computational efficiency.

<br>
<hr>

*Library will be available as a package on PyPI (Python Package Index).*