Georeferencing 


**Q3.a** Find each bounding box's center coordinate (longitude, latitude) (Hint: use the information provided in the README file). Explain your method and show the verification with Google Maps Satellite view that your identification is correct. 

# **Procedure for Finding Bounding Box Center Coordinates**

## **Step 1: Extract Tile Information from Filename**
Each label file follows a specific naming convention containing **tile_id, x0, and y0**. 
To extract these values from the filename:
1. Use a regular expression to search for the pattern:  

2. Extract:
- `tile_id`: The ID of the tile.
- `x0`: The x-offset of the tile.
- `y0`: The y-offset of the tile.

---

## **Step 2: Define Geotransform Parameters**
Each tile has predefined **geotransform parameters** obtained from a **README file**:


These parameters help in converting **pixel coordinates** to **geographic coordinates**.

---

## **Step 3: Read Label Files and Parse YOLO Bounding Box Values**
Each label file follows the **YOLO format**:

For each line in the file:
1. Ignore lines that do not have exactly **five values**.
2. Extract:
- **x_center (cx_norm)**: Normalized x-coordinate of the bounding box center.
- **y_center (cy_norm)**: Normalized y-coordinate of the bounding box center.

---

## **Step 4: Convert Normalized YOLO Coordinates to Tile Pixel Coordinates**
Since YOLO values are **normalized (0-1)** and each **tile size is 416 × 416 pixels**,  
convert **normalized center coordinates** to **absolute tile pixel values**:
$$
cx = cx_{norm} \times 416
$$
$$
cy = cy_{norm} \times 416
$$

Calculate **full tile pixel coordinates**:
$$
px_{\text{full}} = x0 + cx
$$
$$
py_{\text{full}} = y0 + cy
$$

---

## **Step 5: Convert Tile Pixel Coordinates to UTM Coordinates**
Each tile has a geotransform mapping pixel coordinates to UTM coordinates.  
Using the geotransform:
$$
\text{easting} = x_{\text{origin}} + px_{\text{full}} \times x_{\text{pixel\_size}}
$$
$$
\text{northing} = y_{\text{origin}} + py_{\text{full}} \times y_{\text{pixel\_size}}
$$

---

## **Step 6: Convert UTM Coordinates to WGS84 (Latitude, Longitude)**
Using the **pyproj** library, convert the **UTM (easting, northing)** coordinates to **latitude, longitude**:
$$
\text{lon}, \text{lat} = \text{transformer.transform}(\text{easting}, \text{northing})
$$

---

## **Step 7: Store the Coordinates**
For each bounding box, write the **longitude and latitude** to an output file: all_coordinates.txt


In [None]:
import os
import re
import pyproj

def parse_native_filename(filename):
    """Extract tile_id, x0, y0 from filename."""
    match = re.search(r'_native_(\d+)__x0_(\d+)_y0_(\d+)', filename)
    tile_id = int(match.group(1))
    x0 = int(match.group(2))
    y0 = int(match.group(3))
    return tile_id, x0, y0

# Geotransform parameters for each tile (from README)
geotransforms = {
    1: (307670.04, 0.31, 5434427.10, -0.31),  # (x_origin, x_pixel_size, y_origin, y_pixel_size)
    2: (312749.08, 0.31, 5403952.86, -0.31),
    3: (312749.08, 0.31, 5363320.54, -0.31)
}

# UTM to WGS84 transformer
transformer = pyproj.Transformer.from_crs("EPSG:32633", "EPSG:4326")

# Output file to store coordinates
output_file = "all_coordinates.txt"

# Process all label files
label_dir = r"E:\AISG Assignment 1\extracted_data\labels\labels_native"
with open(output_file, 'w') as f_out:
    for filename in os.listdir(label_dir):
        if filename.endswith(".txt"):
            label_path = os.path.join(label_dir, filename)
            tile_id, x0, y0 = parse_native_filename(filename)
            
            # Get geotransform for the tile
            x_origin, x_pixel_size, y_origin, y_pixel_size = geotransforms[tile_id]
            
            with open(label_path, 'r') as f_in:
                for line in f_in:
                    parts = line.strip().split()
                    if len(parts) != 5:
                        continue
                    
                    # Parse YOLO values
                    _, cx_norm, cy_norm, _, _ = parts
                    cx = float(cx_norm) * 416  # Native chip size = 416x416
                    cy = float(cy_norm) * 416
                    
                    # Convert to full tile pixel coordinates
                    px_full = x0 + cx
                    py_full = y0 + cy
                    
                    # Convert to UTM
                    easting = x_origin + px_full * x_pixel_size
                    northing = y_origin + py_full * y_pixel_size
                    
                    # Convert UTM to WGS84 (lat, lon)
                    lon, lat = transformer.transform(easting, northing)
                    
                    # Save to file
                    f_out.write(f"{lon},{lat}\n")

In [2]:
import random

# File path
file_path = r"E:\AISG Assignment 1\all_coordinates.txt"

# Read all coordinates
with open(file_path, "r") as file:
    coordinates = file.readlines()

# Remove any extra spaces and ensure clean format
coordinates = [line.strip() for line in coordinates if line.strip()]

# Select 20 random points
random_points = random.sample(coordinates, min(20, len(coordinates)))

# Print selected points
for i, point in enumerate(random_points, 1):
    print(f"Point {i}: {point}")


Point 1: 48.73004730168156,12.45652908799204
Point 2: 48.37052146650177,12.52155134658945
Point 3: 48.73083983608908,12.458253554728913
Point 4: 48.74578765641092,12.465444946898065
Point 5: 48.36710991730955,12.515592891041166
Point 6: 48.7301804044884,12.457251757862782
Point 7: 48.36081963076373,12.497499637251853
Point 8: 48.37033063517771,12.521368045378772
Point 9: 48.73074835934688,12.458152764475681
Point 10: 48.37037383170091,12.521495721528261
Point 11: 49.011514270943564,12.376458441063704
Point 12: 48.72898572757172,12.456388726586855
Point 13: 48.75023656044854,12.492917128024725
Point 14: 48.72878435875059,12.45680573322211
Point 15: 48.36801264861669,12.515224535926905
Point 16: 49.002224430681274,12.413451758449854
Point 17: 48.73111295382295,12.457991024354302
Point 18: 48.375764711135396,12.473913700210685
Point 19: 48.36833962745731,12.515512110708821
Point 20: 48.74550968999868,12.46580476030067


 For verification purpose i have selected these random geolocation from my *all_coordiante.txt* file and I have  plotted these geolocations on [Google Maps](https://www.google.com/maps/d/u/0/edit?mid=1FJaBuuIXWf4xD7uz4jJ1rTFJaneghkc&usp=sharing) here  i have use this logo for locating solar panel coordinates in the Google Map
  ![Blue Location Icon](image1.png) 


**Q3.b** Visualize the geolocations using the leafmap library with SATELLITE basemap. [https://leafmap.org/] Where are the clusters located? [1 mark]

since I am unable to plot the 29625  instance sof solar panel in the leafmap so i have plotted a subset of 1000 sample points in this  leafmap with red loaction icon

and then appllied clustering algorithm to find the geolocations of clusters of  solar panels that  i got  




In [12]:
import leafmap
from ipyleaflet import AwesomeIcon

# Read the first 10 coordinates from the file
coords = []
with open("all_coordinates.txt", "r") as f:
    for _ in range(1000):  # Read only the first 10 lines
        line = f.readline().strip()
        if line:
            lat, lon = map(float, line.split(","))
            coords.append((lat, lon))  # Store as (latitude, longitude)

# Create a map centered on the first coordinate
m = leafmap.Map(center=coords[0], zoom=14)

# Define a red marker icon
red_icon = AwesomeIcon(
    name="fa-map-marker",  # Font Awesome icon name
    marker_color="red",   # Marker color
    icon_color="white",   # Icon color
    spin=False,           # No spinning animation
)

# Add markers for the first 10 coordinates
for lat, lon in coords:
    m.add_marker(
        location=(lat, lon),  # Marker location
        icon=red_icon,        # Use the red marker icon
    )

# Add satellite basemap
m.add_basemap("SATELLITE")

# Display the map
m

Map(center=[49.01285760727364, 12.37121075192366], controls=(ZoomControl(options=['position', 'zoom_in_text', …

In [2]:
import pandas as pd
import numpy as np
import hdbscan

# Load CSV file
csv_file = 'E:\AISG Assignment 1\output.csv'  # Replace with your file path
df = pd.read_csv(csv_file, header=None, names=['LatLon'])  # Assuming single-column CSV

# Split into separate Latitude & Longitude columns
df[['Latitude', 'Longitude']] = df['LatLon'].str.split(',', expand=True)

# Convert to float & round to 6 decimal places (reduces GPS noise)
df['Latitude'] = df['Latitude'].astype(float).round(6)
df['Longitude'] = df['Longitude'].astype(float).round(6)

# Extract lat/lon as numpy array
latlon = df[['Latitude', 'Longitude']].values

# Convert lat/lon to radians for Haversine distance
latlon_radians = np.radians(latlon)

# Apply HDBSCAN clustering with tuned parameters
clusterer = hdbscan.HDBSCAN(
    min_cluster_size=100,  # Larger clusters
    min_samples=20,       # Helps refine boundaries
    metric='haversine'
)

# Fit HDBSCAN model
cluster_labels = clusterer.fit_predict(latlon_radians)

# Assign cluster labels to DataFrame
df['Cluster'] = cluster_labels

# Extract only valid clusters (ignore noise labeled as -1)
valid_clusters = df[df['Cluster'] != -1]

# Compute cluster centers (mean lat/lon for each cluster)
cluster_centers = valid_clusters.groupby('Cluster')[['Latitude', 'Longitude']].mean().reset_index()

# Display cluster centers
for idx, row in cluster_centers.iterrows():
    print(f"Cluster {int(row['Cluster'])}: Center at (Latitude, Longitude): ({row['Latitude']}, {row['Longitude']})")


Cluster 0: Center at (Latitude, Longitude): (48.993702133333336, 12.427568716666666)
Cluster 1: Center at (Latitude, Longitude): (48.720047168067225, 12.45481680672269)
Cluster 2: Center at (Latitude, Longitude): (48.724935517985614, 12.471898086330935)
Cluster 3: Center at (Latitude, Longitude): (48.73508636363636, 12.45912412396694)
Cluster 4: Center at (Latitude, Longitude): (48.724306604166664, 12.511736770833332)
Cluster 5: Center at (Latitude, Longitude): (48.74137493370166, 12.502888955801104)
Cluster 6: Center at (Latitude, Longitude): (48.730295720306515, 12.45820864367816)
Cluster 7: Center at (Latitude, Longitude): (48.73080858174387, 12.458193682561307)
Cluster 8: Center at (Latitude, Longitude): (48.74758762937063, 12.463434867132866)
Cluster 9: Center at (Latitude, Longitude): (48.72757339568345, 12.454393595066804)
Cluster 10: Center at (Latitude, Longitude): (48.746929400000006, 12.464957672727273)
Cluster 11: Center at (Latitude, Longitude): (48.75400907608696, 12.4870

In [11]:
import pandas as pd
import numpy as np
import hdbscan
import leafmap
import ipywidgets as widgets  # Import widgets for popups

# Load CSV file
csv_file = 'E:/AISG Assignment 1/output.csv'  # Replace with your file path
df = pd.read_csv(csv_file, header=None, names=['LatLon'])  # Assuming single-column CSV

# Split into separate Latitude & Longitude columns
df[['Latitude', 'Longitude']] = df['LatLon'].str.split(',', expand=True)

# Convert to float & round to 6 decimal places (reduces GPS noise)
df['Latitude'] = df['Latitude'].astype(float).round(6)
df['Longitude'] = df['Longitude'].astype(float).round(6)

# Extract lat/lon as numpy array
latlon = df[['Latitude', 'Longitude']].values

# Convert lat/lon to radians for Haversine distance
latlon_radians = np.radians(latlon)

# Apply HDBSCAN clustering with tuned parameters
clusterer = hdbscan.HDBSCAN(
    min_cluster_size=100,  # Larger clusters
    min_samples=20,        # Helps refine boundaries
    metric='haversine'
)

# Fit HDBSCAN model
cluster_labels = clusterer.fit_predict(latlon_radians)

# Assign cluster labels to DataFrame
df['Cluster'] = cluster_labels

# Extract only valid clusters (ignore noise labeled as -1)
valid_clusters = df[df['Cluster'] != -1]

# Compute cluster centers (mean lat/lon for each cluster)
cluster_centers = valid_clusters.groupby('Cluster')[['Latitude', 'Longitude']].mean().reset_index()

# Save cluster centers to CSV
cluster_centers.to_csv("cluster_centers.csv", index=False)

# Save cluster centers to TXT file
with open("cluster_centers.txt", "w") as f:
    for idx, row in cluster_centers.iterrows():
        f.write(f"Cluster {int(row['Cluster'])}: ({row['Latitude']}, {row['Longitude']})\n")

# Visualize clusters in Leafmap
m = leafmap.Map(center=[df['Latitude'].mean(), df['Longitude'].mean()], zoom=10,basemap="Google Satellite")

# Add cluster points with proper popups
for _, row in cluster_centers.iterrows():
    popup_widget = widgets.Label(f"Cluster {int(row['Cluster'])}")  # Use a widget instead of string
    m.add_marker(location=[row['Latitude'], row['Longitude']], popup=popup_widget)

# Display map
m


Map(center=[48.70153342653165, 12.457431306497892], controls=(ZoomControl(options=['position', 'zoom_in_text',…

# Analysis of Cluster Centers

Based on the provided cluster centers in the CSV file, the clusters are concentrated around three main regions. Below is the breakdown of the areas where most of the cluster centers belong:

---

## 1. **Straubing Area (Lower Bavaria)**  
   - **Cluster Centers**: 1, 2, 3, 4, 5, 6, 7, 9, 16, 17, 18, 19, 20, 25, 30, 31, 35, 38, 39, 40, 43, 44, 50, 51, 52  
   - **Coordinates**: Most of these clusters are centered around **48.72–48.75 latitude** and **12.45–12.51 longitude**, which corresponds to the **Straubing** region in Lower Bavaria, Germany. Straubing is a well-known city in this area, and many of the clusters are likely located in or around it.

---

## 2. **Regensburg Area**  
   - **Cluster Centers**: 21, 22, 23, 26, 27, 32, 33, 34, 36, 37, 41, 42, 45, 46, 47, 48, 49, 53, 55, 56, 57, 58, 61, 62, 63, 69, 70  
   - **Coordinates**: These clusters are centered around **49.00–49.01 latitude** and **12.37–12.41 longitude**, which corresponds to the **Regensburg** area. Regensburg is a major city in Bavaria, and many of the clusters are likely located in or near this city.

---

## 3. **Eggenfelden/Pfarrkirchen Area**  
   - **Cluster Centers**: 12, 13, 14, 15, 24, 28, 29, 54, 59, 60, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77  
   - **Coordinates**: These clusters are centered around **48.35–48.39 latitude** and **12.47–12.52 longitude**, which corresponds to the **Eggenfelden** and **Pfarrkirchen** areas. These are smaller towns in Lower Bavaria, and the clusters are likely located in or around these towns.

---

## Summary of the Three Main Areas:
1. **Straubing Area** (Lower Bavaria)  
2. **Regensburg Area**  
3. **Eggenfelden/Pfarrkirchen Area** (Lower Bavaria)

These three regions have the highest concentration of cluster centers based on the provided coordinates. For more specific names (e.g., villages or neighborhoods), a detailed map or geocoding tool can be used to pinpoint the exact locations.