# Getting the features from the KML File 

Some terms you need to keep in mind here:
- Feature -> does not mean the attributes or properties, it refers to individual geographical entities (in this case, buildings)

We will need to use these packages to achieve this:
- `geopandas` - extends pandas to work with geospatial data
- `xml.etree.ElementTree` - built in library for XML parsing
- `shapely` - turns coordinate lists into actual geographic shapes that GeoDataFrames can understand

In [33]:
import geopandas as gpd
from shapely.geometry import Polygon
import xml.etree.ElementTree as ET

In this function, it parses the coordinate string into a Polygon

In [34]:
def parse_kml_coordinates(coord_string):
    # Split coordinates and convert to float pairs
    coords = [tuple(map(float, coord.split(',')[:2])) for coord in coord_string.split()]
    return Polygon(coords)

If you're wondering why we can use an XML parser for a KML file, its because a KML file is in XML-based format. Its just designed for geographical annotation. But its still XML code you could say.

In this function, we use it to read the KML file and return a GeoDataFrame of it.

In [35]:
def read_kml_with_elementtree(file_path):
    # Parse the KML file
    tree = ET.parse(file_path)
    root = tree.getroot()

    # Define namespaces
    namespaces = {
        'kml': 'http://www.opengis.net/kml/2.2'
    }

    # Find all Placemarks from the kml file
    features = []
    for placemark in root.findall('.//kml:Placemark', namespaces):
        # Extract extended data
        extended_data = {}
        for simple_data in placemark.findall('.//kml:SimpleData', namespaces):
            name = simple_data.get('name')
            value = simple_data.text
            extended_data[name] = value

        # Find coordinates
        coord_elem = placemark.find('.//kml:coordinates', namespaces)
        
        # strip() removes trailing whitespace from the texts
        geom = parse_kml_coordinates(coord_elem.text.strip())
        
        # Create a feature dictionary
        feature = {
            'geometry': geom,
            **extended_data  # Merge extended data
        }
        features.append(feature)

    # Create GeoDataFrame
    gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
    
    return gdf

Now we can read the KML file using our function and show the features (buildings) from our dataset.

In [36]:
buildings_gdf = read_kml_with_elementtree('./Dataset/Building_Footprint.kml')
print(buildings_gdf.head())
print("\nTotal features:", len(buildings_gdf))

                                            geometry tessellate extrude  \
0  POLYGON ((-73.91903 40.8482, -73.91933 40.8479...         -1       0   
1  POLYGON ((-73.92195 40.84963, -73.92191 40.849...         -1       0   
2  POLYGON ((-73.9205 40.85011, -73.92045 40.8501...         -1       0   
3  POLYGON ((-73.92056 40.8514, -73.92053 40.8514...         -1       0   
4  POLYGON ((-73.91234 40.85218, -73.91247 40.852...         -1       0   

  visibility               id   fid       layer  \
0         -1    cugir009034.3  7624  clip_Bronx   
1         -1    cugir009034.4  7625  clip_Bronx   
2         -1    cugir009034.5  7626  clip_Bronx   
3         -1    cugir009034.6  7627  clip_Bronx   
4         -1  cugir009034.142  7829  clip_Bronx   

                                                path  
0  /Users/killo/Desktop/Clip_Bronx.kml|layername=...  
1  /Users/killo/Desktop/Clip_Bronx.kml|layername=...  
2  /Users/killo/Desktop/Clip_Bronx.kml|layername=...  
3  /Users/killo/Deskto