# ICT infrastructure data exploratory analysis

Welcome to this hands-on session where we'll learn how to analyze infrastructure data using Python. We'll be working with data from Algeria to understand how to obtain, process and prepare different layers of infrastructure data. This tutorial will teach you how to handle geographic data, create visualizations for exploratory data analysis, and standardize infrastructure data for analysis.

## Setting up our environment

We start by importing the Python libraries we'll need for our analysis:
- geopandas and shapely: For handling geographic data and operations
- pandas: For data manipulation and analysis
- matplotlib and contextily: For creating visualizations and adding map backgrounds
- osmnx: For accessing OpenStreetMap data
- Other utility libraries for various tasks like generating UUIDs and handling country codes

In [1]:
# !pip install osmnx contextily summarytools pycountry

In [2]:
# Standard library imports
import os
import math

# Data manipulation and analysis
import pandas as pd
import numpy as np
import uuid

# Geospatial libraries
import geopandas as gpd
import osmnx as ox
from shapely.ops import unary_union
import pycountry

# Visualization libraries
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import contextily as cx

# Interactive tools and display
import ipywidgets as widgets
from IPython.display import display, HTML
from summarytools import dfSummary
# from google.colab import data_table
import folium

We set `fetch_data = False` to use pre-downloaded data instead of fetching it live during the tutorial.

In [3]:
fetch_data = False

## Get country boundaries

Before analyzing infrastructure within Algeria, we need to define the country's boundaries. We:
1. Load a GeoJSON file containing global UN-recognized country boundaries using geopandas
2. Filter to get just Algeria's boundary
3. Calculate the country's bounding box and UTM projection zone for later use
4. Get Algeria's ISO3 country code for standardization

This boundary data will be crucial for clipping our infrastructure data and ensuring we're analyzing points within Algeria's borders.

In [4]:
un_boundaries = gpd.read_file("https://zstagigaprodeuw1.blob.core.windows.net/gigainframapkit-public-container/country_boundary_data/boundaries.geojson")
algeria = un_boundaries[un_boundaries.romnam == "Algeria"]

In [None]:
algeria.plot(color="green")

In [None]:
algeria_boundary = algeria.total_bounds
algeria_utm = algeria.estimate_utm_crs()
algeria_latitude = algeria.centroid.y.squeeze()

In [7]:
def get_iso3_country_code(country_name):
    try:
        country = pycountry.countries.get(name=country_name)
        return country.alpha_3
    except AttributeError:
        return None

In [None]:
algeria_iso3 = get_iso3_country_code("Algeria")
print(f"The ISO-3 code for Algeria is {algeria_iso3}")

## Get point of interest (POI) data

<img src="https://wiki.openstreetmap.org/w/images/c/c8/Public-images-osm_logo.png" alt="OpenStreetMap logo" width="20%">

[OpenStreetMap](https://www.openstreetmap.org/) (OSM) is an open-source, community-driven geospatial data project that provides free and editable geographic information at a global scale. Established in 2004, OSM functions as a vast, collaborative mapping database where users contribute data on roads, buildings, land use, natural features, and more, creating a detailed digital representation of the world's infrastructure. Its open data licensing and API access make it highly valuable for applications in GIS analysis, urban planning, transportation modeling, and disaster response.

Thanks to this resoruce, we load data about schools in Algeria. The code:
1. Either fetches school data from OpenStreetMap using the osmnx API and the `amenity=school` tag (if fetch_data=True) or loads pre-downloaded data
2. Filters the data to keep only relevant columns (ID, amenity type, city, education level, etc.)
3. Processes the geographical coordinates for each school

This data will help us understand the distribution of educational facilities across the country and identify areas that might be underserved.

In [9]:
if fetch_data:
    place = "Algeria"
    tags = {"amenity": "school"}
    algeria_schools_gdf = ox.features_from_place(place, tags)
else:
    algeria_schools_gdf = gpd.read_file("https://zstagigaprodeuw1.blob.core.windows.net/gigainframapkit-public-container/algeria/algeria-schools.geojson")

In [10]:
algeria_schools_gdf = algeria_schools_gdf[["osmid","amenity","element_type","addr:city","isced:level","operator", "geometry"]]

## Get Ookla speed test data

<img src="https://i.ibb.co/wcB1JHC/Screenshot-2024-10-25-at-21-24-24.png" alt="Ookla Open Data" width="75%">

[Ookla](https://www.ookla.com/ookla-for-good/open-data), known for its Speedtest platform, offers open data on internet speeds, latency, and network quality worldwide. Their datasets, like the Speedtest Global Index, help researchers analyze internet performance and infer knowledge about infrastructure gaps.

Here, we process their internet speed test data. For both mobile and fixed broadband:
1. We load data from Ookla's public dataset hosted as parquet files on Amazon Web Services (or pre-downloaded files)
2. Filter the data to Algeria's geographical bounds obtained earlier
3. Fetch key metrics like average download speed (avg_d_kbps) and latency (avg_lat_ms)
4. For mobile data, we can also create coverage polygons by buffering around test points. The Ookla data is available as tiles which are approximately 610.8 meters by 610.8 meters at the equator.
Fore more information about processing open data from Ookla, visit their [GitHub](https://github.com/teamookla/ookla-open-data) page.

In [11]:
def get_perf_tiles_parquet_url(service: str, year: int, quarter: int) -> str:
    quarter_start = f"{year}-{(((quarter - 1) * 3) + 1):02}-01"
    url = f"s3://ookla-open-data/parquet/performance/type={service}/year={year}/quarter={quarter}/{quarter_start}_performance_{service}_tiles.parquet"
    return url

### Mobile

In [12]:
if fetch_data:
    mobile_perf_tiles_url = get_perf_tiles_parquet_url("mobile", 2024, 2)
    bbox_filters = [('tile_y', '<=', algeria_boundary[3]), ('tile_y', '>=', algeria_boundary[1]),
                ('tile_x', '<=', algeria_boundary[2]), ('tile_x', '>=', algeria_boundary[0])]
    mobile_tiles_df = pd.read_parquet(mobile_perf_tiles_url,
                           filters=bbox_filters,
                           columns=['tile_x', 'tile_y', 'tests', 'avg_d_kbps', 'avg_lat_ms'],
                           storage_options={"s3": {"anon": True}}
                           )
else:
    mobile_tiles_df = pd.read_csv("https://zstagigaprodeuw1.blob.core.windows.net/gigainframapkit-public-container/algeria/algeria-ookla-mobile-tiles.csv",index_col=0)

In [13]:
mobile_tiles_gdf = gpd.GeoDataFrame(mobile_tiles_df, geometry=gpd.points_from_xy(mobile_tiles_df.tile_x, mobile_tiles_df.tile_y), crs="EPSG:4326").drop(columns=["tile_x", "tile_y"])

#### Generate mobile coverage area

We infer mobile coverage areas from Ookla's internet speed test data. We assume that areas where there have been succesful mobile speed tests are areas that have cellular coverage, and vice versa. We do not have information, however, on which cellular technology it refers to (3G, 4G, 5G).

In [14]:
tile_size_at_latitude=610.8*np.cos(math.radians(algeria_latitude))
buffers = mobile_tiles_gdf.to_crs(algeria_utm).buffer(tile_size_at_latitude).to_crs("EPSG:4326")
single_polygon = unary_union(buffers)
algeria_mobile_coverage_gdf = gpd.GeoDataFrame(geometry=[single_polygon], crs="EPSG:4326")

### Fixed

In [15]:
if fetch_data:
    fixed_perf_tiles_url = get_perf_tiles_parquet_url("fixed", 2024, 2)
    fixed_tiles_df = pd.read_parquet(fixed_perf_tiles_url,
                           filters=bbox_filters,
                           columns=['tile_x', 'tile_y', 'tests', 'avg_d_kbps', 'avg_lat_ms'],
                           storage_options={"s3": {"anon": True}}
                           )
else:
    fixed_tiles_df = pd.read_csv("https://zstagigaprodeuw1.blob.core.windows.net/gigainframapkit-public-container/algeria/algeria-ookla-fixed-tiles.csv")

## Get cell site data

<img src="https://wiki.opencellid.org/images/d/de/OpenCellID_banner_main_page2.png" alt="Ookla Open Data" width="50%">

[OpenCellID](https://wiki.opencellid.org/wiki/What_is_OpenCellID) is the world's largest collaborative community project that collects GPS positions of cell towers, used free of charge, for a multitude of commercial and private purposes. Notably, they publish data on cell site coordinates. In order to download their data, register at their page and obtain a free API acces token. Using this token, you will be able to download [datasets](https://opencellid.org/downloads.php) for each country.

The code below:
1. Loads a CSV file containing cell site coordinates obtained from OpenCellID.
2. Converts it to a GeoDataFrame for spatial analysis

In [16]:
algeria_cell_sites = pd.read_csv("https://zstagigaprodeuw1.blob.core.windows.net/gigainframapkit-public-container/algeria/algeria-cell-sites.csv")

In [17]:
algeria_cell_sites_gdf = gpd.GeoDataFrame(algeria_cell_sites, geometry=gpd.points_from_xy(algeria_cell_sites.lon, algeria_cell_sites.lat), crs="EPSG:4326").drop(columns=["lon", "lat"])

## Get transmission node data

<img src="https://www.itu.int/en/ITU-D/Technology/PublishingImages/bbmaps-snpashot.png" alt="Ookla Open Data" width="50%">

We obtain data on fiber transmission nodes from the ITU's own [Connectivity Infrastructure Maps](https://bbmaps.itu.int/bbmaps/) platform. Access to the raw data is restricted to registered users, so we fetch a version of the data uploaded unto our servers. These transmission nodes in this dataset represent key internet infrastructure points that form the backbone of Algeria's telecommunications network.

We follow these steps:
1. Read coordinates and attributes of network nodes
2. Convert to a GeoDataFrame for spatial analysis

In [18]:
algeria_nodes = pd.read_csv("https://zstagigaprodeuw1.blob.core.windows.net/gigainframapkit-public-container/algeria/algeria-transmission-nodes.csv")

In [19]:
algeria_nodes_gdf = gpd.GeoDataFrame(algeria_nodes, geometry=gpd.points_from_xy(algeria_nodes.lon, algeria_nodes.lat), crs="EPSG:4326").drop(columns=["lon", "lat"])

# Tabular data analysis

Before spatial analysis, we examine our data in tabular form:
1. Use dfSummary to get statistics about each dataset
2. Check for missing values and data quality issues
3. Understand the distribution of different infrastructure types

This step is crucial for ensuring our data is clean and understanding what insights we can extract.

## Point of interest (POI) data

In [20]:
# data_table.DataTable(algeria_schools_gdf, num_rows_per_page=10)

In [None]:
dfSummary(algeria_schools_gdf)

## Cell site data

In [22]:
# data_table.DataTable(algeria_cell_sites_gdf, num_rows_per_page=10)

In [None]:
dfSummary(algeria_cell_sites_gdf)

## Transmission node data

In [24]:
# data_table.DataTable(algeria_nodes_gdf, num_rows_per_page=10)

In [None]:
dfSummary(algeria_nodes_gdf)

## Mobile speed test data

In [26]:
# data_table.DataTable(mobile_tiles_gdf, num_rows_per_page=10)

In [None]:
dfSummary(mobile_tiles_gdf)

# Geographical analysis

Now we create visualizations of our infrastructure data:
1. Define plotting functions for points and coverage areas
2. Create interactive widgets to toggle between different views
3. Plot infrastructure locations with background maps
4. Identify points that fall outside country boundaries, and remove them.

In [28]:
colors = {"schools": "#e41a1c", "cell_sites": "#377eb8", "nodes": "#ff7f00"}

In [29]:
def plot_points(points_gdf, title="Points", color="red", zoom=6):
    """
    Create a simple interactive map from a GeoDataFrame using Folium

    Parameters:
    points_gdf : GeoDataFrame with point geometry
    title : str, map title
    color : str, color for points
    zoom : int, initial zoom level for the map
    """
    # Convert CRS to WGS84 if needed
    if points_gdf.crs != "EPSG:4326":
        points_gdf = points_gdf.to_crs("EPSG:4326")

    # Create a map centered on the mean position of points
    center_lat = points_gdf.geometry.y.mean()
    center_lon = points_gdf.geometry.x.mean()
    m = folium.Map(location=[center_lat, center_lon], zoom_start=zoom)

    # Add title and point count in styled box
    title_html = f'''
        <div style="position: fixed;
                    top: 10px;
                    left: 50px;
                    z-index: 1000;
                    background-color: white;
                    padding: 10px;
                    border-radius: 5px;">
            <h4>{title}</h4>
            Total Points: {len(points_gdf)}
        </div>
    '''
    m.get_root().html.add_child(folium.Element(title_html))

    # Add points to the map
    for _, row in points_gdf.iterrows():
        folium.CircleMarker(
            location=[row.geometry.y, row.geometry.x],
            radius=5,
            color='black',
            weight=1,
            fill=True,
            fill_color=color,
            fill_opacity=0.7,
            popup=str(row.name)  # Just shows the index as popup
        ).add_to(m)

    return m

In [30]:
def plot_outside_points(points_gdf, boundary_gdf, title="Points Outside Boundary", color="red", zoom=6):
    """
    Create an interactive map showing points that fall outside a boundary

    Parameters:
    points_gdf : GeoDataFrame with point geometry
    boundary_gdf : GeoDataFrame with polygon geometry
    title : str, map title
    color : str, color for points outside boundary
    """

    # Find points inside and outside
    points_inside = gpd.sjoin(points_gdf, boundary_gdf, predicate='within', how='inner')
    points_outside = points_gdf[~points_gdf.index.isin(points_inside.index)]

    # Convert to WGS84 if needed
    if points_outside.crs != "EPSG:4326":
        points_outside = points_outside.to_crs("EPSG:4326")
    if boundary_gdf.crs != "EPSG:4326":
        boundary_gdf = boundary_gdf.to_crs("EPSG:4326")

    # Calculate map bounds from outside points
    if len(points_outside) > 0:
        minx, miny, maxx, maxy = points_outside.total_bounds
        center_lat = (miny + maxy) / 2
        center_lon = (minx + maxx) / 2
    else:
        # If no points outside, center on boundary
        minx, miny, maxx, maxy = boundary_gdf.total_bounds
        center_lat = (miny + maxy) / 2
        center_lon = (minx + maxx) / 2

    # Create map
    m = folium.Map(
        location=[center_lat, center_lon],
        zoom_start=zoom
    )

    # Add boundary as a polygon
    folium.GeoJson(
        boundary_gdf,
        style_function=lambda x: {
            'fillColor': 'gray',
            'color': 'black',
            'weight': 1,
            'fillOpacity': 0.1
        }
    ).add_to(m)

    # Add points outside boundary
    for _, row in points_outside.iterrows():
        folium.CircleMarker(
            location=[row.geometry.y, row.geometry.x],
            radius=5,
            color='black',
            weight=1,
            fill=True,
            fill_color=color,
            fill_opacity=0.7,
            popup=str(row.name)
        ).add_to(m)

    # Add title and count
    title_html = f'''
        <div style="position: fixed;
                    top: 10px;
                    left: 50px;
                    z-index: 1000;
                    background-color: white;
                    padding: 10px;
                    border-radius: 5px;">
            <h4>{title}</h4>
            Points outside: {len(points_outside)}
        </div>
    '''
    m.get_root().html.add_child(folium.Element(title_html))

    # Fit map bounds to show all points with buffer
    m.fit_bounds([[miny, minx], [maxy, maxx]], padding=[50, 50])

    return m

In [31]:
def show_plots_with_widgets(points_gdf, boundary_gdf, title_1, title_2, color="red"):
    """
    Display two maps side by side in a Jupyter notebook
    """
    # Create both maps
    map1 = plot_points(points_gdf, title=title_1, color=color)
    map2 = plot_outside_points(points_gdf, boundary_gdf, title=title_2, color=color)
    # Simple HTML to put maps side by side
    display_html = f"""
    <div style="display: flex; width: 100%;">
        <div style="width: 50%;">
            {map1._repr_html_()}
        </div>
        <div style="width: 50%;">
            {map2._repr_html_()}
        </div>
    </div>
    """
    return HTML(display_html)

## Point of interest (POI) data

First we notice that some schools have been provided to OpenStreetMap as polyons. To simplify our analysis, we convert those polygons to points, by selectin the centroid of each polygon.

In [None]:
algeria_schools_gdf.geometry.type.value_counts()

In [None]:
algeria_schools_gdf.geometry = algeria_schools_gdf.geometry.centroid

We can see that 5 schools have mistakenly been placed on the Mediterranean sea.


In [None]:
show_plots_with_widgets(algeria_schools_gdf, algeria, "Schools", "Schools outside borders", color=colors["schools"])

## Cell site data

In [None]:
algeria_cell_sites_gdf.geometry.type.value_counts()

We can see that 17 cell sites have been placed on the Mediterranean sea as well.

In [None]:
show_plots_with_widgets(algeria_cell_sites_gdf, algeria, "Cell sites", "Cell sites outside borders", color=colors["cell_sites"])

## Transmission node data

In [None]:
algeria_nodes_gdf.geometry.type.value_counts()

We can see that one transmission node is on the Moroccan border, and one is in the sea. We decide to keep these in the data.

In [None]:
show_plots_with_widgets(algeria_nodes_gdf, algeria, "Transmission nodes", "Transmission nodes outside borders", color=colors["nodes"])

## Mobile coverage

Next, we need to clip the mobile coverage data we produced by extracting speed test data from Ookla. Since this was obtained using a bounding box, some of the tiles can be from neighbouring countries.

In [39]:
def plot_coverage(gdf, title="Mobile Coverage", fill_color="#1f77b4", alpha=0.6):
    """
    Plot coverage polygons with a basemap
    Parameters:
    gdf : GeoDataFrame with polygon geometry
    title : str, plot title
    fill_color : str, color for polygons (hex code or name)
    alpha : float, transparency level (0 to 1)
    """
    # Create figure
    fig, ax = plt.subplots(figsize=(8, 8))

    # Plot the geodataframe
    gdf.plot(
        ax=ax,
        color=fill_color,
        alpha=alpha,
        edgecolor='white',
        linewidth=0.5
    )

    # Add basemap
    cx.add_basemap(
        ax,
        crs=gdf.crs,
        source=cx.providers.CartoDB.DarkMatter
    )

    # Style the plot
    plt.title(title, pad=20, fontsize=16)
    ax.axis('off')

    # Add a text box with coverage area count
    stats_text = f'Coverage Areas: {len(gdf)}'
    plt.figtext(
        0.02, 0.02, stats_text,
        bbox=dict(facecolor='white', alpha=0.7),
        fontsize=12
    )

    plt.tight_layout()
    plt.show()

In [40]:
def clip_coverage(coverage_gdf, boundary_gdf):
    """
    Clip coverage polygons to boundary and optionally show before/after plots

    Parameters:
    coverage_gdf : GeoDataFrame with coverage polygons
    boundary_gdf : GeoDataFrame with country boundary

    Returns:
    GeoDataFrame with clipped coverage polygons
    """
    # Ensure same CRS
    if coverage_gdf.crs != boundary_gdf.crs:
        coverage_gdf = coverage_gdf.to_crs(boundary_gdf.crs)

    # Perform the clip operation
    clipped_coverage = gpd.clip(coverage_gdf, boundary_gdf)

    return clipped_coverage

In [41]:
def show_coverage_plots_with_widgets(coverage_gdf, boundary_gdf, fig1_title="Coverage", fig2_title="Clipped Coverage"):
    # Create tab widget
    tab = widgets.Tab()

    # Create output widgets for each plot
    out1 = widgets.Output()
    out2 = widgets.Output()

    # Set tab contents
    tab.children = [out1, out2]

    # Set tab titles
    tab.set_title(0, fig1_title)
    tab.set_title(1, fig2_title)

    # Display plots in respective tabs
    with out1:
        plot_coverage(coverage_gdf, title=fig1_title)
    with out2:
        clipped_coverage = clip_coverage(coverage_gdf, boundary_gdf)
        plot_coverage(clipped_coverage, title=fig2_title)

    display(tab)

In [42]:
algeria_clipped_mobile_coverage_gdf = clip_coverage(algeria_mobile_coverage_gdf, algeria)

In [None]:
show_coverage_plots_with_widgets(algeria_mobile_coverage_gdf, algeria, "Mobile coverage", "Clipped mobile coverage")

# Standardize data

Finally, we standardize all our data into a consistent format, following the [ITU data dictionaries](https://bbmaps.itu.int/geonetwork/srv/eng/catalog.search;jsessionid=4BB00A9A95D58DCCAAD3967DC2DEA0E0#/metadata/d4fce2b9-ed20-4a3e-9312-4f04e1a384ad) for infrastructure data.
1. Create unified schemas for each infrastructure type (POIs, cell sites, nodes, coverage)
2. Generate unique IDs for each feature, using UUIDs (Universally Unique Identifier). These are long serial numbers that are almost guaranteed to be unique every time we generate them.
3. Transform data to match standard schemas
4. Save standardized data to CSV/GeoJSON files

This standardization makes it easier to:
- Share data with other analysts
- Perform consistent analysis across different regions and projects

In [44]:
def extract_lat_lon(gdf, id_column='id'):
   """
   Create a new dataframe with latitude, longitude and UUID columns
   """
   df = pd.DataFrame({
       id_column: [str(uuid.uuid4()) for _ in range(len(gdf))],
       'dataset_id': str(uuid.uuid4()),
       'lat': gdf.geometry.y,
       'lon': gdf.geometry.x
   })
   return df

## Point of interest (POI) data

In [None]:
poi_metadata = pd.DataFrame({
   'column_name': ['poi_id', 'dataset_id', 'lat', 'lon', 'poi_type', 'is_public', 'poi_subtype', 'country_code', 'is_connected', 'connectivity_type'],
   'column_type': ['UUID', 'UUID', 'float', 'float', 'string', 'boolean', 'string', 'string', 'boolean', 'string'],
   'levels': [''] * 10,
   'example': ['123e4567-e89b-12d3-a456-426614174000', '987fcdeb-51a2-12d3-a456-426614174000', '36.7538', '3.0588', 'school', 'True', 'primary school', 'DZA', 'True', '4G'],
   'mandatory': ['Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
   'definition': [
       'Unique identifier for the POI',
       'Unique identifier for the dataset',
       'Latitude coordinate',
       'Longitude coordinate',
       'Type of point of interest',
       'Whether the POI is public or private',
       'Specific subtype of the POI',
       'ISO 3166-1 alpha-3 country code',
       'Whether the POI has connectivity',
       'Type of internet connectivity'
   ]
})
styled_df = poi_metadata.style.set_properties(**{
   'text-align': 'left',
   'border': '1px solid black',
   'padding': '8px'
}).set_table_styles([
   {'selector': 'thead', 'props': [('background-color', '#f2f2f2'), ('font-weight', 'bold'), ('border-bottom', '2px solid black')]},
   {'selector': 'tbody tr:nth-of-type(odd)', 'props': [('background-color', '#f9f9f9')]}
])
display(styled_df)

In [46]:
# Create blank dataframe with id, latitute and longitude columns
formatted_algeria_schools = extract_lat_lon(algeria_schools_gdf, id_column='poi_id')

# Fill in other columns
formatted_algeria_schools["country_code"] = algeria_iso3
formatted_algeria_schools["poi_type"] = "school"
formatted_algeria_schools["is_connected"] = False

In [47]:
# data_table.DataTable(formatted_algeria_schools, num_rows_per_page=10)

## Cell site data

In [None]:
cell_metadata = pd.DataFrame({
   'column_name': ['ict_id', 'dataset_id', 'latitude', 'longitude', 'operator_name', 'radio_type', 'antenna_height_m', 'backhaul_type', 'backhaul_throughput_mbps'],
   'column_type': ['UUID', 'UUID', 'float', 'float', 'string', 'string', 'float', 'string', 'float'],
   'levels': [
       '',  # ict_id
       '',  # dataset_id
       '',  # latitude
       '',  # longitude
       '',  # operator_name
       'LTE, UMTS, GSM, CDMA',  # radio_type
       '',  # antenna_height_m
       'fiber, microwave, satellite',  # backhaul_type
       ''   # backhaul_throughput_mbps
   ],
   'example': ['123e4567-e89b-12d3-a456-426614174000', '987fcdeb-51a2-12d3-a456-426614174000', '38.988755', '1.401938', 'TelOperator', 'LTE', '25', 'fiber', '1000'],
   'mandatory': ['Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No'],
   'definition': [
       'Cell tower identifier',
       'Unique identifier for the dataset',
       'Cell tower geographical latitude',
       'Cell tower geographical longitude',
       'Mobile network operator name',
       'Type of radio transmission technology',
       'Antenna height on the tower or building',
       'Type of backhaul connectivity of the cell tower',
       'Equipped throughput of the backhaul'
   ]
})
styled_df = cell_metadata.style.set_properties(**{
   'text-align': 'left',
   'border': '1px solid black',
   'padding': '8px'
}).set_table_styles([
   {'selector': 'thead', 'props': [('background-color', '#f2f2f2'), ('font-weight', 'bold'), ('border-bottom', '2px solid black')]},
   {'selector': 'tbody tr:nth-of-type(odd)', 'props': [('background-color', '#f9f9f9')]}
])
display(styled_df)

In [None]:
algeria_cell_sites_gdf["radio"].value_counts()

We assume that each antenna height is 25 meters.

In [50]:
# Create blank dataframe with id, latitute and longitude columns
formatted_algeria_cell_sites = extract_lat_lon(algeria_cell_sites_gdf, id_column='ict_id')

# Fill in other columns
formatted_algeria_cell_sites["radio_type"] = algeria_cell_sites_gdf["radio"]
formatted_algeria_cell_sites["antenna_height_m"] = 25
formatted_algeria_cell_sites["backhaul_type"] = pd.NA
formatted_algeria_cell_sites["backhaul_throughput_mbps"] = pd.NA
formatted_algeria_cell_sites["operator_name"] = pd.NA

In [51]:
# data_table.DataTable(formatted_algeria_schools, num_rows_per_page=10)

## Transmission node data

In [None]:
node_metadata = pd.DataFrame({
   'column_name': ['ict_id', 'dataset_id', 'latitude', 'longitude', 'operator_name', 'infrastructure_type', 'node_status', 'equipped_capacity_mbps', 'potential_capacity_mbps'],
   'column_type': ['UUID', 'UUID', 'float', 'float', 'string', 'string', 'string', 'float', 'float'],
   'levels': [
       '',  # node_id
       '',  # dataset_id
       '',  # latitude
       '',  # longitude
       '',  # operator_name
       'fiber, microwave, other',  # infrastructure_type
       'operational, planned, under construction',  # node_status
       '',  # equipped_capacity_mbps
       ''   # potential_capacity_mbps
   ],
   'example': ['123e4567-e89b-12d3-a456-426614174000', '987fcdeb-51a2-12d3-a456-426614174000', '38.988755', '1.401938', 'TelOperator', 'fiber', 'operational', '1000', '2000'],
   'mandatory': ['Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No'],
   'definition': [
       'Node identifier',
       'Unique identifier for the dataset',
       'Geographical latitude',
       'Geographical longitude',
       'Name of the mobile operator',
       'Type of Infrastructure',
       'Status of the node',
       'Equipped bandwidth ready for use to connect subscribers',
       'Total theoretical bandwidth available for subscriber connections'
   ]
})

styled_df = node_metadata.style.set_properties(**{
   'text-align': 'left',
   'border': '1px solid black',
   'padding': '8px'
}).set_table_styles([
   {'selector': 'thead', 'props': [('background-color', '#f2f2f2'), ('font-weight', 'bold'), ('border-bottom', '2px solid black')]},
   {'selector': 'tbody tr:nth-of-type(odd)', 'props': [('background-color', '#f9f9f9')]}
])
display(styled_df)

In [None]:
algeria_nodes_gdf["type_infr"].value_counts()

All our points are fiber nodes, and we assume that they are all operational.

In [54]:
# Create blank dataframe with id, latitute and longitude columns
formatted_algeria_nodes = extract_lat_lon(algeria_nodes_gdf, id_column='ict_id')

# Fill in other columns
formatted_algeria_nodes["operator_name"] = pd.NA
formatted_algeria_nodes["infrastructure_type"] = "fiber"
formatted_algeria_nodes["node_status"] = "operational"
formatted_algeria_nodes["equipped_capacity_mbps"] = pd.NA
formatted_algeria_nodes["potential_capacity_mbps"] = pd.NA

In [55]:
# data_table.DataTable(formatted_algeria_schools, num_rows_per_page=10)

## Mobile coverage

In [None]:
coverage_metadata = pd.DataFrame({
   'column_name': ['coverage_id', 'dataset_id', 'signal_strength_dbm', 'operator_name', 'geometry', 'coverage'],
   'column_type': ['UUID', 'UUID', 'float', 'string', 'geometry', 'integer'],
   'levels': [
       '',  # coverage_id
       '',  # dataset_id
       '',  # signal_strength
       '',  # operator_name
       'polygon',  # geometry
       '1'
   ],
   'example': [
       '123e4567-e89b-12d3-a456-426614174000',
       '987fcdeb-51a2-12d3-a456-426614174000',
       '-93',
       'TelOperator',
       'POLYGON((...))',
       '1'
   ],
   'mandatory': ['Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes'],
   'definition': [
       'Unique identifier for the coverage area',
       'Unique identifier for the dataset',
       'Mobile signal strength in dBm for coverage',
       'Name of the mobile operator',
       'Polygon geometry of coverage area',
       'Binary value indicating coverage'
   ]
})

styled_df = coverage_metadata.style.set_properties(**{
   'text-align': 'left',
   'border': '1px solid black',
   'padding': '8px'
}).set_table_styles([
   {'selector': 'thead', 'props': [('background-color', '#f2f2f2'), ('font-weight', 'bold'), ('border-bottom', '2px solid black')]},
   {'selector': 'tbody tr:nth-of-type(odd)', 'props': [('background-color', '#f9f9f9')]}
])
display(styled_df)

In [57]:
# Create blank dataframe with id, latitute and longitude columns
formatted_algeria_coverage = algeria_clipped_mobile_coverage_gdf

# Fill in other columns
formatted_algeria_coverage["coverage"] = 1
formatted_algeria_coverage["signal_strength_dbm"] = pd.NA
formatted_algeria_coverage["operator_name"] = pd.NA
formatted_algeria_coverage["coverage_id"] = [str(uuid.uuid4()) for _ in range(len(formatted_algeria_coverage))]
formatted_algeria_coverage["dataset_id"] = str(uuid.uuid4())

In [None]:
formatted_algeria_coverage