# SQL Data Cleaning for Mobile Coverage Dataset

This **SQL** code is designed to clean and **prepare a public dataset** of mobile coverage data in **Google BigQuery**.

The primary goal is **data cleaning**, which includes creating a **new table**, **normalizing categorical features, handling missing data, and removing outliers from numerical features**.

This process is essential for **ensuring data quality and reliability** for subsequent **analysis**.

### Import libraries and modules

In [1]:
import pandas as pd
from google.cloud import bigquery

### Import function: Interactive SQL Query to Pandas DataFrame Converter

Import the custom query_df and run_query functions from the 'query_functions.py' file to execute SQL queries using a pre-configured BigQuery client object.

In [2]:
from query_functions import query_df  # Execute the query and return the output as a DataFrame
from query_functions import run_query  # Execute the query without returning a DataFrame, used for INSERT, UPDATE, DELETE, etc.

### Datasets and Tables paths to Google BigQuery

In [3]:
# Catalonian mobile coverage eu (2015-2017) --> mobile_data_2015_2017_cleaned
mobile_data_cleaned = "bq-analyst-230590.project_cat_mobile_coverage_2015_2017.mobile_data_2015_2017_cleaned"

### Creating Table and Columns

This following code creates a **copy of the 'mobile_data_2015_2017' table** by selecting specific columns and saves it as 'mobile_data_2015_2017_cleaned

In [4]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
CREATE TABLE IF NOT EXISTS `{mobile_data_cleaned}` AS
SELECT
    date,
    hour,
    lat,
    long,
    signal,
    network,
    operator,
    status,
    description,
    net,
    speed,
    satellites,
    precission,
    provider,
    activity,
    downloadSpeed,
    uploadSpeed,
    postal_code,
    town_name,
    position_geom
FROM
  `bigquery-public-data.catalonian_mobile_coverage_eu.mobile_data_2015_2017`
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Preview

In [7]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT *
FROM `{mobile_data_cleaned}` 
WHERE
    network IS NOT NULL
    AND postal_code IS NOT NULL
LIMIT 10
    """

# Execute the query
raw_data = query_df(query)

# Display data
raw_data

Unnamed: 0,date,hour,lat,long,signal,network,operator,status,description,net,speed,satellites,precission,provider,activity,downloadSpeed,uploadSpeed,postal_code,town_name,position_geom
0,2015-06-28,08:38:08,41.82873,1.90466,14,LCR,LCR,0,STATE_IN_SERVICE,,122.8,0.0,24.0,gps,ON_BICYCLE,,,81918,Sallent,POINT(1.90466 41.82873)
1,2015-07-08,08:56:02,41.8249,1.89674,14,LCR,LCR,2,STATE_EMERGENCY_ONLY,2G,126.9,1.0,20.0,gps,IN_VEHICLE,,,81918,Sallent,POINT(1.89674 41.8249)
2,2015-08-26,17:58:02,41.92712,2.25851,12,movistar,ONO,2,STATE_EMERGENCY_ONLY,3G,4.2,6.0,14.0,gps,IN_VEHICLE,,,82981,Vic,POINT(2.25851 41.92712)
3,2015-08-30,11:57:07,41.96662,2.24798,12,movistar,ONO,0,STATE_IN_SERVICE,,120.0,3.0,13.0,gps,IN_VEHICLE,,,81000,Gurb,POINT(2.24798 41.96662)
4,2016-03-06,13:49:06,42.17408,2.4877,12,movistar,ONO,2,STATE_EMERGENCY_ONLY,3G,28.6,7.0,4.0,gps,IN_VEHICLE,,,171143,Olot,POINT(2.4877 42.17408)
5,2016-05-31,15:36:33,42.19776,2.49898,8,vodafone,ONO,2,STATE_EMERGENCY_ONLY,3G,3.4,4.0,10.0,gps,IN_VEHICLE,,,171143,Olot,POINT(2.49898 42.19776)
6,2016-05-21,11:00:30,41.97608,2.80048,17,vodafone,ONO,2,STATE_EMERGENCY_ONLY,3G,5.1,8.0,10.0,gps,ON_FOOT,,,171557,Salt,POINT(2.80048 41.97608)
7,2016-05-21,10:59:28,41.97592,2.80151,17,vodafone,ONO,2,STATE_EMERGENCY_ONLY,3G,5.3,8.0,10.0,gps,ON_FOOT,,,171557,Salt,POINT(2.80151 41.97592)
8,2015-08-28,15:42:22,42.15721,1.85916,13,movistar,ONO,2,STATE_EMERGENCY_ONLY,3G,67.6,6.0,27.0,gps,IN_VEHICLE,,,82687,Cercs,POINT(1.85916 42.15721)
9,2015-08-27,11:26:00,42.1768,1.86113,15,movistar,ONO,2,STATE_EMERGENCY_ONLY,3G,46.8,6.0,31.0,gps,IN_VEHICLE,,,82687,Cercs,POINT(1.86113 42.1768)


In [8]:
raw_data.columns

Index(['date', 'hour', 'lat', 'long', 'signal', 'network', 'operator',
       'status', 'description', 'net', 'speed', 'satellites', 'precission',
       'provider', 'activity', 'downloadSpeed', 'uploadSpeed', 'postal_code',
       'town_name', 'position_geom'],
      dtype='object')

    - Add 'province' and 'year' new columns

A new column, **province**, is being added to the BigQuery dataset to include province values, enhancing the dataset for regional analysis.

This new column is currently empty and will be populated based on the structure of Spanish **postal codes**, with the first two digits determining the province name. If the 'postal_code' doesn't match any of these values, it assigns 'Not defined'. Existing rows with pre-defined province values will not be affected by this update.

In [9]:
# Datasets: {mobile_data_cleaned}

# SQL query: create and add 'province' column
query = f"""
ALTER TABLE `{mobile_data_cleaned}`
ADD COLUMN province STRING;
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Populate 'province' values

In [10]:
# Datasets: {mobile_data_cleaned}

# SQL query: populate 'province' values
query = f"""
UPDATE `{mobile_data_cleaned}`
SET province = CASE
  # First two digits condition (LEFT) for postal code using CAST to maintain a consistent string data type.
  WHEN LEFT(CAST(postal_code AS STRING), 2) = '08' THEN 'Barcelona'
  WHEN LEFT(CAST(postal_code AS STRING), 2) = '25' THEN 'Lleida'
  WHEN LEFT(CAST(postal_code AS STRING), 2) = '17' THEN 'Girona'
  WHEN LEFT(CAST(postal_code AS STRING), 2) = '43' THEN 'Tarragona'
  ELSE 'Not defined'
END
WHERE province IS NULL
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


In [15]:
# Datasets: {mobile_data_cleaned}

# SQL query: populate 'province' values
query = f"""
SELECT
  province,
  COUNT(*) province_counts
FROM `{mobile_data_cleaned}`
GROUP BY 1
ORDER BY 2 DESC
    """

# Execute the query
query_df(query)

Unnamed: 0,province,province_counts
0,Barcelona,7402086
1,Girona,1394942
2,Lleida,1012180
3,Tarragona,1003200
4,Not defined,932506


In order to query more efficiently, even though we have a 'date' column in the 'mobile_data_cleaned' dataset (formatted as YYYY-MM-DD), we are going to **add a new column, 'year'**, which extracts the year from 'date'.

This simplification will make querying the 'mobile_data_cleaned' dataset more straightforward.

In [16]:
# SQL query:
query = f"""
    ALTER TABLE `{mobile_data_cleaned}`
    ADD COLUMN IF NOT EXISTS year INT64
    """
# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Update 'year' values

In [17]:
# SQL query:
query = f"""
    UPDATE `{mobile_data_cleaned}`
    SET year = EXTRACT(YEAR FROM date)
    WHERE year IS NULL
    """
# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


In [18]:
# SQL query:
query = f"""
    SELECT
        DISTINCT year
    FROM `{mobile_data_cleaned}`
    """
# Execute the query
query_df(query)

Unnamed: 0,year
0,2016
1,2017
2,2015


    - Rename columns to avoid confusion and fix typos

In the original dataset, four crucial columns were present: 'net,' 'network,' 'operator,' and 'provider.' These columns provided information about the type of network (e.g., 4G, 3G, 2G), the network provider (e.g., Movistar, Orange), the specific operator (e.g., Movistar, Orange, ONO, Lowi), and the data provider (e.g., GPS, Fused, Network).

To eliminate potential confusion, particularly between the 'net' and 'network' columns, we have crafted the following SQL code to **rename** the **'network' column to 'net_provider'** and the **'provider' column to 'position_provider'**. This alteration aims to ensure the clarity and unambiguous purpose of each column and significantly enhance the overall transparency of the dataset.

In [19]:
# Datasets: {mobile_data_cleaned}

# SQL query: rename 'network' for 'net_provider'
query = f"""
ALTER TABLE `{mobile_data_cleaned}`
RENAME COLUMN network TO net_provider
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


In [20]:
# Datasets: {mobile_data_cleaned}

# SQL query: rename 'network' for 'net_provider'
query = f"""
ALTER TABLE `{mobile_data_cleaned}`
RENAME COLUMN provider TO position_provider
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


In [21]:
# Datasets: {mobile_data_cleaned}

# SQL query: fix typo in original dataset
query = f"""
ALTER TABLE `{mobile_data_cleaned}`
RENAME COLUMN precission TO precision
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Column types

In [22]:
raw_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 20 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   date           10 non-null     object 
 1   hour           10 non-null     object 
 2   lat            10 non-null     float64
 3   long           10 non-null     float64
 4   signal         10 non-null     int64  
 5   network        10 non-null     object 
 6   operator       10 non-null     object 
 7   status         10 non-null     int64  
 8   description    10 non-null     object 
 9   net            8 non-null      object 
 10  speed          10 non-null     float64
 11  satellites     10 non-null     float64
 12  precission     10 non-null     float64
 13  provider       10 non-null     object 
 14  activity       10 non-null     object 
 15  downloadSpeed  0 non-null      object 
 16  uploadSpeed    0 non-null      object 
 17  postal_code    10 non-null     object 
 18  town_name    

In this case, '**date**' and '**hour**' have been transformed to 'object' types. However, in the original Google BigQuery dataset, their formats are 'DATE' and 'TIME', respectively. 

Just in case you need to update these formats in the original table in BigQuery, here are the required queries.

(Please note that we are displaying this data as a Python DataFrame for reference, but the actual changes must be made in the original BigQuery dataset as we will be directly querying it. We won't be working with this DataFrame for further analysis.)

In [23]:
# SQL query: in BigQuery,convert 'date' to date format and 'hour' to time format
query = f"""
    # Convert 'date' column to date data type
    UPDATE `{mobile_data_cleaned}`
    SET date = CAST(date AS DATE)
    """
# Execute the query
# run_query(query)


query = f"""
    # Convert 'date' column to date data type
    UPDATE `{mobile_data_cleaned}`
    SET hour = CAST(hour AS TIME)
    """

# Execute the query
# run_query(query)

### Categorical Features

#### Standardize data names

**Net provider**

In [24]:
# Datasets: {mobile_data_cleaned}

# SQL query: original count of unique net_provider
query = f"""
SELECT
 COUNT(DISTINCT(net_provider)) original_unique_net_provider
FROM `{mobile_data_cleaned}`
    """

# Execute the query
query_df(query)

Unnamed: 0,original_unique_net_provider
0,250


In [25]:
# Datasets: {mobile_data_cleaned}

# SQL query: example of net_provider names not standardized (Movistar)
query = f"""
SELECT
 DISTINCT(net_provider)
FROM `{mobile_data_cleaned}`
WHERE
 UPPER(net_provider) LIKE 'MO%IS%'
    """

# Execute the query
query_df(query)

Unnamed: 0,net_provider
0,Movistar | Particular
1,Mobistar
2,Movistar | Empresa
3,movistar


    - Unusual characters

In [26]:
# Datasets: {mobile_data_cleaned}

# SQL query: unusual characters to take into account
query = f"""
SELECT DISTINCT net_provider
FROM `{mobile_data_cleaned}`
# Filter out net_provider values that contain non-alphanumeric characters (letters, digits, and spaces).
WHERE REGEXP_CONTAINS(net_provider, r'[^a-zA-Z0-9 ]')
    """

# Execute the query
query_df(query)

Unnamed: 0,net_provider
0,E-Plus
1,T-Mobile A
2,Vodafone.de
3,STA-MOBILAND
4,TH 3G+
5,A1 | Orange
6,3 AT | Orange
7,NL KPN | Orange
8,Movistar | Empresa
9,M-AND


This following code **standardizes the 'net_provider' names** in the table. It employs various functions like **UPPER** (to convert to uppercase), **TRIM** (to remove leading/trailing spaces), and '**=**' or **LIKE** (with **%** wildcard for pattern matching) to recognize and group similar operator names under a common name.

For example, 'movistar' and 'Mobistar' are both categorized as 'Movistar'. It ensures uniformity in the 'net_provider' column, making the data more consistent and easier to work with.

    - Standardize net_provider names

In [27]:
# Datasets: {mobile_data_cleaned}

# SQL query: Update net_provider names
query = f"""
UPDATE `{mobile_data_cleaned}`
SET net_provider = CASE
    WHEN UPPER(TRIM(net_provider)) LIKE '3%' THEN 'Three'
    WHEN UPPER(TRIM(net_provider)) LIKE '%AIRTEL%' THEN 'Airtel'
    WHEN UPPER(TRIM(net_provider)) LIKE '%B%TEL%' THEN 'Bytel'
    WHEN UPPER(TRIM(net_provider)) LIKE '%BOUYGUES%' THEN 'Bouygues Telecom'
    WHEN UPPER(TRIM(net_provider)) LIKE 'BUSCANDO %' THEN 'Sense Servei'
    WHEN UPPER(TRIM(net_provider)) = 'CABLE MOVIL' THEN 'Cable Movil'
    WHEN UPPER(TRIM(net_provider)) = 'CABLEMOVIL' THEN 'Cable Movil'
    WHEN UPPER(TRIM(net_provider)) LIKE 'CLARO%' THEN 'Claro'
    WHEN UPPER(TRIM(net_provider)) LIKE 'CUBACEL%' THEN 'Cubacel'
    WHEN UPPER(TRIM(net_provider)) LIKE 'E-%' THEN 'EE'
    WHEN UPPER(TRIM(net_provider)) LIKE '%EMER%' THEN 'Nomes Trucades Emergencies'
    WHEN UPPER(TRIM(net_provider)) LIKE 'JAZZTEL%' THEN 'Jazztel'
    WHEN UPPER(TRIM(net_provider)) LIKE 'LOWI%' THEN 'Lowi'
    WHEN LEFT(UPPER(TRIM(net_provider)), 14) = 'FRANCE TELECOM' THEN 'France Telcom Espana SA'
    WHEN UPPER(TRIM(net_provider)) LIKE 'MASMOVIL%' THEN 'Masmovil'
    WHEN UPPER(TRIM(net_provider)) LIKE '%MOBILAND%' THEN 'Mobiland'
    WHEN UPPER(TRIM(net_provider)) LIKE 'MOVILNET%' THEN 'Movilnet'
    WHEN UPPER(TRIM(net_provider)) LIKE '%MO%ISTAR%' THEN 'Movistar'
    WHEN UPPER(TRIM(net_provider)) LIKE 'MTS%' THEN 'MTS'
    WHEN UPPER(TRIM(net_provider)) LIKE '%ORANGE%' THEN 'Orange'
    WHEN UPPER(TRIM(net_provider)) = 'A1' THEN 'Orange'
    WHEN UPPER(TRIM(net_provider)) LIKE 'O2%' THEN 'O2'
    WHEN UPPER(TRIM(net_provider)) LIKE 'PE%EPHONE' THEN 'Pepephone'
    WHEN UPPER(TRIM(net_provider)) = 'PROXIMUS' THEN 'Proximus'
    WHEN UPPER(TRIM(net_provider)) LIKE 'REPUBLICA%' THEN 'Republica Movil'
    WHEN UPPER(TRIM(net_provider)) LIKE 'SENSE %' THEN 'Sense Servei'
    WHEN UPPER(TRIM(net_provider)) LIKE 'SIMYO%' THEN 'Simyo'
    WHEN UPPER(TRIM(net_provider)) LIKE 'SIN %' THEN 'Sense Servei'
    WHEN UPPER(TRIM(net_provider)) LIKE 'TIGO%' THEN 'TIGO'
    WHEN UPPER(TRIM(net_provider)) LIKE 'TIM%' THEN 'TIM'
    WHEN UPPER(TRIM(net_provider)) LIKE 'TELEF%' THEN 'Movistar'
    WHEN UPPER(TRIM(net_provider)) LIKE 'TDC%' THEN 'TDC Mobile'
    WHEN UPPER(TRIM(net_provider)) LIKE 'TELEKOM%' THEN 'Telekom'
    WHEN UPPER(TRIM(net_provider)) LIKE '%TELENOR%' THEN 'Telenor'
    WHEN UPPER(TRIM(net_provider)) LIKE '%T-MOBILE%' THEN 'T-Mobile'
    WHEN UPPER(TRIM(net_provider)) LIKE 'VIVO%' THEN 'Vivo'
    WHEN UPPER(TRIM(net_provider)) LIKE '%VODAFONE%' THEN 'Vodafone'
    WHEN UPPER(TRIM(net_provider)) = 'VF ES' THEN 'Vodafone'
    WHEN UPPER(TRIM(net_provider)) LIKE 'VODACOM%' THEN 'Vodacom'
    WHEN UPPER(TRIM(net_provider)) LIKE 'YOIGO%' THEN 'Yoigo'
    ELSE net_provider
END
WHERE net_provider IS NOT NULL;

    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Null values in net_provider

In [28]:
# Datasets: {mobile_data_cleaned}

# SQL query: Count null values and calculate the percentage
query = f"""
WITH NetProviderCount AS (
 SELECT
   COUNT(*) AS total_count_net_provider
 FROM `{mobile_data_cleaned}`
),
NullProviderCount AS (
 SELECT
   COUNT(*) AS null_count_net_provider
 FROM `{mobile_data_cleaned}`
 WHERE
   net_provider IS NULL
   OR net_provider = 'null'
)

SELECT
 null_count_net_provider,
 CONCAT(ROUND((null_count_net_provider/total_count_net_provider)*100, 2), " %") AS perc_null_net_provider
FROM NetProviderCount, NullProviderCount
"""

# Execute the query
query_df(query)

Unnamed: 0,null_count_net_provider,perc_null_net_provider
0,53284,0.45 %


     - Remove rows with specified net_provider values and NULL net_providers from the table.

In [29]:
# Datasets: {mobile_data_cleaned}

# SQL query: drop specific rows
query = f"""
DELETE FROM `{mobile_data_cleaned}`
WHERE
 net_provider IS NULL
 OR net_provider IN ('000000', '21303', '21401', '23866', '90118', '?????', 'null')
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Remove rows where the activity count for each net_provider is less than 10

In [30]:
# Datasets: {mobile_data_cleaned}

# SQL query: drop specific rows
query = f"""
DELETE FROM `{mobile_data_cleaned}`
WHERE net_provider IN (
    SELECT net_provider
    FROM `{mobile_data_cleaned}`
    GROUP BY net_provider
    HAVING COUNT(*) < 10
)
    """
# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


In [31]:
# Datasets: {mobile_data_cleaned}

# SQL query: final count of unique net_provider
query = f"""
SELECT
 COUNT(DISTINCT(net_provider)) final_unique_net_provider
FROM `{mobile_data_cleaned}`
    """

# Execute the query
query_df(query)

Unnamed: 0,final_unique_net_provider
0,122


In [32]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT
 net_provider,
 COUNT(*) activity_count
FROM `{mobile_data_cleaned}`
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20
"""

# Execute the query 
query_df(query)

Unnamed: 0,net_provider,activity_count
0,Movistar,5087640
1,Orange,2932386
2,Vodafone,2822286
3,Yoigo,376842
4,MetroPCS,95162
5,Eroski Movil,40041
6,Jazztel,28894
7,France Telcom Espana SA,28113
8,TICAE,26763
9,Lowi,21742


**Operator**

In [33]:
# Datasets: {mobile_data_cleaned}

# SQL query: original count of unique net_provider
query = f"""
SELECT
 COUNT(DISTINCT(operator)) original_unique_net_operator
FROM `{mobile_data_cleaned}`
    """

# Execute the query
query_df(query)

Unnamed: 0,original_unique_net_operator
0,243


    - Standardize operator names

In [34]:
# Datasets: {mobile_data_cleaned}

# SQL query: Update operator names
query = f"""
UPDATE `{mobile_data_cleaned}`
SET operator = CASE
    WHEN UPPER(TRIM(operator)) LIKE '3%' THEN 'Three'
    WHEN UPPER(TRIM(operator)) LIKE '%AIRTEL%' THEN 'Airtel'
    WHEN UPPER(TRIM(operator)) LIKE '%B%TEL%' THEN 'Bytel'
    WHEN UPPER(TRIM(operator)) LIKE '%BOUYGUES%' THEN 'Bouygues Telecom'
    WHEN UPPER(TRIM(operator)) LIKE 'BUSCANDO %' THEN 'Sense Servei'
    WHEN UPPER(TRIM(operator)) = 'CABLE MOVIL' THEN 'Cable Movil'
    WHEN UPPER(TRIM(operator)) = 'CABLEMOVIL' THEN 'Cable Movil'
    WHEN UPPER(TRIM(operator)) LIKE 'CLARO%' THEN 'Claro'
    WHEN UPPER(TRIM(operator)) LIKE 'CUBACEL%' THEN 'Cubacel'
    WHEN UPPER(TRIM(operator)) LIKE 'E-%' THEN 'EE'
    WHEN UPPER(TRIM(operator)) LIKE '%EMER%' THEN 'Nomes Trucades Emergencies'
    WHEN UPPER(TRIM(operator)) LIKE 'JAZZTEL%' THEN 'Jazztel'
    WHEN UPPER(TRIM(operator)) LIKE 'LOWI%' THEN 'Lowi'
    WHEN LEFT(UPPER(TRIM(operator)), 14) = 'FRANCE TELECOM' THEN 'France Telcom Espana SA'
    WHEN UPPER(TRIM(operator)) LIKE 'MASMOVIL%' THEN 'Masmovil'
    WHEN UPPER(TRIM(operator)) LIKE '%MOBILAND%' THEN 'Mobiland'
    WHEN UPPER(TRIM(operator)) LIKE 'MOVILNET%' THEN 'Movilnet'
    WHEN UPPER(TRIM(operator)) LIKE '%MO%ISTAR%' THEN 'Movistar'
    WHEN UPPER(TRIM(operator)) LIKE 'MTS%' THEN 'MTS'
    WHEN UPPER(TRIM(operator)) LIKE '%ORANGE%' THEN 'Orange'
    WHEN UPPER(TRIM(operator)) = 'A1' THEN 'Orange'
    WHEN UPPER(TRIM(operator)) LIKE 'O2%' THEN 'O2'
    WHEN UPPER(TRIM(operator)) LIKE 'PE%EPHONE' THEN 'Pepephone'
    WHEN UPPER(TRIM(operator)) = 'PROXIMUS' THEN 'Proximus'
    WHEN UPPER(TRIM(operator)) LIKE 'REPUBLICA%' THEN 'Republica Movil'
    WHEN UPPER(TRIM(operator)) LIKE 'SENSE %' THEN 'Sense Servei'
    WHEN UPPER(TRIM(operator)) LIKE 'SIMYO%' THEN 'Simyo'
    WHEN UPPER(TRIM(operator)) LIKE 'SIN %' THEN 'Sense Servei'
    WHEN UPPER(TRIM(operator)) LIKE 'TIGO%' THEN 'TIGO'
    WHEN UPPER(TRIM(operator)) LIKE 'TIM%' THEN 'TIM'
    WHEN UPPER(TRIM(operator)) LIKE 'TELEF%' THEN 'Movistar'
    WHEN UPPER(TRIM(operator)) LIKE 'TDC%' THEN 'TDC Mobile'
    WHEN UPPER(TRIM(operator)) LIKE 'TELEKOM%' THEN 'Telekom'
    WHEN UPPER(TRIM(operator)) LIKE '%TELENOR%' THEN 'Telenor'
    WHEN UPPER(TRIM(operator)) LIKE '%T-MOBILE%' THEN 'T-Mobile'
    WHEN UPPER(TRIM(operator)) LIKE 'VIVO%' THEN 'Vivo'
    WHEN UPPER(TRIM(operator)) LIKE '%VODAFONE%' THEN 'Vodafone'
    WHEN UPPER(TRIM(operator)) = 'VF ES' THEN 'Vodafone'
    WHEN UPPER(TRIM(operator)) LIKE 'VODACOM%' THEN 'Vodacom'
    WHEN UPPER(TRIM(operator)) LIKE 'YOIGO%' THEN 'Yoigo'
    ELSE operator
END
WHERE operator IS NOT NULL;
    """

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - Null values in operator

In [35]:
# Datasets: {mobile_data_cleaned}

# SQL query: Count null values and calculate the percentage
query = f"""
WITH NetOperatorCount AS (
 SELECT
   COUNT(*) AS total_count_operator
 FROM `{mobile_data_cleaned}`
),
NullOperatorCount AS (
 SELECT
   COUNT(*) AS null_count_operator
 FROM `{mobile_data_cleaned}`
 WHERE
   operator IS NULL
   OR operator = 'null'
)

SELECT
 null_count_operator,
 CONCAT(ROUND((null_count_operator/total_count_operator)*100, 2), " %") AS perc_null_operator
FROM NetOperatorCount, NullOperatorCount
"""

# Execute the query
query_df(query)

Unnamed: 0,null_count_operator,perc_null_operator
0,0,0 %


In [37]:
# Datasets: {mobile_data_cleaned}

# SQL query: final count of unique operator
query = f"""
SELECT
 COUNT(DISTINCT(operator)) final_unique_operator
FROM `{mobile_data_cleaned}`
    """

# Execute the query
query_df(query)

Unnamed: 0,final_unique_operator
0,129


In [38]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT
 operator,
 COUNT(*) activity_count
FROM `{mobile_data_cleaned}`
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20
"""

# Execute the query 
query_df(query)

Unnamed: 0,operator,activity_count
0,Movistar,4601611
1,Vodafone,2808951
2,Orange,2130386
3,Jazztel,465242
4,Pepephone,398089
5,Yoigo,376842
6,RACC,145208
7,Simyo,142167
8,MetroPCS,95162
9,PARLEM,48442


**Net**

In [39]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT
 net,
 COUNT(*) activity_record
FROM `{mobile_data_cleaned}`
GROUP BY 1
"""

# Execute the query 
query_df(query)

Unnamed: 0,net,activity_record
0,,763270
1,2G,2349673
2,4G,4532624
3,3G,4045652


    - Replace null values with 'Undefined Net'

In [40]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
UPDATE `{mobile_data_cleaned}`
SET net = CASE
  WHEN net IS NULL THEN 'Undefined net'
  ELSE net
END
WHERE net IS NULL
"""

# Execute the query 
run_query(query)

Query successfully executed, and the table has been updated.


**Position Provider**

In [41]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT
 position_provider,
 COUNT(*) activity_record
FROM `{mobile_data_cleaned}`
GROUP BY 1
"""

# Execute the query 
query_df(query)

Unnamed: 0,position_provider,activity_record
0,network,286
1,fused,1409817
2,local_database,4
3,19,1
4,gps,10279854
5,GPS,1125
6,disabled,25
7,,3
8,22,1
9,2017-08-28 11:31:10.000000,1


    - Standardize provider names

In [42]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
UPDATE `{mobile_data_cleaned}`
SET position_provider = CASE
    WHEN UPPER(TRIM(position_provider)) LIKE '%GPS%' THEN 'GPS'
    WHEN UPPER(TRIM(position_provider)) LIKE '%FUSED%' THEN 'Fused'
    WHEN UPPER(TRIM(position_provider)) LIKE '%NETWORK%' THEN 'Network'
    ELSE 'Undefined'
END
WHERE 
    position_provider IS NOT NULL
    OR position_provider IS NULL
"""

# Execute the query 
run_query(query)

Query successfully executed, and the table has been updated.


In [43]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT
 position_provider,
 COUNT(*) activity_record
FROM `{mobile_data_cleaned}`
GROUP BY 1
"""

# Execute the query 
query_df(query)

Unnamed: 0,position_provider,activity_record
0,Network,383
1,Undefined,39
2,Fused,1409817
3,GPS,10280980


**Postal Code** and **Town Names**

This part ensures that there are no discrepancies between postal codes and town names, and removes rows with missing values in both columns.

In [44]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT
 COUNT(*) one_valid_one_null
FROM `{mobile_data_cleaned}`
WHERE
    (postal_code IS NULL AND town_name IS NOT NULL)
    OR (postal_code IS NOT NULL AND town_name IS NULL)
"""

# Execute the query
query_df(query)

Unnamed: 0,one_valid_one_null
0,0


In [45]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT
 COUNT(*) both_features_null
FROM `{mobile_data_cleaned}`
WHERE
    postal_code IS NULL AND town_name IS NULL
"""

# Execute the query
query_df(query)

Unnamed: 0,both_features_null
0,926965


    - This means that all null values in postal_code correspond to null values in `town_name (and also correspond to missing 'province' values). Therefore, we are going to drop all rows with null values in both features.

In [46]:
# Datasets: {mobile_data_cleaned}

# SQL query: delete specific rows
query = f"""
DELETE FROM `{mobile_data_cleaned}`
WHERE postal_code IS NULL AND town_name IS NULL
"""

# Execute the query 
run_query(query)

Query successfully executed, and the table has been updated.


**Download Speed** and **Upload Speed**

In [47]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT
 DISTINCT(downloadSpeed)
FROM `{mobile_data_cleaned}`
LIMIT 60
"""

# Execute the query
query_df(query)

Unnamed: 0,downloadSpeed
0,


In [48]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT
 DISTINCT(uploadSpeed)
FROM `{mobile_data_cleaned}`
LIMIT 60
"""

# Execute the query
query_df(query)

Unnamed: 0,uploadSpeed
0,


    - Both columns have no recorded data and only contain null values. Since they do not provide any information, we are going to drop both columns.

In [49]:
# Datasets: {mobile_data_cleaned}

# SQL query: drop specific columns
query = f"""
ALTER TABLE `{mobile_data_cleaned}`
DROP COLUMN downloadSpeed,
DROP COLUMN uploadSpeed
"""

# Execute the query 
run_query(query)

Query successfully executed, and the table has been updated.


**Description** and **Activity**

In [50]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT
 description,
 COUNT(*) activity_record
FROM `{mobile_data_cleaned}`
GROUP BY 1
LIMIT 60
"""

# Execute the query
query_df(query)

Unnamed: 0,description,activity_record
0,STATE_OUT_OF_SERVICE,142
1,STATE_POWER_OFF,26407
2,STATE_IN_SERVICE,1151314
3,STATE_EMERGENCY_ONLY,9586391


In [51]:
# Datasets: {mobile_data_cleaned}

# SQL query:
query = f"""
SELECT
 activity,
 COUNT(*) activity_record
FROM `{mobile_data_cleaned}`
GROUP BY 1
LIMIT 60
"""

# Execute the query
query_df(query)

Unnamed: 0,activity,activity_record
0,UNKNOWN,699712
1,TILTING,1092941
2,IN_VEHICLE,5349968
3,ON_BICYCLE,200495
4,STILL,1355843
5,ON_FOOT,2065263
6,,32


    - Update null values to 'UNKNOWN' for the 'activity' column.

In [52]:
# Datasets: {mobile_data_cleaned}

# SQL query: update NULL values
query = f"""
UPDATE `{mobile_data_cleaned}`
SET activity = CASE 
  WHEN activity IS NULL THEN 'UNKNOWN' 
END
WHERE activity IS NULL
"""

# Execute the query 
run_query(query)

Query successfully executed, and the table has been updated.


### Numerical Features

#### Summary Statistics

In [53]:
# Datasets: {mobile_data_cleaned}

# SQL query: querying numerical features
query = f"""
SELECT
 status,
 speed,
 precision,
 signal,
 satellites
FROM `{mobile_data_cleaned}`
LIMIT 10
"""

# Execute the query
query_df(query)

Unnamed: 0,status,speed,precision,signal,satellites
0,2,131.0,23.0,17,3.0
1,2,20.4,11.0,15,2.0
2,2,15.3,29.0,4,7.0
3,2,73.6,18.0,7,3.0
4,2,71.0,27.0,7,6.0
5,2,78.1,16.0,7,5.0
6,2,132.4,37.0,8,5.0
7,2,134.3,16.0,10,5.0
8,2,45.7,9.0,10,7.0
9,2,2.7,11.0,14,0.0


    - As 'status' values are directly related to the 'description' (categorical) values, we will drop the 'status' column.

In [54]:
# Datasets: {mobile_data_cleaned}

# SQL query: delete specific rows
query = f"""
ALTER TABLE `{mobile_data_cleaned}`
DROP COLUMN status
"""

# Execute the query 
run_query(query)

Query successfully executed, and the table has been updated.


Despite the option to load the entire dataset into a DataFrame and later apply the pandas .describe() function to obtain all the **summary statistics for numerical features**, we are going to manually calculate some of them **using SQL commands**.

In [55]:
# Datasets: {mobile_data_cleaned}

# SQL query: summary statistics
query = f"""
SELECT
  'speed' AS metric,
  MIN(speed) AS min_value,
  APPROX_QUANTILES(speed, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(speed, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(speed, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(speed) AS INT64) AS max_value,
  ROUND(AVG(speed), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(speed), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`

UNION ALL

SELECT
  'satellites' AS metric,
  MIN(satellites) AS min_value,
  APPROX_QUANTILES(satellites, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(satellites, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(satellites, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(satellites) AS INT64) AS max_value,
  ROUND(AVG(satellites), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(satellites), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`

UNION ALL

SELECT
  'precision' AS metric,
  MIN(precision) AS min_value,
  APPROX_QUANTILES(precision, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(precision, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(precision, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(precision) AS INT64) AS max_value,
  ROUND(AVG(precision), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(precision), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`

UNION ALL

SELECT
  'signal' AS metric,
  MIN(signal) AS min_value,
  APPROX_QUANTILES(signal, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(signal, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(signal, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(signal) AS INT64) AS max_value,
  ROUND(AVG(signal), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(signal), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`
"""

# Execute the query
query_df(query)

Unnamed: 0,metric,min_value,percentile_25,median,percentile_75,max_value,avg_value,std_value
0,signal,0.0,8.0,12.0,18.0,99,13.2,7
1,speed,0.0,1.4,7.2,39.4,255,25.85,35
2,satellites,0.0,2.0,4.0,7.0,11503299477926,1068783.46,3506348267
3,precision,0.0,10.0,17.0,30.0,201503299477926,28152604.87,68773279817


- **APPROX_QUANTILES(column, percentil_divisions)[OFFSET(percentil_index)]**:

Is a SQL construct used to estimate quantiles for the specified column, with the option to retrieve a specific quantile based on its position within the array of calculated quantiles. This is particularly useful for analyzing data distributions and summarizing data into percentiles or quartiles.

    - Handling 'precision' and 'satellites' possible outliers 

In [56]:
# Datasets: {mobile_data_cleaned}

# SQL query: precision values ordered DESC to check maximum values
query = f"""
SELECT
 CAST(precision AS INT64) precision
FROM `{mobile_data_cleaned}`
ORDER BY 1 DESC
LIMIT 10
"""

# Execute the query
query_df(query)

Unnamed: 0,precision
0,201503299477926
1,101503867043000
2,5304
3,3799
4,3400
5,3400
6,3400
7,3400
8,3400
9,3400


In [57]:
# Datasets: {mobile_data_cleaned}

# SQL query: satellites values ordered DESC to check maximum values
query = f"""
SELECT
 CAST(satellites AS INT64) satellites
FROM `{mobile_data_cleaned}`
ORDER BY 1 DESC
LIMIT 10
"""

# Execute the query
query_df(query)

Unnamed: 0,satellites
0,11503299477926
1,42
2,29
3,29
4,29
5,29
6,29
7,29
8,28
9,28


    - Handling outliers in 'precision' colum

Now we are going to calculate various percentiles (25th, 50th, 75th, 90th, 95th, 99th) for the 'precision' column in the table and display their values.

In [58]:
# Datasets: {mobile_data_cleaned}

# SQL query: satellites values ordered DESC to check maximum values
query = f"""
WITH Percentiles AS (
  SELECT
    APPROX_QUANTILES(precision, 100)[OFFSET(25)] AS percentile_25,
    APPROX_QUANTILES(precision, 100)[OFFSET(50)] AS median,
    APPROX_QUANTILES(precision, 100)[OFFSET(75)] AS percentile_75,
    APPROX_QUANTILES(precision, 100)[OFFSET(90)] AS percentile_90,
    APPROX_QUANTILES(precision, 100)[OFFSET(95)] AS percentile_95,
    APPROX_QUANTILES(precision, 100)[OFFSET(99)] AS percentile_99
  FROM `{mobile_data_cleaned}`
)
SELECT percentile_25, median, percentile_75, percentile_90, percentile_95, percentile_99
FROM Percentiles;
"""

# Execute the query
query_df(query)

Unnamed: 0,percentile_25,median,percentile_75,percentile_90,percentile_95,percentile_99
0,10.0,17.0,30.0,48.0,62.0,120.0


In [59]:
# Datasets: {mobile_data_cleaned}

# SQL query: precision equal or greater or greater than 120
query = f"""
SELECT
 COUNT(*) precision_eq_or_gr_t_120
FROM `{mobile_data_cleaned}`
WHERE precision >= 120
"""

# Execute the query
query_df(query)

Unnamed: 0,precision_eq_or_gr_t_120
0,108841


    - Percentile 99: the top 1% of values (108841 rows) have precision values equal or greater than 120.

    - How would the removal of these outliers affect the data distribution, and would this impact be consistent across different provinces?

This following query calculates and compares outlier percentages in different provinces based on the 'precision' column, where values equal or above 120 are considered outliers. It provides insights into how outliers are distributed among provinces.

In [60]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT
  province,
  COUNT(*) AS province_records,
  SUM(CASE WHEN precision >= 120 THEN 1 ELSE 0 END) AS province_outliers,
  ROUND((SUM(CASE WHEN precision >= 120 THEN 1 ELSE 0 END) / COUNT(*)) * 100, 2) AS percentage_outliers
FROM `{mobile_data_cleaned}`
GROUP BY 1
ORDER BY 1
"""

# Execute the query
query_df(query)

Unnamed: 0,province,province_records,province_outliers,percentage_outliers
0,Barcelona,7367393,78754,1.07
1,Girona,1391076,14873,1.07
2,Lleida,1003705,4518,0.45
3,Tarragona,1002080,10696,1.07


    - Barcelona, Girona and Tarragona would have the same impact (1.07% of their data) if these outliers are removed. Only Lleida would have a smaller effect (0.45%).
    
    - We are going to proceed with the deletion of 'precision' values equal or greater than 120.

In [61]:
# Datasets: {mobile_data_cleaned}

# SQL query: delete outliers in 'precision' column
query = f"""
DELETE FROM `{mobile_data_cleaned}`
WHERE precision >= 120
"""

# Execute the query
run_query(query)

Query successfully executed, and the table has been updated.


    - As a result of removing the 'precision' outliers, the outlier in the 'satellites' column, which had a value of 11503299477926, has also been eliminated from the dataset.

In [62]:
# Datasets: {mobile_data_cleaned}

# SQL query: satellites values ordered DESC to check maximum values
query = f"""
SELECT
 CAST(satellites AS INT64) satellites
FROM `{mobile_data_cleaned}`
ORDER BY 1 DESC
LIMIT 10
"""

# Execute the query
query_df(query)

Unnamed: 0,satellites
0,42
1,29
2,29
3,29
4,29
5,29
6,29
7,28
8,28
9,28


    - Summary Statistics

In [63]:
# Datasets: {mobile_data_cleaned}

# SQL query: summary statistics
query = f"""
SELECT
  'speed' AS metric,
  MIN(speed) AS min_value,
  APPROX_QUANTILES(speed, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(speed, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(speed, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(speed) AS INT64) AS max_value,
  ROUND(AVG(speed), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(speed), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`

UNION ALL

SELECT
  'satellites' AS metric,
  MIN(satellites) AS min_value,
  APPROX_QUANTILES(satellites, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(satellites, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(satellites, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(satellites) AS INT64) AS max_value,
  ROUND(AVG(satellites), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(satellites), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`

UNION ALL

SELECT
  'precision' AS metric,
  MIN(precision) AS min_value,
  APPROX_QUANTILES(precision, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(precision, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(precision, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(precision) AS INT64) AS max_value,
  ROUND(AVG(precision), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(precision), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`

UNION ALL

SELECT
  'signal' AS metric,
  MIN(signal) AS min_value,
  APPROX_QUANTILES(signal, 100)[OFFSET(25)] AS percentile_25,
  APPROX_QUANTILES(signal, 100)[OFFSET(50)] AS median,
  APPROX_QUANTILES(signal, 100)[OFFSET(75)] AS percentile_75,
  CAST(MAX(signal) AS INT64) AS max_value,
  ROUND(AVG(signal), 2) AS avg_value,
  CAST(ROUND(STDDEV_POP(signal), 2) AS INT64) AS std_value
FROM `{mobile_data_cleaned}`
"""

# Execute the query
query_df(query)

Unnamed: 0,metric,min_value,percentile_25,median,percentile_75,max_value,avg_value,std_value
0,satellites,0.0,2.0,5.0,7.0,42,4.97,4
1,precision,0.0,10.0,17.0,29.0,119,21.92,18
2,speed,0.0,1.4,7.3,39.7,255,25.97,35
3,signal,0.0,8.0,12.0,18.0,99,13.22,7


### Save Cleaned Table

If required, the fully cleaned dataset could be stored under the name 'mobile_data_2015_2017_cleaned_final' for use in subsequent analysis and reporting.

In [64]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
CREATE OR REPLACE TABLE `bq-analyst-230590.project_cat_mobile_coverage_2015_2017.mobile_data_2015_2017_cleaned_final`
AS
SELECT *
FROM `{mobile_data_cleaned}`
"""

# Execute the query
# run_query(query)

### Cleaned Table

In [65]:
# Datasets: {mobile_data_cleaned}

# SQL query: 
query = f"""
SELECT *
FROM `{mobile_data_cleaned}`
#LIMIT 20
"""

# Execute the query
cleaned_data = query_df(query)

# Display data
display(cleaned_data.head(), cleaned_data.tail(), cleaned_data.shape, cleaned_data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10655413 entries, 0 to 10655412
Data columns (total 19 columns):
 #   Column             Dtype  
---  ------             -----  
 0   date               object 
 1   hour               object 
 2   lat                float64
 3   long               float64
 4   signal             int64  
 5   net_provider       object 
 6   operator           object 
 7   description        object 
 8   net                object 
 9   speed              float64
 10  satellites         float64
 11  precision          float64
 12  position_provider  object 
 13  activity           object 
 14  postal_code        object 
 15  town_name          object 
 16  position_geom      object 
 17  province           object 
 18  year               int64  
dtypes: float64(5), int64(2), object(12)
memory usage: 1.5+ GB


Unnamed: 0,date,hour,lat,long,signal,net_provider,operator,description,net,speed,satellites,precision,position_provider,activity,postal_code,town_name,position_geom,province,year
0,2017-08-17,18:48:48,41.90842,1.87953,4,Nomes Trucades Emergencies,Nomes Trucades Emergencies,STATE_EMERGENCY_ONLY,4G,106.7,5.0,18.0,GPS,IN_VEHICLE,81419,Navàs,POINT(1.87953 41.90842),Barcelona,2017
1,2016-05-22,13:32:46,42.41054,1.12937,9,Nomes Trucades Emergencies,Nomes Trucades Emergencies,STATE_EMERGENCY_ONLY,4G,0.1,0.0,41.0,Fused,ON_FOOT,252094,Sort,POINT(1.12937 42.41054),Lleida,2016
2,2016-09-22,12:59:04,41.53058,2.08809,11,Sense Servei,Sense Servei,STATE_EMERGENCY_ONLY,4G,34.8,12.0,5.0,GPS,IN_VEHICLE,82384,Sant Quirze del Vallès,POINT(2.08809 41.53058),Barcelona,2016
3,2016-04-27,07:44:15,41.55405,2.43134,11,Republica Movil,Republica Movil,STATE_EMERGENCY_ONLY,3G,5.4,2.0,25.0,GPS,STILL,81213,Mataró,POINT(2.43134 41.55405),Barcelona,2016
4,2016-08-12,12:02:28,41.24297,1.86728,0,France Telcom Espana SA,France Telcom Espana SA,STATE_IN_SERVICE,Undefined net,75.9,2.0,24.0,GPS,IN_VEHICLE,82704,Sitges,POINT(1.86728 41.24297),Barcelona,2016


Unnamed: 0,date,hour,lat,long,signal,net_provider,operator,description,net,speed,satellites,precision,position_provider,activity,postal_code,town_name,position_geom,province,year
10655408,2016-01-04,19:24:43,41.39347,2.16628,24,Yoigo,Yoigo,STATE_EMERGENCY_ONLY,3G,11.1,4.0,14.0,GPS,ON_FOOT,80193,Barcelona,POINT(2.16628 41.39347),Barcelona,2016
10655409,2016-05-01,14:20:45,41.4729,2.07955,10,Yoigo,Yoigo,STATE_EMERGENCY_ONLY,3G,2.6,6.0,12.0,GPS,ON_FOOT,82055,Sant Cugat del Vallès,POINT(2.07955 41.4729),Barcelona,2016
10655410,2015-09-18,13:34:34,41.58331,1.94569,19,Yoigo,Yoigo,STATE_EMERGENCY_ONLY,2G,89.6,10.0,7.0,GPS,IN_VEHICLE,82917,Vacarisses,POINT(1.94569 41.58331),Barcelona,2015
10655411,2015-05-06,18:43:55,41.41957,2.14904,15,Lycamobile,Lycamobile,STATE_IN_SERVICE,3G,0.8,0.0,43.0,Fused,TILTING,80193,Barcelona,POINT(2.14904 41.41957),Barcelona,2015
10655412,2016-01-24,13:13:25,41.68968,2.16769,8,Eroski Movil,Eroski Movil,STATE_EMERGENCY_ONLY,3G,23.9,4.0,23.0,GPS,STILL,82107,Sant Feliu de Codines,POINT(2.16769 41.68968),Barcelona,2016


(10655413, 19)

None