# The Rome dataset

In [3]:
import pandas as pd
from scipy.io import loadmat
from tqdm import tqdm

pd.set_option('display.max_columns', None)

## There are two papers regarding the dataset
1. The first one is [A Large-Scale Dataset of 4G, NB-IoT, and 5G Non-Standalone Network Measurements](https://ieeexplore.ieee.org/document/10239125), in which there are passive and active versions of datasets for NB-IoT, 4G, and 5G setups; for the NB-IoT and 4G cases there are also the estimated positions of the cells.
2. The second one is [Positioning by Multicell Fingerprinting in Urban NB-IoT Networks](https://www.mdpi.com/1424-8220/23/9/4266); this corresponds more to the positioning problem. This paper mentions this dataset version [Second data](https://zenodo.org/records/7674299), where you can find the preprocessed raw data in .mat format.

### The second data

While looking the preprocessing codes I found the part connected with ToA measurement. (But I am not sure this is what we have in wair-d case, so need to be checked once more).

This is the description of the preprocessed .mat dataset found in the codes
1. latitude
2. longitude
3. a matrix that contains for each row the following info: NPCI; eNodeB ID; RSSI; NSINR; NRSRP; NRSRQ; ToA; operatorID; campaignID
4. a scalar that reports the number of NPCIs with RF data for operator 1
5. a logical column vector that has 1s at positions of the matrix containing a NPCI with RF data for operator 1
6. a scalar that reports the number of NPCIs with ToA data for operator 1
7. a logical column vector that has 1s at positions of the matrix containing a NPCI with ToA data for operator 1
8. a scalar that reports the number of NPCIs with RF data for operator 2
9. a logical column vector that has 1s at positions of the matrix containing a NPCI with RF data for operator 2
10. a scalar that reports the number of NPCIs with ToA data for operator 2
11. a logical column vector that has 1s at positions of the matrix containing a NPCI with ToA data for operator 2
12. a scalar that reports the number of NPCIs with RF data for operator 3
13. a logical column vector that has 1s at positions of the matrix containing a NPCI with RF data for operator 3
14. a scalar that reports the number of NPCIs with ToA data for operator 3
15. a logical column vector that has 1s at positions of the matrix containing a NPCI with ToA data for operator 3
16. a column vector that contans the list of campaign IDs that contributed to the data in the location

In [13]:
mat = loadmat('/auto/home/mkrtchyan/iot/data/NB-IoT_5G_dataset/NB-IoT/Rome.mat')

FileNotFoundError: [Errno 2] No such file or directory: '/auto/home/mkrtchyan/iot/data/NB-IoT_5G_dataset/NB-IoT/Rome.mat'

In [3]:
mat["dataSet"].shape

(2670, 16)

In [4]:
dct = {}

item_id = 0

for idx in tqdm(range(len(mat["dataSet"]))):
    info_matrix = mat["dataSet"][idx, 2]
    latitude = mat["dataSet"][idx, 0].item()
    longitude = mat["dataSet"][idx, 1].item()
    info_keys = ["NPCI", "eNodeB ID", "RSSI", "NSINR", "NRSRP",
                 "NRSRQ", "ToA", "operatorID", "campaignID"]
    for row_i in range(len(info_matrix)):
        info_dct = {}
        info_matrix_i = info_matrix[row_i]
        info_dct["latitude"] = latitude
        info_dct["longitude"] = longitude
        for key_idx, key in enumerate(info_keys):
            info_dct[key] = info_matrix_i[key_idx]
        dct[item_id] = info_dct
        item_id += 1


100%|██████████| 2670/2670 [00:00<00:00, 18066.47it/s]


In [5]:
df = pd.DataFrame.from_dict(dct, orient="index")

In [6]:
df

Unnamed: 0,latitude,longitude,NPCI,eNodeB ID,RSSI,NSINR,NRSRP,NRSRQ,ToA,operatorID,campaignID
0,41.824214,12.465250,0.0,316061.0,-57.780,5.150,-66.190,-8.400,5530.90,88.0,1.0
1,41.824214,12.465250,10.0,300043.0,-66.265,22.125,-71.030,-4.750,3530.24,1.0,1.0
2,41.824214,12.465250,52.0,372017.0,-58.600,9.350,-64.980,-6.360,1221.26,88.0,1.0
3,41.824214,12.465250,61.0,316716.0,-58.920,-9.400,-78.770,-20.020,4335.62,88.0,1.0
4,41.824214,12.465250,112.0,69046.0,-63.265,-0.840,-75.580,-12.500,2707.04,10.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...
40095,41.870661,12.463569,297.0,318928.0,-61.100,-18.430,-88.290,-27.130,2359.91,88.0,6.0
40096,41.870661,12.463569,334.0,67589.0,-70.890,0.445,-83.365,-12.350,1034.02,10.0,6.0
40097,41.870661,12.463569,360.0,69081.0,-71.840,-13.255,-95.960,-24.055,435.32,10.0,6.0
40098,41.870661,12.463569,426.0,316601.0,-61.100,11.280,-69.075,-7.965,2359.54,88.0,6.0


### The first data

Loading the passive measurements and mergeing them with second data based on ['Latitude', 'Longitude', 'PCI', 'MNC', 'eNodeB.ID']

In [7]:
nv_passive_df = pd.read_csv("NB-IoT - Passive Measurements.csv", index_col=0)

In [8]:
nv_passive_df

Unnamed: 0,Date,Time,UTC,Latitude,Longitude,Altitude,Speed,EARFCN,Frequency,PCI,MNC,CellIdentity,eNodeB.ID,NSINR-Tx0,NSINR-Tx1,NRSRP-Tx0,NRSRP-Tx1,NRSRQ-Tx0,NRSRQ-Tx1,NSSS-Power,scenario,cellLongitude,cellLatitude,cellPosErrorLambda1,cellPosErrorLambda2,n_CellIdentities,distance,Band,campaign
110097,14.01.2021,09:19:25.380,1.613291e+09,41.896705,12.507339,50.30,3.46,6254,801.4025,412,"""Op""[1]",76860486,300236,6.25,6.59,-59.68,-58.53,-11.54,-10.39,-47.15,OW,12.504280,41.890300,2.051987,0.772278,3,756.719746,20,campaign_6_OW_NB-IoT_gaming
210058,14.01.2021,09:19:25.380,1.613291e+09,41.896705,12.507339,50.30,3.46,6254,801.4025,411,"""Op""[1]",76860488,300236,8.98,5.06,-56.84,-61.35,-8.70,-13.21,-48.05,OW,12.504280,41.890300,2.051987,0.772278,3,756.719746,20,campaign_6_OW_NB-IoT_gaming
310032,14.01.2021,09:19:25.958,1.613291e+09,41.896713,12.507331,49.91,3.92,6254,801.4025,412,"""Op""[1]",76860486,300236,5.39,0.65,-58.49,-63.54,-10.60,-15.83,-49.71,OW,12.504280,41.890300,2.051987,0.772278,3,757.337346,20,campaign_6_OW_NB-IoT_gaming
410000,14.01.2021,09:19:25.958,1.613291e+09,41.896713,12.507331,49.91,3.92,6254,801.4025,411,"""Op""[1]",76860488,300236,8.63,11.68,-58.75,-56.06,-10.86,-8.16,-46.85,OW,12.504280,41.890300,2.051987,0.772278,3,757.337346,20,campaign_6_OW_NB-IoT_gaming
72700,14.01.2021,09:19:27.738,1.613291e+09,41.896722,12.507302,53.66,4.03,6254,801.4025,412,"""Op""[1]",76860486,300236,-0.80,0.43,-56.82,-55.42,-13.42,-12.08,-46.20,OW,12.504280,41.890300,2.051987,0.772278,3,757.483987,20,campaign_6_OW_NB-IoT_gaming
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
280332,10.01.2021,18:12:39.269,1.612977e+09,41.899898,12.449773,47.79,0.04,6254,801.4025,470,"""Op""[1]",76820294,300079,,-8.00,,-102.15,,-20.20,-97.34,OD,12.468988,41.829449,17.550001,17.550001,1,8002.496018,20,campaign_40_OD_NB-IoT
280527,10.01.2021,18:12:39.269,1.612977e+09,41.899898,12.449773,47.79,0.04,6254,801.4025,278,"""Op""[1]",76804934,300019,-1.99,-2.17,-94.96,-96.25,-14.36,-15.88,-83.72,OD,12.451815,41.898635,14.510002,14.510002,3,219.986861,20,campaign_40_OD_NB-IoT
280628,10.01.2021,18:12:39.269,1.612977e+09,41.899898,12.449773,47.79,0.04,6254,801.4025,276,"""Op""[1]",76804935,300019,-14.10,-8.91,-107.62,-102.76,-27.19,-22.36,-93.40,OD,12.451815,41.898635,,0.000000,3,219.986861,20,campaign_40_OD_NB-IoT
280726,10.01.2021,18:12:39.269,1.612977e+09,41.899898,12.449773,47.79,0.04,6254,801.4025,344,"""Op""[1]",77283144,301887,-13.94,-0.54,-104.64,-93.08,-24.34,-12.53,-84.51,OD,12.445553,41.901202,,0.000000,1,378.586016,20,campaign_40_OD_NB-IoT


Changing the types and the column names in the second data for further merging with the first one

In [9]:
# Convert NPCI, eNodeB ID from float to int
df['NPCI'] = df['NPCI'].astype(int)
df['eNodeB ID'] = df['eNodeB ID'].astype(int)

# Replace operatorID values
operator_id_mapping = {1.0: '"Op"[1]', 10.0: '"Op"[2]', 88.0: '"Op"[3]'}
df['operatorID'] = df['operatorID'].map(operator_id_mapping)

# Rename columns in df to match with nv_passive_df
df_renamed = df.rename(
    columns={
        'latitude': 'Latitude',
        'longitude': 'Longitude',
        'NPCI': 'PCI',
        'eNodeB ID': 'eNodeB.ID',
        'operatorID': 'MNC'
    }
    )

In [10]:
df_renamed

Unnamed: 0,Latitude,Longitude,PCI,eNodeB.ID,RSSI,NSINR,NRSRP,NRSRQ,ToA,MNC,campaignID
0,41.824214,12.465250,0,316061,-57.780,5.150,-66.190,-8.400,5530.90,"""Op""[3]",1.0
1,41.824214,12.465250,10,300043,-66.265,22.125,-71.030,-4.750,3530.24,"""Op""[1]",1.0
2,41.824214,12.465250,52,372017,-58.600,9.350,-64.980,-6.360,1221.26,"""Op""[3]",1.0
3,41.824214,12.465250,61,316716,-58.920,-9.400,-78.770,-20.020,4335.62,"""Op""[3]",1.0
4,41.824214,12.465250,112,69046,-63.265,-0.840,-75.580,-12.500,2707.04,"""Op""[2]",1.0
...,...,...,...,...,...,...,...,...,...,...,...
40095,41.870661,12.463569,297,318928,-61.100,-18.430,-88.290,-27.130,2359.91,"""Op""[3]",6.0
40096,41.870661,12.463569,334,67589,-70.890,0.445,-83.365,-12.350,1034.02,"""Op""[2]",6.0
40097,41.870661,12.463569,360,69081,-71.840,-13.255,-95.960,-24.055,435.32,"""Op""[2]",6.0
40098,41.870661,12.463569,426,316601,-61.100,11.280,-69.075,-7.965,2359.54,"""Op""[3]",6.0


In [11]:
res = pd.merge(
    nv_passive_df, df_renamed[['Latitude', 'Longitude', 'PCI', 'MNC', 'eNodeB.ID', 'ToA', 'campaignID']],
    on=['Latitude', 'Longitude', 'PCI', 'MNC', 'eNodeB.ID'],
    how='left'
    )

In [12]:
res[~res["ToA"].isna()]

Unnamed: 0,Date,Time,UTC,Latitude,Longitude,Altitude,Speed,EARFCN,Frequency,PCI,MNC,CellIdentity,eNodeB.ID,NSINR-Tx0,NSINR-Tx1,NRSRP-Tx0,NRSRP-Tx1,NRSRQ-Tx0,NRSRQ-Tx1,NSSS-Power,scenario,cellLongitude,cellLatitude,cellPosErrorLambda1,cellPosErrorLambda2,n_CellIdentities,distance,Band,campaign,ToA,campaignID
39407,11.01.2021,17:25:20.450,1.613061e+09,41.824214,12.465250,37.79,0.04,6353,811.2975,112,"""Op""[2]",17675889,69046,-0.20,,-75.01,,-11.83,,-67.08,OD,12.467835,41.830381,0.204817,0.157615,3,719.215819,20,campaign_12_OD_NB-IoT_gaming,2707.04,1.0
39408,11.01.2021,17:25:20.450,1.613061e+09,41.824214,12.465250,37.79,0.04,6353,811.2975,297,"""Op""[2]",17674863,69042,11.19,7.90,-68.98,-71.73,-7.83,-10.57,-59.23,OD,12.464981,41.826373,0.303455,0.188299,2,241.372457,20,campaign_12_OD_NB-IoT_gaming,4461.64,1.0
39409,11.01.2021,17:25:21.168,1.613061e+09,41.824214,12.465250,37.79,0.04,6254,801.4025,10,"""Op""[1]",76811078,300043,,21.74,,-71.84,,-4.78,-63.66,OD,12.463813,41.824584,9.628000,9.628000,1,126.120637,20,campaign_12_OD_NB-IoT_gaming,3530.24,1.0
39410,11.01.2021,17:25:21.168,1.613061e+09,41.824214,12.465250,37.79,0.04,6353,811.2975,112,"""Op""[2]",17675889,69046,-1.48,,-76.15,,-13.17,,-68.02,OD,12.467835,41.830381,0.204817,0.157615,3,719.215819,20,campaign_12_OD_NB-IoT_gaming,2707.04,1.0
39411,11.01.2021,17:25:21.168,1.613061e+09,41.824214,12.465250,37.79,0.04,6353,811.2975,297,"""Op""[2]",17674863,69042,11.14,7.61,-68.48,-71.37,-7.69,-10.58,-58.72,OD,12.464981,41.826373,0.303455,0.188299,2,241.372457,20,campaign_12_OD_NB-IoT_gaming,4461.64,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
262793,06.01.2021,12:11:33.177,1.612610e+09,41.870470,12.463664,96.28,1.76,6353,811.2975,11,"""Op""[2]",17687409,69091,-9.05,-15.10,-88.26,-93.85,-20.66,-26.13,-79.86,OW,12.471007,41.875380,8.730168,8.501727,1,818.064946,20,campaign_32_OW_NB-IoT,9891.67,2.0
262794,06.01.2021,12:11:33.177,1.612610e+09,41.870470,12.463664,96.28,1.76,6353,811.2975,122,"""Op""[2]",17687153,69090,-15.32,-17.04,-92.83,-95.46,-26.69,-29.22,-80.49,OW,12.481681,41.873936,2.472434,0.744794,1,1542.507451,20,campaign_32_OW_NB-IoT,5520.35,2.0
262795,06.01.2021,12:11:33.177,1.612610e+09,41.870470,12.463664,96.28,1.76,6353,811.2975,334,"""Op""[2]",17302896,67589,2.24,-5.93,-75.80,-83.71,-10.44,-17.89,-67.09,OW,12.467131,41.870742,0.282635,0.213173,2,288.985904,20,campaign_32_OW_NB-IoT,4601.28,2.0
262796,06.01.2021,12:11:33.177,1.612610e+09,41.870470,12.463664,96.28,1.76,6353,811.2975,459,"""Op""[2]",17682799,69073,-11.76,-13.13,-88.77,-89.85,-22.58,-23.61,-78.21,OW,12.469537,41.863756,0.975020,0.902929,3,891.988744,20,campaign_32_OW_NB-IoT,3574.14,2.0


#### Finding the map between campaings in two datasets

In [13]:
grouped = (res[~res["ToA"].isna()]).groupby('campaignID')['campaign'].apply(set)

# Create dictionary where key is from 'column1' and value is list of values from 'column2'
my_dict = {key: value for key, value in grouped.items()}

In [14]:
my_dict

{1.0: {'campaign_12_OD_NB-IoT_gaming',
  'campaign_24_OD_NB-IoT',
  'campaign_28_OD_NB-IoT_speedtest',
  'campaign_2_OD_NB-IoT_gaming'},
 2.0: {'campaign_32_OW_NB-IoT'},
 3.0: {'campaign_27_OD_NB-IoT'},
 4.0: {'campaign_23_OD_NB-IoT'},
 5.0: {'campaign_19_OW_NB-IoT'},
 6.0: {'campaign_18_OW_NB-IoT'}}

# Summary

| Feature                       | Wair-D Dataset                 | Rome Dataset                    | Questions for Authors               |
|-------------------------------|--------------------------------|---------------------------------|-------------------------------------|
| User Equipment Location (UE LOC) | Present                      | Present                         |                                       |
| Map Information               | Present                      | Can be restored                 |                                       |
| Time of Arrival (ToA)         | Present                        | Present (Confirmation Needed)   | Confirm ToA specifics for our needs. |
| Angle of Departure (AoD)      | Present                        | Present (Confirmation Needed)   | Clarify presence and details of AoD. |
| Angle of Arrival (AoA)        | Present                        | Not Specified                   |                                       |
| Base Station Location (BS Loc)| Present                        | Estimated                       |                                       |
