*Openstreetmap new tag recommendations*

possible improvements:    
-   use library pyrosm (https://pyrosm.readthedocs.io/en/latest/)    
-   write the results of queries into a file (results.json or smth) instead of just printing      
-   create a function that takes lines from the tsv file and deletes tags for use of evaluation     
-   create a function that takes as input an evaluation set (list of lists of strings) and just runs all the queries together    

Ideas for RQ or topics:


In [2]:
import osmium
import csv
import subprocess
import time
import pickle

The function convert_tsv takes the path to a .osm.pbf file with geodata and will convert it into a tsv file (called filename) usable for the recommenderserver.
It uses osmium to open it, then takes all the points with tags and adds the tags in a new line.

In [11]:
def convert_tsv(path: str, filename: str):
    """
    Converts a .osm.pbf file with geodata from osm to a tsv file usable for RecommenderServer
    """
    with open(filename, "w") as tsvfile:
        tsv_writer = csv.writer(tsvfile, delimiter='\t')
        

        for obj in osmium.FileProcessor(path):
            if len(obj.tags) > 0:
                object = []
                for i in obj.tags:
                    i = str(i).split("=")
                    object.append(i[0])
                tsv_writer.writerow(object)
        
        tsvfile.close()

The function create_tree calls convert_tsv to create a tsv, then runs the recommenderserver build-tree command to build a tree from that.

In [9]:
def create_tree(path_to_source_file: str, tsvfilename: str, path_to_server_dir: str):
    """
    Calls convert_tsv and and then creates a tree with it
    """
    convert_tsv(path_to_source_file, tsvfilename)
    result = subprocess.run(['cmd', '/c', 'cd'], capture_output=True, text=True)
    subprocess.run(['RecommenderServer', 'build-tree', 'from-tsv', result.stdout.strip() + '/' + tsvfilename], cwd= path_to_server_dir)

def create_tree_train_set(training_set:dict, tsvfilename:str, path_to_server_dir: str):
    """
    Converts to tsv and creates a tree
    Takes as input a pois["tags"] column of a pandas geodata object
    Meant for use in final experiments
    """
    
    with open(tsvfilename, "w") as tsvfile:
        tsv_writer = csv.writer(tsvfile, delimiter='\t')
        for obj in training_set:
            listed = list(obj.keys())
            if len(listed) > 0:
                tsv_writer.writerow(listed)
        
        tsvfile.close()
    result = subprocess.run(['cmd', '/c', 'cd'], capture_output=True, text=True)
    subprocess.run(['RecommenderServer', 'build-tree', 'from-tsv', result.stdout.strip() + '/' + tsvfilename], cwd= path_to_server_dir)


The query function will query a recommender tree that was already created from file filename, and in the recommenderserver directory path_to_server_dir. It will query a list of properties, and print the n most probable recommendations

In [12]:
def query(tsvfilename: str, path_to_server_dir: str, property_list: list[str], n:int = 1):
    """
    Opens a recommenderserver and queries it with a property list. 
    n: number of recommendations to print
    """
    open_server = subprocess.Popen(['RecommenderServer', 'serve', tsvfilename + '.schemaTree.typed.pb'], cwd= path_to_server_dir)
    time.sleep(1)
    powershell_command = """
    $body = '{"properties": """ + property_list + ""","types":[]}'
    $response = Invoke-WebRequest -Uri "http://localhost:8080/recommender" -Method POST -Body $body -ContentType "application/json"
    $response.Content
    """
    result = subprocess.run(["powershell", "-Command", powershell_command], capture_output=True, text=True)
        
    output_string = result.stdout
    recommendations_list = output_string.split("{")
    for i in recommendations_list[2:n+2]:
        # A possible improvement is to not print, but store it (perhaps write it in a file)
        print(i)
    
    open_server.terminate()


In [43]:
def multiquery(tsvfilename: str, path_to_server_dir: str, query_list: list[list[str]], n:int = 1) -> list[str]:
    """
    Opens a server and queries it multiple times without closing the server
    Stores the query results in order
    This function in its current form only works for n = 1 (if it only takes the first and most likely response)
    """

    open_server = subprocess.Popen(['RecommenderServer', 'serve', tsvfilename + '.schemaTree.typed.pb'], cwd= path_to_server_dir)
    response_list = []
    for property_list in query_list:
        time.sleep(1)
        powershell_command = """
        $body = '{"properties": """ + property_list + ""","types":[]}'
        $response = Invoke-WebRequest -Uri "http://localhost:8080/recommender" -Method POST -Body $body -ContentType "application/json"
        $response.Content
        """
        result = subprocess.run(["powershell", "-Command", powershell_command], capture_output=True, text=True)
            
        output_string = result.stdout
        recommendations_list = output_string.split("{")
        for i in recommendations_list[2:n+2]:
            print("Querying", i)
            response_list.append(i)
    
    open_server.terminate()
    
    return response_list


Testing the code and running early experiments

In [27]:
# Change for use:

# Path to a geodata file (.osm.pbf format)
path_to_source_file = 'C:/Users/jotan/Downloads/groningen-latest.osm.pbf'    
# What to call your file (and your tree)
tsvfilename = "groningen.tsv"
# Path to the RecommenderServer folder
path_to_server_dir = 'C:/Users/jotan/SchoolStuffs/2024-25/BachelorProject/RecommenderServer'

# For querying: must be a stringed lists of strings
example_q1 = '["name", "traffic sign", "type"]'
example_q2 = '["type"]'
example_q3 = '["capacity", "capacity:disabled", "fee", "surface"]'

multi_q1 = [example_q1, example_q2]

In [15]:
create_tree(path_to_source_file, tsvfilename, path_to_server_dir)

In [44]:
query(tsvfilename, path_to_server_dir, example_q1, 5)
query(tsvfilename, path_to_server_dir, example_q2, 5)

responses = multiquery(tsvfilename, path_to_server_dir, multi_q1, 1)
print(responses)

"property":"operator","probability":0.4519846350832266},
"property":"network","probability":0.4014084507042254},
"property":"wikidata","probability":0.3886043533930858},
"property":"ref","probability":0.3886043533930858},
"property":"route","probability":0.3649167733674776},
"property":"ref","probability":0.43542393874563673},
"property":"network","probability":0.4324963405021957},
"property":"route","probability":0.4317081409751154},
"property":"network:type","probability":0.36223398265961043},
"property":"source","probability":0.2830762301542619},
Querying "property":"operator","probability":0.4519846350832266},
Querying "property":"ref","probability":0.43542393874563673},
['"property":"operator","probability":0.4519846350832266},', '"property":"ref","probability":0.43542393874563673},']


Running the experimetns

In [18]:
# Importing the training set to create the tree
with open('trainingset', 'rb') as fp:
    trainingloaded = pickle.load(fp)
# And the test set for querying and answers for checking
with open('testset_questions', 'rb') as fp:
    questions = pickle.load(fp)
with open('testset_answer', 'rb') as fp:
    answers = pickle.load(fp)

newtsvfilename = "amsterdam.tsv"

In [30]:
create_tree_train_set(trainingloaded["tags"], tsvfilename, path_to_server_dir)

In [33]:
# Send a single query to check if the tree works
query(tsvfilename, path_to_server_dir, example_q3, 5)

"property":"parking","probability":1},
"property":"amenity","probability":1},
"property":"access","probability":0.9705882352941176},
"property":"orientation","probability":0.8235294117647058},
"property":"supervised","probability":0.08823529411764706},


In [51]:
# Creating a multiquery from our test questions set
questions_multi = []
for quer in questions[:10]:
    next_quer = f'{quer}'
    #query(tsvfilename, path_to_server_dir, next_quer)
    questions_multi.append(next_quer)


print(questions_multi[:1])
results = multiquery(tsvfilename, path_to_server_dir, questions_multi, 1)
print(results)

["['description', 'payment:coins', 'payment:contactless', 'payment:credit_cards', 'payment:maestro', 'payment:mastercard', 'source:date', 'wheelchair']"]
[]
