# Documentation Location.py

Analyzing the output RDF file revealed that some prefixes were not used.
These prefixes were: _geonames_ and _wgs84_pos_.
Having ascertained the lack of a script dedicated to the creation of information linked to geographical places, it was decided to proceed with the creation of the same to finalize the RDF file.

The desired output for RDF enrichment is as follows:

```python
<https://w3id.org/moro/enoam/data/alessandria> a dcterms:Location ;
    rdfs:label "Alessandria"^^xsd:string ;
    geonames:featureClass geonames:P ;
    owl:sameAs <http://www.wikidata.org/entity/Q6088> ;
    wgs84_pos:lat "44.90924" ;
    wgs84_pos:long "8.61007" . 
```

### Let's start with the libraries

In [None]:
import urllib.parse
import requests
import os
import json
import re

We import some libraries:

_urllib.parse_: This module provides functions to manipulate URLs and their components.

_requests_: This library is used for making HTTP requests in Python.

_os_: This is a standard library module in Python used for interacting with the operating system. It provides functions for interacting with the filesystem, managing environment variables, and executing system commands.

_json_: This is used for working with JSON (JavaScript Object Notation) data.

_re_: This is used for working with regular expressions. 

### Normalize_location

In [None]:
def normalize_location(location):
    if not location:  # Check for empty input
        return None
    
    normalized_location = location.lower().strip()
    
    # Skip inputs with less than 4 letters
    if len(normalized_location) < 4:
        return None
    else:
        if ',' in normalized_location: # Manage cases like "Roma, Italia"
            normalized_location = normalized_location.split(',')[0]

        # Remove parentheses and text within parentheses like in "Roma (Italia)"
        normalized_location = re.sub(r'\([^)]*\)', '', normalized_location)

        normalized_location = normalized_location.strip()

        return normalized_location

The function follows these steps:

   1. It checks if the input location string is empty. If empty, returns None. 
    
   2. If there is an input,  it converts the input location string to lowercase and removes leading and trailing whitespace. 

   3. Skips normalization for input strings with less than 4 characters. If the input has less than 4 characters, it's considered too short and returns None. 

   4. If the location contains a comma (,), it removes everything after the comma, considering it as a subregion. For example, in "Roma, Italia", it keeps only "Roma". 

   5. Removes text enclosed in parentheses, including the parentheses themselves. This is useful for cases like "Roma (Italia)", where it removes "(Italia)" and keeps only "Roma". 

   6. Strips leading and trailing whitespace again after all modifications. 

   7. Returns the normalized location string. 


### Part 1

In [None]:
def get_coordinates(location):
    json_file_path = 'geonames_data.json' # Create an empty json file for storing data

    print(location) # Check the running of the code while running "main.py"

    if os.path.exists(json_file_path): # If the file already exist open it
        with open(json_file_path, 'r') as json_file:
            json_data = json.load(json_file)
    else:
        json_data = {}

    with open('errori.json', 'r') as error_file: # Open the file where the errors will be stored
        errori_location = json.load(error_file)

    clean_location = normalize_location(location) # Call normalize_location
    
    if not clean_location:  # Check for invalid or empty location
        return None

1. Checks if a JSON file containing cached data exists. If it does, loads the data.
2. If the JSON file doesn't exist, initializes an empty dictionary for storing data.
3. Opens a separate JSON file for storing error data.
4. Calls the _normalize_location_ function to clean and normalize the input location string.
5. Checks for invalid or empty location. If found, returns None.
    

### Part 2

In [None]:
if clean_location in json_data: # Check if the location is already stored in the json file "geonames_data" 
        existing_data = json_data[clean_location]
        return existing_data
    elif clean_location in errori_location: # Check if the location is already stored in "errori"
        return False
    else:
        # Set your GeoNames username
        username = 'aldomorodigitale'

        # Define parameters for the call
        parametri = {'q': clean_location, 'maxRows': 10, 'username': username}

        # Encode parameters
        location_encode = urllib.parse.urlencode(parametri)

        # Construct GeoNames API URL
        url = 'http://api.geonames.org/searchJSON?' + location_encode

6. Checks if the normalized location is already stored in the cached data. If found, returns the stored coordinates.
7. Checks if the normalized location is already marked as an error. If found, returns False.
8. If the location is not cached or marked as an error, proceeds to make an API call to _GeoNames_.
9. Sets the _GeoNames_ username and defines parameters for the API call.
10. Encodes the parameters for the URL.
11. Constructs the GeoNames API URL.


### Part 3

In [None]:
try:
            # Send GET request
            resp = requests.get(url)
            resp.raise_for_status()  # Raise HTTPError for non-200 status codes
            
            # Parse JSON response
            data = resp.json()

            for item in data["geonames"]:
                if item['fcl'] in ('L', 'A', 'P', 'H'):  # Check the feature class
                    if item['fcl'] == 'H':
                        item['fcl'] = 'A'
                    result = item
                    break

            if result and clean_location != "null": # Create variables of the results 
                latitude = str(result['lat'])
                longitude = str(result['lng'])
                address_type = str(result['fcl'])
                countryCode = str(result.get('countryCode'))
                toponymName = str(result['toponymName'])

                json_data[clean_location] = [longitude, latitude, address_type, countryCode, toponymName]

                with open(json_file_path, 'w') as json_file: # Store results in "geonames_data"
                    json.dump(json_data, json_file, indent=4)

                return json_data[clean_location]

12. Sends a GET request to the GeoNames API.
13. Parses the JSON response.
14. Iterates through the response data to find the most relevant result based on feature class. The selected feature class are these utilized for the correct functioning of the interactive map in the website. Every class correspond to a specific zoom measure and point color.
15. If a relevant result is found and the location is not 'null', extracts relevant information and store them as strings.
16. Stores the retrieved data in the cached data JSON file.
17. Returns the retrieved coordinates.

### Part 4

In [None]:
 else: 
                with open('errori.json', 'w') as error_file:
                    errori_location[clean_location] = ["1"]
                    json.dump(errori_location, error_file)
                print("------------------- " + location)
                return False
            
        except requests.exceptions.RequestException as e:
            print("Error occurred during API request:", e)
            return None
        except KeyError:
            print("Invalid or unexpected API response format.")
            return None

18. If no relevant result is found or the location is 'null', marks the location as an error and returns False. Also print the name of the error location to check it during the script processing.
19. Handles exceptions for HTTP request errors and unexpected API response formats, printing error messages.

For the final creation of the triples with all this new information in the RDF file, we are going to insert some portions of code into the _generate_ and _align_ scripts. 

_Generate_ processes the metadata deriving from _kwickwockwac_ and transforms them into triples. Some locations are found here, those related to the place where the document was created.

#### Generate.py

In [None]:
import rdflib
from datetime import datetime
from rdflib import URIRef, Namespace, Literal
from rdflib.namespace import RDF, RDFS, DCTERMS, FOAF, XSD, SKOS, OWL
from lib.location3 import get_coordinates, normalize_location
import os

# ...

# ADD SPATIAL COVERAGE
def add_spatial(g, value, expression, dataset_ns):

    geonames = Namespace('http://www.geonames.org/ontology#')
    wgs84_pos = Namespace('http://www.w3.org/2003/01/geo/wgs84_pos#')
    g.bind('geonames', geonames)
    g.bind('wgs84_pos', wgs84_pos)

    place_text = value.lower()
    place = URIRef(f'{dataset_ns}{value.replace(" ", "-").lower()}')
    g.add((place, RDF.type, DCTERMS.Location))
    g.add((place, RDFS.label, Literal(value, datatype=XSD.string)))
    g.add((expression, DCTERMS.spatial, place))

    if place:

        normalize = normalize_location(place_text)
        coordinates = get_coordinates(normalize)

        if coordinates:
            

            long = coordinates[0]
            lat = coordinates[1]
            key = coordinates[2]
        
            g.add((place, wgs84_pos.lat, Literal(lat)))
            g.add((place, wgs84_pos.long, Literal(long)))
            g.add((place, geonames.featureClass, geonames[key]))

# ...

### Align.py

_Align_ processes the data present in each html and aligns them with the related metadata. 
In this script all the locations mentioned in the documents and which have been marked by the researchers as relevant are selected. In this way the final map shows all the locations mentioned by Aldo Moro, the number of recursions and links to the related documents.

In [None]:
from bs4 import BeautifulSoup
import rdflib
from rdflib import URIRef, Namespace, Literal
from rdflib.namespace import RDF, RDFS, DCTERMS, FOAF, XSD, SKOS, OWL
from lib.location3 import get_coordinates, normalize_location

#ADD GEONAMES
            
    
    # Trovare tutti gli elementi <span> con la classe "mention place"
    mention_places = soup.find_all('span', class_='mention place')

    # print(mention_places)

    # Verifica se la lista mention_places non è vuota
    if mention_places and mention_places != None:
        # Iterare sugli elementi trovati
        for mention_place in mention_places:
            # Ottenere il testo all'interno dell'elemento
            place_text = mention_place.get_text()
            # print(place_text)
    
        
            coordinates = get_coordinates(place_text)

            # print(coordinates)

            if coordinates:
                # Ottenere l'attributo "resource" se presente
                resource_attribute_uri = mention_place.get('resource')

                # Ensure that resource_attribute is a URIRef
                resource_attribute = URIRef(resource_attribute_uri)

                long = coordinates[0]
                lat = coordinates[1]
                key = coordinates[2]
            
                g.add((resource_attribute, wgs84_pos.lat, Literal(lat)))
                g.add((resource_attribute, wgs84_pos.long, Literal(long)))
                g.add((resource_attribute, geonames.featureClass, geonames[key]))