
 #  <span style="color: maroon; font-size:50px"> <b> The Country Dictionary From Scratch</b> </span>




#### Dictionaries are made using WIKIDATA SPARQL  
The following code is about building the country dictionary first using SPARQL and then using amidict to validate 

In [1]:
#!pip install SPARQLWrapper 
from SPARQLWrapper import SPARQLWrapper, JSON,  XML
import pandas as pd

In [2]:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql", agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11")


In [3]:
sparql.setQuery("""
## This query contains phrases common to all queries and others which are specific to particular dictionaries.(SPECIFIC) 
## Select Query was used to retrieve specific results (Country name, wiki data number, Synonyms) 

## ?code as ?_iso3166 is optional and is specific to countries 

SELECT ?wikidata ?wikidataLabel ?wikipedia (?wikidataAltLabel as ?alt) ?synonym (?wikidataLabel as ?term) ?wikidataDescription
## (SPECIFIC) links to ?wikidata below
    (?code as ?_iso3166) ?coords { 

## Forcing particular query execution order
  hint:Query hint:optimizer "None" . 

## all ISO countries (SPECIFIC) must link to ?code above. 
  ?wikidata wdt:P297 ?code.
  ?wikidata wdt:P625 ?coords.

## Optional details about the terms like links to wikipaedia pages for each wikipedia page to be presented in a seperate column
  OPTIONAL { ?wikipedia schema:about ?wikidata; schema:isPartOf <https://en.wikipedia.org/> }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".

## Selecting the prefered label 
    ?wikidata skos:altLabel ?wikidataAltLabel ; rdfs:label ?wikidataLabel; schema:description  ?wikidataDescription          
  } 

## (SPECIFIC) Making sure the RGI alphabets of the flags are not rendered as flags and they appear as simple alphabets by specifying the acceptable characters. 
  BIND (REPLACE(REPLACE(?wikidataAltLabel, "(, )?[🇦-🇿]{2}", ""), "^, ", "") AS ?synonym )
      }
""")

sparql.setReturnFormat(XML)
results = sparql.query().convert()

sparql.setReturnFormat(JSON)
results1 = sparql.query().convert()


In [4]:
print("THE SPARL QUERY RESULT LOOKS LIKE THIS: ")
print("*******************************************")
results_df = pd.io.json.json_normalize(results1['results']['bindings'])
results_df[['wikidata.value', 'wikidataLabel.value', '_iso3166.value', 'wikipedia.value', 'alt.value', 'synonym.value', 'coords.value' , 'term.value']].head()

THE SPARL QUERY RESULT LOOKS LIKE THIS: 
*******************************************


Unnamed: 0,wikidata.value,wikidataLabel.value,_iso3166.value,wikipedia.value,alt.value,synonym.value,coords.value,term.value
0,http://www.wikidata.org/entity/Q16,Canada,CA,https://en.wikipedia.org/wiki/Canada,"CA, ca, CDN, can, CAN, British North America, ...","CA, ca, CDN, can, CAN, British North America, ...",Point(-109.0 56.0),Canada
1,http://www.wikidata.org/entity/Q258,South Africa,ZA,https://en.wikipedia.org/wiki/South_Africa,"SA, za, Republic of South Africa, RSA, 🇿🇦, zaf","SA, za, Republic of South Africa, RSA, zaf",Point(24.0 -29.0),South Africa
2,http://www.wikidata.org/entity/Q55,Netherlands,NL,https://en.wikipedia.org/wiki/Netherlands,"NL, Holland, Nederland, NED, nl, the Netherlan...","NL, Holland, Nederland, NED, nl, the Netherlan...",Point(5.55 52.316666666),Netherlands
3,http://www.wikidata.org/entity/Q20,Norway,NO,https://en.wikipedia.org/wiki/Norway,"NO, Norge, no, NOR, Kingdom of Norway, 🇳🇴, Nor...","NO, Norge, no, NOR, Kingdom of Norway, Noreg, ...",Point(11.0 65.0),Norway
4,http://www.wikidata.org/entity/Q265,Uzbekistan,UZ,https://en.wikipedia.org/wiki/Uzbekistan,"Republic of Uzbekistan, uz, 🇺🇿, UZB","Republic of Uzbekistan, uz, UZB",Point(66.0 41.0),Uzbekistan


Save the sparql endpoint in country.xml

In [None]:
with open("dictionary\country.sparql.xml", "w", encoding="utf-8") as f:
    f.write(results.toxml())

### Installing ami
Requirements: JDK, Maven, git

In [6]:
!git clone https://github.com/petermr/ami3.git
!cd ami3
!mvn install -Dmaven.test.skip=true

fatal: destination path 'ami3' already exists and is not an empty directory.


[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.059 s
[INFO] Finished at: 2020-12-03T08:01:06+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] The goal you specified requires a project to execute but there is no POM in this directory (C:\Users\eless\Documents\viral_epidemic_country). Please verify you invoked Maven from the correct directory. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MissingProjectException


##### The Results should look like this 

#### AMIDICT
amidict converts the SPARQL output into the required dictionary format and also to validate it

In [7]:
!amidict --help

Usage: amidict [OPTIONS] COMMAND

`amidict` is a command suite for managing dictionary:

Parameters:
      [@<filename>...]   One or more argument files containing options.
Options:
  -d, --dictionary=<dictionaryNameList>[,<dictionaryNameList>...]...
                         input or output dictionary NAMES/s. for 'create' must be singular; when 'display' or
                           'translate', any number. Names should be lowercase, unique. [a-z][a-z0-9._]. Dots can be
                           used to structure dictionaries intodirectories. Dictionary names are relative to
                           'directory'. If <directory> is absent then dictionary names are absolute. ) This doesn't
                           make sense; it should relate to current working directory.
      --directory=<directory>
                         top directory containing dictionary/s. Subdirectories will use structured names (NYI). Thus
                           dictionary 'animals' is found in '<dire

#### Convert SPARQL Endpoint into the Country Dictionary

In [8]:
!amidict -vv --dictionary country --directory dictionary  --input dictionary/country.sparql.xml create --informat wikisparqlxml --sparqlmap name=wikidataLabel,term=wikidataLabel,description=wikidataDescription,wikidataURL=wikidata,wikidataID=wikidata,wikipediaPage=wikipedia,wikipediaURL=wikipedia,_p297_country=_iso3166,_coords=coords --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=synonym


Generic values (DictionaryCreationTool)
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@736caf7a
--datacols          : d      null
--hrefcols          : d      null
--informat          : m wikisparqlxml
--linkcol           : d      null
--namecol           : d      null
--outformats        : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$DictionaryFileFormat;@63798ca7
--query             : d      null
--sparqlmap         : m {name=wikidataLabel, term=wikidataLabel, description=wikidataDescription, wikidataURL=wikidata, wikidataID=wikidata, wikipediaPage=wikipedia, wikipediaURL=wikipedia, _p297_country=_iso3166, _coords=coords}
--sparqlquery       : d      null
--synonyms          : m [synonym]
--template          : d      null
--termcol           : d      null
--termfile          : d      null
--terms             : d      null
--transformName     : m {wikidataID=EXTRACT(wikidataURL,.*/(.*))}
--wptype            : d

Version: amidict 2020.08.09_09.54-NEXT-SNAPSHOT
(jar:file:/C:/Users/eless/ami3/target/appassembler/repo/ami3-2020.08.09_09.54-NEXT-SNAPSHOT.jar)
JVM: 14.0.1 (Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 14.0.1+7)
OS: Windows 10 10.0 amd64

dictionaryName: country
sparqlVariables [wikidata, wikidataLabel, wikipedia, alt, synonym, term, wikidataDescription, _iso3166, coords]
sparqlNameByAmiName: {_p297_country=_iso3166, wikidataID=wikidata, name=wikidataLabel, description=wikidataDescription, term=wikidataLabel, wikidataURL=wikidata, wikipediaPage=wikipedia, _coords=coords, wikipediaURL=wikipedia}
amiNames [_p297_country, wikidataID, name, description, term, wikidataURL, wikipediaPage, _coords, wikipediaURL]
WS>[_p297_country, wikidataID, name, description, term, wikidataURL, wikipediaPage, _coords, wikipediaURL]
Personal ami name: _coords
Personal ami name: _p297_country
[_coords, _p297_country, description, name, term, wikidataID, wikidataURL, wikipediaPage, wikipediaURL]
searc

##### The Results should look like this 

#### The dictionary is then validated using amidict

In [9]:
!amidict --dictionary country --directory dictionary display --validate


Generic values (DictionaryDisplayTool)
-v to see generic values

Specific values (DictionaryDisplayTool)
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@13d9b21f
--fields            : d        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [country]
--directory         : d dictionary


    Netherlands
    South Africa
    Canada
    ....


##### The results should look like this: