# `stedsans`

This is a notebook showing the current and most prominent capabilities of `stedsans`. 
It is heavily recommended to run the notebook by using Google Colab:
<br>
<br>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MalteHB/stedsans/blob/main/notebooks/stedsans_demo.ipynb)


If running the notebook on your local machine consider installing [Anaconda](https://docs.anaconda.com/anaconda/install/) and then install the package `geopandas` to get the pre-built binaries, by using the `conda` package manager from an Anaconda integraged terminal:

```bash
conda install geopandas
```

## Setup

We will start off by installing `stedsans` using the `pip` package manager.

In [14]:
!pip -q install stedsans

If you are using either Google Colab, Linux or MacOS also feel free to install `geopandas` using `pip`, however, if you are using Windows OS install `geopandas` to by using the `conda` package manager.

In [15]:
# For Google Colab, Linux or MacOS:
#!pip -q install geopandas

# For Windows:
# !conda install geopandas

__Importing packages__

We now import the main module from stedsans as well as two data loading classes.

In [16]:
from stedsans import stedsans
from stedsans.data.load_data import Articles, GeoData

## Language capabilities of `stedsans`

`stedsans`is capable of taking a either a Danish or an English sentence, and extracting the entities by using either [Ælæctra](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-cased-ner-dane) or [BERT](https://huggingface.co/dslim/bert-base-NER), respectively.

The intended use of `stedsans` is to initialize a stedsans object with a sentence.

In [17]:
# Define the sentence
danish_sentence = "Malte er mit navn, og jeg bor på Testvej 13, Aarhus C"

# By default stedsans assumes the language is Danish
default_stedsans = stedsans(sentence = danish_sentence)

After a `stedsans` instance with a sentence has been initialized one can simply call the 'extract_entities()' function and print the entities.

In [18]:
default_entities = default_stedsans.extract_entities()

print(default_entities)

[('Malte', 'PER'), ('Testvej 13', 'LOC'), ('Aarhus C', 'LOC')]


### Multilinguistic stedsans 
#### (duolinguistic for now...)
By default `stedsans` assumes the language is Danish, but we can also be specified using the 'language' argument. `stedsans` is currently only capable of predicting Danish and English sentences, but future enhancements will include increased language variety.

In [19]:
danish_stedsans = stedsans(danish_sentence, language="danish")

danish_entities = danish_stedsans.extract_entities()

print(danish_entities)

[('Malte', 'PER'), ('Testvej 13', 'LOC'), ('Aarhus C', 'LOC')]


In [20]:
english_sentence = "Hello my name is Malte and i live in Aarhus C"

english_stedsans = stedsans(english_sentence, language="english")

english_entities = english_stedsans.extract_entities()

print(english_entities)

[('Malte', 'PER'), ('Aarhus C', 'LOC')]


A `stedsance` instance has been initialized we also use it for predicting other sentences.

In [21]:
new_danish_sentence = "Jakob er min gode samarbejdspartners navn, og han bor også i Aarhus C"

danish_sentence_entities = default_stedsans.extract_entities(new_danish_sentence)

print(danish_sentence_entities)

[('Jakob', 'PER'), ('Aarhus', 'LOC'), ('C', 'LOC')]


In [22]:
new_english_sentence = "Jakob is the name my good cooperator, and he also lives in Aarhus C"

english_sentence_entities = english_stedsans.extract_entities(new_english_sentence)

print(english_sentence_entities)

[('Jakob', 'PER'), ('Aarhus', 'LOC')]


Notice here how the different models have different predictive capabilities. The Danish Ælæctra notices 'C' as part of the location whereas BERT does not.

## Geographic capabilities of `stedsans`

To show the basic geographical functionalities of stedsans we will start of by initializing a stedsans object with an English text string, and printing the found location and organization entities.

In [23]:
txt = "Han bor på Testvej 13 Aarhus C. Jakob bor i Testparken Aarhus C. \
            MCH Arena er et legendarisk sted. Hun bor tæt på Dejbjerglund Efterskole. \
            I Randers laver man shawarma. LEGOLAND er det fedeste sted. Skanderborg Bryghus \
            laver gode øl. AGF er et ringe hold. Han bor på Ingerslevs Boulevard. Fjordgaarden \
            er en lækker restaurant. Knebel ligger på Mols Djursland. Vestebro ligger vest for \
            Østerbro og tæt på Amager. Bruuns Galleri og Dokk1 er steder i Aarhus."

geo_demo = stedsans(sentence = txt, language = 'danish')

geo_demo.print_entities()

[   ('Testvej 13 Aarhus C', 'LOC'),
    ('Testparken Aarhus C', 'LOC'),
    ('MCH Arena', 'ORG'),
    ('Dejbjerglund Efterskole', 'LOC'),
    ('Randers', 'LOC'),
    ('LEGOLAND', 'ORG'),
    ('Skanderborg Bryghus', 'LOC'),
    ('AGF', 'ORG'),
    ('Ingerslevs Boulevard', 'LOC'),
    ('Fjordgaarden', 'LOC'),
    ('Knebel', 'LOC'),
    ('Mols Djursland', 'LOC'),
    ('Vestebro', 'LOC'),
    ('Østerbro', 'LOC'),
    ('Amager', 'LOC'),
    ('Bruuns Galleri', 'LOC'),
    ('Dokk1', 'LOC'),
    ('Aarhus', 'LOC')]


We can then use the 'get_coordinates()' function, to obtain both the a list of coordinates, a `pandas`dataframe and a `geopandas`dataframe.

In [24]:
coords, df, gdf = geo_demo.get_coordinates()

print("List of coordinates:\n", coords)

List of coordinates:
 [[56.11690805, 8.95090626379648], [56.007349149999996, 8.418548397993728], [56.4800019, 10.0891444], [55.735931050000005, 9.126759387129225], [56.05082275, 9.94509834643894], [44.17395835, 0.5933200933739851], [56.1447115, 10.1950871], [55.666663, 9.697003], [56.21409615, 10.486067290407115], [56.22668755, 10.571935253435392], [55.7050841, 12.5826141], [55.62437565, 12.603853037516306], [56.14901465, 10.204766472313928], [56.15341825, 10.21417248765583], [56.1496278, 10.2134046]]


In [25]:
print("First five rows of the pandas dataframe:\n",df)

First five rows of the pandas dataframe:
                                                places   latitude  longitude  \
0   (MCH Arena, Kaj Zartows Vej, Messecenter Herni...  56.116908   8.950906   
1   (Dejbjerglund Efterskole, Uglbjergvej, Vester ...  56.007349   8.418548   
2   (Randers NØ, Randers, Randers Kommune, 8930, D...  56.480002  10.089144   
3   (Legoland, Åstvej, Billund, Billund Kommune, R...  55.735931   9.126759   
4   (Skanderborg Bryghus, Danmarksvej, Højvangen, ...  56.050823   9.945098   
5   (Aérodrome d'Agen-La Garenne, Chemin de Labast...  44.173958   0.593320   
6   (Ingerslevs Boulevard, Frederiksbjerg, Aarhus,...  56.144711  10.195087   
7   (Fjordgaarden, 116, Gauerslund Skovvej, Børkop...  55.666663   9.697003   
8   (Knebel, Syddjurs Kommune, Danmark, (56.214096...  56.214096  10.486067   
9   (Besøgscenter Øvre Strandkær, Strandkærvej, St...  56.226688  10.571935   
10  (Østerbro, København, Københavns Kommune, Regi...  55.705084  12.582614   
11  (Amage

In [26]:
print("First five rows of the geopandas dataframe:\n",gdf)

First five rows of the geopandas dataframe:
                                                places   latitude  longitude  \
0   (MCH Arena, Kaj Zartows Vej, Messecenter Herni...  56.116908   8.950906   
1   (Dejbjerglund Efterskole, Uglbjergvej, Vester ...  56.007349   8.418548   
2   (Randers NØ, Randers, Randers Kommune, 8930, D...  56.480002  10.089144   
3   (Legoland, Åstvej, Billund, Billund Kommune, R...  55.735931   9.126759   
4   (Skanderborg Bryghus, Danmarksvej, Højvangen, ...  56.050823   9.945098   
5   (Aérodrome d'Agen-La Garenne, Chemin de Labast...  44.173958   0.593320   
6   (Ingerslevs Boulevard, Frederiksbjerg, Aarhus,...  56.144711  10.195087   
7   (Fjordgaarden, 116, Gauerslund Skovvej, Børkop...  55.666663   9.697003   
8   (Knebel, Syddjurs Kommune, Danmark, (56.214096...  56.214096  10.486067   
9   (Besøgscenter Øvre Strandkær, Strandkærvej, St...  56.226688  10.571935   
10  (Østerbro, København, Københavns Kommune, Regi...  55.705084  12.582614   
11  (Am

As we see the two dataframes are essentially the same, however, their types are different and the `geopandas` dataframe offers additional geoanalytical features compared to the `pandas`dataframe.

## Basic visualisation: Plotting points onto a map

In [None]:
danmark = GeoData.municipalities()
region_m = danmark[danmark["REGIONNAVN"] == "Region Midtjylland"]

Interactive folium map

In [None]:
geo_demo.plot_locations()

Plotting onto a passed map layer (shp file)

In [None]:
shp_map = geo_demo.plot_locations(layer=danmark)

In [None]:
geo_demo.plot_locations(layer=danmark)

# Perform basic statistcal point pattern tests

These Q-statistics functions enable a quick statistical analysis of distribution of the points by checking for complete spatial randomness. 

In [None]:
# Initialsing a stedsans objects
example = stedsans(sentence=txt)

# Getting quadrat statistics
example.print_entities()


In [None]:
example.get_quad_stats()

In [None]:
# Plotting points with quadrants
example.plot_quad_count(squares = 4)

# Plotting region heatnmaps on a given map layer

This tool gives a beuatiful visual representation of the distribution of the extracted locations. The level of partitioning can be set using the *group_by* parameter.

By default `plot_cloropleth()` plots the world.

In [None]:
 geo_demo.plot_choropleth()

One can also use the argument `layer` to specify a geopandas dataframe to plot on.

In [None]:
danmark_cloropleth = geo_demo.plot_choropleth(layer=danmark)
danmark_cloropleth

In [None]:
denmark_heatmap_by_region = geo_demo.plot_choropleth(layer=danmark, group_by='REGIONNAVN')
denmark_heatmap_by_region

In [None]:
 region_m_heatmap = geo_demo.plot_choropleth(layer=region_m, title = 'Region Midtjylland', group_by = 'DAGI_ID')
 region_m_heatmap

# Aarhus article example for exam paper

In [None]:
aarhus_article = Articles.aarhus()

In [None]:
geo_demo = stedsans(file = aarhus_article, language = 'danish')

In [None]:
coords, df, gdf = geo_demo.get_coordinates()

In [None]:
df

In [None]:
geo_demo.plot_heatmap()

In [None]:
geo_demo.plot_heatmap(limit = 'country', limit_area = 'Danmark')

In [None]:
geo_demo.plot_heatmap(bounding_box=((55.859900,7.630005),(56.613931,10.958862)), bounded=True)

In [None]:
geo_demo.plot_heatmap(bounding_box=((55.9,7.6),(56.6,10.9)), bounded=False)

####  Choropleth

In [None]:
 geo_demo.plot_choropleth()  # This might take a while :)

In [None]:
danmark_cloropleth = geo_demo.plot_choropleth(layer=danmark)
danmark_cloropleth

In [None]:
denmark_heatmap_by_region = geo_demo.plot_choropleth(layer=danmark, group_by='REGIONNAVN')
denmark_heatmap_by_region

In [None]:
 region_m_heatmap = geo_demo.plot_choropleth(layer=region_m, title = 'Region Midtjylland', group_by = 'DAGI_ID')
 region_m_heatmap

# Den Store Danske - Jylland

## Reading in the article

In [None]:
jylland_article = Articles.jylland()

## Initialising stedsans object

In [None]:
geo_demo = stedsans(file = jylland_article, language = 'danish')

## Plotting locations

### Plotting on interactive leaflet map

In [None]:
geo_demo.plot_locations()

## Plotting on shapefile layer

In [None]:
geo_demo.plot_locations(layer=danmark, on_map=True)

In [None]:
geo_demo.plot_locations(layer=region_m, on_map=True)

## Plotting heatmaps

### No restrictions

In [None]:
geo_demo.plot_heatmap()

### Plotting only locations in denmark

In [None]:
geo_demo.plot_heatmap(limit = 'country', limit_area = 'Danmark')

### Boudning search area to Region Midtjylland

In [None]:
geo_demo.plot_heatmap(bounding_box=((55.9,7.6),(56.6,10.9)), bounded=True)

## Choropleth maps

### Plotting choropleth map of points bounded to Region Midtjylland on map of Denmark

In [None]:
geo_demo.plot_choropleth(layer=danmark, title='Jylland - Den Store Danske \n Bounded to Region Midtjylland', group_by='DAGI_ID', bounding_box=((55.9,7.6),(56.6, 10.9)), bounded=True)

### Plotting choropleth map grouped by region

In [None]:
geo_demo.plot_choropleth(layer=danmark, title='Jylland - Den Store Danske \n Grouped by Region', group_by='REGIONNAVN', bounding_box=((54.6,7.8),(57.8, 15.2)), bounded=False)

### Plotting choropleth map grouped by municipalites

In [None]:
geo_demo.plot_choropleth(layer=danmark, title='Jylland - Den Store Danske \n Unbounded', group_by='DAGI_ID')

## Quadrat Statistics

In [None]:
geo_demo.get_quad_stats(limit = 'country', limit_area = 'Danmark')

In [None]:
geo_demo.plot_quad_count(limit = 'country', limit_area = 'Danmark')