# Geographics data in Raku demo
### ***JavaScript::D3***

Anton Antonov  
[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com)   
RakuForPrediction-book at GitHub   
June 2024  

-----

## Introduction

This notebook showcase the data and functionalities of Raku packages for:

- Geographic data and named entity recognition
- Geometric computations
- Data wrangling and summarization
- Visualization

Since the exposition is in a chat-enabled notebook (***chatbook***) the exposition showcases data presentation and formatting.

This notebook can be alternatively seen as showcasing Raku for Geo-data exploratory analysis. (Similar to [AA1, AAv1].)

------

## Setup

Here are loaded the packages used in the rest of notebook:

In [1]:
use Data::Reshapers;
use Data::Summarizers;
use Data::TypeSystem;
use Data::Translators;
use Data::Geographics;
use Math::Nearest;

use DSL::Entity::Geographics;
use DSL::English::DataQueryWorkflows;

use JavaScript::D3;
use WWW::MermaidInk;

Here we prepare the notebook to visualize with JavaScript:

In [2]:
#% javascript
require.config({
     paths: {
     d3: 'https://d3js.org/d3.v7.min'
}});

require(['d3'], function(d3) {
     console.log(d3);
});

Verification:

In [3]:
#% js
js-d3-list-line-plot(10.rand xx 40, background => 'none', stroke-width => 2)

Here we set a collection of visualization variables:

In [36]:
my $title-color = 'Ivory';
my $stroke-color = 'SlateGray';
my $tooltip-color = 'LightBlue';
my $tooltip-background-color = 'none';
my $background = '1F1F1F';
my $color-scheme = 'schemeTableau10';
my $mmd-theme = q:to/END/;
%%{
  init: {
    'theme': 'forest',
    'themeVariables': {
      'lineColor': 'Ivory'
    }
  }
}%%
END
my %force = charge => {strength => -30, iterations => 4}, collision => {radius => 50, iterations => 4}, link => {distance => 30};

{charge => {iterations => 4, strength => -30}, collision => {iterations => 4, radius => 50}, link => {distance => 30}}

-------

## Country data

Here is a list of the countries the package "Data::Geographics" has data for:

In [5]:
#% html
country-data().keys.sort ==> to-html(:multicolumn, columns => 3)

0,1,2
Botswana,Hungary,Serbia
Brazil,Iran,Slovakia
Bulgaria,Iraq,SouthAfrica
Canada,Japan,SouthKorea
China,Mexico,Spain
CzechRepublic,NorthKorea,Sweden
Denmark,Poland,Turkey
Finland,Romania,Ukraine
France,Russia,UnitedStates
Germany,SaudiArabia,(Any)


In [6]:
#% html
country-data.head.value.keys.sort.List
==> to-html(:multicolumn, columns => 5, align => 'left')

0,1,2,3,4,5
AMRadioStations,EconomicAid,GDPSectorFractions,MaleAdultPopulation,OilProduction,WaterArea
AdultPopulation,ElderlyPopulation,GiniIndex,MaleChildPopulation,OilReserves,WaterwayLength
AgriculturalProducts,ElectricalGridFrequency,GovernmentConsumption,MaleElderlyPopulation,PavedAirportLengths,(Any)
AgriculturalValueAdded,ElectricalGridPlugs,GovernmentDebt,MaleInfantMortalityFraction,PavedAirports,(Any)
Airports,ElectricalGridSockets,GovernmentExpenditures,MaleLifeExpectancy,PavedRoadLength,(Any)
AlternateNames,ElectricalGridVoltages,GovernmentReceipts,MaleLiteracyFraction,PhoneLines,(Any)
AlternateStandardNames,ElectricityConsumption,GovernmentSurplus,MaleMedianAge,Pipelines,(Any)
AnnualBirths,ElectricityExports,GrossInvestment,MalePopulation,Population,(Any)
AnnualDeaths,ElectricityImports,HIVAIDSDeathRateFraction,ManufacturingValueAdded,PopulationGrowth,(Any)
AnnualHIVAIDSDeaths,ElectricityProduction,HIVAIDSFraction,MaritimeClaims,PovertyFraction,(Any)


In [7]:
#% html
my @field-names = <Name FullNativeName ISOName Area Population GDP ElectricityProduction>;
my @dsCountries = country-data().map({ @field-names Z=> $_.value{|@field-names} })>>.Hash;
@dsCountries = @dsCountries.map({ $_.deepmap({ $_ ~~ Numeric:D ?? $_.round !! $_ }) });
@dsCountries.sort(*<Name>) ==> to-html(:@field-names, align => Whatever)

Name,FullNativeName,ISOName,Area,Population,GDP,ElectricityProduction
Botswana,Republic of Botswana,BOTSWANA,581730,2700000,15781732826,2143597000
Brazil,República Federativa do Brasil,BRAZIL,8514877,216400000,1444733258972,614724000000
Bulgaria,Republika Bŭlgariya,BULGARIA,110879,6700000,69105101090,47550809000
Canada,Canada,CANADA,9984670,38800000,1644037286481,643035676000
China,Zhonghua Renmin Gongheguo,CHINA,9596960,1425700000,14722730697890,7190458000000
Czech Republic,Česká republika,CZECH REPUBLIC,78867,10500000,245349489988,84907272000
Denmark,Kongeriget Danmark,DENMARK,43094,5900000,356084867686,33042851000
Finland,Suomen Tasavalta,FINLAND,338145,5500000,269751312854,71711000000
France,République française,FRANCE,551500,64800000,2630317731455,529100000000
Germany,Bundesrepublik Deutschland,GERMANY,357022,83300000,3846413928654,596195000000


In [8]:
#% js
js-d3-list-plot(
    @dsCountries.map({ %( tooltip => $_<Name>, |(<x y> Z=> $_<GDP ElectricityProduction>) ) }), 
    width => 500,
    height => 500,
    title => "GDP vs Electricity production", 
    x-label => 'lg(GDP)', y-label => 'lg(Electricity production)',
    x-axis-scale => 'log', y-axis-scale => 'log',
    point-size => 8,
    stroke-color => 'Orange',
    :$title-color, 
    :$tooltip-color, 
    :$tooltip-background-color,
    margins => { left => 50},
    :$background, 
    :grid-lines
)

In [9]:
#% js
js-d3-list-line-plot(
    country-data(){'UnitedStates'}<Coordinates>.head.map({ $_.reverse.Array }), 
    title => 'USA countour',
    :$title-color,
    stroke-color => 'Orange',
    stroke-width => 2,
    width => 1000, 
    height => 500,
    :$background, 
    :grid-lines
)

----- 

## City data

In [10]:
my @field-names = <ID Country State City Population Latitude Longitude Elevation LocationLink>;
my @dsCityData = |city-data().grep({ $_<Country> eq 'United States' });

@dsCityData.&dimensions

(32796 9)

In [11]:
#% html
@dsCityData.pick(12) 
==> to-html(:@field-names) 
==> { $_.subst(:g, / <?after '<td>'> ('http' .*?) <before '</td>'> /, { "<a href=\"$0\">link</a>" }) }()

ID,Country,State,City,Population,Latitude,Longitude,Elevation,LocationLink
United_States.Kansas.Park,United States,Kansas,Park,112,39.1118036,-100.3614503,838.0,link
United_States.Missouri.Elsberry,United States,Missouri,Elsberry,1937,39.1683356,-90.7879089,138.0,link
United_States.New_York.Black_Brook,United States,New York,Black Brook,1453,44.5244263,-73.8175127,385.0,link
United_States.Kansas.Bazine,United States,Kansas,Bazine,282,38.4460629,-99.692748,649.0,link
United_States.California.Twentynine_Palms,United States,California,Twentynine Palms,28065,34.1483128,-116.0655262,607.0,link
United_States.California.Fulton,United States,California,Fulton,551,38.4937089,-122.773417,,link
United_States.Wisconsin.Clearfield,United States,Wisconsin,Clearfield,702,43.9376598,-90.1320822,277.0,link
United_States.Ohio.Lafayette,United States,Ohio,Lafayette,206,39.9407874,-83.405527,,link
United_States.Mississippi.Sledge,United States,Mississippi,Sledge,368,34.4326904,-90.2212994,51.0,link
United_States.Wisconsin.Nekoosa,United States,Wisconsin,Nekoosa,2449,44.3131774,-89.9078906,290.0,link


-----

## State distributions for USA

In [12]:
'use @dsCityData; 
 group by "State"; 
 summarize "Population"'
==> ToDataQueryWorkflowCode(target => 'Raku::Reshapers', format => 'code')
==> cbcopy

$obj = @dsCityData ;
$obj = group-by($obj, "State") ;
$obj = $obj.map({ $_.key => summarize-at($_.value, ("Population"), (&elems, &min, &max)) })

In [13]:
my $obj = @dsCityData ;
$obj = group-by($obj, "State") ;
$obj = $obj.map({ $_.key => summarize-at($_.value, ("Population"), (&elems, &sum)) }).Array;

.say for $obj.head(3)

Louisiana => {Population.elems => 472, Population.sum => 3020074}
California => {Population.elems => 1539, Population.sum => 37701022}
Minnesota => {Population.elems => 971, Population.sum => 4804255}


In [14]:
my @dsStateData = $obj.map({ <State NumberOfCities Population> Z=> [$_.key, |$_.value<Population.elems Population.sum>] })>>.Hash;

deduce-type(@dsStateData)

Vector(Struct([NumberOfCities, Population, State], [Int, Int, Str]), 51)

In [15]:
#% html
@dsStateData = @dsStateData.sort({ - $_<Population> });

@dsStateData.head(12) ==> to-html()

NumberOfCities,Population,State
1539,37701022,California
1756,22456375,Texas
1854,20821585,New York
1000,16813996,Florida
1370,11328893,Illinois
641,8531352,New Jersey
1222,8442287,Ohio
1805,7897347,Pennsylvania
448,6791993,Massachusetts
458,6668804,Arizona


In [16]:
#% js
<linear log>.map({
js-d3-list-plot(@dsStateData.map(*<NumberOfCities Population>), 
    x-label => 'Number of cities', 
    y-label => 'population', 
    x-axis-scale => 'linear', y-axis-scale => $_,
    margins => {left => 100},
    title => 'USA state cities and populations',
    :$background,
    :$title-color,
    :grid-lines,
    margins => {left => 80, right => 60}
) }).join("\n")

In [17]:
#% js
my @data = @dsStateData.clone.sort(*<NumberOfCities>).reverse.map({ 
    $_<variable> = $_<State>; 
    $_<value> = $_<NumberOfCities>; 
    $_<label> = $_<NumberOfCities>; 
    $_ 
});

@data = select-columns(@data, <value variable label>);
js-d3-bar-chart(@data,
        :horizontal,
        color => 'DarkSlateGray',
        height => 800,
        width => 800,
        plot-labels-font-size => 12,
        margins => %(left => 120, bottom => 40),
        title => 'Number of cities per state',
        :$background,
        :$title-color,    
        grid-lines => (12, Whatever),
    )

In [18]:
@dsStateData = @dsStateData.sort(*<NumberOfCities>).reverse;
my @paretoCities = @dsStateData.map({ "{$_<State>} : {$_<NumberOfCities>}" }) Z=> pareto-principle-statistic(@dsStateData.map(*<NumberOfCities>));
my $k = 0;
@paretoCities = @paretoCities.map({ %( tooltip => $_.key, x => $k++, y => $_.value, group => 'Cities') });

@dsStateData = @dsStateData.sort(*<Population>).reverse;
my @paretoPopulations = @dsStateData.map({ "{$_<State>} : {$_<Population>}" }) Z=> pareto-principle-statistic(@dsStateData.map(*<Population>));
my $k = 0;
@paretoPopulations = @paretoPopulations.map({ %( tooltip => $_.key, x => $k++, y => $_.value, group => 'Populations') });

deduce-type(@paretoPopulations)

Vector(Struct([group, tooltip, x, y], [Str, Str, Int, Rat]), 51)

In [19]:
#% js
js-d3-list-plot(
    [|@paretoCities, |@paretoPopulations], 
    title => 'Pareto principle', 
    :$title-color, 
    :grid-lines,
    :$background,
    :$color-scheme
)

-----

## Cities Geo-locations

In [20]:
my @data = city-data().grep({ $_<Country> eq 'United States'});
@data = @data.grep({ -130 ≤ $_<Longitude> ≤ -60});
@data.&dimensions

(32282 9)

In [21]:
my @plot-data = @data.map({ %(y => $_<Latitude>, x => $_<Longitude>, tooltip => "$_<State> : $_<City>", group => 'data', ID => $_<ID>, Population => $_<Population>) });

@plot-data.&dimensions

(32282 6)

Here we prepare data for finding nearest cities for given Geo-location -- we make a mapping from identifiers to latitude-longitude pairs:

In [22]:
my %locations = @data.map({ $_<ID> => $_<Latitude Longitude>});
say deduce-type(%locations);
%locations.pick(3)

Assoc(Vector(Atom((Str)), 31599), Tuple([Tuple([Atom((Int)), Atom((Rat))]) => 1, Vector(Atom((Rat)), 2) => 31598], 31599), 31599)


(United_States.South_Dakota.Irene => (43.0837047 -97.1575855) United_States.New_York.Chateaugay => (44.9265894 -74.0803981) United_States.Pennsylvania.Laceyville => (41.6458107 -76.1589704))

Here we make _nearest_ function for the labeled Geo-locations:

In [23]:
my &nf = nearest(%locations.pairs, distance-function => &geo-distance)

Math::Nearest::Finder(Algorithm::KDimensionalTree(points => 31599, distance-function => &geo-distance, labels => 31599))

Find the identifier for Las Vegas, Nevada:

In [24]:
my $id = @data.grep({ $_<ID> ~~ /Nevada .* Las .* Vegas/}).head<ID>;

United_States.Nevada.Las_Vegas

Alternatively, we can do Named Entity Recognition (NER) lookup using a function of the package "DSL::Entity::Geographics":

In [25]:
entity-city-and-state-name('Las Vegas, Nevada', 'Raku::System')

United_States.Nevada.Las_Vegas

Here we find cities nearest to Las Vegas, Nevada:

In [26]:
my @nns-labels = &nf(%locations{$id}, (Whatever, 100_000), prop => <label>).flat;
my @plot-nns-data = @plot-data.grep({ $_<ID> ∈ @nns-labels }).map({ my %h = $_.clone; %h<group> = 'nns'; %h });
@plot-nns-data.&dimensions

(24 6)

Here we plot USA cities (towns, villages) with populations larger than 1,000 and the Las Vegas' neighbors found above:

In [27]:
#% js

my %search-point = @plot-nns-data.head;
%search-point<group> = 'search';

js-d3-list-plot([|@plot-data.grep({ $_<Population> ≥ 1_000}), |@plot-nns-data, %search-point], 
        point-size => 3, 
        :$background, 
        :$tooltip-background-color,
        :$color-scheme, 
        width => 1200, 
        height => 600)

------

## Nearest neighbor graphs

In this section we demonstrate the making of nearest neighbor graphs for Geo-locations.

In [28]:
my %locations-ne = @data.grep({ $_<State> eq 'Nevada' && $_<Population> ≥ 30_000 }).map({ $_<ID> => $_<Latitude Longitude>});
%locations-ne .= map({ $_.key.subst('United_States.Nevada.', '', :g) => $_.value });
say deduce-type(%locations-ne);
%locations-ne.pick(3)

Assoc(Atom((Str)), Vector(Atom((Rat)), 2), 14)


(Reno => (39.4744867 -119.7765384) Henderson => (36.0122334 -115.0374619) Spring_Valley => (36.098721 -115.261921))

### Nearest neighbor graph

Find the edges of the nearest neighbor graph:

In [34]:
my @edges = nearest-neighbor-graph(%locations-ne.pairs, distance-function => &geo-distance);

deduce-type(@edges)

Vector(Pair(Atom((Str)), Atom((Str))), 14)

In [44]:
#%js
@edges ==> js-d3-graph-plot(:$background, :$title-color, height => 800, :%force)

In [47]:
#% js
js-d3-random-mandala(5)

The graph above was derived by using only one neighbor for each city -- its closest neighbor:

### Nearest neighbors via proximity disk

Alternatively, we can be make a graph based on the neighbors with a certain radius. 
Here we specify the making of such graph with all neighbors for each city that within 8 miles radius.
Note that we had to define the variant of `&geo-distance` in order to use mile-units:

Again, using Mermaid-Ink:

In [42]:
#%js
nearest-neighbor-graph(%locations-ne.pairs, (Whatever, 8), distance-function => { &geo-distance($^a, $^b, units => 'miles') })
==> js-d3-graph-plot(:$background, :$title-color, height => 800, :%force)

Let us "verify" the graph by making the contingency table of distances. Here we compute the long form of the distances dataset, and then we cross-tabulate that dataset:

In [31]:
my @tbl = (%locations-ne X %locations-ne).map({ %( from => $_.head.key, to => $_.tail.key, distance => &geo-distance($_.head.value, $_.tail.value, units => 'miles').round(0.1) ) });
my @ct = cross-tabulate(@tbl, 'from', 'to', 'distance').sort(*.key);

deduce-type(@ct)

Vector(Pair(Atom((Str)), Assoc(Atom((Str)), Atom((Rat)), 14)), 14)

Here is contingency table in HTML:

In [32]:
#% html
@ct.map({ ['from' => $_.key , |$_.value].Hash }) ==> to-html(field-names => ['from', |@ct>>.key])


from,Carson,Enterprise,Henderson,Las_Vegas,North_Las_Vegas,Pahrump,Paradise,Reno,Sparks,Spring_Valley,Summerlin_South,Sunrise_Manor,Whitney,Winchester
Carson,0.0,329.5,337.2,317.8,322.8,286.5,329.9,22.3,28.8,323.8,319.7,329.2,333.2,327.6
Enterprise,329.5,0.0,10.0,14.7,19.6,47.5,6.2,345.4,347.2,6.1,9.8,14.5,11.4,9.5
Henderson,337.2,10.0,0.0,19.6,18.9,57.1,7.3,352.8,354.4,13.9,18.2,11.5,6.1,10.0
Las_Vegas,317.8,14.7,19.6,0.0,10.5,42.1,12.4,333.4,334.9,8.9,8.1,12.5,15.4,9.8
North_Las_Vegas,322.8,19.6,18.9,10.5,0.0,51.9,14.2,337.9,339.3,16.0,17.5,7.6,12.9,10.4
Pahrump,286.5,47.5,57.1,42.1,51.9,0.0,50.5,303.5,305.7,43.4,39.2,54.3,55.6,50.2
Paradise,329.9,6.2,7.3,12.4,14.2,50.5,0.0,345.5,347.2,7.1,11.3,8.4,5.7,3.9
Reno,22.3,345.4,352.8,333.4,337.9,303.5,345.5,0.0,7.5,339.7,335.6,344.4,348.6,343.1
Sparks,28.8,347.2,354.4,334.9,339.3,305.7,347.2,7.5,0.0,341.4,337.4,345.9,350.1,344.7
Spring_Valley,323.8,6.1,13.9,8.9,16.0,43.4,7.1,339.7,341.4,0.0,4.3,13.1,12.5,7.8


-------

## Future plans

A variety of improvements can be done in the presented packages. Some features need fundamental revisiting of the package implementations. The sub-sections below correspond to each package considered. The features listed closer to the tope are of higher priority.

### Data::Geographics

- Figure out how to archive the data
   - One of the reasons currently to have a small set of countries is the data files are ≈ 30 MB.
- More countries
   - Both country data and city data
- Administrative divisions data

### DSL::Entity::Geographics

- Multi-language recognition of city names
    - Multi-language recognition for continents, countries, and related adjectives is already in place.
- "Drop-in" ingestion (and utilization) of OpenStreetMap geographic entity names files
    - Related to the previous item, since OpenStreetMap has multi-language names of cities.

### Data::Reshapers

- Moving association (hash-map) keys into dataset records and vice versa
- Association key-flatten and de-flatten
  - Similar to:
    - [`AssociationKeyFlatten`](https://resources.wolframcloud.com/FunctionRepository/resources/AssociationKeyFlatten/) 
    - [`AssociationKeyDeflatten`](https://resources.wolframcloud.com/FunctionRepository/resources/AssociationKeyDeflatten/) 


### Math::Nearest

- Working with word collections
- Working with collections of arbitrary objects of the same type
- Octree algorithm

### JavaScript::D3

- Point size per group in multi-group scatter plots
- Point or line per group in scatter plots
    - For example `plot-style => { A => 'joined', B => 'point'}`
- Callouts for points
- Pairwise-plot
- Geo-plots with maps
- Combining the JavaScript code of different plots into one plot
    - The combinations should produce over-imposed plots. 
    - The simple combining into one div is already done (e.g. `js-d3-random-mandala`).
- Spatial 2D histograms:
    - Hexagon-based
    - Rectangle-based
- Mosaic plots
- [Choropleth maps](https://en.wikipedia.org/wiki/Choropleth_map)
- Make list-log plot functions
    - `list-log-plot`, `list-log-linear-plot`, `list-log-log-plot`
    - `date-list-log-plot`
    - Currently logarithmic plots are specified with the arguments "x-axis-scale" and "y-axis-scale".
- Animation
    - For a single plot (growing bars, creeping time series, etc.)
    - Combination of plots into an animation.

--------

## References

### Articles, blog posts

[AA1] Anton Antonov, ["Age at creation for programming languages stats"](https://rakuforprediction.wordpress.com/2024/05/25/age-at-creation-for-programming-languages-stats/), (2024), [RakuForPrediction](https://rakuforprediction.wordpress.com).

### Packages


[AAp1] Anton Antonov, [Data::Geographics Raku package](https://github.com/antononcube/Raku-Data-Geographics), (2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp2] Anton Antonov, [Data::Reshapers Raku package](https://github.com/antononcube/Raku-Data-Reshapers), (2021-2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp3] Anton Antonov, [Data::Summarizers Raku package](https://github.com/antononcube/Raku-Data-Summarizers), (2021-2023), [GitHub/antononcube](https://github.com/antononcube).

[AAp4] Anton Antonov, [Data::Translators Raku package](https://github.com/antononcube/Raku-Data-Translators), (2023-2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp5] Anton Antonov, [Data::TypeSystem Raku package](https://github.com/antononcube/Raku-Data-TypeSystem), (2023-2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp6] Anton Antonov, [DSL::Entity::Geographics Raku package](https://github.com/antononcube/Raku-DSL-Entity-Geographics), (2021-2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp7] Anton Antonov, [Math::DistanceFunctions Raku package](https://github.com/antononcube/Raku-Math-DistanceFunctions), (2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp8] Anton Antonov, [Math::Nearest Raku package](https://github.com/antononcube/Raku-Math-Nearest), (2024), [GitHub/antononcube](https://github.com/antononcube).

[AAp9] Anton Antonov, [JavaScript::D3 Raku package](https://github.com/antononcube/Raku-JavaScript-D3), (2022-2024), [GitHub/antononcube](https://github.com/antononcube).


### Videos

[AAv1] Anton Antonov, ["The Raku-ju hijack hack for D3.js"](https://www.youtube.com/watch?v=YIhx3FBWayo), (2022), [YouTube/@AAA4prediction](https://www.youtube.com/@AAA4prediction). (7 min.)

[AAv2] Anton Antonov, ["Random mandalas generation (with D3.js via Raku)"](https://www.youtube.com/watch?v=THNnofZEAn4), (2022), [YouTube/@AAA4prediction](https://www.youtube.com/@AAA4prediction). (2 min.)

[AAv3] Anton Antonov, ["Exploratory Data Analysis with Raku"](https://www.youtube.com/watch?v=YCnjMVSfT8w), (2024), [YouTube/@AAA4prediction](https://www.youtube.com/@AAA4prediction). (21 min.)