
## Working with Mapbox GL JS

Mapbox GL JS is a JavaScript library/API
that allows you to make interactive maps that includes shapes, markers, pop-up windows and many other built-in interactive capabilities. But, because it's written in JavaScript and running the browser, anyone with knowledge of JavaScript can extend its capabilities as far as that knowledge can take them. 

What is most important to us is that "data" can be attached to these maps via **geojson format**--making the map, and the rest of the browser an interface for the reader to explore and engage with the output of your research.

For your final projects--the real goal is producing successful, thoughtful, meaningful **output** (that is, dataframes!) that can be explored through the map. More on the specific output you'll need in a moment: first, in basics about JavaScript and Mapbox.

### JavaScript
JavaScript is the programming language that was invented in order to make webpages interactive. JavaScript is an odd and quirky language--it began as a necessity for scripting events on web browsers, and now it has been extended in many directions beyond even the browser. There are many JavaScript tutorials out there -- https://www.w3schools.com/js/ is the most basic, and a decent place to start. With your knowledge of Python you could certainly learn JavaScript via tutorials, books like *Javascript & JQuery, interactive front-end web development*, by Jon Duckett, sites/books like http://eloquentjavascript.net/, and, of course, by patrolling stack overflow.

A few basic things to know:

-All lines in JavaScript are supposed to end with the semi-colon  `;`
Not everyone follows this standard, but that's what you're supposed to do.

-All variables must be initialized/declared using one of the following keywords: `var`, `let`, or `const` (each have a different meaning)

-All functions, loops, if statements etc. are enclosed in brackets `{ }`
For example, here's a JavaScript loop:

`
for (let i=0; i < 10; i++) {
   console.log(i)
}
`

Here is the JavaScript loop through an array (Python list):

`const daysofWeek = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'];`

`
for (let i=0; i < daysofWeek.length; i++) {
   console.log(daysofWeek[i])
}
`


**JavaScript console** Notice the `console.log()`, that is the JavaScript version of `print()` The runtime environment for JavaScript is the browser. The console is the JavaScript console that is part of the browser's developer tools. Go to Chrome and select:

`View:Developer:JavaScript Console`

And you can cut-and-paste each of those loops into the console and run it. The console is very helpful for debugging JavaScript. When you have errors on the the page, the console tells you where they are by line number (or tries to), and you can also log variables into the console to make sure everything is working in your script. **If you try to load your map, you should always check the JavaScript console if you have any trouble.**

There's a lot more to know about JavaScript, if you want to learn it. Here a few random things to know:

-Indentations are meaningless in JavaScript (but it's good to use them so your code can be read clearly by a human)

-JavaScript is a messy language, it tries not to be type-specific: so it will automatically convert numerical variables into strings and back--unless it doesn't.

-JavaScript tries not to break--if one part of the script breaks it tries to keep the page going, so sometimes it's hard to debug. 

-JavaScript cares about the DOM -- it reads the page for elements and allows you to change their contents, styles, and lots of other things. For really robust browser-page effects, you should use the Jquery library--it's like short-hand, superpowered JavaScript.

-As I'm sure you all know by now, "lists" in Python are "arrays" in JavaScript, "dictionaries" in Python are "objects" in JavaScript.

-Finally: **you do not need to learn JavaScript to complete this project.** I have built templates that will allow you to build a geojson file that will plug into the mapbox GL page with only a little bit of work and custom changes.



## MAPS ON SCREEN

### SCREEN SPACE
This is the space on the browser, and the screen is measured on an X/Y axis. All of the placement of elements on the screen have a location on these coordinates. The top left-hand corner of the browser is at (x:0,y:0). The further to the right you go the more pixels 'x' is. The further down you go the more pixels 'y' is. To get an idea of locations try this page: 

https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_event_mouse_clientxy2

The main thing that a library like Mapbox GL https://docs.mapbox.com/mapbox-gl-js/examples/
does is translate longitudes and latitudes into screen space...

### Templates
There are two templates that I am providing for this project. I have created them so you can go very far with almost no customization at all. There are two main templates available on courseworks:

`map_shapes_template.zip
&
map_points_template.zip`

These both contain two files:

`map.html
geo-data.js`

`map.html` contains all of the HTML, CSS, JavaScript to display the map can make it interactive. How much you want to customize the styles, layout, etc is completely optional. You may not even touch this file.

`geo-data.js` contains the geojson document that you will export from your data frames. 95% of the work is in building this.

### Building the map
In mapbox there is one main function that creates the map: It sets the position, zoom and the tiles. In all likelihood this is the only thing you will need to edit.

```

		var map = new mapboxgl.Map({
			container: 'map',
			style: 'mapbox://styles/mapbox/light-v11',
			center: [0, 0],
			zoom: 11,
			projection: 'naturalEarth'
		});

```

'new mapboxgl.Map({' is where the map first is constructed. 'var map' is the container (variable) that holds the map that you're constructing. The rest of the properties (keys) define different aspects of the map.

### Projections
Over the past year or so Mapbox has implemented different projections. These allow us to move away from Mercator projections which warp the size of countries unacceptably. Both of the templates default to use the 'naturalEarth' projection, which is more accurate. If your map is zoomed in, you might want to consider removing the projection property (above) and return it the map to Mercator, so that the area you're displaying is flatter. For more on projections:
https://docs.mapbox.com/mapbox-gl-js/guides/projections/

### Positioning your map
To make your mapbox project work, you only need to make some small changes on the HTML page (your main goal is generating the proper output for geojson--I'll get to that soon). First here are a few things that you can do to the HTML/Mapbox code.

Center your map and choose your zoom:
`
center: [-21.9270884, 64.1436456], 
zoom: 11
`

Those properties `center` and `zoom` tell the browser what longitude and latitude you want the map to be centered on `[-21.9270884, 64.1436456]`, and the next number is the zoom level `11`. 0 zoom is the whole world, around 12 you start zooming in on a city, after 20 you start getting very very close to the street. `map_points_template` uses the  method .fitBounds() that automatically sets your map's center and zoom--which can be super helpful or make things more complicated--see the template code for details.

There are tons of ways of using and extending Mapbox GL JS. Here are links to examples which might be helpful but probably a rabbit hole, and the actual API documentation which I suggest you don't read until next semester:

https://docs.mapbox.com/mapbox-gl-js/examples/

https://docs.mapbox.com/mapbox-gl-js/api/


## Tiles
What are tiles? Tiles are the background images that are displayed on an interactive screen map. If you have ever gone to [Google Maps](https://www.google.com/maps), you may have noticed that the world according to Google has a particular look and feel to it--very tan, green and blue. This is the default design for Google Maps' tiles. Notice how when you zoom in or move the mouse around, there is an empty gray space before the details of the map show up (if you internet connection is fast you might not see the blank tiles). These are tiles: different illustrations of maps that have been created for various levels of zoom, for every part of the earth. Your browser doesn't download all of them at once--that would be a huge download. Instead, these images of the earth are split up into small tiles and served dynamically to you depending on what you're looking at (what level of zoom, and what geographic location).

One of the advantages and limitations to Mapbox is that it serves tiles--so your maps can work just like Google Maps. The problem of course is that these tiles greatly influence the look and feel of your map. There are and handful of free tiles that Mapbox provides. (And if you are a designer, and have a lot of time on your hands, you can custom-make tiles in Mapbox--do not do this for this project!!!)

Choose your tiles:

`style: 'mapbox://styles/mapbox/light-v11'`

This line lets you access a free tile library from MapBox, are other free tiles include:

https://docs.mapbox.com/api/maps/#styles

### Access token
Like many APIs Mapbox requires that you have access token and register. Please register and get your own access token here:

https://docs.mapbox.com/help/how-mapbox-works/access-tokens/

You will need to make a public token for this project, but please do rather than use the one that's already in the code.

Once you've made your access token replace this line with your code:
```
<script type="text/javascript">
		mapboxgl.accessToken = 'your_code';
```

Those are the super basics--you can go a lot deeper on your own if you want to pursue this project beyond the next week.

### OUTPUT!

Finally, this is really what matters the next week. You are now scraping cleaning and aggregating your data. The question will be, **what do you want people to see?** Here are the main categories you need to focus on in order to get the output you need.  All of these outputs will be constructed in Python, and exported to geojson.

## geojson

The **geojson format** is a standardized form of JSON (JavaScript object notation)--specifically set up to be read by mapping programs (not just Mapbox but all mapping programs). The main thing to understand is that each point or shape on a map is considered a feature. Each feature is held in an array (list) called featuresCollection. Each feature has two important properties (keys)--**geometry**, which contains the longitude and latitude as well as type of shape-- and **properties**, which attaches any additional data to that shape. Here's a simple example of a feature:
`
{
"type": "Feature",
"properties": {"party": "Democrat"},
"geometry": {
"type": "Polygon",
"coordinates": [[
[-109.05, 41.00],
[-102.06, 40.99],
[-102.03, 36.99],
[-109.04, 36.99],
[-109.05, 41.00]
]]
}
`

For your project it is the properties that are 95% of the challenge--but you will need some geometry so that you have interactive shapes on your map.

## Geometry
This is critical to building your GEOJSON object--**geometry** is the property that uses longitude and latitude to plot points or draw shapes on the map. What kind of geometry will you need? There are two aspects of geometry you need to decide on--first what geographical level are you studying (Country, State, City, Address), and second, what kind of shape do you need?

The two templates for this project do imply that you have to make a choice between to main categories of shapes: **polygons** or **points**. Polygons are shapes like the shape of the state or a country. Points are specific locations (like an address) defined by a single set of longitude and latitude. (Usually these different categories of shapes imply different levels of knowledge. You can try to combine them for the project, and the templates may or may not behave...) Most of you will need polygons and multi polygons for Country/State level projects. Some of you will need points (which is simple latitude and longitude). 

Here is a Mapbox example of what shapes are:

https://docs.mapbox.com/mapbox-gl-js/example/geojson-polygon/

But rather than reading a boring tutorial, let's just make some shapes! For a class today we are going to build a very quick point map, and make it work in the points template.

Go to this site, and make some points:

http://geojson.io/

This site allows you to dynamically construct a geojson document. Zoom into whatever you like and start making points. As you might begin to understand as you build your random point map, the real trick with a program like mapbox is that it translates longitude and latitude coordinates to the X and Y grid of the screen space.

Once you have constructed your document's geometry, you can now add the properties from our template:

`{"article": "<p>text</p>", "radius":7, "color": "#FFFF00", "group_id": 1, "group_name": " ", "headline":"", "name": " "}`

While geometry is critical, at this point you only need to know which kind of geometry you'll need--the last step of your project should be locating shapefiles/lng-lats and merging them with your output data that will all be in the "properties" object/dictionary of your final geojson document. (Today's later tutorials begin to take you through those exact steps.)


## Properties 
This is what you should be focusing on building. As you will see, there are only about four or five dictionary keys that you need to build. But you need to build them well.

### Informational properties
**name:** State/Country/City/Court district--the main unit of study/geometry

**headline:** a simple short summary of the information attached to the layer (State/City/Point)--like a headline. This will appear in the pop-up window when you roll over a layer.

**article:** text displayed in the browser, outside of the map--this can be an entire article in HTML. This text will be displayed when you click on a layer.

### Visual grouping properties
**color:** in a hexadecimal ("#660066") or RGB ("rgb(120,0,120)") string -- for more on defining colors, check this out: https://color.adobe.com/create/color-wheel/
Think about how many colors do you need, and what kind of colors would be the most representative, appropriate, effective.

**rating:** This is an alternate to using 'color:' You can specify a numerical range and specify numbers between that range, and the mapbox template will automatically generate a proper color based on the range (Soma made this!). If you want to use this you may need to edit some of the JavaScript here:

`
paint: {
`

And go to Soma's explanation of how this works:

https://gist.github.com/jsoma/c91cfa7a1f4f8346d95ac2a907f0cb0c



**radius**: This only works with points geometry. You can set the radius of each point in pixels.

**group_id:** and **groud_name:** different groupings of data to be displayed separately as multiple layers on the map--this will allow you to display/study multiple aspects of the data.

### geojson row:
Here's an example one row in geojson format that will work with the template:
`
{
      "type": "Feature",
      "properties":{ "name": "My House", "group_name": "best", "group_id": 1, "headline": "home", "article": "<p>What I like about my house is ...,<\/p>", "color": "#660000", "radius": "7" },
      "geometry": {
        "type": "Point",
        "coordinates": [
          -73.96416664123535,
          40.78950978441437
        ]
      }
    }
`

You will have a number of these rows. You want to build them in pandas, and then export them as json document format--See below for how to do this.

## Plugging in your geojson document

But setting this up entails running the server on your computer that you access through your browser. 

So, the much easier way to do this just to take the geojson document, paste the whole thing into the geo-data.js file, directly following the variable:

`infoData = `

So that your first line begins like this:

`infoData = {"type": "FeatureCollection", "features": `

... Continuing on with the entirety of the exported document.





## The process: from shapefiles to dataframes to mapbox

### Shapefiles

Sometimes the biggest challenge is finding the right shapes for your project.

If you simply need country shapes, this is a possible resource:

https://geojson-maps.ash.ms/

If you need to get latitude and longitude's, your best bet is the Google maps API:

https://developers.google.com/maps/documentation/geocoding/start

Some other general US shapes are here:

https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html

If you are doing federal court districts, you are in luck, I found them for you! This is not the easiest search in the world, but eventually I came upon the shape files here:

https://hifld-geoplatform.opendata.arcgis.com/datasets/us-district-court-jurisdictions


### Mapshaper

This is an online tool for processing, formatting, and exporting shape files. 

http://mapshaper.org/

You drag and drop the shapefile that you downloaded--and, most importantly for the District Court, you want to downsize that so it's not too big. 

Then you export it as geojson...

### pandas > geojson > mapbox



### Prepping for Output

Back to the BBC data that we scraped. 

I am first going to use pandas to get it into a proper format to fit into the mapbox template that I've set up.


In [143]:
import numpy as np
import pandas as pd
df = pd.read_csv("backup_BBC2.csv")

In [144]:
df.head(20)

(1770, 6)

#### Grouping

Right now, the only useful group is **critics by country**. There's not much else to work with, no other locations. So as far as location mapping goes, that's what we're doing here. 

(If I were to make this a real project, I would add data to this initial data set. And I would most likely find directors by country because that gives me a lot more to work with and map.)



In [145]:
df.groupby('crit_cn')['critic'].nunique()

crit_cn
Argentina        2
Australia        4
Austria          2
Bangladesh       1
Belgium          1
Brazil           1
Canada           5
Chile            2
China            1
Colombia         4
Cuba             5
Egypt            1
France           5
Germany          5
Hong Kong        1
India            5
Indonesia        1
Israel           4
Italy            4
Japan            1
Kazakhstan       1
Lebanon          3
Mexico           2
Namibia          1
Philippines      1
Qatar            1
Senegal          1
Singapore        2
South Africa     1
South Korea      2
Switzerland      1
Taiwan           1
Turkey           2
UAE              3
UK              18
US              82
Name: critic, dtype: int64

In [146]:
#Narrowing down my data into a smaller frame

In [147]:
df1 = df.groupby('crit_cn')['movie'].value_counts().reset_index(name='count')
df1.head(60)

Unnamed: 0,crit_cn,movie,count
0,Argentina,Spirited Away,2
1,Argentina,Adventureland,1
2,Argentina,Boyhood,1
3,Argentina,Elephant,1
4,Argentina,Extraordinary Stories,1
5,Argentina,In the Mood for Love,1
6,Argentina,Jersey Boys,1
7,Argentina,Mad Max: Fury Road,1
8,Argentina,Mia Madre,1
9,Argentina,Moulin Rouge!,1


The cell above has the output I want,  but I need it to look nice and be combined in a single column.  This is how we build the **"article:"** field of our geojson doc. You need to combine columns of data into readable text.

In [148]:
#Moving the values from the two right columns into a new column
#One that human readers can understand
df1["string"] = df1["movie"] + ": " + df1["count"].astype(str) + np.where(df1["count"]>1, ' votes', ' vote')
df1[850:900]

Unnamed: 0,crit_cn,movie,count,string
850,UK,Wild Tales,1,Wild Tales: 1 vote
851,UK,Yi Yi: A One and a Two,1,Yi Yi: A One and a Two: 1 vote
852,UK,"You, The Living",1,"You, The Living: 1 vote"
853,UK,Zero Dark Thirty,1,Zero Dark Thirty: 1 vote
854,US,There Will Be Blood,21,There Will Be Blood: 21 votes
855,US,Mulholland Drive,20,Mulholland Drive: 20 votes
856,US,Eternal Sunshine of the Spotless Mind,19,Eternal Sunshine of the Spotless Mind: 19 votes
857,US,In the Mood for Love,18,In the Mood for Love: 18 votes
858,US,Inside Llewyn Davis,17,Inside Llewyn Davis: 17 votes
859,US,Boyhood,16,Boyhood: 16 votes


This is nice but I need to I have only **one row per country**. I do another group, and combine everything together. And when I do that I throw some HTML into there. 

Note this is the opposite of **unwind** (in a way) it is grouping, but it is carrying all of the text in the group into one cell, and formatting it as HTML!

In [149]:
#this is just two different ways to do this:
#one using % as the wild card
#and one using .format and {0} as the wild card

#output = df1.groupby('crit_cn')['string'].apply(lambda x: "<div id='movie'><h1><b>Top Movies</b></h1><P>%s</P></div>" % '</p><p> '.join(x)).reset_index(name='properties.article')
output = df1.groupby('crit_cn')['string'].apply(lambda x: "<div class='movie_list'><h1><b>Top Movies</b></h1><P>{0}</P></div>".format('</p><p> '.join(x))).reset_index(name='properties.article')
output


Unnamed: 0,crit_cn,properties.article
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...
2,Austria,<div class='movie_list'><h1><b>Top Movies</b><...
3,Bangladesh,<div class='movie_list'><h1><b>Top Movies</b><...
4,Belgium,<div class='movie_list'><h1><b>Top Movies</b><...
5,Brazil,<div class='movie_list'><h1><b>Top Movies</b><...
6,Canada,<div class='movie_list'><h1><b>Top Movies</b><...
7,Chile,<div class='movie_list'><h1><b>Top Movies</b><...
8,China,<div class='movie_list'><h1><b>Top Movies</b><...
9,Colombia,<div class='movie_list'><h1><b>Top Movies</b><...


In [None]:
#this is what the UK cell for the article looks like

In [150]:
output.iloc[34]['properties.article']

"<div class='movie_list'><h1><b>Top Movies</b></h1><P>12 Years a Slave: 4 votes</p><p> Boyhood: 4 votes</p><p> Eternal Sunshine of the Spotless Mind: 4 votes</p><p> In the Mood for Love: 4 votes</p><p> Mulholland Drive: 4 votes</p><p> Spirited Away: 4 votes</p><p> The Great Beauty: 4 votes</p><p> The Lives of Others: 4 votes</p><p> There Will Be Blood: 4 votes</p><p> A Separation: 3 votes</p><p> Caché: 3 votes</p><p> Lost in Translation: 3 votes</p><p> Tabu: 3 votes</p><p> Under the Skin: 3 votes</p><p> Birth: 2 votes</p><p> Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan: 2 votes</p><p> Margaret: 2 votes</p><p> Moolaadé: 2 votes</p><p> Red Road: 2 votes</p><p> Synecdoche, New York: 2 votes</p><p> Talk to Her: 2 votes</p><p> The Act of Killing: 2 votes</p><p> The Dark Knight: 2 votes</p><p> The Royal Tenenbaums: 2 votes</p><p> The Tree of Life: 2 votes</p><p> Uncle Boonmee Who Can Recall His Past Lives: 2 votes</p><p> Weekend: 2 votes</p><p> Wendy an

Now we need our **headline**:  I'm going to have it be the number of critics in the country. I make a mini dataframe That counts the number of critics, and I merge it with my output dataframe.

In [151]:
crits = df.groupby('crit_cn')['critic'].nunique().reset_index(name='properties.votes')
crits

Unnamed: 0,crit_cn,properties.votes
0,Argentina,2
1,Australia,4
2,Austria,2
3,Bangladesh,1
4,Belgium,1
5,Brazil,1
6,Canada,5
7,Chile,2
8,China,1
9,Colombia,4


In [152]:
output = output.merge(crits, how='left', on='crit_cn')

In [153]:
output


Unnamed: 0,crit_cn,properties.article,properties.votes
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...,2
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4
2,Austria,<div class='movie_list'><h1><b>Top Movies</b><...,2
3,Bangladesh,<div class='movie_list'><h1><b>Top Movies</b><...,1
4,Belgium,<div class='movie_list'><h1><b>Top Movies</b><...,1
5,Brazil,<div class='movie_list'><h1><b>Top Movies</b><...,1
6,Canada,<div class='movie_list'><h1><b>Top Movies</b><...,5
7,Chile,<div class='movie_list'><h1><b>Top Movies</b><...,2
8,China,<div class='movie_list'><h1><b>Top Movies</b><...,1
9,Colombia,<div class='movie_list'><h1><b>Top Movies</b><...,4


In [154]:
#Turn that number and into something readable.
# and I'm keeping the votes as a separate for later
output['properties.headline'] = output['properties.votes'].astype(str) + np.where(output["properties.votes"]>1, ' critics', ' critic')

In [155]:
output

Unnamed: 0,crit_cn,properties.article,properties.votes,properties.headline
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics
2,Austria,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics
3,Bangladesh,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic
4,Belgium,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic
5,Brazil,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic
6,Canada,<div class='movie_list'><h1><b>Top Movies</b><...,5,5 critics
7,Chile,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics
8,China,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic
9,Colombia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics


Now let's **set colors** for our map. (There are ways to do this inside the map, but why not do it in Python just for fun.)

In [156]:
def assign_color(votes):
    if votes < 3:
        return "#00CCE7"
    if votes < 10:
        return "#00AEC5"
    if votes < 20:
        return "#0090A3"
    else:
        return "#007281"

In [157]:

output['properties.color'] = output['properties.votes'].apply(lambda x: assign_color(x))
output

Unnamed: 0,crit_cn,properties.article,properties.votes,properties.headline,properties.color
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5
2,Austria,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7
3,Bangladesh,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
4,Belgium,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
5,Brazil,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
6,Canada,<div class='movie_list'><h1><b>Top Movies</b><...,5,5 critics,#00AEC5
7,Chile,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7
8,China,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
9,Colombia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5


Ok, that's some nice output! But we need to join (**merge**) this with a **GeoJson** file.

I will bring the GeoJson file into pandas. But in anticipation, for the fact that there's just no way all the countries are going to match across both files. I'm outputting lists of countries, so I can just do this in Python.

In [158]:
#list of countries in our main df
cn_list = list(output['crit_cn'].unique())
cn_list.sort()
cn_list

['Argentina',
 'Australia',
 'Austria',
 'Bangladesh',
 'Belgium',
 'Brazil',
 'Canada',
 'Chile',
 'China',
 'Colombia',
 'Cuba',
 'Egypt',
 'France',
 'Germany',
 'Hong Kong',
 'India',
 'Indonesia',
 'Israel',
 'Italy',
 'Japan',
 'Kazakhstan',
 'Lebanon',
 'Mexico',
 'Namibia',
 'Philippines',
 'Qatar',
 'Senegal',
 'Singapore',
 'South Africa',
 'South Korea',
 'Switzerland',
 'Taiwan',
 'Turkey',
 'UAE',
 'UK',
 'US']

**Importing GeoJson**

First, we need to load it as JSON.

Then we normalize it in pandas so we get a flat data frame.

Pay very close attention to the column names. They are maintaining the hierarchy of the JSON data.

In [159]:
import json
from pandas import json_normalize
with open('countries50.geo.json') as json_data:
    geometry_data = json.load(json_data)

In [160]:
df_geo = pd.DataFrame.from_dict(json_normalize(geometry_data['features']), orient='columns')


In [161]:
df_geo

Unnamed: 0,type,properties.featurecla,properties.scalerank,properties.labelrank,properties.sovereignt,properties.sov_a3,properties.adm0_dif,properties.level,properties.type,properties.tlc,...,properties.fclass_pl,properties.fclass_gr,properties.fclass_it,properties.fclass_nl,properties.fclass_se,properties.fclass_bd,properties.fclass_ua,properties.filename,geometry.type,geometry.coordinates
0,Feature,Admin-0 country,1,5,Costa Rica,CRI,0,2,Sovereign country,1,...,,,,,,,,CRI.geojson,Polygon,"[[[-82.56357421874999, 9.57666015625], [-82.56..."
1,Feature,Admin-0 country,1,5,Nicaragua,NIC,0,2,Sovereign country,1,...,,,,,,,,NIC.geojson,Polygon,"[[[-83.15751953124999, 14.993066406249994], [-..."
2,Feature,Admin-0 country,3,6,France,FR1,1,2,Dependency,1,...,,,,,,,,MAF.geojson,Polygon,"[[[-63.011181640625, 18.06894531249999], [-63...."
3,Feature,Admin-0 country,3,6,Netherlands,NL1,1,2,Country,1,...,,,,,,,,SXM.geojson,Polygon,"[[[-63.123046875, 18.06894531249999], [-63.011..."
4,Feature,Admin-0 country,1,5,Haiti,HTI,0,2,Sovereign country,1,...,,,,,,,,HTI.geojson,MultiPolygon,"[[[[-71.779248046875, 19.718164062499994], [-7..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
237,Feature,Admin-0 country,3,6,United Kingdom,GB1,1,2,Dependency,1,...,,,,,,,,SHN.geojson,MultiPolygon,"[[[[-5.692138671875, -15.997753906250011], [-5..."
238,Feature,Admin-0 country,1,5,Mauritius,MUS,0,2,Sovereign country,1,...,,,,,,,,MUS.geojson,Polygon,"[[[57.65126953125002, -20.48486328125], [57.52..."
239,Feature,Admin-0 country,3,5,United Kingdom,GB1,1,2,Disputed,1,...,,,,,,,,IOT.geojson,Polygon,"[[[72.49199218750002, -7.37744140625], [72.468..."
240,Feature,Admin-0 country,4,5,Maldives,MDV,0,2,Sovereign country,1,...,,,,,,,,MDV.geojson,MultiPolygon,"[[[[73.4166015625, 3.23125], [73.3953125000000..."


In [None]:
# OK now a list of all the countries in the geo file, this is close to every country.

In [162]:
geo_list = list(df_geo['properties.sovereignt'].unique())
geo_list.sort()
geo_list

['Afghanistan',
 'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Antarctica',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Costa Rica',
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Democratic Republic of the Congo',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'East Timor',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Ethiopia',
 'Federated States of Micronesia',
 'Fiji',
 'Finland',
 'France',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Grenada',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Guyana',
 'Haiti',
 'Hond

In [163]:
# get unmatching countries
for cn in cn_list:
    if cn not in geo_list:
        print(cn)

Hong Kong
UAE
UK
US


In [164]:
# look for those in the list
import re
for cn in geo_list:
    if re.match("United", cn, re.IGNORECASE):
        print(cn)
    elif re.match("Hong", cn, re.IGNORECASE):
        print(cn)
#unfortunately hong kong isn't in there

United Arab Emirates
United Kingdom
United Republic of Tanzania
United States of America


In [166]:
#rename those unmatching countries in the main dataframe
output['crit_cn'] = output['crit_cn'].str.replace('US','United States of America')
output['crit_cn'] = output['crit_cn'].str.replace('UK','United Kingdom')
output['crit_cn'] = output['crit_cn'].str.replace('UAE','United Arab Emirates')

In [167]:
output

Unnamed: 0,crit_cn,properties.article,properties.votes,properties.headline,properties.color
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5
2,Austria,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7
3,Bangladesh,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
4,Belgium,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
5,Brazil,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
6,Canada,<div class='movie_list'><h1><b>Top Movies</b><...,5,5 critics,#00AEC5
7,Chile,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7
8,China,<div class='movie_list'><h1><b>Top Movies</b><...,1,1 critic,#00CCE7
9,Colombia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5


In [168]:
#I am not ok with this deleting Hong Kong, so instead, I make an artificial Hong Kong rectangle here:
#https://geojson.io/
#and hand-pasted it into the geojson file
#not perfect...but better than deleting
#importing my new file
import json
from pandas import json_normalize
with open('countries50.geo2.json') as json_data:
    geometry2_data = json.load(json_data)


In [169]:
df_geo2 = pd.DataFrame.from_dict(json_normalize(geometry2_data['features']), orient='columns')


In [170]:
#there hong kong in the first row
#all of the properties are wrong, except sovereignt, but that's the only property we need!
df_geo2.head(10)

Unnamed: 0,type,properties.featurecla,properties.scalerank,properties.labelrank,properties.sovereignt,properties.sov_a3,properties.adm0_dif,properties.level,properties.type,properties.tlc,...,properties.fclass_pl,properties.fclass_gr,properties.fclass_it,properties.fclass_nl,properties.fclass_se,properties.fclass_bd,properties.fclass_ua,properties.filename,geometry.coordinates,geometry.type
0,Feature,Admin-0 country,1,5,Hong Kong,CRI,0,2,Sovereign country,1,...,,,,,,,,CRI.geojson,"[[[114.04369164382706, 22.179169859864487], [1...",Polygon
1,Feature,Admin-0 country,1,5,Costa Rica,CRI,0,2,Sovereign country,1,...,,,,,,,,CRI.geojson,"[[[-82.56357421874999, 9.57666015625], [-82.56...",Polygon
2,Feature,Admin-0 country,1,5,Nicaragua,NIC,0,2,Sovereign country,1,...,,,,,,,,NIC.geojson,"[[[-83.15751953124999, 14.993066406249994], [-...",Polygon
3,Feature,Admin-0 country,3,6,France,FR1,1,2,Dependency,1,...,,,,,,,,MAF.geojson,"[[[-63.011181640625, 18.06894531249999], [-63....",Polygon
4,Feature,Admin-0 country,3,6,Netherlands,NL1,1,2,Country,1,...,,,,,,,,SXM.geojson,"[[[-63.123046875, 18.06894531249999], [-63.011...",Polygon
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
238,Feature,Admin-0 country,3,6,United Kingdom,GB1,1,2,Dependency,1,...,,,,,,,,SHN.geojson,"[[[[-5.692138671875, -15.997753906250011], [-5...",MultiPolygon
239,Feature,Admin-0 country,1,5,Mauritius,MUS,0,2,Sovereign country,1,...,,,,,,,,MUS.geojson,"[[[57.65126953125002, -20.48486328125], [57.52...",Polygon
240,Feature,Admin-0 country,3,5,United Kingdom,GB1,1,2,Disputed,1,...,,,,,,,,IOT.geojson,"[[[72.49199218750002, -7.37744140625], [72.468...",Polygon
241,Feature,Admin-0 country,4,5,Maldives,MDV,0,2,Sovereign country,1,...,,,,,,,,MDV.geojson,"[[[[73.4166015625, 3.23125], [73.3953125000000...",MultiPolygon


**Now we can Merge** the two dataframe without have key errors.

Note, this is a **merge left** so it keeps my output, data frame as it was, and just adds three columns from the GeoJson data:

`properties.sovereignt` because that's what I'm joining on

and `geometry.coordinates` and `geometry.type` because those are what the map needs.

In [171]:
df_final = pd.merge(
    output,df_geo2[["properties.sovereignt","geometry.coordinates","geometry.type"]],
    left_on='crit_cn', 
    right_on = 'properties.sovereignt', how='left')

In [172]:
df_final

Unnamed: 0,crit_cn,properties.article,properties.votes,properties.headline,properties.color,properties.sovereignt,geometry.coordinates,geometry.type
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7,Argentina,"[[[[-57.60888671875, -30.187792968750003], [-5...",MultiPolygon
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[[105.72539062499999, -10.49296875], [105.69...",MultiPolygon
2,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[[143.17890625, -11.954492187500009], [143.1...",MultiPolygon
3,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[167.939453125, -29.017675781250006], [167.9...",Polygon
4,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[123.59453124999999, -12.425683593750009], [...",Polygon
...,...,...,...,...,...,...,...,...
64,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[[-64.84501953124999, 18.330078125], [-64.91...",MultiPolygon
65,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[[-66.12939453125, 18.444921875], [-66.09848...",MultiPolygon
66,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[-170.72626953125, -14.351171875], [-170.769...",Polygon
67,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[144.74179687500003, 13.25927734375], [144.6...",Polygon


**A custom GeoJson dataframe**

Well, almost. I do want the name of the country to be carried properly into GeoJson format so I change the column name to:

`properties.name`

In [173]:
df_final = df_final.rename(columns={'crit_cn': 'properties.name'})
df_final

Unnamed: 0,properties.name,properties.article,properties.votes,properties.headline,properties.color,properties.sovereignt,geometry.coordinates,geometry.type
0,Argentina,<div class='movie_list'><h1><b>Top Movies</b><...,2,2 critics,#00CCE7,Argentina,"[[[[-57.60888671875, -30.187792968750003], [-5...",MultiPolygon
1,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[[105.72539062499999, -10.49296875], [105.69...",MultiPolygon
2,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[[143.17890625, -11.954492187500009], [143.1...",MultiPolygon
3,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[167.939453125, -29.017675781250006], [167.9...",Polygon
4,Australia,<div class='movie_list'><h1><b>Top Movies</b><...,4,4 critics,#00AEC5,Australia,"[[[123.59453124999999, -12.425683593750009], [...",Polygon
...,...,...,...,...,...,...,...,...
64,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[[-64.84501953124999, 18.330078125], [-64.91...",MultiPolygon
65,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[[-66.12939453125, 18.444921875], [-66.09848...",MultiPolygon
66,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[-170.72626953125, -14.351171875], [-170.769...",Polygon
67,United States of America,<div class='movie_list'><h1><b>Top Movies</b><...,82,82 critics,#007281,United States of America,"[[[144.74179687500003, 13.25927734375], [144.6...",Polygon


In [None]:
#turn it back into JSON (list of dictionaries)

In [174]:
ok_json = json.loads(df_final.to_json(orient='records'))

In [None]:
#not this is not geojson 

In [175]:
ok_json

[{'properties.name': 'Argentina',
  'properties.article': "<div class='movie_list'><h1><b>Top Movies</b></h1><P>Spirited Away: 2 votes</p><p> Adventureland: 1 vote</p><p> Boyhood: 1 vote</p><p> Elephant: 1 vote</p><p> Extraordinary Stories: 1 vote</p><p> In the Mood for Love: 1 vote</p><p> Jersey Boys: 1 vote</p><p> Mad Max: Fury Road: 1 vote</p><p> Mia Madre: 1 vote</p><p> Moulin Rouge!: 1 vote</p><p> Mulholland Drive: 1 vote</p><p> Nine Queens: 1 vote</p><p> Open Range: 1 vote</p><p> Right Now, Wrong Then: 1 vote</p><p> The Social Network: 1 vote</p><p> The Son's Room: 1 vote</p><p> Toy Story 3: 1 vote</p><p> Uncle Boonmee Who Can Recall His Past Lives: 1 vote</p><p> WALL-E: 1 vote</P></div>",
  'properties.votes': 2,
  'properties.headline': '2 critics',
  'properties.color': '#00CCE7',
  'properties.sovereignt': 'Argentina',
  'geometry.coordinates': [[[[-57.6088867188, -30.1877929688],
     [-57.6457519531, -30.226953125],
     [-57.6508789062, -30.2950195313],
     [-57.712695312

In [None]:
#this function will convert the flat json into the geojson hierarchy

In [176]:
def process_to_geojson(file):
    geo_data = {"type": "FeatureCollection", "features":[]}
    for row in file:
        this_dict = {"type": "Feature", "properties":{}, "geometry": {}}
        for key, value in row.items():
            key_names = key.split('.')
            if key_names[0] == 'geometry':
                this_dict['geometry'][key_names[1]] = value
            if str(key_names[0]) == 'properties':
                this_dict['properties'][key_names[1]] = value
        geo_data['features'].append(this_dict)
    return geo_data


In [177]:
geo_format = process_to_geojson(ok_json)

In [178]:
geo_format

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'properties': {'name': 'Argentina',
    'article': "<div class='movie_list'><h1><b>Top Movies</b></h1><P>Spirited Away: 2 votes</p><p> Adventureland: 1 vote</p><p> Boyhood: 1 vote</p><p> Elephant: 1 vote</p><p> Extraordinary Stories: 1 vote</p><p> In the Mood for Love: 1 vote</p><p> Jersey Boys: 1 vote</p><p> Mad Max: Fury Road: 1 vote</p><p> Mia Madre: 1 vote</p><p> Moulin Rouge!: 1 vote</p><p> Mulholland Drive: 1 vote</p><p> Nine Queens: 1 vote</p><p> Open Range: 1 vote</p><p> Right Now, Wrong Then: 1 vote</p><p> The Social Network: 1 vote</p><p> The Son's Room: 1 vote</p><p> Toy Story 3: 1 vote</p><p> Uncle Boonmee Who Can Recall His Past Lives: 1 vote</p><p> WALL-E: 1 vote</P></div>",
    'votes': 2,
    'headline': '2 critics',
    'color': '#00CCE7',
    'sovereignt': 'Argentina'},
   'geometry': {'coordinates': [[[[-57.6088867188, -30.1877929688],
       [-57.6457519531, -30.226953125],
       [-57.6508789062, -3

**Done!!!** Saving this as a JavaScript file, because I'm lazy and I don't feel like running a server that loads in a .json file.

In [179]:
#Variable name
with open('geo-data.js', 'w') as outfile:
    outfile.write("const infoData = ")
#geojson output
with open('geo-data.js', 'a') as outfile:
    json.dump(geo_format, outfile)
