# GBG Clusters work flow
Steps to analyse and visualise streets and street names:
1. Get the geographical information (coordinates) and street names data in a geojson format by extracting it from OpenStreetMap with the Overpass Turbo API
2. Turn the geojson into a csv with Mapshaper to view more easily
3. Make a list of unique street names from the csv
4. Group the street names into thematic clusters
5. Update the geojson with the thematic cluster information
6. Visualise the map in Mapbox and adjust colours and labelling
7. Construct a scrollytelling version of the map
8. Fine-tune the layout
9. Publish on Github


## 1. Get the geographical information (coordinates) and street names data

To extract information from OpenStreetMap, I used the [Overpass Turbo API](https://overpass-turbo.eu/#). Since I wanted *all* possible streets I used a wildcard (*) in my query. This selects “residential”, “primary”, “secondary”, “tertiary”, etc., roads. Search area was set to “Göteborg” (since “Gothenburg” leads to a town in Nebraska). 

`// fetch area “Göteborg” to search in
{{geocodeArea:Göteborg}}->.searchArea;
Highway = * #→ looks for ALL!`

This query took a long time, so I had to increase the timeout time for it. The resulting geojson file was quite big.

## 2. Turn the geojson into a csv

To look at the information more easily, I wanted to convert everything to csv. That was most easily done with Mapshaper, by simply importing the geojson file and exporting it as a csv. The resulting csv was also huge and there was no way to open and handle it in Google Sheets or Excel (as I had originally planned). 


## 3. Make a list of unique street names

The resulting csv had >60 000 rows and >250 columns, including information about the surface of each street (asphalt, gravel,...), direction (one-way, two-way) and other things that I was not really interested in. Since most streets are divided into different fragments that all have their unique ID and geospatial information, the number of rows was much higher than the actual number of individual streets. 

The only thing that I was interested in for now was to get a list of all unique street names, so I imported the csv into Python and used the pandas library to get a csv of unique names.

In [3]:
import pandas as pd
GBG_all = pd.read_csv("pandas/GBG all highways simplified.csv")
#Get the column headers
GBG_all.columns

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Index(['@id', 'highway', 'int_ref', 'lanes', 'maxspeed', 'name', 'oneway',
       'ref', 'surface', 'FID',
       ...
       'ford', 'asphalt', 'wikimedia_commons', 'passenger_information_display',
       'parking:condition:left', 'parking:lane:left', 'direction',
       'oneway:foot', 'website', '@relations'],
      dtype='object', length=259)

In [4]:
#Take a slice to get the first 5 rows of the dataframe
#to get a feel for the dataset
GBG_all[:5]

Unnamed: 0,@id,highway,int_ref,lanes,maxspeed,name,oneway,ref,surface,FID,...,ford,asphalt,wikimedia_commons,passenger_information_display,parking:condition:left,parking:lane:left,direction,oneway:foot,website,@relations
0,way/4040302,motorway,E 06,2.0,80,Kungälvsleden,yes,E 6,asphalt,way/4040302,...,,,,,,,,,,
1,way/4040303,motorway,E 06,3.0,100,,yes,E 6,asphalt,way/4040303,...,,,,,,,,,,
2,way/4040436,motorway_link,,1.0,80,,yes,,asphalt,way/4040436,...,,,,,,,,,,
3,way/4040439,motorway,E 20,1.0,70,,yes,E 20,asphalt,way/4040439,...,,,,,,,,,,
4,way/4040441,motorway,E 20,3.0,70,,yes,E 20,asphalt,way/4040441,...,,,,,,,,,,


In [5]:
#Get one specific column, e.g. the names column
GBG_all["name"]

0                        Kungälvsleden
1                                  NaN
2                                  NaN
3                                  NaN
4                                  NaN
                     ...              
61069    Smedmästare Karlssons Gångväg
61070                       Bondegatan
61071                     Borgaregatan
61072                              NaN
61073                              NaN
Name: name, Length: 61074, dtype: object

A lot of the data is either duplicates of street names (since long streets are fragmented into different parts), or other geospatial data that doesn't have a street name. So let's get rid of those rows and just get the unique names.

In [6]:
#Get unique values from this column: with .unique()
#and turn it into a list with .tolist()
#GBG_all.name.unique().tolist()

In [7]:
#Then save the list in a variable and then turn it into a data frame to then turn it into a csv
unique_list = GBG_all.name.unique().tolist() #Make a list with the unique values
dict_unique = {"street_names": unique_list} #Turn the list into a dictionary
unique_dataframe = pd.DataFrame(dict_unique) #Turn the dictionary into a dataframe
unique_dataframe.to_csv("pandas/Unique GBG street names.csv") #Turn the dataframe into a csv


## 4. Group the street names into thematic clusters

Now came the manual work of going through the street names and figuring out which thematic clusters I could identify. I did this by importing the csv into Google Sheets - the csv was now sufficiently small to open there - and started by just scrolling and seeing what caught my eye. Then in a second round I went through the list more thoroughly. It was actually quite fun, but I got a bit of a “tunnel vision” from scrolling and scrolling through the 2000+ street names and had to do it over several days. 

These were the clusters I came up with: 
- Culture (literature, music, dance)
- Radio
- Professions
- Space
- Colours
- Doctors
- Food
- Weather
- Animals
- Industrial
- Holidays
- Flora
- Gems

I saved the lists of the different clusters as separate csvs and then continued in Python to generate a “category” column. 

I could have done this in Google Sheets as well, it’s a bit pointless that I went back to Python. But I wanted to practice using Python and pandas a bit more. So the result is not pretty, but it worked:

In [3]:
GBG_unique = pd.read_csv("pandas/Unique GBG street names.csv")
GBG_unique.head(10)

Unnamed: 0.1,Unnamed: 0,street_names
0,0,Kungälvsleden
1,1,
2,2,Boråsleden
3,3,Gråbovägen
4,4,Hjällbovägen
5,5,Alingsåsleden
6,6,Kungsbackaleden
7,7,Stora Nygatan
8,8,Östra Larmgatan
9,9,Östra Hamngatan


In [4]:
#Read in all the csv for the clusters
radio_cluster = pd.read_csv("pandas/cluster_radio.csv")
culture_cluster = pd.read_csv("pandas/cluster_culture.csv")
space_cluster = pd.read_csv("pandas/cluster_space.csv")
professions_cluster = pd.read_csv("pandas/cluster_professions.csv")
colors_cluster = pd.read_csv("pandas/cluster_colors.csv")
animals_cluster = pd.read_csv("pandas/cluster_animals.csv")
industrial_cluster = pd.read_csv("pandas/cluster_industrial.csv")
doctors_cluster = pd.read_csv("pandas/cluster_doctors.csv")
food_cluster = pd.read_csv("pandas/cluster_food.csv")
weather_cluster = pd.read_csv("pandas/cluster_weather.csv")
holidays_cluster = pd.read_csv("pandas/cluster_holidays.csv")
flora_cluster = pd.read_csv("pandas/cluster_flora.csv")
gems_cluster = pd.read_csv("pandas/cluster_gems.csv")

#Check if it worked for the radio cluster
radio_cluster.head()

Unnamed: 0,radio
0,Åttarörsgatan
1,Bandfiltersgatan
2,Bildradiogatan
3,Elektrongatan
4,Ettrörsgatan


In [11]:
#Turn into lists
culture_list = culture_cluster["culture"].tolist()
radio_list = radio_cluster["radio"].tolist()
space_list = space_cluster["space"].tolist()
professions_list = professions_cluster["professions"].tolist()
colors_list = colors_cluster["colors"].tolist()
animals_list = animals_cluster["animals"].tolist()
industrial_list = industrial_cluster["industrial"].tolist()
doctors_list = doctors_cluster["doctors"].tolist()
food_list = food_cluster["food"].tolist()
weather_list = weather_cluster["weather"].tolist()
holidays_list = holidays_cluster["holidays"].tolist()
flora_list = flora_cluster["flora"].tolist()
gems_list = gems_cluster["gems"].tolist()

In [14]:
#Write a function that goes through each row in the dataframe, checks the street name against the various lists
#and if the name is found in a specific list, the cat variable gets set to that cluster name
def check_category(row):
    if row['street_names'] in culture_list:
        cat = "culture"
    elif row['street_names'] in radio_list:
        cat = "radio"
    elif row['street_names'] in space_list:
        cat = "space"
    elif row['street_names'] in professions_list:
        cat = "professions"
    elif row['street_names'] in colors_list:
        cat = "colors"
    elif row['street_names'] in animals_list:
        cat = "animal"
    elif row['street_names'] in industrial_list:
        cat = "industrial"
    elif row['street_names'] in doctors_list:
        cat = "doctors"
    elif row['street_names'] in food_list:
        cat = "food"
    elif row['street_names'] in weather_list:
        cat = "weather"
    elif row['street_names'] in holidays_list:
        cat = "holidays"
    elif row['street_names'] in flora_list:
        cat = "flora"
    elif row['street_names'] in gems_list:
        cat = "gems"
    else:
        cat = "other"
    return cat

In [15]:
#Now this function gets applied to the dataframe
GBG_unique['category'] = GBG_unique.apply(check_category, axis=1)

In [6]:
#To make it easier later, rename the street_names column to name
GBG_new = GBG_unique.rename(columns={"street_names":"name"})

In [7]:
#Check if it worked
GBG_new.head()

Unnamed: 0.1,Unnamed: 0,name
0,0,Kungälvsleden
1,1,
2,2,Boråsleden
3,3,Gråbovägen
4,4,Hjällbovägen


So now we have a new list that connects all street names to categories. Let's export this as a csv and in the next step, combine it with the geospatial information again.

In [8]:
GBG_new.to_csv("pandas/GBG_streets_with_categories.csv")

## 5. Update the geojson with the thematic cluster information

Back in Mapshaper, I loaded both the original geojson and the csv with the new category column into [Mapshaper](https://mapshaper.org/). Then I joined them by street name 

`join data keys=name,name`

and exported the resulting file as a geojson. So now I got a geojson with all the original information about the streets, as well as my new categories. This would now be the basis for my map.

## 6. Visualise the map in Mapbox and adjust colours and labelling

[Mapbox](https://studio.mapbox.com/) was a bit tricky to get into at first, I didn't find it super intuitive. But here is what I did:

Tileset > New tileset → upload the joined map from Mapshaper. 

Then go to Styles > New style > choose template > customize template.

Go to your area of interest (i.e. Gothenburg). 

The easier way (I found) is to work in the tab "Components": Click + (add new component) > data visualization > source, and select your tileset of the joined map. Select data visualization type > select **data-driven lines**. 
Then adjust what you want to colour and how, e.g. choose "category" and then pick different colours for the different categories.

Unfortunately, you only have a limited number of choices when using Components, and I wanted to show more different categories. So I went with "Layers":

In the tab "Layers": click + (to add new layer) > Source > select the tileset of the joined map. Type > select Line (to visualise the streets themselves). 
Filter > create filter > filter by data field "Category".
Select the condition to filter for, e.g. "gems". 

![Mapbox categories](mapbox_categories.png)



Next to "select data" there's the "style" tab, where you can choose colours and other appearances of how the streets of the selected category should be visualised.

For labelling, I created another layer, but instead of line I selected Type > Symbol. Then I repeated the same steps as for the lines, selecting the right tileset and setting a condition for a certain category. Then under the "style" tab, the labelling can be adjusted, e.g. colours, font size, etc. 

So I did this for all the categories that I wanted to show and ended up with a pretty nice map:

[Interactive Gothenburg map with thematic street clusters](https://api.mapbox.com/styles/v1/silfaz/cl0crqri6000c14p3xtxnge8i.html?title=view&access_token=pk.eyJ1Ijoic2lsZmF6IiwiYSI6ImNrenRuY2NrdTEydzEybnBraGszaWpuOHUifQ.f8NFzJ-yyaUfTpP6Vn3maA&zoomwheel=true&fresh=true#15.35/57.709156/11.889073)

![Final map](image-og.png)


## 7. Scrollytelling

Finally, the scrollytelling. 
Mapbox has a tutorial that is pretty simple to follow: [Mapbox interactive storytelling](https://github.com/mapbox/storytelling)

You download the package that includes the necessary .html and .js files. Then you open the config.js file in a text editor and start adding things to it by copying the example chapter, and then adjust the location, the layout of the text box and the text itself.

First, I sketched out the storyline with the different chapters, or anchor points in the map. You need to figure out the coordinates of the different chapters, so you can "fly" to them when you scroll through your story. That was a bit tedious. I ended up with a long list of my chapters, and corresponding coordinates, zoom, pitch and bearing values. For example:

`location: {
center: [11.80677, 57.93543],
zoom: 2.85,
pitch: 45.00,
bearing: 0.00
}`
            
You can follow along the building of your story by opening the index.html file in the same folder. This was super useful to not only check if the locations of the chapters was correct, but also if the text box should be left- or right-aligned, etc. 


 
 
 

## 8. Fine-tune the story

I wrote the text for the chapters of my story in a simple text editor, and then when I was finished copied it over into the config.js file. This was much easier than to write everything in the file directly. One thing that tripped me up in the beginning was that I had to change the **'** quotation marks to **"**, otherwise apostrophes in the text would break the code. 

Inside the quotation marks, the text can be adjusted with html tags, e.g. `<b> </b>` for bold, etc. I used that in a second read-through to make the mentioned street names stand out more, and put a few emphasis here and there in italics. 

The resulting story looked good, but I wanted to adjust the font and font size. For this, I had to dig a bit into CSS.

When opening the index.html file, it's possible to change overall things, like font-size or font-family from sans-serif to serif. But I wanted a different font altogether. So I went to [Google Fonts](https://fonts.google.com/), selected a font I liked, clicked "select this style" and then on the right copied the html:

![Google Font selection](google_fonts.png)

This html code bit I then pasted into the index.html file in the `<head>` section, and the CSS rules into `<style>` section. 
    
Similarly, you can select certain fonts that is just applied to heading h1, for example. 
I did not change much else - although you could, e.g. opacity, width, margins, etc. - but I inserted some meta tags in the `<head>` section to make a nicer preview when posting the link to my map on social media:

`<meta property="og:title" content="The fantastical street names of Gothenburg" />`

`<meta property="og:description" content="When I moved to Gothenburg, Sweden, I noticed that many streets had really cute names, so I mapped them. " />`

`<meta property="og:url" content="https://silfaz.github.io/gbgclusters/" />`

`<meta property="og:image" content="https://silfaz.github.io/gbgclusters/image-og.png" />`

`<meta property="og:image:width" content="1200" />`

`<meta property="og:image:height" content="630" />`

`<meta property="og:type" content="text/html" />`


Another little trick: When you want to "preview" and adjustment, like a different font size, on your webpage or checking where in the code you would have to adjust something to change a certain webpage element, open the webpage in the browser, right click on the element and select "Inspect". Then in the html window that comes up, you can see the code part that is responsible for that formatting, and you can also change values to see what the direct effect on the layout is. These changes are not saved, but if you like them, you can then change the actual code in the index.html file. 

## 9. Publish on Github

I downloaded Github for Desktop and logged into my account.
I put all files associated with the data analysis and construction of the scrollytelling story into a folder, and then submitted this folder to my Github repository. 
To work on the files on my local desktop, I cloned the online repository to my computer. Now for adjustments, I can change things (e.g. the index.html file) in my text editor (Visual Studio Code), and then when I'm ready I push the updated file(s) to my online Github repo ("commit to main") to update them there. 

To publish the webpage with the scrollytelling story, I also used Github. I went to the repo (online) containing index.html, then to Settings > Pages and selected “Source” as branch: main. After a few minutes of waiting, the webpage was published. 

It can now be accessed under: (https://silfaz.github.io/gbgclusters/)[https://silfaz.github.io/gbgclusters/]