<a href="https://www.kaggle.com/code/ifeanyichukwunwobodo/geospatial-analysis-using-folium-and-plotly?scriptVersionId=130750369" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

Geospatial data science is becoming a popular field in data science as corporations are starting to learn abbout it's usefulness. This is because many stakeholders are now paying attention to the importance od making decisions using spatial data.

According to the [IBM](https://www.ibm.com/topics/geospatial-data), geospatial data is information that describes objects, events or other features with a location on or near the surface of the earth.  [Wikipedia](https://en.wikipedia.org/wiki/Geographic_data_and_information) defines it as data and information having an implicit or explicit association with a location relative to Earth.

This notebook explores how to create and find patterns in map using the two very popular data visualization libraries, Folium and Plotly Express. Folium is a specialized library  for visuliazing maps while plotly is a library for data visualization as a whole. That means, if you already know how to visualize in plotly you can easily integrate geospatial data visualization 

Enough with the talk. Let's get our hands dirty.

In [1]:
import math #for basic maths calculation 
import numpy #for linear algebra
import pandas as pd # for data manipulation and basic data visualization

# For Data Visualization
import plotly.express as px

# for geographical analysis we use
import geopandas as gpd #To read our data 
import folium #for interactive maps
from folium import Circle, Marker #to select the maptype we want to use
from folium.plugins import HeatMap, MarkerCluster #for plugins

  shapely_geos_version, geos_capi_version_string


For this we are going to use a geojson file obtained from GRID which contains geeogarphical data on the various factories and industrial sites (for the rest of this notebook, this will be reffered to simply as "Factories" in Lagos. 

In [2]:
factories = gpd.read_file('/kaggle/input/lagos-georeferenced-dataset/factoriesindustrial-sites.geojson')
#Let's take a look at the first few rows
factories.head()

Unnamed: 0,id,state_code,source,name,status,ward_code,global_id,geometry
0,u_fc_poi_factory_industry_site.1,LA,GRID,JJ International Block Industry,In Use,LASIUI05,000b39d0-ba96-4d62-91f3-52bef3d64890,POINT (3.78227 6.47244)
1,u_fc_poi_factory_industry_site.2,LA,GRID,Kadiri Block Industry,In Use,LASIUI04,001a5124-6f66-457f-ac1b-c4f1fde6a61a,POINT (3.68407 6.46090)
2,u_fc_poi_factory_industry_site.4,LA,GRID,Teju Blocks,In Use,LASIKU29,004d7b39-d3b8-4074-b270-d0fc09a20510,POINT (3.54362 6.59287)
3,u_fc_poi_factory_industry_site.11,LA,GRID,Adurumigba Block Industries,In Use,LASBDY08,006bb17c-b0fc-4b09-acc4-59250acdcfce,POINT (3.03417 6.46336)
4,u_fc_poi_factory_industry_site.18,LA,GRID,Abllat Nig Co Ltd,In Use,LASAMO24,0097ec77-7a51-44a4-a83d-c15f67e03fad,POINT (3.25955 6.52691)


In [3]:
print( "The factories dataset contains", factories.shape[0], "rows and", factories.shape[1], "columns")

The factories dataset contains 732 rows and 8 columns


In [4]:
factories.columns.to_list()

['id',
 'state_code',
 'source',
 'name',
 'status',
 'ward_code',
 'global_id',
 'geometry']

In [5]:
#Let's keep the
factories = factories[['name', 'status', 'geometry']]



* 'name': The name of the factory. This will be used to identify the factory.

* 'status': refers to the state of the factory.

* 'geometry': geometry refers to the location of the factories.




In [6]:
#Let's check for missing values
factories.isna().sum()

name        0
status      0
geometry    0
dtype: int64

There are no missing values in our dataset.

## Brief Introduction to Geographical Analysis
Let's create Longitude ('lon') and Latitude ('lat') columns from the 'geometry' column. 
Longitude is a geographic coordinate that specifies the east–west position of a point on the surface of the Earth, or another celestial body. We are going to focus on earth in this notebook foe obvious reasons.
If longitude is the east-west position of a point on earth's surface, latitude is the north-south position of a point on earth's surface.

In [7]:
factories['lon'] = factories.geometry.apply(lambda p: p.x)
factories['lat'] = factories.geometry.apply(lambda p: p.y)

### Coordinate Reference System

Another key concept we are going to look at is the the Cordinate Reference Systems (or CRS for short). The erath is a globe so representing it on a flat surface requires us to make some assumptions. This is where CRS comes in. These are different rmaps representation that make different assumptions about the projection of earth on a flat surface.

In [8]:
print(factories.crs)


epsg:4326


For example, the CRS of our dataset according to  European Petroleum Survey Group (EPSG) codes is 4326 which corresponds to longitude and latitudes. 

# Visualizing Geospatial Data 

### How Does Mapping in Python Work?

Mapping in python is just like any other visualization except that the background image is a map. For example, a circle map (which we will talk about later) is the equivalent of a scatter plot of the longitude and latitude of the factories. 

In [9]:
px.scatter(factories, 'lon', 'lat', 
           title='A Scatter Plot of the Latitude and  Longitude of Factories in Lagos'
          )

## Visualizing Geographical Data Using Folium

Folium is a one of the most important python library for geospatial data visualization that helps you create several types of Leaflet maps.Folium creates a map in a separate HTML file. The results of  a folium map are very interactive. This is the library adopted for the Kaggle Learn course on geospatial analysis.

### How to Create a Map in Folium

*Step 1: Create a basemap. The basemap include the location (longitude and latitude of area of interest), tiles (which can be thought about as the theme the map displays) and zoom_start (this shows how 'focused you want your initial map to be). Since the graph is interactive, the audience can change the size/focus of the image manually to get greater details of the map. That is, it can be zoomed down to country level, state level, street level and so on.

*Step 2: Add the plugins. The plugins can be a marker, circle, heatmap, markercluster (for bubble charts) and so on which you add to your basemap using the 'add_to' function. The choice of plugin depends on the goal of the visualization.

*Step 3: Checkout the amazing map you just created.

In [10]:
# Unhash this comment to learn more about maps in Folium and run cell
# ?? folium.Map

### Markers on Folium (Mark Your Territory)

Markers show different points in a geospatial data. The map shows a pin on different location of interest (factories) in the map.

In [11]:
# Create a base map
mark = folium.Map(location=[6.5244, 3.3792],
                  tiles='cartodbpositron', zoom_start=12)

# Add plugin, in this case, markers

for idx, row in factories.iterrows():
    Marker([row['lat'], row['lon']], 
           popup='name' #popup shows what you want displayed when you over the map
          ).add_to(mark)


# Display the map
mark

Does the distribution of the markers look familiar? It should. That's because they are the same as the distribution on the scatterplot.

In [12]:
# Unhash this comment to learn more about markers
# ?? Marker

If you zoom out you would notice that the number of factories reduces as you come closer to the western borders (towards Cotonou). 


You can play with the map by zooming in and out on any part of the map you are interested in. If you have ever been to Lagos you can check out if you recognize any street or factory on the map.

**Question 1: What did happens to the number of factories as you head towrds the Eastern border?**

### Bubble Charts on Folium (Burst the Bubble)

The bubble map shows a bubble to indicate a location at different points in the map. One cool thing about the bubble map is that it also shows the amount of 'factories' at the different locations of interest. It gets more granular as you zoom in. 



In [13]:
#create a basemap
bubble = folium.Map(location =[6.5244, 3.3792], 
                       zoom_start =12,tiles ='cartodbpositron')

#Add Plugins
mc = MarkerCluster()
for idx, row in factories.iterrows():
    mc.add_child(Marker([row['lat'], row['lon']]))
bubble.add_child(mc)

#Display Bubble Chart
bubble

In the map above, the points on the map with a only one factory are represented by markers. The green colored bubble shows when data point is less than 10. The yellow shows the factories are greater than 10. This is not the color scheme in all situations. 

**Question 2: What is the total number of factories in Lagos?**

Hint: The more you zoom out the less granular the map becomes.


### HeatMap on Folium (Bring In the Heat)

Heatmap shows the concentration of data points on a map. The heatmap 

Note: It is easier to answer Question 1 with the heatmap.

In [14]:
#Learn more
#??HeatMap

In [15]:
heat = folium.Map(location=[6.5244, 3.3792], tiles='cartodbpositron', zoom_start=10)

# Add a heatmap to the base map
HeatMap(data=factories[['lat', 'lon']], radius=10).add_to(heat)

# Display the map
heat

### Circle Map in Folium (Sraight to the Point)

In [16]:
status = factories.groupby('status').count().reset_index() #To get the count of the different categories


px.bar(status, #The data we use for our visualization
       color= 'status', #the variable you want to use to color the dataset
       color_discrete_map={'In Use': 'blue', 'Unused Good':'yellow', 'Unused Poor': 'purple'}, #the color for each unique values of the variable 
       title='Status of Factories and Indistrial Sites in Lagos') #Title of your visualization

You might want to add layers to your map by showing the different categories of the points in your map. That is, by showing a bivariate relationship you can show more detail in a map. A circle map can help by showing the spatial distribution of status of the factories in Lagos shown in the bar chart above. 

In [17]:
#add basemap
circle = folium.Map(location=[6.5244, 3.3792], tiles='cartodbpositron', zoom_start=10)

# create conditions to colour the datapoints on  the map
def color_producer(val):
    if val == 'In Use':
        return 'blue'
    elif val == 'Unused Good':
        return 'yellow'
    else:
        return 'purple'

# Add the circle map to the base map
for i in range(0,len(factories)):
    Circle( #add the plugins
        location=[factories.iloc[i]['lat'], factories.iloc[i]['lon']], #for the different latitude and longitude on the map
        radius=20,
        color=color_producer(factories.iloc[i]['status'])).add_to(circle)

# Display the map
circle

Just as seen in the bar chart, the 'In use' factories are more densely distributed than other factories.
As seen above, the circle map shows your map in form of a scatter plot. And you can add more layers like increasing the radius of the circle map by size and shapes. This is easier to do with plotly express. That brings us to our next tool.

**Question 3: Is there a pattern in the location of the various factories based on status?**

## Visualising Maps in Plotly

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. Plotly Express provides functions to visualize a variety of types of data. Geospatial analytics is used to add timing and location to traditional types of data and to build data visualizations.

Geospatial data is most useful when it can be discovered, shared, analyzed and used in combination with traditional business data. Plotly makes this easier as you can use combine geospatial data without jumping between different libraries.

Plotly graphs have more interactive options than Folium. You can save image,pan,box select, lasso select, zoom in and out and reset the graph using the icons in the top right corner of the visualization.

In [18]:
# scatter_mapbox creates a circle map. The remaining in the first line is exactly what they imply                
fig = px.scatter_mapbox(factories,lat=factories.lat, lon=factories.lon, zoom=10, color='status',
                        #This defines the color for each status
                        color_discrete_map={'In Use': 'blue', 'Unused Good':'yellow', 'Unused Poor': 'purple'},
                        #The equivalent of tiles in Folium
                        mapbox_style="carto-positron", 
                        #for those familiar with tableau this is the same as tooltips. It shows information on the various datapoints
                        hover_name='name')
#update layout in plotly defines the aesthetic of various visuals.
fig.update_layout(margin=dict(l=0, r=0, t=30, b=10)) 
fig.show()


In this case, l, r, t and b stands for left, right, top and bottom margins. You can play with the values and find out what it does.
 The various mapbox_style in available in plotly are 'white-bg','carto-positron','carto-darkmatter', 'stamen-terrain','stamen-toner','stamen-watercolor'. 

**Question 4: What are the names of the 'Unused Poor Factories?**

In [19]:
#?? px.scatter_mapbox

### Heatmap in Plotly

The density_mapbox is used to create a heatmap in plotly.

In [20]:
fig = px.density_mapbox(factories, lat='lat', lon='lon', radius=15, #radius determines the size of each point. 
                        #The same as the location in the basemap of Folium
                        center=dict(lat=6.5244, lon= 3.3792), zoom=9,
                        #The color scale you want to use in your map
                        color_continuous_scale='viridis',
                        mapbox_style="carto-positron", hover_name='name')
fig.update_layout(margin=dict(l=0, r=0, t=30, b=10))
fig.show()


In [21]:
#You can learn about different color scales by unhashing the following and running the cells 

#??px.colors.diverging
#??px.colors.sequential
#??px.colors.cyclical

In [22]:
#You can learn more about heatmap in plotly by unhashing the comment and running the cell

#??px.density_mapbox

A good way to learn different colour scale to use in your visualisation is to use a wrong input (instead of 'viridis' try 'vidiis' or something like that). An error message will appear showing different options available to you.

To the best of my knowlege there is no way to create a bubble map or map with markers in Plotly (as shown with folium. If there is anyway you know of you can drop a link or leave a comment in the comment section.

# What Next?

The best way to learn more is to read the [plotly map documentation](https://plotly.com/python/maps/) and [folium documentation](https://python-visualization.github.io/folium/).

You can cement your knowledge by trying out these datasets (you can try out the other dataand [geotagged flickr images dataset](https://www.kaggle.com/datasets/ifeanyichukwunwobodo/tokyo-geotagged-flickr-images)). You can also see how to apply your newly gained knowledge to a project using this [link](https://www.kaggle.com/code/ifeanyichukwunwobodo/exploratory-analysis-of-co2-emission-using-plotly. You can also learn how to create different maps from the base knowledge obtained from this notebook by exploring the various maps in used here to explore Co2 Emission around the world (co2 emission notebook). You can also improve your map visualization skill by adding animation and increasing size of different points in your circle map based on the continuous values used there.

# Conclusion

Geospatial analytics is used to add timing and location to traditional types of data and to build data visualizations. In this notebook we showcased various data on how to visualize spatial data.


Thank you for reading to the end. Leave an upvote if you found the notebook helpful. Drop a comment if you have any question or contribution. 