![Natural Earth logo](https://www.naturalearthdata.com/wp-content/themes/NEV/images/nev_logo.png "Natural Earth logo")
<div align="center">

## Scraping sea port data.
</div>

Document explaining how sea ports data is processed and saved in PostgreSQL database.

Link to data from Natural Earth: [counties data](https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_ports.zip)

<hr>

# 1. Download and load data
## Data is in shapefile format compressed into zip file. There is need to import geopandas to read spatial data and requests to get zip file from web.

In [None]:
import geopandas as gpd
import requests
import os

In [None]:
ports_link = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_ports.zip"

## Get zip file and check status code. 200 is OK.

In [None]:
r = requests.get(ports_link ,stream=True, headers={"User-Agent": "XY"})
r.status_code

## Save zip file inside temp folder(create if not exist).

In [None]:
if not os.path.exists('../temp/seaport/'):
    os.makedirs('../temp/seaport/')

In [None]:
with open('../temp/seaport/ne_10m_ports.zip', 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

In [None]:
zip_file = "zip://../temp/seaport/ne_10m_ports.zip!ne_10m_ports.shp"

## Load shapefile into geopandas dataframe.

In [None]:
port_shp = gpd.read_file(
    zip_file, layer='ne_10m_ports'
)

## Check number of row, columns and crs shapefile data.

In [None]:
port_shp.shape

In [None]:
port_shp.crs

## Modify dataframe to display only columns that are valuable for our dataset.

In [None]:
port_shp.columns.values.tolist()

In [None]:
filter_port = port_shp[["name", "website", "geometry"]]

In [None]:
filter_port.head()

# 2. Save in database

## Preview avaliable data from dataframe on map.

In [None]:
filter_port.explore(popup=True)

In [None]:
from django.contrib.gis.geos import GEOSGeometry, Point, MultiPoint
from apps.civic_structure.models import BoatTerminal

## Convert geometry field from geopandas dataframe to GIS Multipoint.

In [None]:
def convert_geometry(geometry):
    geometry = GEOSGeometry(str(geometry))
    if geometry.geom_type == 'Point':
        geometry = MultiPoint(geometry)
    return geometry

## Iterate through dataframe, convert data, get FK and update and create new entry in database.

In [None]:
for row in filter_port.itertuples(index=False, name="Pandas"):
    geometry = convert_geometry(row.geometry)

    updated_values = {
        "geometry": geometry,
        "url": row.website,
    }

    BoatTerminal.objects.update_or_create(
        name=row.name,
        defaults=updated_values,
    )