<a href="https://colab.research.google.com/github/CarlosMendez1997Col/GeoDatabases-And-Cloud-Computing-For-Water-Resources-Management/blob/main/1-Creation%20Geodatabase/Download_and_Geoprocessing_Databases_in_Google_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Download and Geoprocessing Databases in Google Colab

---

> Water Resources Management using PostgreSQL and PgAdmin4

> Area of Interest (South America)

> Developed by MSc Carlos Mendez

TABLES AND DATASETS USED IN GOOGLE COLAB:

1. South America Countries and Boundary [Url Data](https://international.ipums.org/international/gis.shtml)

2. First Level Administrative Units (FLAU) [Url Data](https://www.geoboundaries.org/globalDownloads.html)

3. Second Level Administrative Units (SLAU) [Url Data](https://www.geoboundaries.org/globalDownloads.html)

4. HydroSHEDS [Url Data](https://www.hydrosheds.org/products/hydrosheds)

5. HydroBASINS (Level 1 to 12) [Url Data](https://www.hydrosheds.org/products/hydrobasins)

6. HydroRIVERS [Url Data](https://www.hydrosheds.org/products/hydrorivers)

7. HydroLAKES [Url Data](https://www.hydrosheds.org/products/hydrolakes)


## Install and import ArcGIS API for Python

In [6]:
# If you need to install any library, please delete commit and then install it
#!pip install arcgis
#!pip install geopandas
!pip install rasterio
#!pip install shapely



## Import libraries and packages

In [2]:
import numpy as np
import pandas as pd
import geopandas as gpd
import rasterio
import xarray as xr
import matplotlib.pyplot as plt
import math
import zipfile
import os
import time
from datetime import datetime as dt
from osgeo import gdal, ogr, osr
from shapely.geometry import box
from google.colab import output
output.enable_custom_widget_manager()

## Import and extract Databases in your local computer



### Connect to Google Drive

In [3]:
import os
os.makedirs('/content', exist_ok=True)
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [34]:
# Set Directory or WorkSpace
%cd /content/drive/MyDrive/Geodatabase

/content/drive/MyDrive/Geodatabase


### 1. South America Countries and Boundary (SACB)

In [None]:
!wget https://international.ipums.org/international/resources/gis/IPUMSI_world_release2024.zip

--2025-09-11 14:29:02--  https://international.ipums.org/international/resources/gis/IPUMSI_world_release2024.zip
Resolving international.ipums.org (international.ipums.org)... 128.101.163.176
Connecting to international.ipums.org (international.ipums.org)|128.101.163.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61705330 (59M) [application/zip]
Saving to: ‘IPUMSI_world_release2024.zip’


2025-09-11 14:29:09 (11.1 MB/s) - ‘IPUMSI_world_release2024.zip’ saved [61705330/61705330]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.zip
 extracting: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.CPG  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.dbf  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.prj  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.sbn  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.sbx  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.shp  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.shp.xml  
  inflating: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.shx  


In [None]:
SACB = gpd.read_file('IPUMSI_world_release2024.shp')
#SACB.head()

SACB.drop(['OBJECTID','CNTRY_CODE','BPL_CODE'], axis=1, inplace=True)
SACB.rename(columns={'CNTRY_NAME': 'Country'}, inplace=True)
#SACB.info()

SA_countries =  ['Argentina', 'Bolivia', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Guyana', 'French Guiana', 'Paraguay', 'Peru', 'Suriname', 'Uruguay', 'Venezuela']
SACB_SA = SACB[SACB['Country'].isin(SA_countries)]

# If you want to verify the 14 countries of SA
#SACB_SA.head(14)

# If you want to display and visualize the 14 countries of SA
#SACB_SA.plot(column='Country', figsize=(16,8))

# Export data to Google Drive (.shp)
output_path_SACB = '/content/drive/MyDrive/Geodatabase/SA_Countries.shp'
SACB_SA.to_file(output_path_SACB)

## Delete original files (zip and .shp) to reduce space and volume
!rm '/content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.zip'

shapefile_prefix = '/content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024'

# List of common shapefile extensions
extensions = ['.CPG', '.dbf', '.prj', '.sbn', '.sbx', '.shp', '.shp.xml', '.shx']

for ext in extensions:
    file_path = shapefile_prefix + ext
    if os.path.exists(file_path):
            os.remove(file_path)
            print(f"Deleted: {file_path}")
    else:
            print(f"File not found: {file_path}")


Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.CPG
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.dbf
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.prj
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.sbn
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.sbx
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.shp
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.shp.xml
Deleted: /content/drive/MyDrive/Geodatabase/IPUMSI_world_release2024.shx


### 2. First Level Administrative Units (FLAU)

In [None]:
!wget https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM1.zip

--2025-09-11 14:30:26--  https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM1.zip
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM1.zip [following]
--2025-09-11 14:30:26--  https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM1.zip
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 103470246 (99M) [application/zip]
Saving to: ‘geoBoundariesCGAZ_ADM1.zip’


2025-09-11 14:30:32 (17.3 MB/s) - ‘geoBoundariesCGAZ_ADM1.zip’ saved 

In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.zip
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.shp  
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.shx  
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.dbf  
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.prj  


In [None]:
FLAU = gpd.read_file('geoBoundariesCGAZ_ADM1.shp')
#FLAU.head()

FLAU.drop(['shapeID','shapeGroup', 'shapeType'], axis=1, inplace=True)
FLAU.rename(columns={'shapeName': 'Department'}, inplace=True)
#FLAU.info()

# Check CRSs
#print(FLAU.crs)
#print(SACB_SA.crs)

In [None]:
FLAU_Intersect = FLAU.overlay(SACB_SA, how='intersection')



In [None]:
# If you want to check the Departments or States of SA countries
#FLAU_Intersect.head(30)

# Export data to Google Drive (.shp)
output_path_FLAU = '/content/drive/MyDrive/Geodatabase/SA_FLAU.shp'
FLAU_Intersect.to_file(output_path_FLAU)

# If you want to display and visualize the data
#FLAU_Intersect.plot(column='Department', figsize=(16,8))

## Delete Zip to reduce space and volume

!rm '/content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.zip'

shapefile_prefix = '/content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1'

# List of common shapefile extensions
extensions = ['.dbf', '.prj', '.shp', '.shx']

for ext in extensions:
    file_path = shapefile_prefix + ext
    if os.path.exists(file_path):
            os.remove(file_path)
            print(f"Deleted: {file_path}")
    else:
            print(f"File not found: {file_path}")

Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.dbf
Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.prj
Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.shp
Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM1.shx


### 3. Second Level Administrative Units (SLAU)

In [None]:
!wget https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM2.zip

--2025-09-11 14:33:01--  https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM2.zip
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM2.zip [following]
--2025-09-11 14:33:02--  https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM2.zip
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 155911064 (149M) [application/zip]
Saving to: ‘geoBoundariesCGAZ_ADM2.zip’


2025-09-11 14:33:11 (17.2 MB/s) - ‘geoBoundariesCGAZ_ADM2.zip’ saved

In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.zip
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.shp  
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.shx  
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.dbf  
  inflating: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.prj  


In [None]:
SLAU = gpd.read_file('geoBoundariesCGAZ_ADM2.shp')
#SLAU.head()

SLAU.drop(['shapeID','shapeGroup', 'shapeType'], axis=1, inplace=True)
SLAU.rename(columns={'shapeName': 'Municipality'}, inplace=True)
#SLAU.info()

In [None]:
SLAU_Intersect = SLAU.overlay(SACB_SA, how='intersection')



In [None]:
# If you want to check the Departments or States of SA countries
#SLAU_Intersect.head(30)

# If you want to display and visualize the data
#SLAU_Intersect.plot(column='Municipality', figsize=(16,8))

# Export data to Google Drive (.shp)
output_path_SLAU = '/content/drive/MyDrive/Geodatabase/SA_SLAU.shp'
SLAU_Intersect.to_file(output_path_SLAU)

## Delete Zip to reduce space and volume

!rm '/content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.zip'

shapefile_prefix = '/content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2'

# List of common shapefile extensions
extensions = ['.dbf', '.prj', '.shp', '.shx']

for ext in extensions:
    file_path = shapefile_prefix + ext
    if os.path.exists(file_path):
            os.remove(file_path)
            print(f"Deleted: {file_path}")
    else:
            print(f"File not found: {file_path}")

  SLAU_Intersect.to_file(output_path_SLAU)
  ogr_write(


Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.dbf
Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.prj
Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.shp
Deleted: /content/drive/MyDrive/Geodatabase/geoBoundariesCGAZ_ADM2.shx


### 4. HydroSHEDS

Due to the capacity and volume of geoprocessing raster files to vector (points, polylines, and polygons), the conversion of these files is performed in the ArcGIS API for Python.      


#### Void Filled DEM

In [None]:
!wget https://data.hydrosheds.org/file/hydrosheds-v1-dem/hyd_sa_dem_30s.zip

--2025-09-11 14:44:35--  https://data.hydrosheds.org/file/hydrosheds-v1-dem/hyd_sa_dem_30s.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 104.21.14.61, 172.67.158.28, 2606:4700:3036::ac43:9e1c, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|104.21.14.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31834815 (30M) [application/zip]
Saving to: ‘hyd_sa_dem_30s.zip’


2025-09-11 14:44:36 (44.8 MB/s) - ‘hyd_sa_dem_30s.zip’ saved [31834815/31834815]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/hyd_sa_dem_30s.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/hyd_sa_dem_30s.zip
 extracting: /content/drive/MyDrive/Geodatabase/hyd_sa_dem_30s.tif  
 extracting: /content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf  


In [None]:
!rm '/content/drive/MyDrive/Geodatabase/hyd_sa_dem_30s.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf'

path_hydroSHEDS_DEM = '/content/drive/MyDrive/Geodatabase/hyd_sa_dem_30s.tif'
with rasterio.open(path_hydroSHEDS_DEM) as src:
  hydroSHEDS_DEM_SA = src.read(1)

In [None]:
## If you want to visualize data.
#prof_hydroSHEDS_DEM = src.profile
#print("Void Filled DEM Profile:", prof_hydroSHEDS_DEM)

#### Flow Direction

In [None]:
!wget https://data.hydrosheds.org/file/hydrosheds-v1-dir/hyd_sa_dir_30s.zip

--2025-09-11 14:45:54--  https://data.hydrosheds.org/file/hydrosheds-v1-dir/hyd_sa_dir_30s.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 172.67.158.28, 104.21.14.61, 2606:4700:3036::6815:e3d, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|172.67.158.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9349529 (8.9M) [application/zip]
Saving to: ‘hyd_sa_dir_30s.zip’


2025-09-11 14:45:57 (5.90 MB/s) - ‘hyd_sa_dir_30s.zip’ saved [9349529/9349529]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/hyd_sa_dir_30s.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/hyd_sa_dir_30s.zip
 extracting: /content/drive/MyDrive/Geodatabase/hyd_sa_dir_30s.tif  
 extracting: /content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf  


In [None]:
!rm '/content/drive/MyDrive/Geodatabase/hyd_sa_dir_30s.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf'

path_hydroSHEDS_Dir = '/content/drive/MyDrive/Geodatabase/hyd_sa_dir_30s.tif'
with rasterio.open(path_hydroSHEDS_Dir) as src:
  hydroSHEDS_Dir_SA = src.read(1)

In [None]:
## If you want to visualize data.
#prof_hydroSHEDS_Dir = src.profile
#print("Flow Direction Profile}:", prof_hydroSHEDS_Dir)

#### Flow Accumulation

In [None]:
!wget https://data.hydrosheds.org/file/hydrosheds-v1-acc/hyd_sa_acc_30s.zip

--2025-09-11 14:47:41--  https://data.hydrosheds.org/file/hydrosheds-v1-acc/hyd_sa_acc_30s.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 172.67.158.28, 104.21.14.61, 2606:4700:3036::6815:e3d, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|172.67.158.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19947429 (19M) [application/zip]
Saving to: ‘hyd_sa_acc_30s.zip’


2025-09-11 14:47:44 (10.5 MB/s) - ‘hyd_sa_acc_30s.zip’ saved [19947429/19947429]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/hyd_sa_acc_30s.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/hyd_sa_acc_30s.zip
 extracting: /content/drive/MyDrive/Geodatabase/hyd_sa_acc_30s.tif  
 extracting: /content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf  


In [None]:
!rm '/content/drive/MyDrive/Geodatabase/hyd_sa_acc_30s.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf'

path_hydroSHEDS_Flow = '/content/drive/MyDrive/Geodatabase/hyd_sa_dir_30s.tif'
with rasterio.open(path_hydroSHEDS_Flow) as src:
  hydroSHEDS_Flow_SA = src.read(1)

In [None]:
## If you want to visualize data.
#prof_hydroSHEDS_Flow = src.profile
#print("Flow Accumulation Profile}:", prof_hydroSHEDS_Flow)

#### Flow Length Upstream

In [None]:
!wget https://data.hydrosheds.org/file/hydrosheds-v1-lup/hyd_sa_lup_15s.zip

--2025-09-11 14:48:56--  https://data.hydrosheds.org/file/hydrosheds-v1-lup/hyd_sa_lup_15s.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 104.21.14.61, 172.67.158.28, 2606:4700:3036::6815:e3d, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|104.21.14.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100988898 (96M) [application/zip]
Saving to: ‘hyd_sa_lup_15s.zip’


2025-09-11 14:49:03 (15.4 MB/s) - ‘hyd_sa_lup_15s.zip’ saved [100988898/100988898]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/hyd_sa_lup_15s.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/hyd_sa_lup_15s.zip
 extracting: /content/drive/MyDrive/Geodatabase/hyd_sa_lup_15s.tif  
 extracting: /content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf  


In [None]:
!rm '/content/drive/MyDrive/Geodatabase/hyd_sa_lup_15s.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf'

path_hydroSHEDS_Lup = '/content/drive/MyDrive/Geodatabase/hyd_sa_lup_15s.tif'
with rasterio.open(path_hydroSHEDS_Lup) as src:
  hydroSHEDS_Lup_SA = src.read(1)

In [None]:
## If you want to visualize data.
#prof_hydroSHEDS_Lup = src.profile
#print("Flow Length Upstream Profile}:", prof_hydroSHEDS_Lup)

#### Flow Length Downstream

In [None]:
!wget https://data.hydrosheds.org/file/hydrosheds-v1-ldn/hyd_sa_ldn_15s.zip

--2025-09-11 14:49:57--  https://data.hydrosheds.org/file/hydrosheds-v1-ldn/hyd_sa_ldn_15s.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 172.67.158.28, 104.21.14.61, 2606:4700:3036::6815:e3d, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|172.67.158.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 241146941 (230M) [application/zip]
Saving to: ‘hyd_sa_ldn_15s.zip’


2025-09-11 14:50:12 (17.5 MB/s) - ‘hyd_sa_ldn_15s.zip’ saved [241146941/241146941]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/hyd_sa_ldn_15s.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/hyd_sa_ldn_15s.zip
 extracting: /content/drive/MyDrive/Geodatabase/hyd_sa_ldn_15s.tif  
 extracting: /content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf  


In [None]:
!rm '/content/drive/MyDrive/Geodatabase/hyd_sa_ldn_15s.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroSHEDS_TechDoc_v1_4.pdf'

path_hydroSHEDS_Ldn = '/content/drive/MyDrive/Geodatabase/hyd_sa_ldn_15s.tif'
with rasterio.open(path_hydroSHEDS_Ldn) as src:
  hydroSHEDS_Ldn_SA = src.read(1)

In [None]:
## If you want to visualize data.
#prof_hydroSHEDS_Ldn = src.profile
#print("Flow Length Upstream Profile}:", prof_hydroSHEDS_Ldn)

### 5. HydroBASINS (Level 1 to 12)

In [None]:
# Create a new folder where hydroBASINS will be zip and unzip

basins_download = '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download'
os.makedirs(basins_download, exist_ok=True)
print(f"Folder '{basins_download}' created successfully.")

basins_stored = '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_stored'
os.makedirs(basins_stored, exist_ok=True)
print(f"Folder '{basins_stored}' created successfully.")

# Set new Directory or WorkSpace to download and geoprocessing basins
%cd /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download

Folder '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download' created successfully.
Folder '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_stored' created successfully.
/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download


In [None]:
!wget https://data.hydrosheds.org/file/hydrobasins/standard/hybas_sa_lev01-12_v1c.zip

--2025-09-11 18:40:41--  https://data.hydrosheds.org/file/hydrobasins/standard/hybas_sa_lev01-12_v1c.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 172.67.158.28, 104.21.14.61, 2606:4700:3036::6815:e3d, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|172.67.158.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 334160720 (319M) [application/zip]
Saving to: ‘hybas_sa_lev01-12_v1c.zip’


2025-09-11 18:40:49 (41.2 MB/s) - ‘hybas_sa_lev01-12_v1c.zip’ saved [334160720/334160720]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01-12_v1c.zip" -d "/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download"

Archive:  /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01-12_v1c.zip
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.dbf  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.prj  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.sbn  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.sbx  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.shp  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.shp.xml  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01_v1c.shx  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev02_v1c.dbf  
  inflating: /content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_do

In [None]:
!rm '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/hybas_sa_lev01-12_v1c.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/HydroBASINS_TechDoc_v1c.pdf'

#### Basins Level 1 to 12

In [None]:
input_basins_sa = '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_download/'
output_basins_sa = '/content/drive/MyDrive/Geodatabase/HydroBASINS_Lv_01_12_stored/'

# Create output folder if it doesn't exist
os.makedirs(output_basins_sa, exist_ok=True)

drop_basins_columns = ['NEXT_DOWN','NEXT_SINK','MAIN_BAS','DIST_SINK','DIST_MAIN','SUB_AREA','UP_AREA','PFAF_ID','ENDO','COAST','ORDER','SORT']

# Create a for loop through all shapefiles (.shp)
for filename in os.listdir(input_basins_sa):
    if filename.endswith('.shp'):
        filepath = os.path.join(input_basins_sa, filename)
        gdf = gpd.read_file(filepath)

        # Drop unwanted columns (ignore if not present)
        gdf = gdf.drop(columns=[col for col in drop_basins_columns if col in gdf.columns])

        # Export cleaned shapefile
        output_path = os.path.join(output_basins_sa, filename)
        gdf.to_file(output_path)
        print(f"Processed and saved: {filename}")

Processed and saved: hybas_sa_lev01_v1c.shp
Processed and saved: hybas_sa_lev02_v1c.shp
Processed and saved: hybas_sa_lev03_v1c.shp
Processed and saved: hybas_sa_lev04_v1c.shp
Processed and saved: hybas_sa_lev05_v1c.shp
Processed and saved: hybas_sa_lev06_v1c.shp
Processed and saved: hybas_sa_lev07_v1c.shp
Processed and saved: hybas_sa_lev08_v1c.shp
Processed and saved: hybas_sa_lev09_v1c.shp
Processed and saved: hybas_sa_lev10_v1c.shp
Processed and saved: hybas_sa_lev11_v1c.shp
Processed and saved: hybas_sa_lev12_v1c.shp


In [None]:
!rm -rf "/content/drive/MyDrive/Your_Folder_Name"

### 6. HydroRIVERS

In [16]:
# Set Directory or WorkSpace
%cd /content/drive/MyDrive/Geodatabase

/content/drive/MyDrive/Geodatabase


In [None]:
!wget https://data.hydrosheds.org/file/HydroRIVERS/HydroRIVERS_v10_sa_shp.zip

--2025-09-12 02:04:36--  https://data.hydrosheds.org/file/HydroRIVERS/HydroRIVERS_v10_sa_shp.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 172.67.158.28, 104.21.14.61, 2606:4700:3036::ac43:9e1c, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|172.67.158.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 95257204 (91M) [application/zip]
Saving to: ‘HydroRIVERS_v10_sa_shp.zip’


2025-09-12 02:04:38 (37.4 MB/s) - ‘HydroRIVERS_v10_sa_shp.zip’ saved [95257204/95257204]



In [None]:
!unzip "/content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp.zip" -d "/content/drive/MyDrive/Geodatabase"

Archive:  /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp.zip
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_TechDoc_v10.pdf  
   creating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.dbf  
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.prj  
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.sbn  
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.sbx  
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.shp  
  inflating: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.shx  


In [None]:
hydroRIVERS = gpd.read_file('/content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.shp')
#hydroRIVERS.head()

hydroRIVERS.drop(['NEXT_DOWN','MAIN_RIV','DIST_DN_KM','ENDORHEIC'], axis=1, inplace=True)
#hydroRIVERS.info()

# If you want to display and visualize data
#hydroRIVERS.plot(column='LENGTH_KM', figsize=(16,8))

#!rm '/content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp.zip'
#!rm '/content/drive/MyDrive/Geodatabase/HydroRIVERS_TechDoc_v10.pdf'

# Export data to Google Drive (.shp)
hydroRIVERS_path = '/content/drive/MyDrive/Geodatabase/SA_hydroRIVERS.shp'
hydroRIVERS.to_file(hydroRIVERS_path)

hydroRIVERS_shp = '/content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa'

# List of common shapefile extensions
extensions = ['.sbx','.sbn','.dbf','.shp','.shx','.prj']

for ext in extensions:
    file_path = hydroRIVERS_shp + ext
    if os.path.exists(file_path):
            os.remove(file_path)
            print(f"Deleted: {file_path}")
    else:
            print(f"File not found: {file_path}")

Deleted: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.sbx
Deleted: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.sbn
Deleted: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.dbf
Deleted: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.shp
Deleted: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.shx
Deleted: /content/drive/MyDrive/Geodatabase/HydroRIVERS_v10_sa_shp/HydroRIVERS_v10_sa.prj


### 7. HydroLAKES

In [18]:
!wget https://data.hydrosheds.org/file/hydrolakes/HydroLAKES_polys_v10_shp.zip

--2025-09-12 16:19:03--  https://data.hydrosheds.org/file/hydrolakes/HydroLAKES_polys_v10_shp.zip
Resolving data.hydrosheds.org (data.hydrosheds.org)... 104.21.14.61, 172.67.158.28, 2606:4700:3036::ac43:9e1c, ...
Connecting to data.hydrosheds.org (data.hydrosheds.org)|104.21.14.61|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 820295132 (782M) [application/zip]
Saving to: ‘HydroLAKES_polys_v10_shp.zip’


2025-09-12 16:19:30 (30.4 MB/s) - ‘HydroLAKES_polys_v10_shp.zip’ saved [820295132/820295132]



In [17]:
!unzip "/content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp.zip" -d "/content/drive/MyDrive/Geodatabase"

unzip:  cannot find or open /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp.zip, /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp.zip.zip or /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp.zip.ZIP.


In [None]:
hydroLAKES = gpd.read_file('/content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.shp')
#hydroLAKES.head()
hydroLAKES.drop(['Hylak_id','Lake_name','Continent','Poly_src','Grand_id','Shore_len','Shore_dev','Vol_src','Slope_100','Wshd_area','Pour_long','Pour_lat'], axis=1, inplace=True)
#hydroLAKES.info()

SACB_SA = gpd.read_file('/content/drive/MyDrive/Geodatabase/SA_Countries.shp')
HydroLAKES_Intersect = hydroLAKES.overlay(SACB_SA, how='intersection')
HydroLAKES_Intersect.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 53714 entries, 0 to 53713
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Country_1  53714 non-null  object  
 1   Lake_type  53714 non-null  int32   
 2   Lake_area  53714 non-null  float64 
 3   Vol_total  53714 non-null  float64 
 4   Vol_res    53714 non-null  float64 
 5   Depth_avg  53714 non-null  float64 
 6   Dis_avg    53714 non-null  float64 
 7   Res_time   53714 non-null  float64 
 8   Elevation  53714 non-null  int32   
 9   Country_2  53714 non-null  object  
 10  geometry   53714 non-null  geometry
dtypes: float64(6), geometry(1), int32(2), object(2)
memory usage: 4.1+ MB


In [None]:
# If you want to display and visualize data
#HydroLAKES_Intersect.plot(column='Lake_area', figsize=(16,8))

!rm '/content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp.zip'
!rm '/content/drive/MyDrive/Geodatabase/HydroLAKES_TechDoc_v10.pdf'

# Export data to Google Drive (.shp)
hydroLAKES_path = '/content/drive/MyDrive/Geodatabase/SA_hydroLAKES.shp'
HydroLAKES_Intersect.to_file(hydroLAKES_path)

hydroLAKES_shp = '/content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10'

# List of common shapefile extensions
extensions = ['.sbx','.sbn','.dbf','.shp','.shx','.prj']

for ext in extensions:
    file_path = hydroLAKES_shp + ext
    if os.path.exists(file_path):
            os.remove(file_path)
            print(f"Deleted: {file_path}")
    else:
            print(f"File not found: {file_path}")

Deleted: /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.sbx
Deleted: /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.sbn
Deleted: /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.dbf
Deleted: /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.shp
Deleted: /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.shx
Deleted: /content/drive/MyDrive/Geodatabase/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.prj
