### This file describes the grid that we use for our API, which uses Planet imagery to make MOSAIK features. 

For simplicity of use, we're using an equal angle grid with a resolution of .01 degree.
Thus, the height of a grid cell is $\approx 0.01 * 111 = 1.11$km everywhere. The width of a grid cell changes with latitude, $\theta$ and is: $\approx 0.01*111*\cos(\theta)$km. Thus, all grid cells are squares in angle space and rectangles in meter space, with longer heights (in the N-S direction) than widths (in the E-W direction). 

The grid spans -60 to 74 degrees latitude to encompass nearly all global land surface area. The full grid is defined as a vector of longitudes and latitudes, which when crossed define the full grid. The values within the vectors are the grid cell centroids. The grid "origin" is the grid cell whose NW corner is at -180 longitude, 74 latitude. The grid cells are then indexed E to W and N to S.

In [1]:
%env OMP_NUM_THREADS=24

env: OMP_NUM_THREADS=24


In [2]:
import math as math
import numpy as np

import pandas as pd
import geopandas as gpd
from shapely.prepared import prep
from shapely.geometry import Point

In [3]:
def log_text(text, mode="a"): ## This will help me track the job status via text file
    """
    Function that writes text to a log file. Takes text and mode as inputs. Default mode is "a" for append. Can also accept "w" which will clear the log.
    """
    print(text)
    f = open("log.txt", mode)
    f.write(("\n" + text))
    f.close()
    
log_text("BEGIN LOG", "w")

BEGIN LOG


In [4]:
def make_grid(latmin, latmax, lonmin, lonmax, res = 0.01):
    latVals = []
    currentLat = latmax
    i = 0
    while(currentLat > latmin + res):
      latVals.append(currentLat)
      i = i + 1
      currentLat = currentLat - res
    latVals = np.array(latVals)
    
    lonVals = []
    currentLon = lonmin
    i = 1
    while(currentLon < (lonmax - res)):
      lonVals.append(currentLon)
      i = i + 1
      currentLon = currentLon + res
    lonVals = np.array(lonVals)
    
    #shift to make the values represent grid cell centers
    latVals = latVals - res/2.0
    lonVals = lonVals + res/2
    
    grid = {"lat" : latVals,
           "lon" : lonVals}
    
    #Return the grid in degrees: 
    return grid

In [5]:
grid = make_grid(-60, 74, -180, 180)

In [6]:
grid

{'lat': array([ 73.995,  73.985,  73.975, ..., -59.975, -59.985, -59.995]),
 'lon': array([-179.995, -179.985, -179.975, ...,  179.975,  179.985,  179.995])}

### Subgrid 1, sparse global 10

In [7]:
#This takes every 10th point. After subsetting to land should give closer to ~1 million points as desired. 

#Take every 10th point
subLats10 = grid['lat'][::10]
subLons10 = grid['lon'][::10]

log_text("Number of points in grid sampled by every 10: " + str( len(subLats10) * len(subLons10)))

Number of points in grid sampled by every 10: 4824000


### Subgrid 2, sparse global and limited to landmass

Keep only elements of this subgrid that are over land and toss out the rest. This is done by loading a shapefile of global coastlines and keeping only tiles whose centroid lands within the polygon. 

The function first creates a full grid by crossing the lat and lon lists. Then we intersect each polygon shape object with the full set of grid centroid points, keeping only the intersection points.

Finally, I save the subgrid as a csv.

In [8]:
#Import of continental land masses and ocean islands, except Antartica. 
# GSHHG database from https://www.soest.hawaii.edu/pwessel/gshhg/

landmassLoRes = gpd.read_file("/shares/maps100/data/raw/grid/landmass/GSHHS_shp/c/GSHHS_c_L1.shp") #This reads a "crude" resolution file. Takes about 2 hours to run.
landmassIntermediateRes = gpd.read_file("/shares/maps100/data/raw/grid/landmass/GSHHS_shp/i/GSHHS_i_L1.shp") #Read this file for "intermediate" resolution. Guessing this might take 36 hours to run

In [9]:
#landmassLoRes.plot() # You can use this to visualize earth landmasses if interested

In [10]:
def landmassOnlyGrid(lats, lons, gpdFile):
    """
    Takes a grid in the form of two arrays, lats and lons. First, transforms the arrays into a flat grid. 
    Then checks for point intersections in the attached geopandas file.
    
    Returns a flat grid dictionary object with only lats and lons that are included in the gpd geometry file.
    .
    Needs shapely, geopandas, and numpy.
    """
    gpdFile["preped"] = gpdFile["geometry"].apply(prep) # prepare the geometry to improve speed
    
    grid_lats, grid_lons = np.meshgrid(lats, lons) # Create grid from input arrays
    flat_lats = grid_lats.flatten() #Making two arrays that together correspond to all of the grid points
    flat_lons = grid_lons.flatten()
    
    points = [Point((flat_lons[i], flat_lats[i])) for i in range(len(flat_lats))] # turn each point into a Shapely object
    
    total = str(len(gpdFile))
    for i in range(len(gpdFile)):
        log_text("Loop status: " + str(i) + " out of " + total)
        prepared_polygon = gpdFile["preped"][i]

        intersect_points = list(filter(prepared_polygon.contains, points))

        if i == 0:
            hits = intersect_points
        else:
            hits = hits + intersect_points

    output_lons = []
    output_lats = []

    for i in range(len(hits)):
        output_lons.append(hits[i].x)
        output_lats.append(hits[i].y)

    landGridFlat = {    #Note that this output will be the full length 'flat' grid as json file. 
        "lat" : output_lats,
        "lon" : output_lons,
        }
    
    return pd.DataFrame(landGridFlat) #currently the output is not ordered. This improved runtime.


In [11]:
## Run for the grid sampled at one in every 10 points


landmassDF = landmassOnlyGrid(subLats10,subLons10, landmassLoRes) # Maybe 3 hours to run?
landmassDF= landmassDF.sort_values(["lat","lon"])
log_text("Number of points in LoRes landmass grid 10: " + str(len(landmassDF)))

outPath = "/shares/maps100/data/output/grid/LandmassLoResSparseGrid10.csv"

landmassDF.to_csv(outPath, float_format = "%.3f", index=False) # This outputs an exact decimal as opposed to float approximation used in pandas operations

log_text("LoRes grid10 complete!")

Loop status: 0 out of 742
Loop status: 1 out of 742
Loop status: 2 out of 742
Loop status: 3 out of 742
Loop status: 4 out of 742
Loop status: 5 out of 742
Loop status: 6 out of 742
Loop status: 7 out of 742
Loop status: 8 out of 742
Loop status: 9 out of 742
Loop status: 10 out of 742
Loop status: 11 out of 742
Loop status: 12 out of 742
Loop status: 13 out of 742
Loop status: 14 out of 742
Loop status: 15 out of 742
Loop status: 16 out of 742
Loop status: 17 out of 742
Loop status: 18 out of 742
Loop status: 19 out of 742
Loop status: 20 out of 742
Loop status: 21 out of 742
Loop status: 22 out of 742
Loop status: 23 out of 742
Loop status: 24 out of 742
Loop status: 25 out of 742
Loop status: 26 out of 742
Loop status: 27 out of 742
Loop status: 28 out of 742
Loop status: 29 out of 742
Loop status: 30 out of 742
Loop status: 31 out of 742
Loop status: 32 out of 742
Loop status: 33 out of 742
Loop status: 34 out of 742
Loop status: 35 out of 742
Loop status: 36 out of 742
Loop status

In [12]:
## Run for the grid sampled at one in every 10 points

landmassDF = landmassOnlyGrid(subLats10,subLons10, landmassIntermediateRes) # Maybe 8 hours to run?
landmassDF= landmassDF.sort_values(["lat","lon"])
log_text("Number of points in IntermediateRes landmass grid 10: " + str(len(landmassDF)))

outPath = "/shares/maps100/data/output/grid/LandmassIntermediateResSparseGrid10.csv"

landmassDF.to_csv(outPath, float_format = "%.3f", index=False) # This outputs an exact decimal as opposed to float approximation used in pandas operations

log_text("IntermediateRes grid10 complete!")

Loop status: 0 out of 32830
Loop status: 1 out of 32830
Loop status: 2 out of 32830
Loop status: 3 out of 32830
Loop status: 4 out of 32830
Loop status: 5 out of 32830
Loop status: 6 out of 32830
Loop status: 7 out of 32830
Loop status: 8 out of 32830
Loop status: 9 out of 32830
Loop status: 10 out of 32830
Loop status: 11 out of 32830
Loop status: 12 out of 32830
Loop status: 13 out of 32830
Loop status: 14 out of 32830
Loop status: 15 out of 32830
Loop status: 16 out of 32830
Loop status: 17 out of 32830
Loop status: 18 out of 32830
Loop status: 19 out of 32830
Loop status: 20 out of 32830
Loop status: 21 out of 32830
Loop status: 22 out of 32830
Loop status: 23 out of 32830
Loop status: 24 out of 32830
Loop status: 25 out of 32830
Loop status: 26 out of 32830
Loop status: 27 out of 32830
Loop status: 28 out of 32830
Loop status: 29 out of 32830
Loop status: 30 out of 32830
Loop status: 31 out of 32830
Loop status: 32 out of 32830
Loop status: 33 out of 32830
Loop status: 34 out of 3