Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Kontinuation committed Mar 21, 2024
1 parent 1d29bdb commit 5aea3b1
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 5 deletions.
19 changes: 14 additions & 5 deletions docs/setup/compile.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,20 @@ For example,
export SPARK_HOME=$PWD/spark-3.0.1-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python
```
2. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder.
2. Put JAI jars to ==SPARK_HOME/jars/== folder.
```
export JAI_CORE_VERSION="1.1.3"
export JAI_CODEC_VERSION="1.1.3"
export JAI_IMAGEIO_VERSION="1.1"
wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_core/${JAI_CORE_VERSION}/jai_core-${JAI_CORE_VERSION}.jar
wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_codec/${JAI_CODEC_VERSION}/jai_codec-${JAI_CODEC_VERSION}.jar
wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_imageio/${JAI_IMAGEIO_VERSION}/jai_imageio-${JAI_IMAGEIO_VERSION}.jar
```
3. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder.
```
cp spark-shaded/target/sedona-spark-shaded-xxx.jar $SPARK_HOME/jars/
```
3. Install the following libraries
4. Install the following libraries
```
sudo apt-get -y install python3-pip python-dev libgeos-dev
sudo pip3 install -U setuptools
Expand All @@ -86,20 +95,20 @@ sudo pip3 install -U virtualenvwrapper
sudo pip3 install -U pipenv
```
Homebrew can be used to install libgeos-dev in macOS: `brew install geos`
4. Set up pipenv to the desired Python version: 3.7, 3.8, or 3.9
5. Set up pipenv to the desired Python version: 3.7, 3.8, or 3.9
```
cd python
pipenv --python 3.7
```
5. Install the PySpark version and the other dependency
6. Install the PySpark version and the other dependency
```
cd python
pipenv install pyspark
pipenv install --dev
```
`pipenv install pyspark` installs the latest version of pyspark.
In order to remain consistent with the installed spark version, use `pipenv install pyspark==<spark_version>`
6. Run the Python tests
7. Run the Python tests
```
cd python
pipenv run python setup.py build_ext --inplace
Expand Down
38 changes: 38 additions & 0 deletions docs/tutorial/raster.md
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,44 @@ SELECT RS_AsPNG(raster)

Please refer to [Raster writer docs](../../api/sql/Raster-writer) for more details.

## Collecting raster Dataframes and working with them locally in Python

Sedona allows collecting Dataframes with raster columns and working with them locally in Python since `v1.6.0`.
The raster objects are represented as `SedonaRaster` objects in Python, which can be used to perform raster operations.

```python
df_raster = sedona.read.format("binaryFile").load("/path/to/raster.tif").selectExpr("RS_FromGeoTiff(content) as rast")
rows = df_raster.collect()
raster = rows[0].rast
raster # <sedona.raster.sedona_raster.InDbSedonaRaster at 0x1618fb1f0>
```

You can retrieve the metadata of the raster by accessing the properties of the `SedonaRaster` object.

```python
raster.width # width of the raster
raster.height # height of the raster
raster.affine_trans # affine transformation matrix
raster.crs_wkt # coordinate reference system as WKT
```

You can get a numpy array containing the band data of the raster using the `as_numpy` or `as_numpy_masked` method. The
band data is organized in CHW order.

```python
raster.as_numpy() # numpy array of the raster
raster.as_numpy_masked() # numpy array with nodata values masked as nan
```

If you want to work with the raster data using `rasterio`, you can retrieve a `rasterio.DatasetReader` object using the
`as_rasterio` method.

```python
ds = raster.as_rasterio() # rasterio.DatasetReader object
# Work with the raster using rasterio
band1 = ds.read(1) # read the first band
```

## Performance optimization

When working with large raster datasets, refer to the [documentation on storing raster geometries in Parquet format](../storing-blobs-in-parquet) for recommendations to optimize performance.

0 comments on commit 5aea3b1

Please sign in to comment.