Update documentation

apache · Mar 21, 2024 · 5aea3b1 · 5aea3b1
1 parent 1d29bdb
commit 5aea3b1
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 5 deletions.
diff --git a/docs/setup/compile.md b/docs/setup/compile.md
@@ -73,11 +73,20 @@ For example,
 export SPARK_HOME=$PWD/spark-3.0.1-bin-hadoop2.7
 export PYTHONPATH=$SPARK_HOME/python
 ```
-2. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder.
+2. Put JAI jars to ==SPARK_HOME/jars/== folder.
+```
+export JAI_CORE_VERSION="1.1.3"
+export JAI_CODEC_VERSION="1.1.3"
+export JAI_IMAGEIO_VERSION="1.1"
+wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_core/${JAI_CORE_VERSION}/jai_core-${JAI_CORE_VERSION}.jar
+wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_codec/${JAI_CODEC_VERSION}/jai_codec-${JAI_CODEC_VERSION}.jar
+wget -P $SPARK_HOME/jars/ https://repo.osgeo.org/repository/release/javax/media/jai_imageio/${JAI_IMAGEIO_VERSION}/jai_imageio-${JAI_IMAGEIO_VERSION}.jar
+```
+3. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder.
 ```
 cp spark-shaded/target/sedona-spark-shaded-xxx.jar $SPARK_HOME/jars/
 ```
-3. Install the following libraries
+4. Install the following libraries
 ```
 sudo apt-get -y install python3-pip python-dev libgeos-dev
 sudo pip3 install -U setuptools
@@ -86,20 +95,20 @@ sudo pip3 install -U virtualenvwrapper
 sudo pip3 install -U pipenv
 ```
 Homebrew can be used to install libgeos-dev in macOS: `brew install geos`
-4. Set up pipenv to the desired Python version: 3.7, 3.8, or 3.9
+5. Set up pipenv to the desired Python version: 3.7, 3.8, or 3.9
 ```
 cd python
 pipenv --python 3.7
 ```
-5. Install the PySpark version and the other dependency
+6. Install the PySpark version and the other dependency
 ```
 cd python
 pipenv install pyspark
 pipenv install --dev
 ```
 `pipenv install pyspark` installs the latest version of pyspark.
 In order to remain consistent with the installed spark version, use `pipenv install pyspark==<spark_version>`
-6. Run the Python tests
+7. Run the Python tests
 ```
 cd python
 pipenv run python setup.py build_ext --inplace

diff --git a/docs/tutorial/raster.md b/docs/tutorial/raster.md
@@ -583,6 +583,44 @@ SELECT RS_AsPNG(raster)
 
 Please refer to [Raster writer docs](../../api/sql/Raster-writer) for more details.
 
+## Collecting raster Dataframes and working with them locally in Python
+
+Sedona allows collecting Dataframes with raster columns and working with them locally in Python since `v1.6.0`.
+The raster objects are represented as `SedonaRaster` objects in Python, which can be used to perform raster operations.
+
+```python
+df_raster = sedona.read.format("binaryFile").load("/path/to/raster.tif").selectExpr("RS_FromGeoTiff(content) as rast")
+rows = df_raster.collect()
+raster = rows[0].rast
+raster  # <sedona.raster.sedona_raster.InDbSedonaRaster at 0x1618fb1f0>
+```
+
+You can retrieve the metadata of the raster by accessing the properties of the `SedonaRaster` object.
+
+```python
+raster.width        # width of the raster
+raster.height       # height of the raster
+raster.affine_trans # affine transformation matrix
+raster.crs_wkt      # coordinate reference system as WKT
+```
+
+You can get a numpy array containing the band data of the raster using the `as_numpy` or `as_numpy_masked` method. The
+band data is organized in CHW order.
+
+```python
+raster.as_numpy()        # numpy array of the raster
+raster.as_numpy_masked() # numpy array with nodata values masked as nan
+```
+
+If you want to work with the raster data using `rasterio`, you can retrieve a `rasterio.DatasetReader` object using the
+`as_rasterio` method.
+
+```python
+ds = raster.as_rasterio()  # rasterio.DatasetReader object
+# Work with the raster using rasterio
+band1 = ds.read(1)         # read the first band
+```
+
 ## Performance optimization
 
 When working with large raster datasets, refer to the [documentation on storing raster geometries in Parquet format](../storing-blobs-in-parquet) for recommendations to optimize performance.