# Introduction to GeoST

This quick introduction will cover some of the key concepts and basic features of `GeoST` to help you get started. `GeoST` depends heavily on popular data science libraries [Pandas](https://pandas.pydata.org/docs/index.html) and [GeoPandas](https://geopandas.org/en/stable/index.html) but `GeoST` provides readily available, frequently used selections on data held in [DataFrame](https://pandas.pydata.org/docs/reference/frame.html) or [GeoDataFrame](https://geopandas.org/en/stable/docs/reference/geodataframe.html) objects. This makes GeoST an easy to use option for less experienced Python users while more experienced users can easily access the underlying DataFrames and develop their own functionalities.

GeoST is designed to work with many different kinds of subsurface data that is available in The Netherlands. GeoST is a work-in-progress and aims to support an increasing number of data sources. Below is a list of different data sources which are currently supported or will be supported by GeoST in the future:

**From local files**:
- Tabular data of borehole, CPT, etc. (.parquet, .csv)
- Geological boreholes xml (BHR-G)
- Geotechnical boreholes xml (BHR-GT)
- Pedological boreholes xml (BHR-P)
- Cone Penetration Test xml/gef (CPT)
- Pedological soilprofile descriptions xml (SFR)
- BORIS (TNO borehole description software) xml

**Directly from the [BRO REST-API](https://www.bro-productomgeving.nl/bpo/latest/url-s-publieke-rest-services)**:
- BHR-G
- BHR-GT
- BHR-P
- CPT
- SFR

**BRO models**:
- GeoTOP: from local NetCDF or directly via [OPeNDAP server](https://dinodata.nl/opendap/)

*Planned*:
- BRO/PDOK geopackages: [BHR-G](https://service.pdok.nl/bzk/bro-geologisch-booronderzoek/atom/index.xml), [BHR-GT](https://service.pdok.nl/bzk/bro-geotechnischbooronderzoek/atom/v1_0/index.xml), [BHR-P](https://service.pdok.nl/bzk/brobhrpvolledigeset/atom/v1_1/index.xml), [CPT](https://service.pdok.nl/bzk/brocptvolledigeset/atom/v1_0/index.xml), [SFR](https://service.pdok.nl/bzk/bodem/bro-wandonderzoek/atom/index.xml)
- Well logs LAS/ASCII
- REGIS II
- Dino xml geological boreholes 
- BHR-G gef 
- Soilmap of the Netherlands

GeoST also plans support for several Geophysical data sources such as Seismic, ERT, EM and others.

## Concept
At the core, `GeoST` handles data in a so-called `Collection` objects which holds all the spatial information of any kind of data source in a **"header"** attribute, and the corresponding data in a **"data"** attribute. So for example, a set of 100 boreholes is held in a `BoreholeCollection` where the **"header"** contains one row per data entry and provides information about the id, location, surface level and depths and the **"data"** has the information of each described layer. When working with these `Collections`, GeoST automatically keeps track of the alignment and thus makes sure each data entry occurs in both the **"header"** and **"data"** attributes. For example, when a user deletes an individual borehole entry from the **"header"**, the `Collection` ensures it is deleted from the **"data"** as well.  

<div class="alert alert-info">
User guide
    
For a more detailed explanation of the types of GeoST objects for different sources of data, check the [Data structures](../user_guide/data_structures.md) in the user guide.
</div>

## The basics
### BoreholeCollection
Data is usually loaded through various reader functions (see [API reference](../api_reference/io.rst)). For this tutorial, `GeoST` provides a set of readily available boreholes in the area of the Utrecht Science Park which can be directly loaded as a `BoreholeCollection`. Let's read the data, print the result to see what it says and also plot the locations to get an idea where we are:

In [None]:
import geost

usp_boreholes = geost.data.boreholes_usp()
print(usp_boreholes)
usp_boreholes.header.explore()  # Interactive plot of the borehole locations.

As you can see it says that 'usp_boreholes' is of the type [BoreholeCollection](../api_reference/borehole_collection.rst). Additionally, it says `# header = 67`. This means that the collection in total contains 67 boreholes but it also shows the first key attribute of a collection: the "header" attribute.

As said in the previous section, the "header" attribute in a `BoreholeCollection` contains all the information about each borehole such as the ID, x- and y-coordinates and further metadata. Additionally, it contains geometry objects for each borehole which allows for spatial selections and exports to GIS-supported formats etc. that are provided by `GeoST`. The header attribute is a [Geopandas GeoDataFrame](https://geopandas.org/en/stable/docs/reference/geodataframe.html) instance. Let's see what the attribute looks like by printing it:

In [None]:
print(usp_boreholes.header)

Since the header is a `GeoDataFrame` instance, we have direct access to all methods provided by GeoDataFrames. Therefore, the above interactive plot of the borehole locations was easily created using the `.explore()` method. More experienced Python users can therefore use the header to do any customized operation with GeoDataFrames they would normally do. 

The other key attribute of a collection is the "data" attribute which is a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/frame.html) instance. This contains the actual logged data (i.e. layer descriptions) of the boreholes. In this case, the "data" attribute contains "layered" data because the borehole data is logged in terms of layers (i.e. depth intervals over which properties are the same) with "top" and "bottom". Let's see what it looks like:

In [None]:
print(usp_boreholes.data)

Also with the "data" attribute, we have direct access to all methods provided by DataFrames and more experienced Python users can use it to do any customized operation. The "data" attribute of this collection contains 32 different columns that hold the relevant borehole data and describes characteristics such as lithology, sand grain size, plant remains and others.

### Positional reference
As said, a collection contains all spatial information about the data, both horizontally and vertically. These attributes can be accessed through the "horizontal_reference" and "vertical_reference" attributes:

In [None]:
print(usp_boreholes.vertical_reference)
print(usp_boreholes.horizontal_reference)

These attributes can be used to reproject the data. For example, changing Dutch "Rijksdriehoekstelsel" coordinates to WGS 84 coordinates or change the vertical reference from Dutch "NAP" to a "Mean Sea Level" plane. Any reprojection automatically updates the coordinates in the data. Let's change the horizontal reference in "usp_boreholes" and checkout the "header" again to see this:

In [None]:
usp_boreholes.change_horizontal_reference(4326)  # Change from RD to WGS 84
print(usp_boreholes.header, usp_boreholes.horizontal_reference, sep="\n")

Note that the coordinates in the "x" and "y" columns have indeed been changed to latitude, longitude coordinates.

### Selections and slices
There are several ways to make subsets of a collection, such as:

**Spatial selections**
- `select_within_bbox` - Select data points in the collection within a bounding box
- `select_with_points` - Select data points in the collection within distance to other point geometries
- `select_with_lines` - Select data points in the collection within distance from line geometries
- `select_within_polygons` - Select data points in the collection within polygon geometries

**Conditional selections**
- `select_by_values` - Select data points in the collection based on the presence of certain values in one or more of the data columns
- `select_by_length` - Select data points in the collection based on length requirements 
- `select_by_depth` - Select data points in the collection based on depth constraints

**Slicing**
- `slice_depth_interval` - Slice boreholes in the collection down to the specified depth interval
- `slice_by_values` - Slice boreholes in the collection based on value (e.g. only sand layers, remove others).

We will not go through each of these methods in this quick start but please see the [API Reference](../) for more details.