Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify SEG-Y parsing settings and documentation cleanup #129

Merged
merged 3 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ It supports reading from local and cloud files (object store). It can read:
- Disjoint sequential regions (fast)
- Random traces (slow)

The library will also try to infer the endianness and the revision of the SEG-Y
file automatically. If it can't, users can override the endianness, revision, and
more parameters using the settings.

### High Performance

The performance is high and to be proven with upcoming benchmarks. The initial
Expand All @@ -91,13 +95,13 @@ data models and JSON schema parsing and validation.

### Predefined SEG-Y Standards

It supports predefined SEG-Y "standards" for various versions. However,
some versions are still in progress:
It supports predefined SEG-Y "standards" for various versions. However, some versions
are still in progress and not all validation logic is implemented yet:

- [x] Rev 0 (1975)
- [x] Rev 1 (2002)
- [ ] Rev 2 (2017)
- [ ] Rev 2.1 (2023)
- Rev 0 (1975)
- Rev 1 (2002)
- Rev 2 (2017)
- 🔲 Rev 2.1 (2023)

### Custom SEG-Y Standards

Expand Down
19 changes: 2 additions & 17 deletions docs/api_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,26 +21,11 @@
## Configuration

```{eval-rst}
.. autopydantic_settings:: segy.config.SegyFileSettings
.. autopydantic_settings:: segy.config.SegySettings
:inherited-members: BaseModel
```

```{eval-rst}
.. autopydantic_settings:: segy.config.SegyBinaryHeaderSettings
:inherited-members: BaseModel
```

```{eval-rst}
.. autopydantic_settings:: segy.config.ExtendedTextHeaderSetting
:inherited-members: BaseModel
```

```{eval-rst}
.. autopydantic_settings:: segy.config.SampleIntervalSetting
:inherited-members: BaseModel
```

```{eval-rst}
.. autopydantic_settings:: segy.config.SamplesPerTraceSetting
.. autopydantic_settings:: segy.config.BinaryHeaderSettings
:inherited-members: BaseModel
```
4 changes: 2 additions & 2 deletions docs/cli_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ trace_index
## Configuration Options

When accessing public datasets from S3, we need to set
`SegyFileSettings().storage_options = {"anon": True}`{l=python} for anonymous
access. [SegyFileSettings](#SegyFileSettings) exposes all configuration options
`SegySettings().storage_options = {"anon": True}`{l=python} for anonymous
access. [SegySettings](#SegySettings) exposes all configuration options
as environment variables. We just need to set `storage_options` with the `JSON`
string `{"anon": true}`{l=python}. On Linux you can do this by the command below.
Environment variables can be configured in many ways, please refer to the options
Expand Down
1 change: 0 additions & 1 deletion docs/data_models/data_type.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
:nosignatures:

ScalarType
DataFormat
HeaderSpec
HeaderField
Endianness
Expand Down
8 changes: 4 additions & 4 deletions docs/data_models/file.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,13 @@ It must be set to one of the allowed [`SegyStandard`](#SegyStandard) values.

#### Text File Header

The [`text_file_header`](#SegySpec.text_file_header) stores the information
required to parse the textual file header of the SEG-Y file. This includes important
metadata that pertains to the seismic data in human-readable format.
The [`text_header`](#SegySpec.text_header) stores the information required to parse
the textual file header of the SEG-Y file. This includes important metadata that
pertains to the seismic data in human-readable format.

#### Binary File Header

The [`binary_file_header`](#SegySpec.binary_file_header) item talks about
The [`binary_header`](#SegySpec.binary_header) item talks about
the binary file header of the SEG-Y file. It is a set of structured and important
information about the data in the file, stored in binary format for machines to
read and process quickly and efficiently.
Expand Down
43 changes: 19 additions & 24 deletions docs/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,43 +15,39 @@
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

## `SegyFileSettings` Class
## `SegySettings` Class

The [SegyFileSettings] is a configuration object for the
The [SegySettings] is a configuration object for the
[SegyFile] in the environment. It allows you to customize various aspects of
SEG-Y file parsing according to your needs and the specifics of your project.

It is composed of various sub-settings isolated by SEG-Y components and various topics.

- **binary**: The [SegyBinaryHeaderSettings] is used for binary header configuration
while reading a SEG-Y file.
- **endian**: This setting determines the byte order that is being used in the SEG-Y file.
- **binary**: The [BinaryHeaderSettings] is used for binary header overrides
when reading a SEG-Y file.
- **endianness**: This setting determines the byte order that is being used in the SEG-Y file.
The possible options are `"big"` or `"little"` based on [Endianness]. If left as None,
the system defaults to Big Endian (`"big"`).
- **revision**: This setting is used to specify the SEG-Y revision number. If left as
None, the system will automatically use the revision mentioned in the SEG-Y file.
- **use_pandas**: This setting is a boolean that decides whether to use pandas for
headers or not. Does not apply to trace data. The trace data is always returned
as Numpy arrays. The option to use Numpy for headers is currently disabled and will
be available at a later release (as of March 2024).
- **storage_options**: Provides a hook to pass parameters to storage backend. Like
credentials, anonymous access, etc.

## Usage

You initialize an instance of [SegyFileSettings] like any other Python object,
You initialize an instance of [SegySettings] like any other Python object,
optionally providing initial values for the settings. For example:

```python
from segy.config import SegyBinaryHeaderSettings
from segy.config import BinaryHeaderSettings
from segy.config import SegySettings
from segy.schema import Endianness

# Override extended text header count to zero
binary_header_settings = SegyBinaryHeaderSettings(
extended_text_header={"value": 0}
)
bin_overrides = BinaryHeaderSettings(extended_text_header=0)

settings = SegySettings(
binary=binary_header_settings,
binary=bin_overrides,
endian=Endianness.LITTLE,
revision=1,
)
Expand All @@ -68,25 +64,24 @@ file = SegyFile(uri="...", settings=settings)
If no settings are provided to [SegyFile], it will take the default values.

```{seealso}
[SegyFileSettings], [SegyFile], [Endianness]
[SegySettings], [SegyFile], [Endianness]
```

## Environment Variables

Environment variables that follow the `SEGY__VARIABLE__SUBVARIABLE` format will be
automatically included in your [SegyFileSettings] instance:
automatically included in your [SegySettings] instance:

```shell
export SEGY__BINARY__SAMPLES_PER_TRACE__VALUE=1001
export SEGY__BINARY__SAMPLE_INTERVAL__KEY="my_custom_key_in_schema"
export SEGY__ENDIAN="big"
export SEGY__REVISION=0.0
export SEGY__BINARY__SAMPLES_PER_TRACE=1001
export SEGY__ENDIANNESS="big"
export SEGY__REVISION=0
```

The environment variables will override the defaults in the [SegyFileSettings]
The environment variables will override the defaults in the [SegySettings]
configuration, unless user overrides it again within Python.

[endianness]: #Endianness
[segyfilesettings]: #SegyFileSettings
[segysettings]: #SegySettings
[segyfile]: #SegyFile
[segybinaryheadersettings]: #SegyBinaryHeaderSettings
[segybinaryheadersettings]: #BinaryHeaderSettings
24 changes: 13 additions & 11 deletions docs/tutorials/creation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"outputs": [],
"source": [
"from segy.factory import SegyFactory\n",
"from segy.standards.rev1 import rev1_segy"
"from segy.standards import get_segy_standard"
]
},
{
Expand All @@ -49,12 +49,13 @@
"metadata": {},
"outputs": [],
"source": [
"SAMPLE_INTERVAL = 4000 # in microseconds\n",
"SAMPLES_PER_TRACE = 101\n",
"factory_config = {\n",
" \"spec\": get_segy_standard(1.0),\n",
" \"samples_per_trace\": 101,\n",
" \"sample_interval\": 4_000, # in microseconds\n",
"}\n",
"\n",
"factory = SegyFactory(\n",
" rev1_segy, sample_interval=SAMPLE_INTERVAL, samples_per_trace=SAMPLES_PER_TRACE\n",
")\n",
"factory = SegyFactory(**factory_config)\n",
"\n",
"txt = factory.create_textual_header()\n",
"bin_ = factory.create_binary_header()"
Expand Down Expand Up @@ -83,13 +84,13 @@
"samples = factory.create_trace_sample_template(size=TRACE_COUNT)\n",
"\n",
"for trace_idx in range(TRACE_COUNT):\n",
" headers[trace_idx][\"trace_seq_file\"] = trace_idx + 1\n",
" headers[trace_idx][\"trace_seq_num_reel\"] = trace_idx + 1\n",
" headers[trace_idx][\"cdp_x\"] = 1_000\n",
" headers[trace_idx][\"cdp_y\"] = 10_000 + trace_idx * 50\n",
" headers[trace_idx][\"inline\"] = 10\n",
" headers[trace_idx][\"crossline\"] = 100 + trace_idx\n",
"\n",
" samples[trace_idx] = range(SAMPLES_PER_TRACE) # sample index\n",
" samples[trace_idx] = range(factory_config[\"samples_per_trace\"]) # sample index\n",
" samples[trace_idx] += trace_idx # trace no"
]
},
Expand Down Expand Up @@ -190,7 +191,7 @@
"outputs": [],
"source": [
"show_fields = [\n",
" \"trace_seq_file\",\n",
" \"trace_seq_num_reel\",\n",
" \"cdp_x\",\n",
" \"cdp_y\",\n",
" \"inline\",\n",
Expand All @@ -203,7 +204,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ef613e27-20cf-420c-85d5-cfa630b28def",
"id": "96fafaae-894a-447a-adcb-d98ffa70a0ad",
"metadata": {},
"outputs": [],
"source": []
Expand All @@ -224,7 +225,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
32 changes: 17 additions & 15 deletions docs/tutorials/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"from segy import SegyFile\n",
"from segy.config import SegySettings\n",
"from segy.schema import HeaderField\n",
"from segy.standards import rev1_segy"
"from segy.standards import get_segy_standard"
]
},
{
Expand All @@ -61,13 +61,13 @@
"[http link]: http://s3.amazonaws.com/open.source.geoscience/open_data/newzealand/Taranaiki_Basin/PARIHAKA-3D/Parihaka_PSTM_full_angle.sgy\n",
"\n",
"This link is convenient as the `segy` library supports HTTP and we can directly use it\n",
"without downloading as well. Hovewer, For demonstration purposes, we'll use the \n",
"without downloading as well. Hovewer, For demonstration purposes, we'll use the\n",
"corresponding S3 link (or called bucket and prefix):\n",
"\n",
"`s3://open.source.geoscience/open_data/newzealand/Taranaiki_Basin/PARIHAKA-3D/Parihaka_PSTM_full_angle.sgy`\n",
"\n",
"It's important to note that the file isn't downloaded but rather read on demand from the\n",
"S3 object store with the `segy` library. \n",
"S3 object store with the `segy` library.\n",
"\n",
"The `SegyFile` class uses information from the binary file header to construct a SEG-Y\n",
"descriptor, allowing it to read the file. The SEG-Y Revision is inferred from the binary\n",
Expand Down Expand Up @@ -202,7 +202,7 @@
"id": "7ba3bd7911a900ec",
"metadata": {},
"source": [
"We can look at headers (by default it is a Pandas `DataFrame`) in a nicely formatted table. \n",
"We can look at headers (by default it is a Pandas `DataFrame`) in a nicely formatted table.\n",
"\n",
"We can also do typical Pandas analytics (like plots, statistics, etc.) but it won't be shown here."
]
Expand Down Expand Up @@ -281,9 +281,9 @@
"Based on the text header lines:\n",
"\n",
"```\n",
"C 2 HEADER BYTE LOCATIONS AND TYPES: \n",
"C 3 3D INLINE : 189-192 (4-BYTE INT) 3D CROSSLINE: 193-196 (4-BYTE INT) \n",
"C 4 ENSEMBLE X: 181-184 (4-BYTE INT) ENSEMBLE Y : 185-188 (4-BYTE INT) \n",
"C 2 HEADER BYTE LOCATIONS AND TYPES:\n",
"C 3 3D INLINE : 189-192 (4-BYTE INT) 3D CROSSLINE: 193-196 (4-BYTE INT)\n",
"C 4 ENSEMBLE X: 181-184 (4-BYTE INT) ENSEMBLE Y : 185-188 (4-BYTE INT)\n",
"```\n",
"\n",
"As we know by the SEG-Y Rev1 definition, the coordinate scalars are at byte 71."
Expand All @@ -296,18 +296,19 @@
"metadata": {},
"outputs": [],
"source": [
"custom_spec = rev1_segy.customize(\n",
"rev1 = get_segy_standard(1.0)\n",
"custom_spec = rev1.customize(\n",
" binary_header_fields=[\n",
" HeaderField(name=\"sample_int\", byte=17, format=\"int16\"),\n",
" HeaderField(name=\"num_samples\", byte=21, format=\"int16\"),\n",
" HeaderField(name=\"num_ext_text_headers\", byte=305, format=\"int16\"),\n",
" HeaderField(name=\"sample_interval\", byte=17, format=\"int16\"),\n",
" HeaderField(name=\"samples_per_trace\", byte=21, format=\"int16\"),\n",
" HeaderField(name=\"num_extended_text_headers\", byte=305, format=\"int16\"),\n",
" ],\n",
" trace_header_fields=[\n",
" HeaderField(name=\"inline\", byte=189, format=\"int32\"),\n",
" HeaderField(name=\"crossline\", byte=193, format=\"int32\"),\n",
" HeaderField(name=\"cdp_x\", byte=181, format=\"int32\"),\n",
" HeaderField(name=\"cdp_y\", byte=185, format=\"int32\"),\n",
" HeaderField(name=\"scalar_coord\", byte=71, format=\"int16\"),\n",
" HeaderField(name=\"coordinate_scalar\", byte=71, format=\"int16\"),\n",
" ],\n",
")\n",
"\n",
Expand Down Expand Up @@ -398,8 +399,8 @@
"source": [
"trace_headers = traces.header.to_dataframe()\n",
"\n",
"trace_headers[\"cdp_x\"] /= trace_headers[\"scalar_coord\"].abs()\n",
"trace_headers[\"cdp_y\"] /= trace_headers[\"scalar_coord\"].abs()\n",
"trace_headers[\"cdp_x\"] /= trace_headers[\"coordinate_scalar\"].abs()\n",
"trace_headers[\"cdp_y\"] /= trace_headers[\"coordinate_scalar\"].abs()\n",
"\n",
"trace_headers"
]
Expand Down Expand Up @@ -468,7 +469,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
14 changes: 7 additions & 7 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading