# Introduction to IO (Input/Output)

We inevitably will need to read data from various places and formats in order to do things with them. This notebook is an overview of some common formats and common ways to read and/or write them. This is absolutely not an exhaustive list of what can be read in python, so if you have specific requests, please do reach out.

The following will not import everything upfront. We will start with some generic formats, and then some more specialised subsurface/geoscience formats.

## CSV or TSV files

A very common format, which is plain text with some sort of delimiter character (often `,` or `;`) separating each column, and newlines separating records. There are a number of ways to load these, depending on the intended use-case. Numpy or Pandas are probably the most common. D

In [1]:
import numpy as np
import pandas as pd

In [None]:
np.genfromtxt()

In [None]:
df = pd.read_csv()
df

In [None]:
np.savetxt()

In [None]:
df.to_csv()

## Excel Files

The easiest for this is definitely pandas. You will need to install `xlrd` as well, since this is an optional library used in the background.

In [None]:
df = pd.read_excel()

It is worth noting that you can either read individual worksheets, or load multiple ones into one dictionary.

In [None]:
df.to_excel()

## Databases

There are numerous ways of reading a database, which partially depends on the type of database. Pandas can read or write SQL, so it a reasonable starting point.

For a more powerful and flexible option, consider [sqlalchemy](https://www.sqlalchemy.org/).

## JSON

JavaScript Object Notation is a very common format used to exchange information on the internet, so you may get this back from various Application Programming Interfaces (APIs). It is very similar to a python `dict`, which is how these are usually handled once they are loaded. There is a built-in library for working with these, logically enough named `json`. This can handle json files in string format as well.

In [11]:
import json

In [None]:
json.load()

In [None]:
json.dump()

<hr/>

The following are more geoscience or subsurface data formats.

## Shapefiles

These are a common geographical information system format, originally developed by Esri. A simple way to load these is to use geopandas:

In [5]:
import geopandas as gpd

In [None]:
gdf = gpd.read_file()

Because geopandas uses `fiona` in the background for file handling, it can handle the following formats in addition to shapefiles. Files with `'r'` can read from, `'w'` can be written to, and `'a'` can be appended to.

In [10]:
import fiona
fiona.supported_drivers

{'ARCGEN': 'r',
 'DXF': 'rw',
 'CSV': 'raw',
 'OpenFileGDB': 'r',
 'ESRIJSON': 'r',
 'ESRI Shapefile': 'raw',
 'FlatGeobuf': 'rw',
 'GeoJSON': 'raw',
 'GeoJSONSeq': 'rw',
 'GPKG': 'raw',
 'GML': 'rw',
 'OGR_GMT': 'rw',
 'GPX': 'rw',
 'GPSTrackMaker': 'rw',
 'Idrisi': 'r',
 'MapInfo File': 'raw',
 'DGN': 'raw',
 'PCIDSK': 'rw',
 'OGR_PDS': 'r',
 'S57': 'r',
 'SQLite': 'raw',
 'TopoJSON': 'r'}

In [None]:
gpd.geodataframe.

## LAS files

`lasio` is a library that is able to read LAS2 files, but `welly` is a wrapper that may be nicer to use for everyday use:

In [3]:
from welly import Well, Project

In [None]:
w = Well.from_las()

Welly can also load an entire directory of las files into a `Project`:

In [None]:
p = Project.from_las()

## SEG-Y

The SEG-Y format is widely-used, although any given individual file can be tricky to load. Equinor has written a low-level library named [`segyio`](https://github.com/equinor/segyio) which can (with some effort in some cases) read and write SEG-Y files and headers.

In [12]:
import segyio

In [None]:
with segyio.open() as s:
    vol = s.cube()

Given that `segyio` is intended for relatively low-level operations, it means that there is a fair amount of work to get things working. An alternative, built on top of it is SEGY Swis Army Knife ([SEGYSAK](https://segysak.readthedocs.io/en/latest/index.html)). This is intended to make common operations a little easier. It also interfaces with `xarray`, which is an extension of numpy, and well-worth a look.

In [None]:
from segysak.segy import segy_loaderder

In [None]:
segy_loader()

## DLIS files

Equinor have written a library named `dlisio` that can handle dlis files:

In [4]:
import dlisio

In [None]:
# need to confirm how this one works

## Other Assorted Formats

The subsurface world is filled with all sorts of other formats. Agile Scientific has written a library named `gio` that can handle a variety of these, such as OpendTect horizons, Surfer 7 grids, and ZMaps. These are loaded as `xarray`s. The documentation has [more details](https://code.agilescientific.com/gio/index.html).

In [13]:
import gio

In [None]:
data = gio.read_odt(fname)