# Read data from cloud

There's an open data fork of the Groningen project in Software Underground's AWS.

Let's try to read data from it.

First you'll need to do this in your environment:

    pip install boto3

## Anonymous access

You can also use `boto3` without authentication to read data, get file listings, etc, from an S3 bucket, if the bucket is public.

In [15]:
import boto3
from botocore import UNSIGNED
from botocore.client import Config

s3 = boto3.resource('s3', config=Config(signature_version=UNSIGNED))

In [16]:
bucket = s3.Bucket('swung-hosted')

with open('../data/FILENAMES.txt', 'wt') as f:
    for obj in bucket.objects.all():
        print(obj.key)

groningen/
groningen/3DGrid/3D_Grid_Export_settings.PNG
groningen/3DGrid/3D_Grid_Horizon_order.png
groningen/3DGrid/DeckCache.ptd
groningen/3DGrid/Final_Grid.GRDECL
groningen/3DGrid/Final_Grid_ACTNUM.GRDECL
groningen/3DGrid/Final_Grid_COORD.GRDECL
groningen/3DGrid/Final_Grid_ZCORN.GRDECL
groningen/Cultural_Data/
groningen/Cultural_Data/Discoveries_TM5.dat.dbf
groningen/Cultural_Data/Discoveries_TM5.dat.prj
groningen/Cultural_Data/Discoveries_TM5.dat.shp
groningen/Cultural_Data/Discoveries_TM5.dat.shx
groningen/Cultural_Data/Fields_Reservoirs_TM5.dat.dbf
groningen/Cultural_Data/Fields_Reservoirs_TM5.dat.prj
groningen/Cultural_Data/Fields_Reservoirs_TM5.dat.shp
groningen/Cultural_Data/Fields_Reservoirs_TM5.dat.shx
groningen/Cultural_Data/NLD_Blocks_Concessions_TM5.dat.dbf
groningen/Cultural_Data/NLD_Blocks_Concessions_TM5.dat.prj
groningen/Cultural_Data/NLD_Blocks_Concessions_TM5.dat.shp
groningen/Cultural_Data/NLD_Blocks_Concessions_TM5.dat.shx
groningen/Cultural_Data/NLD_Netherlands_Ou

groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__3.las
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__4.dev
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__4.las
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__5.dev
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__5.las
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__6.dev
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__6.las
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__6A.dev
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__6A.las
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__7.dev
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__7.las
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__8.dev
groningen/Well_data/Groningen_field/01__Cluster_Wells/Sappemeer/SAP-__8.las
groningen/

## Authenticated access

I used the following code to make a file listing, which is on AWS here: https://swung-hosted.s3.ca-central-1.amazonaws.com/groningen/FILENAMES.txt

In [1]:
import boto3
import secrets

session = boto3.Session(
    aws_access_key_id=secrets.AWS_ACCESS_KEY_ID,
    aws_secret_access_key=secrets.AWS_SECRET_ACCESS_KEY,
)

s3 = session.resource('s3')

In [2]:
bucket = s3.Bucket('swung-hosted')

with open('../data/FILENAMES.txt', 'wt') as f:
    for obj in bucket.objects.all():
        f.write(obj.key + '\n')

## Read direct from URL

Some libraries let you read directly:

In [3]:
import pandas as pd

url = "https://swung-hosted.s3.ca-central-1.amazonaws.com/groningen/Formation_tops/Groningen__Formation_tops__EPSG_28992.csv"

df = pd.read_csv(url)
df.head()

Unnamed: 0,X,Y,Z,TWT picked,TWT auto,Geological age,MD,Type,Surface,Well,...,Used by dep.conv.,Used by geo mod,Zone log,Edited by user,Symbol,Locked to fault,"FLOAT,Continuous","FLOAT,Carb_net2","FLOAT,SH_WS_belowcontact",PVD auto
0,256256.0,591586.0,-2824.0,,-1875.9,,2831.5,Horizon,USS_3.1_T,AMR- 1,...,False,False,0.0,False,0.0,0.0,,,,-2824.0
1,256634.0,591613.0,-2790.0,,,,2818.75,Horizon,USS_3.1_T,AMR- 2,...,False,True,0.0,False,0.0,0.0,,,,-2790.0
2,256627.0,591617.0,-2789.0,,,,2828.31,Horizon,USS_3.1_T,AMR- 3,...,False,True,0.0,False,0.0,0.0,,,,-2789.0
3,256583.0,591606.0,-2786.0,,,,2829.86,Horizon,USS_3.1_T,AMR- 4,...,False,True,0.0,False,0.0,0.0,,,,-2786.0
4,256533.0,591778.0,-2791.0,,,,2888.83,Horizon,USS_3.1_T,AMR- 5B,...,False,True,0.0,False,0.0,0.0,,,,-2791.0


In [4]:
from welly import Well

url = "https://swung-hosted.s3.ca-central-1.amazonaws.com/groningen/Well_data/Oude_Pekela_field/OPK-__1.las"

w = Well.from_las(url)
w



OPK- 1 11000080112101,OPK- 1 11000080112101.1
crs,CRS({})
location,
province,
api,
td,
data,"CAL, DENS, FACIES, FACIES_PP, FACIES_PP_ED, FLDE, FLGR, FLSO, GENERALTIME1, GR, NET_NOV14, NEUT, PERMNET_2015, PERMNET_NOV14, PORNET_NOV14, RESD, RESM, SH, SON"


## Download with requests

For everything else, the `requests` library is nice:

In [5]:
2*1024*1024  # Bytes in 2MiB

2097152

In [6]:
import requests

# NB This file is about 12GB.
url = "https://swung-hosted.s3.ca-central-1.amazonaws.com/groningen/Seismic_Volume/R3136_15UnrPrDMkD_Full_D_Rzn_RMO_Shp_vG.SEGY"

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open('../data/NAM/Seismic_Volume/R3136_15UnrPrDMkD_Full_D_Rzn_RMO_Shp_vG.SEGY', 'wb') as f:
        for chunk in r.iter_content(chunk_size=2_097_152):  # Bytes in chunk.
            f.write(chunk)


KeyboardInterrupt: 