## S3 bucket access

The [s3 source](../guide/sources.rst#s3) provides access to [Amazon S3](https://aws.amazon.com/s3/) buckets.

In this example we will use a publicly available bucket containing ECMWF forecast data in GRIB format.

In [1]:
import earthkit.data

bucket_name = "ecmwf-forecasts"

### Getting a whole object

#### Read as stream

By default the [s3 source](../guide/sources.rst#s3) returns a stream iterator, which we can consume field by field.

In [2]:
key = "20240111/00z/0p4-beta/oper/20240111000000-0h-oper-fc.grib2"
r = {"bucket": bucket_name, 
     "objects": [
         {"object":  key} 
          ],
   }

ds = earthkit.data.from_source("s3", r)

cnt = 0
for f in ds:
    cnt += 1

cnt

83

Options [batch_size](../guide/sources.rst#s3) and [group_by](../guide/sources.rst#s3) control how the stream data is handled. E.g. with [batch_size](../guide/sources.rst#s3)=0 we can load the whole object into memory. 

In [3]:
ds = earthkit.data.from_source("s3", r, batch_size=0)

# ds is a FieldList stored entirely in memory
print(f"len={len(ds)}")
ds.head()

len=83


Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,gh,isobaricInhPa,200,20240111,0,0,fc,,regular_ll
1,ecmf,gh,isobaricInhPa,925,20240111,0,0,fc,,regular_ll
2,ecmf,gh,isobaricInhPa,500,20240111,0,0,fc,,regular_ll
3,ecmf,r,isobaricInhPa,925,20240111,0,0,fc,,regular_ll
4,ecmf,gh,isobaricInhPa,250,20240111,0,0,fc,,regular_ll


#### Disk based access

Data can be read from a file downloaded and stored in the earthkit-data cache/tmp area.

In [4]:
ds = earthkit.data.from_source("s3", r, stream=False)

# ds is a FieldList read from a grib file on disk
print(f"len={len(ds)}")
ds.head()

len=83


Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,gh,isobaricInhPa,200,20240111,0,0,fc,,regular_ll
1,ecmf,gh,isobaricInhPa,925,20240111,0,0,fc,,regular_ll
2,ecmf,gh,isobaricInhPa,500,20240111,0,0,fc,,regular_ll
3,ecmf,r,isobaricInhPa,925,20240111,0,0,fc,,regular_ll
4,ecmf,gh,isobaricInhPa,250,20240111,0,0,fc,,regular_ll


### Getting multiple objects

In [5]:
# getting 2 forecats steps (0 and 12 h)
key1 = "20240111/00z/0p4-beta/oper/20240111000000-0h-oper-fc.grib2"
key2 = "20240111/00z/0p4-beta/oper/20240111000000-12h-oper-fc.grib2"
r = {"bucket": bucket_name, 
     "objects": [
         {"object": key1} , 
         {"object": key2}
     ],
   }

ds = earthkit.data.from_source("s3", r, stream=False)

# ds is a FieldList read from a grib file on disk
len(ds), ds.index("step")

                                                                                                                                                                                          

(166, [0, 12])

### Getting part of an object¶

A single byte range can be specified for each object.

In [6]:
key = "20240111/00z/0p4-beta/oper/20240111000000-0h-oper-fc.grib2"
r = {"bucket": bucket_name, 
     "objects": [
         {"object": key, "start": 0, "range": 438714} 
          ],
   }

# ds is a FieldList read from a grib file on disk
ds = earthkit.data.from_source("s3", r, stream=False)
ds.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,gh,isobaricInhPa,200,20240111,0,0,fc,,regular_ll
1,ecmf,gh,isobaricInhPa,925,20240111,0,0,fc,,regular_ll


### Getting parts of multiple objects

In [7]:
key1 = "20240111/00z/0p4-beta/oper/20240111000000-0h-oper-fc.grib2"
key2 = "20240111/00z/0p4-beta/oper/20240111000000-12h-oper-fc.grib2"
r = {"bucket": bucket_name, 
     "objects": [
         {"object": key1, "start": 0, "range": 438714},
         {"object": key2, "start": 0, "range": 451531} 
     ],
   }

# ds is a FieldList read from a grib file on disk
ds = earthkit.data.from_source("s3", r, stream=False)
ds.ls()

                                                                                                                                                                                          

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,gh,isobaricInhPa,200,20240111,0,0,fc,,regular_ll
1,ecmf,gh,isobaricInhPa,925,20240111,0,0,fc,,regular_ll
2,ecmf,gh,isobaricInhPa,200,20240111,0,12,fc,,regular_ll
3,ecmf,gh,isobaricInhPa,1000,20240111,0,12,fc,,regular_ll
