# Working with Items in Batch

DynamoDB also provides batch operations.

- `BatchGetItem` reads up to 100 items from one or more tables
- `BatchWriteItem` creates or deletes up to 25 items in one or more tables

## BatchWriteItem

There are two ways to do `BatchWriteItem` operation in Python. Basically the two ways call the same AWS API.

- Use DynamoDB client batch_write_item method
    - Limited up to 25 put or delete requests in a single batch
    - Each operation is atomic, but the batch as a whole is not
        - Check `UnprocessedItems` in a response
- Use DynamoDB resource table batch_writer method
    - Automatically handles buffering and sending items in batches
    - Automatically handle any unprocessed items and resend them as needed
    - No response returned about the result

Let's try to import `data/starbucks.csv` by using `batch_writer` context.

In [2]:
# import and get dynamodb resource
import boto3
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError
from pprint import pprint, pformat
from decimal import Decimal
import time
import multiprocessing as mp
import csv
from datetime import datetime

dynamodb = boto3.resource('dynamodb')
starbucks = dynamodb.Table('Starbucks')

In [15]:
# cleanse raw data
items = []

with open('data/starbucks.csv', 'r', encoding='utf-8') as f:
    reader = csv.DictReader(
        f,
        fieldnames=['Brand', 'StoreNumber', 'StoreName', 'OwnershipType', 'StreetAddress', 'City', 'State', 'Country', 'Postcode', 'PhoneNumber', 'Timezone', 'Longitude', 'Latitude']
    )
    next(reader, None)
    
    for row in reader:
        item = {key: value for key, value in row.items() if value != ''}
        try:
            item['Longitude'] = Decimal(item['Longitude'])
            item['Latitude'] = Decimal(item['Latitude'])
            item['StateCity'] = item['State'] + '::' + item['City']
        except:
            pass
        
        items.append(item)
        
print('Total rows in items list is {}.'.format(len(items)))
print('Here is a sample data: \n{}'.format(pformat(items[0])))

Total rows in items list is 25600.
Here is a sample data: 
{'Brand': 'Starbucks',
 'City': 'Andorra la Vella',
 'Country': 'AD',
 'Latitude': Decimal('42.51'),
 'Longitude': Decimal('1.53'),
 'OwnershipType': 'Licensed',
 'PhoneNumber': '376818720',
 'Postcode': 'AD500',
 'State': '7',
 'StateCity': '7::Andorra la Vella',
 'StoreName': 'Meritxell, 96',
 'StoreNumber': '47370-257954',
 'StreetAddress': 'Av. Meritxell, 96',
 'Timezone': 'GMT+1:00 Europe/Andorra'}


In [16]:
# batch put items in parallel
def group_items(iterator, n=25):
    """
    Split input list to the sub-lists with given number
    """
    accumulator = []
    
    for item in iterator:
        accumulator.append(item)
        if len(accumulator) == n:
            yield accumulator
            accumulator = []
            
    if accumulator:
        yield accumulator

def table_batch_write(table, items):
    """
    Batch writer with Table's batch_writer() context
    """
    ts = time.time()
    
    with boto3.resource('dynamodb').Table(table).batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
    
    te = time.time()
    duration_ms = (te - ts) * 1000
    print('table_batch_write is done in {:.2f} ms'.format(duration_ms))


table = 'Starbucks'
batch_size = 1000

ts = time.time()

with mp.Pool(processes=mp.cpu_count()) as pool:
    pool.starmap(table_batch_write, [(table, grouped_items) for grouped_items in group_items(items, batch_size)])
    pool.close()

te = time.time()
duration_ms = (te - ts) * 1000
print('all is done in {:.2f} ms'.format(duration_ms))

table_batch_write is done in 932.39 ms
table_batch_write is done in 927.25 ms
table_batch_write is done in 991.20 ms
table_batch_write is done in 1032.37 ms
table_batch_write is done in 1011.40 ms
table_batch_write is done in 1089.11 ms
table_batch_write is done in 1208.03 ms
table_batch_write is done in 1274.58 ms
table_batch_write is done in 1647.18 ms
table_batch_write is done in 1905.23 ms
table_batch_write is done in 1706.88 ms
table_batch_write is done in 2002.59 ms
table_batch_write is done in 2002.82 ms
table_batch_write is done in 2164.59 ms
table_batch_write is done in 2211.30 ms
table_batch_write is done in 1961.16 ms
table_batch_write is done in 1989.15 ms
table_batch_write is done in 1973.75 ms
table_batch_write is done in 1968.24 ms
table_batch_write is done in 2332.76 ms
table_batch_write is done in 1947.73 ms
table_batch_write is done in 1838.47 ms
table_batch_write is done in 2162.88 ms
table_batch_write is done in 2024.52 ms
table_batch_write is done in 540.33 ms
tabl

## BatchGetItem

The `BatchGetItem` operation returns the attributes of one or more items from one or more tables. You identify requested items by primary key.

A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items. If a partial result is returned, the operation returns a value for `UnprocessedKeys`.

By default, BatchGetItem performs eventually consistent reads on every table in the request.

In [12]:
response = dynamodb.batch_get_item(
    RequestItems={
        'Starbucks': {
            'Keys': [
                {
                    'StoreNumber': '47370-257954'
                },
                {
                    'StoreNumber': '47370-257955'
                }
            ]
        }
    },
    ReturnConsumedCapacity='INDEXES'
)

pprint(response)

{'ConsumedCapacity': [{'CapacityUnits': 1.0,
                       'Table': {'CapacityUnits': 1.0},
                       'TableName': 'Starbucks'}],
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '540',
                                      'content-type': 'application/x-amz-json-1.0',
                                      'date': 'Sun, 04 Oct 2020 19:37:22 GMT',
                                      'server': 'Server',
                                      'x-amz-crc32': '2673786153',
                                      'x-amzn-requestid': 'J3H470A1JMSG6MQFMENE3NOE4RVV4KQNSO5AEMVJF66Q9ASUAAJG'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'J3H470A1JMSG6MQFMENE3NOE4RVV4KQNSO5AEMVJF66Q9ASUAAJG',
                      'RetryAttempts': 0},
 'Responses': {'Starbucks': [{'Brand': 'Starbucks',
                              'City': 'Andorra la Vella',
                           

## Batch Operations and Error Handling

A batch operation can tolerate the failure of individual requests in the batch. For example, consider a `BatchGetItem` request to read five items. Even if some of the underlying `GetItem` requests fail, this does not cause the entire `BatchGetItem` operation to fail. However, if all five read operations fail, then the entire `BatchGetItem` fails.

The batch operations return information about individual requests that fail so that you can diagnose the problem and retry the operation. For `BatchGetItem`, the tables and primary keys in question are returned in the `UnprocessedKeys` value of the response. For `BatchWriteItem`, similar information is returned in `UnprocessedItems`.

The most likely cause of a failed read or a failed write is throttling. If DynamoDB returns any unprocessed items, you should retry the batch operation on those items. However, we strongly recommend that you use an exponential backoff algorithm.