<b><center>
<span style="font-size: 24pt; line-height: 1.2">
Topics in SW Engineering:<br>Microservices and Cloud Native Applications
</span>
</center></b>
<br>
<p>
<i><center>
<span style="font-size: 20pt; line-height: 1.2">
Lecture 3: Application Design Models, Elastic Beanstalk/PaaS Concept, <br>REST (Continued), DynamoDB, Calling REST APIs, Pub/Sub-SNS, CloudFront<br>
(Draft 0.8)
</span>
</center></i>

# Lecture Overview

1. Questions/Answers.
<br><br>
1. Course changes
<br><br>
1. PaaS/Beanstalk Context<br><br>
1. Application design methodology.
<br><br>
1. REST: Repetition and more detail.
<br><br>
1. A little more about application logic.
<br><br>
1. Serverless, Function-as-a-Service, Lambda Functions.
<br><br>
1. DynamoDB.
<br><br>
1. <del>API Gateway, CloudFront, API Management.<del>
<br><br>
1. Next project phase.


# Questions?

# Course Changes

- I am trying to get the TA situation squared away, and will redouble my effort over the weekend. We have two TAs.


- I am going to be a lot more precise about:
    - Project milestone dates and deliverables.
    - We will start having mandatory team meetings.
    - I will set up a system for weekly status report submissions.

# Application Design Methodology: Data-Out/UX-In

| <img src="../../images/e6156-Slides-50.jpg"> |
| :---: |
| __Data-Out/UX-In Design__ |

<br><br>
__We will use data out for the first step.__

- The logical data model that we will start with is ...

| <img src="../../images/microservice_data_model.jpg"> |
| :---: |
| __Data-Out/UX-In Design__ |


- The data linkage will span microservices:
    - Core customer information is the first microservice we are building, and is HW1.
    - Social Information is an independent microservice.
    - Address is a linked, external microservice, but we will access through a caching adaptor.
    - There is a an indepdent user profile microservice.


- Starting with data means that we need to surface the data through a service model and linked resource model. This will help us understand core concepts in REST.


- This is not a UI course. Our primary focus will be the service, databases, APIs, etc. We will do some basic UI work, and you can do as much on your projects as you want.

- There are other application design methodologies, e.g.
    - [API First](https://swagger.io/resources/articles/adopting-an-api-first-approach/)
    - [Test Driven Development](https://en.wikipedia.org/wiki/Test-driven_development)
    - [Model Driven Development/Engineering](https://en.wikipedia.org/wiki/Model-driven_engineering)

# PaaS and Beanstalk

| <img src="../../images/beanstalk_1.jpg"> |
| :---: |
| __IaaS, PaaS, SaaS__ |

<br><br>

| <img src="../../images/beanstalk_concept.jpg"> |
| :---: |
| [Beanstalk Concept](https://www.youtube.com/watch?v=nRLZZefLDqU) |

- The basic idea is that with virtual machines (IaaS), you own a lot of set up, software installation, configuration, etc.


- With PaaS, the cloud provides all of the infrastructure and platform software, and you just "drop your code" into the platform container.

# REST

## Overview

- "Representational State Transfer (REST) is an architectural style that defines a set of constraints to be used for creating web services. Web Services that conform to the REST architectural style, or RESTful web services, provide interoperability between computer systems on the Internet. REST-compliant web services allow the requesting systems to access and manipulate textual representations of web resources by __using a uniform and predefined set of stateless operations.__ Other kinds of web services, such as SOAP web services, expose their own arbitrary sets of operations." \(Emphasis added\).(https://en.wikipedia.org/wiki/Representational_state_transfer)


- Non-RESTful applications surface service/domain specific operations, e.g.
    - ```open_account(...)```
    - ```transfer(...)```
    - ```check_balance(...)```
    

- The uniform, predefined REST operations are the HTTP Methods:
    - GET
    - PUT (or PATCH)
    - POST
    - DELETE
    
    
- These represent Create-Retrieve-Update-Delete operations on __resources__ identified by __URLs.__
    - POST is Create
    - GET is Retrieve
    - PUT (or PATCH) is Update
    - DELETE is Delete.
    
    
- __Note:__ People often confuse:
    - Remote procedure call/service invocation using HTTP
    - REST
    - They are not the same thing.
    
    
- The six core characteristics of the REST style are:
    1. Client–server architecture
    1. Statelessness
    3. Cacheability
    3. Layered system
    4. Code on demand (optional)
    6. Uniform interface


- You may also hear the term __Hypermedia As The Engine Of Application State (HATEOAS).__
    

## Client-Server Architecture

| <img src="../../images/rest-client-server.jpg"> |
| :---: |
| __REST Client Server__ |

- "The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system." (https://en.wikipedia.org/wiki/Client%E2%80%93server_model)


- Concept is straightforward.

## Statelessness

- Statelessness is easy to misunderstand.


- The server _clearly_ has long-lived state information, e.g.
    - Account balances.
    - Customer contact information.
    - Product catalog information in a database.
    - etc.
    
    
- Client-Server interactions have two types of state:
    - Resource state
    - Conversation/Session
    
    
- "In computer science, in particular networking, a session is a temporary and interactive information interchange between two or more communicating devices, or between a computer and user." (https://en.wikipedia.org/wiki/Session_(computer_science))


| <img src="../../images/session-state.jpeg"> |
| :---: |
| __Session/Conversation Start__ |

| <img src="../../images/http_session.jpg"> |
| :---: |
| __HTTP Session__ |

- Database cursors are an example of conversation state.


- Example stateful "service" using cursors.

In [15]:
import pymysql.cursors
import pandas as pd
import json


cnx = pymysql.connect(host='localhost',
                             user='dbuser',
                             password='dbuserdbuser',
                             db='lahman2019raw',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)

cursor = cnx.cursor()


def get_by_last_name(lastName, birthState):
    cursor.execute("select playerID, nameLast, NameFirst, birthCity, birthState, birthYear " +
                   " from people where nameLast=%s and birthState=%s",(lastName, birthState));
    r = cursor.fetchone()
    return r

def get_next():
    r = cursor.fetchone()
    return r

- Example stateful client for stateful server.

In [16]:


first = get_by_last_name("Williams", "CA")

print("First = ", first)

done = False
while not done:
    next = get_next()
    if next is None or len(next) == 0:
        done = True
    else:
        print("Next = ", next)

First =  {'playerID': 'willibe01', 'nameLast': 'Williams', 'NameFirst': 'Bernie', 'birthCity': 'Alameda', 'birthState': 'CA', 'birthYear': '1948'}
Next =  {'playerID': 'willido02', 'nameLast': 'Williams', 'NameFirst': 'Don', 'birthCity': 'Los Angeles', 'birthState': 'CA', 'birthYear': '1935'}
Next =  {'playerID': 'williji03', 'nameLast': 'Williams', 'NameFirst': 'Jimy', 'birthCity': 'Santa Maria', 'birthState': 'CA', 'birthYear': '1943'}
Next =  {'playerID': 'willike02', 'nameLast': 'Williams', 'NameFirst': 'Ken', 'birthCity': 'Berkeley', 'birthState': 'CA', 'birthYear': '1964'}
Next =  {'playerID': 'willima04', 'nameLast': 'Williams', 'NameFirst': 'Matt', 'birthCity': 'Bishop', 'birthState': 'CA', 'birthYear': '1965'}
Next =  {'playerID': 'willimi02', 'nameLast': 'Williams', 'NameFirst': 'Mitch', 'birthCity': 'Santa Ana', 'birthState': 'CA', 'birthYear': '1964'}
Next =  {'playerID': 'williri02', 'nameLast': 'Williams', 'NameFirst': 'Rinaldo', 'birthCity': 'Santa Cruz', 'birthState': '

- The server side, e.g. the database, remembers the last position with a cursor. This is session state.


- Statelessness in REST means that the server does not maintain conversation state.


- All requests from the client are complete and self-contained.


- The server _may_ return state to the client that the client must return on subsequent requests. $\Rightarrow$<br>The client maintains any conversation state the server requires.


| <img src="../../images/rest_client.jpg">|
| :---: |
| [REST Self-Contained Messages](http://mrbool.com/rest-architectural-elements-and-constraints/29339) |

- Stateless server example

In [25]:
import pymysql.cursors
import pandas as pd
import json


cnx = pymysql.connect(host='localhost',
                             user='dbuser',
                             password='dbuserdbuser',
                             db='lahman2019raw',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)


def get_by_last_name_offset(lastName, birthState, offset=0):
    cursor=cnx.cursor()
    cursor.execute("select playerID, nameLast, NameFirst, birthCity, birthState, birthYear " +
                   " from people where nameLast=%s and birthState=%s " +
                   "limit 1 offset %s",(lastName, birthState, offset))
    r = cursor.fetchone()
    return r


- Statless client example.

In [26]:

done = False
offset = 0

while not done:
    next = get_by_last_name_offset("Williams", "CA", offset)
    if next is None or len(next) == 0:
        done = True
    else:
        print("Next = ", next)
        offset += 1

Next =  {'playerID': 'willibe01', 'nameLast': 'Williams', 'NameFirst': 'Bernie', 'birthCity': 'Alameda', 'birthState': 'CA', 'birthYear': '1948'}
Next =  {'playerID': 'willido02', 'nameLast': 'Williams', 'NameFirst': 'Don', 'birthCity': 'Los Angeles', 'birthState': 'CA', 'birthYear': '1935'}
Next =  {'playerID': 'williji03', 'nameLast': 'Williams', 'NameFirst': 'Jimy', 'birthCity': 'Santa Maria', 'birthState': 'CA', 'birthYear': '1943'}
Next =  {'playerID': 'willike02', 'nameLast': 'Williams', 'NameFirst': 'Ken', 'birthCity': 'Berkeley', 'birthState': 'CA', 'birthYear': '1964'}
Next =  {'playerID': 'willima04', 'nameLast': 'Williams', 'NameFirst': 'Matt', 'birthCity': 'Bishop', 'birthState': 'CA', 'birthYear': '1965'}
Next =  {'playerID': 'willimi02', 'nameLast': 'Williams', 'NameFirst': 'Mitch', 'birthCity': 'Santa Ana', 'birthState': 'CA', 'birthYear': '1964'}
Next =  {'playerID': 'williri02', 'nameLast': 'Williams', 'NameFirst': 'Rinaldo', 'birthCity': 'Santa Cruz', 'birthState': 'C

- The caller remembers the position, and does not rely on a cursor.

- Is there a concern about the client modifying or tinkering with the state information?


- Yes, and the server can encrypt the session state information to prevent tampering.


- Facebook example

| <img src="../../images/facebook-request.jpg"> |
| :---: |
| __Sample Facebook Request__ |

- The Facebook request contains an encrypted _access token._ "In computer systems, an access token contains the security credentials for (...) identifies the user, the user's groups, the user's privileges, and, in some cases, a particular application." (https://en.wikipedia.org/wiki/Access_token)


- The response contains hashed and encrypted session state that MUST be returned to continue the conversational interaction.


- We will see concrete examples when we implement security and [pagination](https://docs.microsoft.com/en-us/azure/architecture/best-practices/api-design)

## Cacheability

- Cacheability means exactly what the word implies. There may be several intermediaries between the client and server that caches a result.


- The intermediaries check the cache on a request and return the cached result without forwarding the request to the server.


| <img src="../../images/cacheability.jpg"> |
| :---: |
| __Cacheability__ |

- The client and server can specify cache control headers in requests and responses.

| <img src="../../images/cache_control_headers.jpeg"> |
| :---: |
| [Subset of Cache Control Directives](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control)

## Layered System

- There may be many, many, many things including other microservices between the client and server. For example,
    - Firewalls.
    - Cache servers.
    - Middleware servers.
    - ... ...
    
    
- Part of designing and deploying microservices and cloud applications involves configuring or developing functionality that resides in various layers/intermediaries. We will see examples in the class.

## Code on Demand

- This primarily means that browsers (or devices) may be able to/have to download code to interact with server.


- JavaScript in the browser is the most common example, and we will do this in our projects.

- Resource identification in requests:
    - URIs, and nothing else, identifies a resource.
    - Resources are representations (JSON, XML, ...), and the client is unaware of the underlying realization, e.g. relational database, some legacy application, ...
    
    
- Manipulation of Resources Through Representations: When a client holds a representation of a resource, including any metadata attached, it has enough information to modify or delete the resource on the server, provided it has permission to do so. There is no additional information or data necessary, for example in documentation or other services.


- Self-descriptive messages: Each message includes enough information to describe how to process the message.


- <u>Hypermedia as the Engine of Application State (HATEOAS):</u> Clients deliver state via body contents, query-string parameters, request headers and the requested URI (the resource name). Services deliver state to clients via body content, response codes, and response headers. __Responses contain links to related resources.__ Awareness of how to convert data into URIs is not necessary.


- This will become more clear as we build out our services. The best way to learn this vaguely explained concept is by implementing it.

## Defining and Documenting a REST Interface


- The [Open API Specification](https://swagger.io/specification/) provides a model (and some tools) for thinking about APIs and how to model/define them.


- [Open API Explorer](http://openapi-map.apihandyman.io/?version=3.0) is an interactive tool for understanding Open API definitions and the elements.


| <img src="../../images/open_api.jpeg"> |
| :---: |
| [Open API Explorer](http://openapi-map.apihandyman.io/?version=3.0) |

- __Open API Swagger Demo__


- Open API is a systematic, thorough, complete, ... approach to publishing and collaborating on APIs.


- We do not need to do anything this systematic but the model is good to understand.


- The basic pattern focuses on paths.

```
/Customers
    GET /Customers?<query parameters>&fields=<list of fields>
    POST /Customers
    
/Customers/{id}
    GET fields=<list of fields>
    PUT
    DELETE
    
/Customers/{id}/<Some related resource>
```

- The code/logic that implements a resource has the following inputs:
    - Path
    - Path parameters
    - Query parameters
    - Headers
    - Body
   
   
- We will understand these concepts as we complete the ```CustomerInfo``` microservice, and in out next two microservices: ```Addresses``` and ```CustomerProfiles.``` We will implement the next two microservices using Lambda functions and DynamoDB.


- But first, we will build a simple set of "functions" for verifying email addresses and phone numbers.

## Summary

- There is no "standard" or set of hard and fixed rules for designing a REST API.


- There are countless descriptions of and opinions on best practices, e.g. https://docs.microsoft.com/en-us/azure/architecture/best-practices/api-design.



- The main requirement is consistency. If you develop 20 microservices, all of their APIs should consistently follow the same pattern and apply the same best practices.

# REST $-$ Some Additional Details

## Structure

<hr style="height:1px;">

| <img src="../../images/hw2_intro.jpeg"> |
| :---: |
| __HW2 Concept__ |

<hr style="height:1px;">



## Resource

<hr style="height:2px;">

| <img src="../../images/rest_concepts.png"> |
| :---: |
| [Resource Concept](https://restful-api-design.readthedocs.io/en/latest/resources.html) |

<hr style="height:2px;">

"The fundamental concept in any RESTful API is the resource. A resource is an object with a type, associated data, relationships to other resources, and a set of methods that operate on it. It is similar to an object instance in an object-oriented programming language, with the important difference that only a few standard methods are defined for the resource (corresponding to the standard HTTP GET, POST, PUT and DELETE methods), while an object instance typically has many methods.

Resources can be grouped into collections. Each collection is homogeneous so that it contains only one type of resource, and unordered. Resources can also exist outside any collection. In this case, we refer to these resources as singleton resources. Collections are themselves resources as well.

Collections can exist globally, at the top level of an API, but can also be contained inside a single resource. In the latter case, we refer to these collections as sub-collections. Sub-collections are usually used to express some kind of “contained in” relationship. We go into more detail on this in Relationships." (https://restful-api-design.readthedocs.io/en/latest/resources.html)

## URLs $-$ Digression

- A little more about ```mysql+pymysql://dbuser:dbuser@localhost/lahman2017```


- The connection specification above is a URL.


- "A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (http), but are also used for file transfer (ftp), email (mailto), database access (JDBC), and many other applications." (https://en.wikipedia.org/wiki/URL)


- A URL has the format


```URI = scheme:[//authority]path[?query][#fragment]```


- The components are:
    1. Scheme: Information about the protocol, connector library, ...
    2. Authority: Usually ```userid:password```.
    3. Path: File system like folder path to the resource.
    4. We will cover query string later.
    5. Fragment: A location or subset of the resource, e.g. a section with heading.
    
    
- We have seen how we connect to MySQL from Python using ```PyMySQL``` library.

```default_cnx = pymysql.connect(host='localhost',
                             user='dbuser',
                             password='dbuser',
                             db='lahman2017',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)```



- Some connector libraries support a single connection string of the form:<br>
```jdbc:mysql://someuserid:somepassword@www.myurl.com:3306```


- Other databases have similar concepts, for example CouchDB.


- The REST (HTTP) URL is the same concept<br> ```http://127.0.0.1:5000/api/lahman2019clean/people?nameLast=williams&birthCity=San%20Diego```


- In this case:
    - ```HTTP``` is the protocol (Schema) instead of ```mysql+pymysql```
    - ```/api/lahman2019clear/people``` is the resource.
    - ```nameLast=williams&birthCity=San%20Diego``` is the query, and analogous to:
        - ```WHERE nameLast='williams' and birthCity='San Diego'```
        - Or the weird Donald Ferguson templates ```{"nameLast": "Williams", "birthCity": "San Diego"}```
        
        
- There is no common convention for _project_ or choosing fields. We will use the convention ```fields=x,y,z```

## REST API, Protocol, Formats

<hr style="height:1px;">

| <img src="../../images/http_rest_protocol.jpeg" > |
| :---: |
| __HTTP and REST__ |

<hr style="height:1px;">

- URL paths have a pattern in most applications

| Path | Mapping |
| :---: | :--- |
| /api	| The API entry point |
| /api/:coll	| A top-level collection named “coll” |
| /api/:coll/:id	| The resource “id” inside/related to collection “coll” |
| /api/:coll/:id/:subcoll	| Sub-collection “subcoll” under resource “id” |
| /api/:coll/:id/:subcoll/:subid	| The resource “subid” inside “subcoll” |


- Path examples:
    - ```/api/people/willite01```
    - ```/api/people/willite01/batting```
    - ```/api/people/willite01/batting/BOS_1960_1```
    
    
- Query string:

"On the internet, a query string is the part of a uniform resource locator (URL) which assigns values to specified parameters. The query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML form.

A web server can handle a Hypertext Transfer Protocol request either by reading a file from its file system based on the URL path or by handling the request using logic that is specific to the type of resource. In cases where special logic is invoked, the query string will be available to that logic for use in its processing, along with the path component of the URL." (https://en.wikipedia.org/wiki/Query_string)


- Query string example:
    - ```http://127.0.0.1/api/people?nameLast=Williams&nameFirst=Ted
    - maps to
    - ```SELECT * FROM people where nameLast='Williams' and nameFirst='Ted'```
    
    
- There is no standard way to specify _project._ A common convent is ```fields=f1,f2,...```


- Full example:
    - ```http://127.0.0.1/api/people?nameLast=Williams&nameFirst=Ted&fields=playerID,nameLast,nameFirst,throws, bats``` 
    - maps to
    - ```SELECT playerID,nameLast,nameFirst,throws,bats FROM people where nameLast='Williams' and nameFirst='Ted'```
    
    
- These are very common conventions, which we will use for HW2. Applications and frameworks also use other conventions.

- ```limit``` and ```offset``` implement _pagination._

"Currently, when an HTTP GET request is issued on ... route, all of the table's rows are returned. This may not be a big deal with only 107 rows in ... table, but imagine what would happen if the table contained thousands or millions of rows. Clients such as mobile and web apps generally consume and display only a fraction of the rows available in the database and then fetch more rows when needed — perhaps when a user scrolls down or clicks the "next" button on some pagination control in the UI.

To allow for this, REST APIs need to support a means of paginating the results returned. Once pagination is supported, sorting capabilities become important as data usually needs to be sorted prior to pagination being applied. Additionally, a means of filtering data is very important for performance. Why send data from the database, through the mid-tier, and all the way to the client if it's not needed?" (https://dzone.com/articles/creating-a-rest-api-manual-pagination-sorting-and)


- You have seen that I have to do pagination when submitting queries to MySQL from Jupyter notebooks. If I do not use ```LIMIT,``` the amount of returned data causes the Jupyter notebook/browser to freeze/lock-up.


## Response Codes

See https://restfulapi.net/http-status-codes/


In [2]:
# Display the associated webpage in a new window
import IPython
url = 'https://restfulapi.net/http-status-codes/'
iframe = '<iframe src=' + url + ' width=900 height=750></iframe>'
IPython.display.HTML(iframe)

# A Little More about Business Logic

- We have seen that I have
    - An application/business logic layer/thingy.
    - A data access object/thingy.
    
    
- Because our applications are simple, a lot of the time my application logic simply calls the data object.


- There typically is a lot of logic in the application layer, and often the layer has to use several other services or data objects.


- A simple example is a simulated application for managing this class' students, teams, ...


- The logic for Registration or creating a student entry minimally needs to:
    - Validate the input parameters.
    - Ensure that the student is actually registered in the class.
    
    
- We can accomplish the second validation by calling the CourseWorks API.

In [6]:
import json
import sys
sys.path.append('/Users/donaldferguson/OneDrive - ANSYS, Inc/Columbia/Projects/ClassTeams2')
import Context.ContextHack

In [7]:
token = Context.ContextHack.token

In [8]:
token

{'Authorization': 'Bearer 1396~f4iVLu1xHCIiDQHBn6AsXmmuASMEIYLpv0fz8CDTrsXtHyCOjOZpFRrtfsEWq3Wa'}

In [9]:
import requests
url = "https://courseworks2.columbia.edu/api/v1/courses/87722/students"
headers = {}

headers["Authorization"] = token['Authorization']
print(headers)
res = requests.get(url=url, headers=headers)
#print("Turn on printing in lecture.")
print(json.dumps(res.json(), indent=2))


{'Authorization': 'Bearer 1396~f4iVLu1xHCIiDQHBn6AsXmmuASMEIYLpv0fz8CDTrsXtHyCOjOZpFRrtfsEWq3Wa'}
[
  {
    "id": 418261,
    "name": "Jie An",
    "created_at": "2019-02-14T09:12:05-05:00",
    "sortable_name": "An, Jie",
    "short_name": "Jie An",
    "sis_user_id": "ja3375",
    "integration_id": null,
    "root_account": "courseworks2.columbia.edu",
    "login_id": "ja3375"
  },
  {
    "id": 361449,
    "name": "Ryan Anderson",
    "created_at": "2017-10-15T08:56:20-04:00",
    "sortable_name": "Anderson, Ryan",
    "short_name": "Ryan Anderson",
    "sis_user_id": "ra2929",
    "integration_id": null,
    "root_account": "courseworks2.columbia.edu",
    "login_id": "ra2929"
  },
  {
    "id": 423960,
    "name": "Sidharth Bambah",
    "created_at": "2019-03-15T16:53:25-04:00",
    "sortable_name": "Bambah, Sidharth",
    "short_name": "Sidharth Bambah",
    "sis_user_id": "sb4283",
    "integration_id": null,
    "root_account": "courseworks2.columbia.edu",
    "login_id": "sb42

- The simple application has an adaptor/module for accessing CourseWorks.


- The create student function uses the API to validate enrollment.

```
    @classmethod
    def create_student(cls, student_info):

        for f in StudentService.required_create_fields:
            v = student_info.get(f, None)
            if v is None:
                raise ServiceException(ServiceException.missing_field,
                                       "Missing field = " + f)

            if f == 'email':
                if v.find('@') == -1:
                    raise ServiceException(ServiceException.bad_data,
                           "Email looks invalid: " + v)

        status_code, cw_info = cw_adaptor.get_students(course_id, student_info['uni'])

        if status_code != 200 or len(cw_info) == 0:
            raise ServiceException(ServiceException.authorization_error,
                                   "UNI not found in class enrollment.")

        result = StudentsDO.create_student(student_info=student_info)

        return result

```

# Function-as-a-Service (Serverless Computing)

## Introduction

"Function as a service (FaaS)" is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage application functionalities without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. Building an application following this model is one way of achieving a "serverless" architecture, and is typically used when building microservices applications." (https://en.wikipedia.org/wiki/Function_as_a_service)

<hr style="height:2px">

| <img src="../../images/faas.png"> |
| :---: |
| [Function as a Service](https://www.quora.com/What-is-FaaS) |

<hr style="height:2px">

"Serverless computing is a cloud-computing execution model in which the cloud provider acts as the server, dynamically managing the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. It is a form of utility computing.

The name "serverless computing" is used because the server management and capacity planning decisions are completely hidden from the developer or operator. Serverless code can be used in conjunction with code deployed in traditional styles, such as microservices. Alternatively, applications can be written to be purely serverless and use no provisioned servers at all." (https://en.wikipedia.org/wiki/Serverless_computing)

<hr style="height:2px">

| <img src="../../images/serverless1.png"> |
| :---: |
| [Serverless](https://deloitte.wsj.com/cio/2017/11/09/serverless-computings-many-potential-benefits/) |

<hr style="height:2px">



- How does "serverless" compare to PaaS, e.g Elastic Beanstalk
    - The zip file more or less contained an application "server."
    - You are aware of stopping, starting, etc. the application server.
    - There are considerations around scaling levels, etc.
    
    
- Serverless is:
    - Event happens.
    - Bound to a function (Lambda function).
    - Environment starts, executes function and terminates.
    


- This is a perspective pulling all the concepts together. I have no idea what it means, but I like beer and pizza.

<hr style="height:2px;">

| <img src="../../images/pizza_as_a_services.jpeg"> |
| :---: |
| [Huh?](https://medium.com/@pkerrison/pizza-as-a-service-2-0-5085cd4c365e) |

## Email Verification Service $-$ Build a Lambda Function

### Introduction

- The ```CustomerInfo``` microservice requires a customer to provide an email.


- Most applications verify ownership of the email by sending a message with an activation link.


- In our scenario, this will transition the user's account from ```PENDING``` to ```ACTIVE.```


- We will implement the function by:
    1. Modifying the Elastic Beanstalk microservice to emit a ```user_changed_event``` when data associated with a user changes.
    2. Developing a Lambda function that:
        1. Subscribes to ```user_changed_event.```
        2. Send a verification email containing an activation link using the [Simple Email Service](https://aws.amazon.com/ses/).
        3. Invokes a REST API on the ```CustomerInfo``` service to transition the user to ```ACTIVE``` when the user clicks on the activation link.
        4. Displays a web page on success (or failure) of activation.
        
        
- I will show you how to do some parts of this, but you will have to handle the other parts.


- This use case allows us to start understanding security and authorization.

### Sequence Diagram

<hr style="height:2px;">

| <img src="../../images/email_uml.jpg"> |
| :---: |
| __Email Verification Sequence Diagram__ |

<hr style="height:2px;">

- The diagram is a [Unified Modeling Language](https://en.wikipedia.org/wiki/Unified_Modeling_Language) [Sequence Diagram](https://en.wikipedia.org/wiki/Sequence_diagram).


- I teach data/information [entity-relationship modeling](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model) in my database class using the Crow's Foot.


- UML is a rich, expressive visual notation and formal language for precisely modeling systems. One of the main benefits is consistency and unambiguity. If you use PowerPoint or Google Slides, you have to explain what symbols mean and people use differently.


- I UML because:
    1. It makes look smart and cool.
    2. I know and worked with the authors.
    3. Your having basic familiarity is a good skill in case it comes up in interviews and you can put on resume.


- I use a small subset of UML because I am lazy and get bored easily.


## Simple Notification Service

- We briefly discussed [publish-subscribe (pub/sub)](https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern) in the first lecture. We also briefly discussed "event driven" being a core characteristic of microservices.


- This is our first exploration of pub/sub and we will cover in more detail later in the course.


- AWS [Simple Notification Service](https://aws.amazon.com/sns/) is the cloud platform API for enabling pub/sub.


- Our first task is to create create an SNS _topic._ I use the web admin UI for simplicity.


- __Note to instructor:__ Demo SNS topic creation.

## An Aside $-$ Infrastructure as Code

- More realistic environments use "scripts" to set up the virtual infrastructure for application configuration, deployment and change.


- If you remember the infrastructure as code concept from lecture 1, ...

<hr style="height:2px;"> 

| <img src="../../images/infrastructure_as_code.jpg"> |
| :---: |
| __Infrastructure as Code__ |

<hr style="height:2px;"> 

- This is a simple script that creates an SNS topic. A larger build, deployment, change, ... workflow would invoke the script in addition to building other artifacts, configuring infrastructure, etc.


- This could be a step in a Jekins pipelin or some other automation.

```
#!/c/Users/dferguso/Python37/python

import argparse
import boto3
import json


def run_it():
    parser = argparse.ArgumentParser(description='Create an SNS Topics')
    parser.add_argument('--name', metavar='topic_name', type=str,
                       help="'Name of the topic to create.'")
    parser.add_argument('--display_name', metavar="'Some cool topic display name.'",
                       help='Seriously?')

    args = parser.parse_args()

    print("Args = ", args)

    client = boto3.client('sns')

    params = {
            "Name": args.name,
            "Attributes" : {
                'DisplayName': args.display_name
            }
        }

    response = client.create_topic(**params)

    print("Result = ", json.dumps(response, indent=2))

if __name__ == "__main__":
    run_it()
```

## Lambda Functions

<hr style="height:2px;">

| <img src="../../images/email_lambda.jpeg"> |
| :---: |
| __Email Lambda Configuration__ |

<hr style="height:2px;">

- There are two event sources for triggering the Lambda function:
    - An event on a specific SNS topic.
    - A REST API call to a specific URL published through the API Gateway.
    
    
- We have configured the Lambda function to have authorization for three types of resources:
    - DynamoDB
    - CloudWatch
    - SES (Email Service)


- The code

```
import json

    
import boto3
from botocore.exceptions import ClientError

# Replace sender@example.com with your "From" address.
# This address must be verified with Amazon SES.
#SENDER = "Donald F. Ferguson <dff@cs.columbia.edu>"
SENDER = "Info <info@dff-cu.org>"

# Replace recipient@example.com with a "To" address. If your account 
# is still in the sandbox, this address must be verified.
RECIPIENT = "dff@cs.columbia.edu"

# Specify a configuration set. If you do not want to use a configuration
# set, comment the following variable, and the 
# ConfigurationSetName=CONFIGURATION_SET argument below.
CONFIGURATION_SET = "ConfigSet"

# If necessary, replace us-west-2 with the AWS Region you're using for Amazon SES.
AWS_REGION = "us-east-1"

# The subject line for the email.
SUBJECT = "Cool message from Don!!!"

# The email body for recipients with non-HTML email clients.
BODY_TEXT = ("Amazon SES Test (Python)\r\n"
             "This email was sent with Amazon SES using the "
             "AWS SDK for Python (Boto)."
            )
            
# The HTML body of the email.
BODY_HTML = """<html>
<head></head>
<body>
  <h1>Amazon SES Test (SDK for Python)</h1>
  <p>This email was sent with
    <a href='https://aws.amazon.com/ses/'>Amazon SES</a> using the
    <a href='https://aws.amazon.com/sdk-for-python/'>
      AWS SDK for Python (Boto)</a>.</p>
      <form action="http://google.com">
        <input type="submit" value="Go to Google" />
    </form>
</body>
</html>
            """            

# The character encoding for the email.
CHARSET = "UTF-8"

# Create a new SES resource and specify a region.
client = boto3.client('ses',region_name=AWS_REGION)

# Try to send the email.
def send_email(em):
    try:
        print("em = ", em)
        #Provide the contents of the email.
        response = client.send_email(
            Destination={
                'ToAddresses': [
                    em
                ],
            },
            Message={
                'Body': {
                    'Html': {
                        'Charset': CHARSET,
                        'Data': BODY_HTML,
                    },
                    'Text': {
                        'Charset': CHARSET,
                        'Data': BODY_TEXT,
                    },
                },
                'Subject': {
                    'Charset': CHARSET,
                    'Data': SUBJECT,
                },
            },
               Source=SENDER
            # If you are not using a configuration set, comment or delete the
            # following line
            #ConfigurationSetName=CONFIGURATION_SET,
            )
    # Display an error if something goes wrong.	
    except ClientError as e:
        print(e.response['Error']['Message'])
    else:
        print("Email sent! Message ID:"),
        print(response['MessageId'])
        
def handle_sns_event(records):
    
    sns_event = records[0]['Sns']
    topic_arn = sns_event.get("TopicArn", None)
    topic_subject = sns_event.get("Subject", None)
    topic_msg = sns_event.get("Message", None)
    
    print("SNS Subject = ", topic_subject)
    if topic_msg:
        json_msg = None
        try:
            json_msg = json.loads(topic_msg)
            print("Message = ", json.dumps(json_msg, indent=2))
        except:
            print("Could not parse message.")
            
        em = json_msg["customers_email"]
        send_email(em)
        
    
def lambda_handler(event, context):
    
    print("Event = ", json.dumps(event, indent=2))
    
    records = event.get("Records", None)
    print("Records = ", json.dumps(records, indent=2))
  
    if records:
        handle_sns_event(records)
        
       
    # TODO implement
    return {
        "statusCode": 200,
        "body": json.dumps('Hello from Lambda!')
    }



```

__Note:__
1. Walk through the code.
2. Demo using the SNS test event in the Lambda console.
3. Demo using SNS; event data is: {"customers_email": "donald.f.ferguson@gmail.com"}
4. Walk through the email setup steps.

- We will go through pub/sub and event based integration versus API calls in a later lecture.

# DynamoDB

## Motivation

- Consider a post on Piazza.

<hr style="height: 2px"> 

| <img src="../../images/cw1.jpeg"> |
| :---: |
| __Piazza Post__ |

<hr style="height: 2px"> 

- The data is semi-structured text, not well-defined record data/relational data. For example:
    - Two columns are slightly typed: _Summary_ and _Details._ These are text.  One of the strength of the relational model is enforcing integrity. Almost none of the abilities apply to these fields.
        - Types: INT, DOUBLE, CHAR, ENUM, ...
        - Check Constraints, e.g. ```salary > 0 AND salary < 250000```
        - Foreign keys.
    - The column ```Folders``` is multi-valued (```project, logistics, officehours```)
    - The user has some flexibility to change the "schema," e.g. allowed folders.
    - The data is linked, but you would never do a ```JOIN.```
    - The data is "document style," e.g. nested. In fact, CourseWorks makes a REST API call to get the data returned in [JSON](https://www.json.org/) format, and then uses JavaScript, CSS and frameworks to render. The raw data is:


<hr style="height: 2px"> 

| <img src="../../images/cw2.jpeg"> |
| :---: |
| __Piazza Post__ |

<hr style="height: 2px"> 

In [1]:
data={"result":{"folders":["officehours","logistics","project"],"nr":15,"data":{"embed_links":[]},"created":"2019-09-12T06:12:09Z","bucket_order":2,"no_answer_followup":2,"change_log":[{"anon":"no","uid":"i05r4bvmhya5n2","data":"k0gapkibwwx4zj","type":"create","when":"2019-09-12T06:12:09Z"},{"anon":"no","uid":"i05r4bvmhya5n2","to":"k0gapki83ds4zi","type":"followup","when":"2019-09-12T06:12:25Z"},{"anon":"no","uid":"i05r4bvmhya5n2","to":"k0gapki83ds4zi","type":"feedback","when":"2019-09-12T06:12:35Z"},{"anon":"no","uid":"i05r4bvmhya5n2","to":"k0gapki83ds4zi","type":"feedback","when":"2019-09-12T06:12:49Z"},{"anon":"no","uid":"i05r4bvmhya5n2","to":"k0gapki83ds4zi","type":"followup","when":"2019-09-12T06:13:06Z"},{"anon":"no","uid":"i05r4bvmhya5n2","to":"k0gapki83ds4zi","type":"feedback","when":"2019-09-12T06:13:16Z"}],"bucket_name":"Today","history":[{"anon":"no","uid":"i05r4bvmhya5n2","subject":"Sample Post","created":"2019-09-12T06:12:09Z","content":"<p>Ignore this post. I will use for an example of a document in class.<\/p>"}],"type":"note","tags":["instructor-note","logistics","officehours","project"],"tag_good":[],"unique_views":4,"children":[{"anon":"no","folders":[],"data":None,"no_upvotes":0,"subject":"<p>Comment 1<\/p>","created":"2019-09-12T06:12:25Z","bucket_order":2,"bucket_name":"Today","type":"followup","uid":"i05r4bvmhya5n2","children":[{"anon":"no","folders":[],"data":None,"subject":"<p>comment 2<\/p>","created":"2019-09-12T06:12:35Z","bucket_order":2,"bucket_name":"Today","type":"feedback","uid":"i05r4bvmhya5n2","children":[],"id":"k0gaq49jdxp5ck","updated":"2019-09-12T06:12:35Z","config":{}},{"anon":"no","folders":[],"data":None,"subject":"<p>Comment 2<\/p>\n<p><\/p>\n<p>&#64;14<\/p>\n<p>&#64;11<\/p>","created":"2019-09-12T06:12:49Z","bucket_order":2,"bucket_name":"Today","type":"feedback","uid":"i05r4bvmhya5n2","children":[],"id":"k0gaqfcya8u5hm","updated":"2019-09-12T06:12:49Z","config":{}}],"no_answer":1,"id":"k0gapw6xg5x58t","updated":"2019-09-12T06:12:25Z","config":{}},{"anon":"no","folders":[],"data":None,"no_upvotes":0,"subject":"<p>Comment 3<\/p>","created":"2019-09-12T06:13:06Z","bucket_order":2,"bucket_name":"Today","type":"followup","uid":"i05r4bvmhya5n2","children":[{"anon":"no","folders":[],"data":None,"subject":"<p>Comment 4<\/p>","created":"2019-09-12T06:13:16Z","bucket_order":2,"bucket_name":"Today","type":"feedback","uid":"i05r4bvmhya5n2","children":[],"id":"k0gaqzt8w565tw","updated":"2019-09-12T06:13:16Z","config":{}}],"no_answer":1,"id":"k0gaqrw384w5oj","updated":"2019-09-12T06:13:06Z","config":{}}],"tag_good_arr":[],"id":"k0gapki83ds4zi","config":{},"status":"active","upvote_ids":[],"request_instructor":0,"request_instructor_me":False,"bookmarked":1,"num_favorites":0,"my_favorite":False,"is_bookmarked":True,"is_tag_good":False,"q_edits":[],"i_edits":[],"s_edits":[],"t":1568269379649,"default_anonymity":"no","my_post":True},"error":None,"aid":"k0gb3acwuctf"}


import json
print(json.dumps(data, indent=2, default=str))

{
  "result": {
    "folders": [
      "officehours",
      "logistics",
      "project"
    ],
    "nr": 15,
    "data": {
      "embed_links": []
    },
    "created": "2019-09-12T06:12:09Z",
    "bucket_order": 2,
    "no_answer_followup": 2,
    "change_log": [
      {
        "anon": "no",
        "uid": "i05r4bvmhya5n2",
        "data": "k0gapkibwwx4zj",
        "type": "create",
        "when": "2019-09-12T06:12:09Z"
      },
      {
        "anon": "no",
        "uid": "i05r4bvmhya5n2",
        "to": "k0gapki83ds4zi",
        "type": "followup",
        "when": "2019-09-12T06:12:25Z"
      },
      {
        "anon": "no",
        "uid": "i05r4bvmhya5n2",
        "to": "k0gapki83ds4zi",
        "type": "feedback",
        "when": "2019-09-12T06:12:35Z"
      },
      {
        "anon": "no",
        "uid": "i05r4bvmhya5n2",
        "to": "k0gapki83ds4zi",
        "type": "feedback",
        "when": "2019-09-12T06:12:49Z"
      },
      {
        "anon": "no",
        "uid": "i05r

- Getting document data like this out of a relational or other structured database is painful.
    - If you do not believe me, take my section of W4111.
    - There is also little value because you have to implement constraints and integrity in the application, can cannot use the rich RDB capabilities.

- For our solution, we will need the document model for the comments microservice.
<hr style="height:2px">


| <img src="../../images/6156_project_structure.jpeg"> |
| :---: |
| __Target Solution Structure__ |


- We will use DynamoDB for the document database and Lambda functions for implementation.

## DynamoDB Concept

- Topology

<hr style="height:2px;">

| <img src="../../images/dynamodb_partitions.png"> |
| :---: |
| [DynamoDB Concept](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html) |

<hr style="height:2px;">



- A DynamoDB "table" has one or more _partitions._ and must define a _partition key_ that is a data field on ALL entries in the table.


- DynamoDB processes operations by:
    - Getting the partition key value from the request data.
    - Hashing to determine the partition.
    - Routing the request to the partition that contains (may contain) the data.
    

## Some Database Server/Data Layer Design Patterns

- There are two [basic approaches to scalability:](https://en.wikipedia.org/wiki/Scalability#Horizontal_and_vertical_scaling)
    - Horizontal aka scale-out: add more (or fewer) systems.
    - Vertical aka scale-up: Use a bigger system (more CPUs, memory, disk, ...)

<hr style="height:2px;">

| <img src="../../images/scale_up_out.jpeg"> |
| :---: |
| __Scale-Up and Scale-Out__ |

<hr style="height:2px;">

- There are pros and cons to each approach.
 
 
 - RDB's basic pattern is scale-up. DynamoDB's pattern is scale-out.
 
 
 - You can also think in terms of _shared everything_ versus _shared nothing._


<hr style="height:2px;">

| <img src="../../images/shared-nothing-comparison.jpg" width="900px"> |
| :---: |
| [Shared Everything vs. Shared Nothing](https://www.morpheusdata.com/blog/shared-nothing-architecture) |

<hr style="height:2px;">

- RDBs tend to favor _shared everything._ DynamoDB is _shared nothing._


- Shared nothing/scale out works
    - Extremely well for:
        - Scans
        - Individual item read/write/update.
    - Poorly for:
        - Referential integrity.
     - Multi-table/type operations, e.g. JOIN.
        
        
- Shared everything/scale up works
    - Extremely well for:
        - Complex, multi-table/type queries.
        - Referential integrity.
    - Poorly for:
        - Massive scalability.
        - Availability.
        
        
- I conflated scale up/versus scale out with shared everything/shared nothing. There is some ability to mix and match, but in general the two patterns are scale out/shared nothing and scale up/share everything.

## An Aside: The Microservices Scale Cube

- Microservice completely encapsulate data, __and__ two different microservices never share a database.


- There are a set of microservice scaling patterns that related to data scaling patterns.


<hr style="height:2px;">

| <img src="../../images/scale_cube.jpeg"> |
| :---: |
| __Microservices XYZ-Scaling__ |

<hr style="height:2px;">

<hr style="height:2px;">

| <img src="../../images/microservice_scale_cube.jpeg"> |
| :---: |
| __Microservices XYZ-Scaling__ |

<hr style="height:2px;">

- An individual microservice and a set of microservices may apply a combination of the patterns.

## Another Aside $-$ The "CAP" Theorem

"In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- _Consistency:_ Every read receives the most recent write or an error.
- _Availability:_ Every request receives a response that is not an error.
- _Partition tolerance:_ The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
    
In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions." (https://en.wikipedia.org/wiki/CAP_theorem)

<hr style="height:2px;">

| <img src="../../images/cap_theorem.png" width="900px"> |
| :---: |
| [CAP Theorem and DB Engines/Models] (https://blog.flux7.com/blogs/nosql/cap-theorem-why-does-it-matter) |

<hr style="height:2px;">

- Why the _consistency_ versus _availability_ trade-off?
    - If there is a single copy of each data item,  the system can ensure consistency for the item but not availability.
    - Replicated copies ensures availability, but _consistency_ would require some form of _locking_ of the replicas to perform a write. This is impossible if there are network partitions.
    


## Another Aside $-$ NoSQL Databases

"A NoSQL (originally referring to "non SQL" or "non relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the "NoSQL" moniker until a surge of popularity in the early twenty-first century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages, or sit alongside SQL database in a polyglot persistence architecture.

Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases), and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables" (https://en.wikipedia.org/wiki/NoSQL)


There are two basic reasons two use a "Not Only SQL Database:"
- The application does not require RDB functions, which means a reduced function DB can be cheaper, faster, more available, ...
- The application requires capabilities that are hard to achieve in the relational model.

## OK, "Back to our regularly scheduled programming."

## DynamoDB Data Model

<hr style="height:2px;">

| <img src="../../images/dynamo_model_1.jpeg"> |
| :---: |
| [DynamoDB Data Model](https://brewing.codes/2017/11/13/dynamo-data-modeling/) |

- Core concepts:
    - DynamoDB is "schema-less."
    - Table is a collection of Items
    - Item is a collection of Attributes
    
    
- Keys:
    - DynamoDB hashes the partition key to determine the table partition.
    - DynamoDB sorts using the optional sort key within a partition.
    
    
- Attribute Types (basically, think JSON): 
    - Scalars:
        - Numbers − They are limited to 38 digits, and are either positive, negative, or zero.
        - String − They are Unicode using UTF-8, with a minimum length of >0 and maximum of 400KB.
        - Binary − They store any binary data, e.g., encrypted data, images, and compressed text. DynamoDB views its bytes as unsigned.
        - Boolean − They store true or false.
        - Null − They represent an unknown or undefined state.
    - Documents:
        - List − It stores ordered value collections, and uses square ([...]) brackets.
        - Map − It stores unordered name-value pair collections, and uses curly ({...}) braces.
    - Sets: Sets must contain elements of the same type whether number, string, or binary.

## Another Aside $-$ Consistency Models

- The "classic" approach to databases and data consistency is the [ACID Model](https://en.wikipedia.org/wiki/ACID). The database engine enforces:
    - Atomicity
    - Consistency
    - Isolation
    - Durability
    
<hr style="height:2px;">

| <img src="../../images/acid.jpeg"> |
| :---: |
| __ACID Properties__ |

<hr style="height:2px;">

- The CAP Theorem and common sense tell us that ACID properties are hard to achieve, especially at Internet scales.


- So, since computer scientists have an odd sense of humor, since bases are the opposites of acids, there is also "BASE" 
    - B)asically (A)vailable: basic reading and writing operations are available as much as possible (using all nodes of a database cluster), but without any kind of consistency guarantees (the write may not persist after conflicts are reconciled, the read may not get the latest write)
    - (S)oft state: without consistency guarantees, after some amount of time, we only have some probability of knowing the state, since it may not yet have converged
    - (E)ventually consistent: If the system is functioning and we wait long enough after any given set of inputs, we will eventually be able to know what the state of the database is, and so any further reads will be consistent with our expectations" (https://en.wikipedia.org/wiki/Eventual_consistency)
    
    
- There are many, many variations of the details of BASE semantics and how databases implement the functions.

<hr style="height:2px;">

| <img src="../../images/eventual_consistency.jpeg"> |
| :---: |
| Consistency Models__ |

<hr style="height:2px;">

<hr style="height:2px;">

| <img src="../../images/eventual_consistency_2.jpeg"> |
| :---: |
| Consistency Models__ |

<hr style="height:2px;">


- DynamoDB supports:
    - Eventual consistency a guaranteed response, or
    - Strong consistency, but with the possibility of a read failure.


## Our Use of DynamoDB

- Our application will not scale to the point where we need DynamoDB.


- We are unlikely to have issues with eventual consistency. But, think about it, ... Does a comment thread really need to be strongly consistent?


- We are using DynamoDB
    - To get a feel for NoSQL models and document DBs.
    - Understand DynamoDB and similar programming models.
    - Implementing comments data model would be "icky" on relational models.

## DynamoDB API

- There is good [documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.API.html) on the API. 


- There are several good [tutorials.](https://www.tutorialspoint.com/dynamodb/index.htm)


- Countless samples and examples.


- There are a lot of APIs, but there is a relatively straightforward, common core. From the manual:


- "Control Plane: Control plane operations let you create and manage DynamoDB tables. They also let you work with indexes, streams, and other objects that are dependent on tables.
    - CreateTable – Creates a new table. Optionally, you can create one or more secondary indexes, and enable DynamoDB Streams for the table.
    - DescribeTable– Returns information about a table, such as its primary key schema, throughput settings, and index information.
    - ListTables – Returns the names of all of your tables in a list.
    - UpdateTable – Modifies the settings of a table or its indexes, creates or removes new indexes on a table, or modifies DynamoDB Streams settings for a table.
    - DeleteTable – Removes a table and all of its dependent objects from DynamoDB.
    
    
- Data Plane: Data plane operations let you perform create, read, update, and delete (also called CRUD) actions on data in a table. Some of the data plane operations also let you read data from a secondary index.
    - Creating Data
        - PutItem – Writes a single item to a table. You must specify the primary key attributes, but you don't have to specify other attributes.
        - BatchWriteItem – Writes up to 25 items to a table. This is more efficient than calling PutItem multiple times because your application only needs a single network round trip to write the items. You can also use BatchWriteItem for deleting multiple items from one or more tables.
    - Reading Data
        - GetItem – Retrieves a single item from a table. You must specify the primary key for the item that you want. You can retrieve the entire item, or just a subset of its attributes.
        - BatchGetItem – Retrieves up to 100 items from one or more tables. This is more efficient than calling GetItem multiple times because your application only needs a single network round trip to read the items.
        - Query – Retrieves all items that have a specific partition key. You must specify the partition key value. You can retrieve entire items, or just a subset of their attributes. Optionally, you can apply a condition to the sort key values so that you only retrieve a subset of the data that has the same partition key. You can use this operation on a table, provided that the table has both a partition key and a sort key. You can also use this operation on an index, provided that the index has both a partition key and a sort key.
        - Scan – Retrieves all items in the specified table or index. You can retrieve entire items, or just a subset of their attributes. Optionally, you can apply a filtering condition to return only the values that you are interested in and discard the rest.
    - Updating Data
        - UpdateItem – Modifies one or more attributes in an item. You must specify the primary key for the item that you want to modify. You can add new attributes and modify or remove existing attributes. You can also perform conditional updates, so that the update is only successful when a user-defined condition is met. Optionally, you can implement an atomic counter, which increments or decrements a numeric attribute without interfering with other write requests.
        - Deleting Data: DeleteItem – Deletes a single item from a table. You must specify the primary key for the item that you want to delete.
        - BatchWriteItem – Deletes up to 25 items from one or more tables. This is more efficient than calling DeleteItem multiple times because your application only needs a single network round trip to delete the items. You can also use BatchWriteItem for adding multiple items to one or more tables.
        
        
- Streams: DynamoDB Streams operations let you enable or disable a stream on a table, and allow access to the data modification records contained in a stream.
    - ListStreams – Returns a list of all your streams, or just the stream for a specific table.
    - DescribeStream – Returns information about a stream, such as its Amazon Resource Name (ARN) and where your application can begin reading the first few stream records.
    - GetShardIterator – Returns a shard iterator, which is a data structure that your application uses to retrieve the records from the stream.
    - GetRecords – Retrieves one or more stream records, using a given shard iterator.


- Transactions: Transactions provide atomicity, consistency, isolation, and durability (ACID) enabling you to maintain data correctness in your applications more easily.
    - TransactWriteItems – A batch operation that allows Put, Update, and Delete operations to multiple items both within and across tables with a guaranteed all-or-nothing result.
    - TransactGetItems – A batch operation that allows Get operations to retrieves multiple items from one or more tables.
    

- __Note:__ This is only "sort of ACID." True ACID handles multiple tables and interleaving of reads and writes.
    - This is more like "atomic multiple read" or "atomic multiple write." Which is useful, and a very common scenario in databases.
    - Also, because of the more flexible data model, the application requires fewer tables. Multiple relational tables would often map to a single DynamoDB table.

## Local DynamoDB

- There is a "local" DynamoDB that is useful for development and testing. See https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html.




- You can use the AWS Command Line (CLI) to create the local tables.


- The start command is:
```
java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
```

- You can create and list tables using commands like:
```
aws dynamodb create-table --table-name BaseballComments --attribute-definitions AttributeName=commentID,AttributeType=S AttributeName=commentDate,AttributeType=S --key-schema AttributeName=commentID,KeyType=HASH AttributeName=commentDate,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 --endpoint-url http://localhost:8000


aws dynamodb create-table --table-name BaseballComments --attribute-definitions AttributeName=commentID,AttributeType=S AttributeName=commentDate,AttributeType=S --key-schema AttributeName=commentID,KeyType=HASH AttributeName=commentDate,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 --endpoint-url http://localhost:8000
```

## Some Examples

- Surprisingly, WiFi is working on the airplane halfway across the Atlantic. So, I am going to use console to create my table.


### Initialize the Connection


In [53]:

import boto3
from boto3.dynamodb.conditions import Key, Attr

# There is some weird stuff in DynamoDB JSON responses. These utils work better.
from dynamodb_json import json_util as jsond

# There are a couple of types of client.
dynamodb = boto3.resource('dynamodb', 
                          # aws_access_key_id=aws_access_key_id,
                          # aws_secret_access_key=aws_secret_access_key,
                          region_name='us-east-1')
other_client = boto3.client("dynamodb")

table = dynamodb.Table('BaseballComments')
print("Table = ", table, "\n")




Table =  dynamodb.Table(name='BaseballComments') 

['BaseballComments', 'CustomerProfile', 'CustomerProfileTypes', 'order_comments', 'orders', 'products', 'users']


- Umm, <br>

<img src="../../images/what-is-this-sorcery.jpg" width="400px">

- ```boto3``` is the Python SDK for Amazon Web Services.


- AWS has "regions" (aka "big honking data centers") all over the place. I use ```us-east-1``` but there are others. Why do I use us-east-1?

<hr height="2px;">

| <img src="../../images/aws-regions.png"> |
| :---: |
| [AWS Regions](https://faasandfurious.com/pages/aws-regions.png) |


- What's that "key stuff?" AWS uses API Keys, which is common in cloud API scenarios.

"Access keys consist of two parts: an access key ID (for example, AKIAIOSFODNN7EXAMPLE) and a secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY). You use access keys to sign programmatic requests that you make to AWS if you use AWS CLI commands (using the SDKs) or using AWS API operations." (https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys)


- Sign a request? What does that mean? Sign, like a permission slip? We will cover security in subsequent lectures.

<hr height="2px;">

| <img src="../../images/Advanced_Potion-Making.png"> |
| :---: |
| [AWS Regions](https://faasandfurious.com/pages/aws-regions.png) |

<hr height="2px;">


- OK. Umm, how do I get and use them?
    - [Get Access Key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) instructions are pretty good.
    - Use them?
        - You can add them to the API call.
        - But, you do not want people to "see them."
        - So, I put mine in environment variables. You can also use an [AWS credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) file in your root directory.

## List Tables


In [55]:
def t1():
    tbls = other_client.list_tables()['TableNames']
    print (tbls)

In [56]:
t1()

['BaseballComments', 'CustomerProfile', 'CustomerProfileTypes', 'order_comments', 'orders', 'products', 'users']


## Create an Item

In [72]:
def do_a_put(table_name, partition_key_name, item):

    table = dynamodb.Table(table_name)

    try:
        response = table.put_item(
            Item=item,
            ConditionExpression='attribute_not_exists(' + partition_key_name + ')'
        )
    except Exception as e:
        print("Exception = ", e)
        raise e
        
    return response

In [68]:
import uuid
import time
import json

commentID = uuid.uuid4()

new_comment = {
    "comment_id": str(commentID),
    "commentor_id": 'dff9',
    "date": int(time.time()), # Round to seconds.
    "referenced_comments": None,
    "labels": ['Red Sox', 'Greatest Hitter of All Time'],
    "responses": None,
    "comment": "If you do not know who the greatest hitter of all time is, take my section of W4111"
}

print("Comment = \n", json.dumps(new_comment, indent=2))

Comment = 
 {
  "comment_id": "92bcbb0c-1142-4a1e-98aa-55370186283a",
  "commentor_id": "dff9",
  "date": 1568303334,
  "referenced_comments": null,
  "labels": [
    "Red Sox",
    "Greatest Hitter of All Time"
  ],
  "responses": null,
  "comment": "If you do not know who the greatest hitter of all time is, take my section of W4111"
}


In [69]:
do_a_put("BaseballComments", "comment_id", new_comment)

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '2',
   'content-type': 'application/x-amz-json-1.0',
   'date': 'Thu, 12 Sep 2019 15:49:02 GMT',
   'server': 'Server',
   'x-amz-crc32': '2745614147',
   'x-amzn-requestid': 'LPC3CNBUQQJ7ORJ59CBGKV5HGFVV4KQNSO5AEMVJF66Q9ASUAAJG'},
  'HTTPStatusCode': 200,
  'RequestId': 'LPC3CNBUQQJ7ORJ59CBGKV5HGFVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'RetryAttempts': 0}}

- What happens if I do it again?

In [74]:
try:
    another_comment = new_comment
    another_comment['comment'] = "Babe Ruth?"
    do_a_put("BaseballComments", "comment_id", another_comment)
except Exception as e:
    print("E = ", e)
    print("This did not work because of the condition on UUID")

Exception =  An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed
E =  An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed
This did not work because of the condition on UUID


- You can avoid overwriting existing items when assigning a partition key by using conditions.


- A conflict is almost impossible with UUIDs, but is possible for some types of keys.

## Get an Item

In [87]:
def get_item(table_name, key_value):
    table = dynamodb.Table(table_name)

    response = table.get_item(
        Key=key_value
    )
    
    response = response.get('Item', None)
        

In [88]:
get_item("BaseballComments",
         key_value={"comment_id": "92bcbb0c-1142-4a1e-98aa-55370186283a"}
        )

In [90]:
x = get_item("BaseballComments",
         key_value={"comment_id": "cat"}
        )
print(x)

None


## Delete and Item

In [91]:
def delete_item(table_name, key_value):
    table = dynamodb.Table(table_name)

    response = table.delete_item(
        Key=key_value
    )
    
    return response

In [92]:
delete_item("BaseballComments",
         key_value={"comment_id": "92bcbb0c-1142-4a1e-98aa-55370186283a"}
        )

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '2',
   'content-type': 'application/x-amz-json-1.0',
   'date': 'Thu, 12 Sep 2019 16:14:56 GMT',
   'server': 'Server',
   'x-amz-crc32': '2745614147',
   'x-amzn-requestid': 'FD3UULPLMHOJ0IVH8G8OJ4SP8JVV4KQNSO5AEMVJF66Q9ASUAAJG'},
  'HTTPStatusCode': 200,
  'RequestId': 'FD3UULPLMHOJ0IVH8G8OJ4SP8JVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'RetryAttempts': 0}}

In [94]:
delete_item("BaseballComments",
         key_value={"comment_id": "cat"}
        )

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '2',
   'content-type': 'application/x-amz-json-1.0',
   'date': 'Thu, 12 Sep 2019 16:16:12 GMT',
   'server': 'Server',
   'x-amz-crc32': '2745614147',
   'x-amzn-requestid': '787VUQIECBBLCJ21P4E5CRLDEVVV4KQNSO5AEMVJF66Q9ASUAAJG'},
  'HTTPStatusCode': 200,
  'RequestId': '787VUQIECBBLCJ21P4E5CRLDEVVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'RetryAttempts': 0}}

- Delete works a little weirdly.

- Let's modify get a little.

In [101]:
def get_item_new(table_name, key_value):
    table = dynamodb.Table(table_name)

    response = table.get_item(
        Key=key_value
    )
    
    response = (response['ResponseMetadata']['HTTPStatusCode'], response.get('Item', None))
    return response


In [102]:
get_item_new("BaseballComments",
         key_value={"comment_id": "cat"}
        )

(200, None)

- Get also works a little weird.

## Update

- This one is a little tricky.


- The typical scenario is that:
    1. I read something.
    2. I change it.
    3. I write it back.
    
    
- How do I avoid a write-write conflict, e.g. maybe there was a change in between my reading and writing.


<hr style="height:2px;">

| <img src="../../images/rw_conflict.jpeg"> |
| :---: |
| __Read/Write Conflict__ |

<hr style="height:2px;">


- This is tricky in REST, and many databases, because their is no locking or you should not use locking. We will discuss later in the course.


- What do I do? Well, first, lets change our data a little.

In [103]:
commentID = uuid.uuid4()

new_comment = {
    "comment_id": str(commentID),
    "version_no": 1,
    "commentor_id": 'dff9',
    "date": int(time.time()), # Round to seconds.
    "referenced_comments": None,
    "labels": ['Red Sox', 'Greatest Hitter of All Time'],
    "responses": None,
    "comment": "If you do not know who the greatest hitter of all time is, take my section of W4111"
}

print("Comment = \n", json.dumps(new_comment, indent=2))

Comment = 
 {
  "comment_id": "033b2839-0f16-40a5-9f5e-bf1d01b149dd",
  "version_no": 1,
  "commentor_id": "dff9",
  "date": 1568311540,
  "referenced_comments": null,
  "labels": [
    "Red Sox",
    "Greatest Hitter of All Time"
  ],
  "responses": null,
  "comment": "If you do not know who the greatest hitter of all time is, take my section of W4111"
}


In [104]:
do_a_put("BaseballComments", "comment_id", new_comment)

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '2',
   'content-type': 'application/x-amz-json-1.0',
   'date': 'Thu, 12 Sep 2019 18:06:53 GMT',
   'server': 'Server',
   'x-amz-crc32': '2745614147',
   'x-amzn-requestid': '5F8EHK33GB2M3AMNB3N0QQFOC7VV4KQNSO5AEMVJF66Q9ASUAAJG'},
  'HTTPStatusCode': 200,
  'RequestId': '5F8EHK33GB2M3AMNB3N0QQFOC7VV4KQNSO5AEMVJF66Q9ASUAAJG',
  'RetryAttempts': 0}}

- Now, I am going to simulate two different applications reading the data.

In [113]:
bob = get_item_new("BaseballComments",
         key_value={"comment_id": "033b2839-0f16-40a5-9f5e-bf1d01b149dd"}
        )
bob

(200,
 {'comment': 'If you do not know who the greatest hitter of all time is, take my section of W4111',
  'comment_id': '033b2839-0f16-40a5-9f5e-bf1d01b149dd',
  'commentor_id': 'dff9',
  'date': Decimal('1568311540'),
  'labels': ['Red Sox', 'Greatest Hitter of All Time'],
  'referenced_comments': None,
  'responses': None,
  'version_no': Decimal('1')})

In [107]:
mary = get_item_new("BaseballComments",
         key_value={"comment_id": "033b2839-0f16-40a5-9f5e-bf1d01b149dd"}
        )
mary

(200,
 {'comment': 'If you do not know who the greatest hitter of all time is, take my section of W4111',
  'comment_id': '033b2839-0f16-40a5-9f5e-bf1d01b149dd',
  'commentor_id': 'dff9',
  'date': Decimal('1568311540'),
  'labels': ['Red Sox', 'Greatest Hitter of All Time'],
  'referenced_comments': None,
  'responses': None,
  'version_no': Decimal('1')})

- Bob will now modify the comment.

In [114]:
bob=bob[1]

bob['responses'] = []
rsp = {
    "commentor_id": 'dff9',
    "date": int(time.time()),
    "comment": "Totally"
}
bob['responses'].append(rsp)
bob['version_no'] += 1

In [115]:
bob

{'comment': 'If you do not know who the greatest hitter of all time is, take my section of W4111',
 'comment_id': '033b2839-0f16-40a5-9f5e-bf1d01b149dd',
 'commentor_id': 'dff9',
 'date': Decimal('1568311540'),
 'labels': ['Red Sox', 'Greatest Hitter of All Time'],
 'referenced_comments': None,
 'responses': [{'comment': 'Totally',
   'commentor_id': 'dff9',
   'date': 1568312131}],
 'version_no': Decimal('2')}

- We will now modify put.

In [122]:
def do_an_update(table_name, partition_key_name, item, old_version):

    table = dynamodb.Table(table_name)

    try:
        response = table.put_item(
            Item=item,
            ConditionExpression='attribute_exists(' + partition_key_name + ') and ' +
                'version_no =  :num',
            ExpressionAttributeValues={
            ':num': old_version})
    except Exception as e:
        print("Exception = ", e)
        raise e
        
    return response

In [123]:
rsp = do_an_update('BaseballComments', 'comment_id', bob, 1)

- Now let's simulate Mary's task.

In [125]:
mary=mary[1]

mary['responses'] = []
rsp = {
    "commentor_id": 'mm1',
    "date": int(time.time()),
    "comment": "Babe Ruth is better"
}
mary['responses'].append(rsp)
mary['version_no'] += 1

In [126]:
try:
    rsp = do_an_update('BaseballComments', 'comment_id', mary, 1)
    print("Updated.")
except Exception as e:
    print("Mary got exception = ", e)

Exception =  An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed
Mary got exception =  An error occurred (ConditionalCheckFailedException) when calling the PutItem operation: The conditional request failed


- At this point, Mary can reread, re-modify the data and try again.


- BTW, any comment suggesting Babe Ruth was better than Ted Williams should fail!

## Scan

- Let's put another comment.

In [128]:
commentID = uuid.uuid4()

new_comment = {
    "comment_id": str(commentID),
    "version_no": 1,
    "commentor_id": 'dff9',
    "date": int(time.time()), # Round to seconds.
    "referenced_comments": None,
    "labels": ['Red Sox', 'Greatest Hitter of All Time'],
    "responses": None,
    "comment": "Mary is a twit!"
}

print("Comment = \n", json.dumps(new_comment, indent=2))

Comment = 
 {
  "comment_id": "22af9642-0e02-453f-8b4e-f9bea617ddf8",
  "version_no": 1,
  "commentor_id": "dff9",
  "date": 1568312926,
  "referenced_comments": null,
  "labels": [
    "Red Sox",
    "Greatest Hitter of All Time"
  ],
  "responses": null,
  "comment": "Mary is a twit!"
}


In [129]:
do_a_put('BaseballComments', 'comment_id', new_comment)

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '2',
   'content-type': 'application/x-amz-json-1.0',
   'date': 'Thu, 12 Sep 2019 18:28:50 GMT',
   'server': 'Server',
   'x-amz-crc32': '2745614147',
   'x-amzn-requestid': 'KMMD2CO1374EOT558JI96UAKM7VV4KQNSO5AEMVJF66Q9ASUAAJG'},
  'HTTPStatusCode': 200,
  'RequestId': 'KMMD2CO1374EOT558JI96UAKM7VV4KQNSO5AEMVJF66Q9ASUAAJG',
  'RetryAttempts': 0}}

In [130]:
def do_a_scan(table_name, filterexpression):
    table = dynamodb.Table(table_name)

    if filterexpression is not None:
        response = table.scan(
            filterexpression
        )
    else:
        response = table.scan(
        )

    print("Scan succeeded")
    #print(json.dumps(response, indent=4))
    return response

In [135]:
response = do_a_scan('BaseballComments', None)
print("Scan = ", json.dumps(response, indent=2, default=str))

Scan succeeded
Scan =  {
  "Items": [
    {
      "comment_id": "bdc0a9f1-ca92-4a5b-ab89-d3ea059514a6",
      "date": "1568303103",
      "responses": null,
      "commentor_id": "dff9",
      "referenced_comments": null,
      "labels": [
        "Red Sox",
        "Greatest Hitter of All Time"
      ]
    },
    {
      "comment_id": "033b2839-0f16-40a5-9f5e-bf1d01b149dd",
      "date": "1568311540",
      "version_no": "2",
      "responses": [
        {
          "commentor_id": "dff9",
          "date": "1568312131",
          "comment": "Totally"
        }
      ],
      "commentor_id": "dff9",
      "labels": [
        "Red Sox",
        "Greatest Hitter of All Time"
      ],
      "comment": "If you do not know who the greatest hitter of all time is, take my section of W4111",
      "referenced_comments": null
    },
    {
      "comment_id": "22af9642-0e02-453f-8b4e-f9bea617ddf8",
      "date": "1568312926",
      "version_no": "1",
      "responses": null,
      "commentor_id

- There is a relatively complete "filter expression" language that allows specification of predicates. I seldom use it.


- There is also a way to specify a subset of the fields that you want returned.


- Finally, you will have to worry about pagination, but we will handle that later.


- If I do not use filter expressions, how will I query comments? I will use text search.
    - Dump the "text" into a text search engine.
    - Searches will return matching comments and their IDs, and we can then get the comments.
    - We will try to cover this later in the course.

# Project Summary