<img src="../docs/images/FirstSlide.png" alt="Build a Replicable Serverless Python NLP App from Scratch" width="1000" />

## Table of Contents

- I. **Introduction**
- II. **The NLP Pipeline**
- III. **Infrastructure Provisioning Workflow**
- IV. **Conclusion**

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> I. Introduction </h1>

- What is our Goal?
- Motivation for the Talk
- Why AWS CLI
- Why Taskfiles
- GitHub Codes supporting this talk


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h2 align="center"> What is our Goal?</h2>

> **Automation in infrastructure provisioning** is key to reducing the development time. <br>No one else can do this better than a developer so that DevOps engineer does less heavy-lifting at the time-crunched release window period


- Create Infrastructure provisioning codes as & when you write your application development codes
- Make every small configuration component of your provisioned services traceable, and version-controlled


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

In this talk, we will ... <br>
> Cover how to build a Python NLP App in Cloud using **replicable infrastructure provisioning codes**

> Give transferable learnings on building **Serverless Apps in any cloud infra** (here, AWS Cloud infra is chosen). 

> Be using a combination of **CLI commands and Yaml Taskfiles** to provision the AWS Infrastructure. 


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

#### **What is in it for you?**
By the end of this talk, a developer will 
- start loving the potent combination of [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) + [Taskfile](https://taskfile.dev/installation/) for provisioning replicable cloud infrastructure

- understand how this AWS CLI approach helps in-depth understanding of the cloud Services employed

- reduce the "hell month" for DevOps Engineers during releases (refer [adopting DevOps mindset](https://stackoverflow.blog/2020/06/10/the-rise-of-the-devops-mindset/)) 

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h2 align="center"> Motivation for the Talk </h2>

- Is DevOps an afterthought for a developer? 
- Does a developer have the same rigor in `application code development` applied to their `infrastructure provisioning codes`? 

<img src="../docs/images/devops_mindset.png" alt="DeveOps Mindset" width="500" />
<h5 align="center">Image Source: https://modelthinkers.com/mental-model/devops-mindset </h5>

<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

#### Why Dev needs to adopt DevOps Principles 

- More nimble & responsive product delivery !

> The right DevOps culture ultimately makes you deliver better products faster. <br>

<h5 align="center">Source: https://stackoverflow.blog/2020/06/10/the-rise-of-the-devops-mindset/ </h5>


> “It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.”
> Charles Darwin

<h5 align="center">Source: https://faun.pub/some-popular-devops-quotes-and-what-i-learned-from-it-using-in-my-day-to-day-development-7b299ced7884 </h5>


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h2 align="center"> Why AWS CLI? </h2>

The reason why we chose AWS CLI is that <br>
- it is a unified interface to all AWS services and
- it mimics the console way of creating AWS services with the right* level of abstraction
    - meaning, you have control over the CRUD operations 
- it is not different from a traditional bash commands that we are highly familiar with

*arguable personal opinion 


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">


- What you give in arguments is what is provisioned for you !

```bash
/path/where/lambda_codes/located % cat aws_cli_command_for_lambda_creation.bash
#!/bin/bash
aws lambda create-function \
--function-name $1 \
--zip-file fileb://${1}.zip \
--runtime python3.8 \
--role $2 \
--handler lambda_function.lambda_handler \
--timeout 60 \
--memory-size 256 \
--layers $3 \
--architectures x86_64 
```


- And, it gets executed with a just `task <task_name>`

```bash
/path/where/lambda_codes/located % cat Taskfile.yml 
...
...
tasks:
  create_lambda_name:
    cmds:
      - zip -r ${LAMBDA_FUNCTION_NAME}.zip lambda_function.py
      - bash aws_cli_command_for_lambda_creation.bash $LAMBDA_FUNCTION_NAME $IAM_ROLE_ARN $SPACY_LAYER
```
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h2 align="center"> Why Taskfiles? </h2>

- The Taskfiles make the execution of steps so easy. You could repeat it multiple times for various projects.

```bash
# how to create IAM policy and roles
/path/where/IAM_Taskfile/is/located/ % task create_policy && task create_role && task attach_role_to_policy

# how lambda function is created
/path/where/LAMBDA_Taskfile/is/located/ % task create_lambda_name && task update_lambda_environment

# how to test the lambda
/path/where/Testing_Taskfile/is/located/ % task run_test_event_1 

{
    "StatusCode": 200, "ExecutedVersion": "$LATEST"
},
{
  "output_bucket_name": "pycon-$USER-nlp-output-bkt", "file_key": "email_1.txt",
  "message": "PII Redaction Pipeline worked successfully"
}
```

<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">



**Alternate Options**: 
- As a developer you could also choose other modes of creation of Cloud Services such as [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html), [AWS CloudFormation](https://aws.amazon.com/cloudformation/getting-started/) and [AWS Terraform](https://github.com/terraform-aws-modules) and others.
    - There is a bit of a learning curve involved in all the above frameworks. 
- Want to know more about AWS Toolkits? Refer [aws-toolkit-aws-cli-sdk-and-cdk-6feab9e746b8](https://aws.plainenglish.io/aws-toolkit-aws-cli-sdk-and-cdk-6feab9e746b8) 


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> Link to the GitHub Code</h1>

<div style="text-align: center;">
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" alt="https://github.com/Toyota-Connected-India/serverless_nlp_app" width="100" />
</div>

---


<h3 align="center"> https://github.com/Toyota-Connected-India/serverless_nlp_app </h3>

<br>

---

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> II. The NLP Pipeline </h1>

- Goal of the NLP Pipeline
- Sample Input and Output
- Pipeline Architecture
- Workflow of the Pipeline `Tasks`


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">


Broadly, there are two major types of NLP Pipelines: 
1. **NLP Pipeline in ML World**: Text pre-processing >> Feature Embeddings >> Model Training and Prediction on Numericalized Embeddings 
2. **NLP Pipeline in Data Engineering World**: Transforming raw text data into useful outputs in a sequence of steps.

If an NLP Pipeline could be defined in above 2 major ways, the second definition of Data Engineering based pipeline, is what we will accomplish in this talk.


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> Goal of the NLP Pipeline </h1>

- The pipeline will be capable of redacting sensitive information such as email addresses, phone numbers, and names (PII) from email bodies. 
	- We will use examples from Enron Email data to test the pipeline.
    
<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> Pipeline Architecture </h1>


<img src="../docs/images/pii_redaction_pipeline_architecture.png" alt="Pipeline Architecture" width="700" />

Note: This pipeline is intentionally made simple. Real-world Serverless Pipelines could be much more complex

<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> Sample Input and Output </h1>


### Sample Input 

<img src="../docs/images/email_1.png" alt="Input" width="400" />


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

### Output of Simple Lambda (phone and email redacted)

<img src="../docs/images/email_1_intermediary_output.png" alt="Input" width="400" />

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

### Output of Layer Lambda (names redacted)

<img src="../docs/images/email_1_final_output.png" alt="Input" width="400" />

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> Workflow of  the Tasks  </h1>


#### 1. Setting up S3 Buckets

	- Create the S3 Trigger bucket (s3_1 in pic), intermediary S3 bucket and Output S3 Bucket (s3_2 in pic) 
<br>

[GitHub Codes Reference](https://github.com/Toyota-Connected-India/serverless_nlp_app/blob/master/src/aws/1.create_s3_buckets/Taskfile.yml)

<hr>

#### 2. Create a "Simple Lambda"
	- With no special/ extra packages, in a standard Py3.8 lambda env, 
		- create a lambda that replaces Phone and Email and 
		- test it with a sample csv file
<br>[GitHub Codes Reference](https://github.com/Toyota-Connected-India/serverless_nlp_app/tree/master/src/aws/3.stepfunctions_pipeline/b.lambda_codes/simple_lambda)

<hr>

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">


#### 3. Create a "Layer Lambda" 

- 3.A. Create a Spacy Layer inside a `amazon/aws-lambda-python:3.8` and publish as a layer
- 3.B. Create a "Layer Lambda" that identifies & replaces names using a SpaCy pre-trained model 

[GitHub Codes Reference](https://github.com/Toyota-Connected-India/serverless_nlp_app/tree/master/src/aws/3.stepfunctions_pipeline/b.lambda_codes/layer_lambda)

<hr>

#### 4. Create a StepFunctions Pipeline

- 4.A. Create a Stepfunctions State Machine Json | [GitHub Codes Reference](https://github.com/Toyota-Connected-India/serverless_nlp_app/tree/master/src/aws/3.stepfunctions_pipeline/a.create_iam_role_policy_state_machine)
- 4.B. Create a StepFunctions Invoke Lambda and Test it | [GitHub Codes Reference](https://github.com/Toyota-Connected-India/serverless_nlp_app/tree/master/src/aws/2.stepfunctions_invoke_lambda)

<hr>

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

#### 5.Test the Whole Pipeline with a simple task command

```bash
# set up the temporary AWS credentials
## task executes the task from `Taskfile.yml` in the `/path/to/serverless_nlp_app`
/path/to/serverless_nlp_app/src/aws/2.stepfunctions_invoke_lambda/c.testing  % task run_test_event_1
```
[GitHub Codes Reference](https://github.com/Toyota-Connected-India/serverless_nlp_app/blob/master/src/aws/2.stepfunctions_invoke_lambda/c.testing/Taskfile.yml)

<hr>

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">


<h1 align="center"> III. Infrastructure Provisioning Workflow  </h1>

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<img src="../docs/images/Taskfile_Workflow_Layer_Lambda_Example.png" alt="Taskfile Workflow" width="700"/>

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

### Sample IAM `Taskfile.yml`

<img src="../docs/images/Taskfile_IAM_creation.png" alt="IAM Taskfile" width="500" />

<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

### Sample Lambda `Taskfile.yml`

<img src="../docs/images/Taskfile_Lambda_creation.png" alt="Lambda Taskfile" width="600"/>

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> IV. Conclusion </h1>

- Treat your infrastructure provisioning codes with the same rigor as your application development codes. 
    - This is getting easier with `AWS CLI` + `Taskfile.yml` approach 
- As many companies adopt a cloud-first approach, ensuring its developers have a `DevOps Mindset` ensures better software development cycle


<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

<h1 align="center"> Appendix: Link to the GitHub codes </h1>

<div style="text-align: center;">
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" alt="https://github.com/Toyota-Connected-India/serverless_nlp_app" width="100" />
</div>

---


<h3 align="center"> https://github.com/Toyota-Connected-India/serverless_nlp_app </h3>

<br>

---

<br>
<br>
<img src="../docs/images/footer_logo.png" alt="tcin logo" style="width:1200px;height:40px;">

In [25]:
!jupyter nbconvert serverless_nlp_python_app_slides.ipynb --to slides

[NbConvertApp] Converting notebook serverless_nlp_python_app_slides.ipynb to slides
[NbConvertApp] Writing 609493 bytes to serverless_nlp_python_app_slides.slides.html
