# Lambda

Lambda is a serverless computing service from AWS. This means that we can lauch code without worrying about provisioning the necessary infrastructure. 

# Why is Lambda Awesome ?

**It allows us to focus on the important part**

Normally, whenever we want to execute an script, we need a computer that runs it. If we want to run this script remotely, we then need a server (something like an EC2 machine). However, before we can run the code, we have to set up the server, configure the environment, secure the system and so on. If we want to change our code, then we have to access the server, change the code, make sure that everything still runs... it can quickly become a lot of work.

While there are tools in the market that help us in this process, AWS Lambda allows us to focus on the most important part, which is **write the code**.

**Very little overhead configuration (can be) needed**

Configuring Lambda can become tricky, however compared with other services it's much simpler to set up and run. In case of AWS Glue, for example, the amount of configuration necessary to run it can be overwhelming, with lambda we can simply deploy code. Of course, on the long term for bigger projects this initial configuration can pay off (instead of running multiple lambda functions for every task), but for smaller tasks lambda can tackle the job very well.

# Hands-On

Maybe the best way to show the power of lambda is demonstrate how it works.

Basically, what we will do is take some CSV files stored in a S3 Bucket and load them into a relational database using lambda.

# S3 (Data Source)

The files that are located on the S3 bucket were downloaded from [Kaggle](https://www.kaggle.com/). They're [data from a Brazillian e-commerce business](https://www.kaggle.com/olistbr/brazilian-ecommerce).

![olist](Lambda/lambda_olist.png)

# Database (Data Target)

In the relation database, I already created the tables to hold the data that we need. In a "real world" environment the database schema would have already been decided by the database administrator or the person leading the development process. 

For this example, I just went with the structure of the files themselves, as they are organised in a logical way according to the [kaggle page](https://www.kaggle.com/olistbr/brazilian-ecommerce) from which I downloaded it.

![tables](Lambda/lambda_rds.png)

# Setting Up Lambda

Even though lambda allows us to quickly run some code, we can't get around some "configuration". We have to make sure that our code can access the AWS resources it needs to move data from one place to the other. In this case, we're looking for access to S3 and RDS.

One easy way to do this is to create a user with the permissions to access the services. In a company though you might have to talk to the cloud administrator so that he can give you the necessary permissions. 

This should generate an AWS Secret Key and a Key ID that allows us to authenticate the code we're running. 

![iam](Lambda/lambda_iam.png)

*Note: there are many ways to do this and eventually you might want to do this differently in a company or in a production environment*
# Creating the Function

I'll create a lambda function from a blank template. The function's name will be S3ToRDS and it'll run with python 3.7 as the programming language.

![aws_lambda_config](Lambda/lambda_config.png)

# Configuring the Function

Once the function is created, we're sent to a new screen. Here we can pick events that will trigger this function and also a destination, that is the service that the function will connect with.

Since for this example we're doing a "one time load", I won't spend more time on this. It's nice to know, however, that we can configure lambda functions to react to something that has happened in our network or our application, but also to simply execute at a given time every day much like a cron job in linux.

![aws_lambda_head_page](Lambda/lambda_head.png)

At the bottom of the screen, we find other functionalities to configure our lambda function as we please. One interesting one is the possibility to set the environment variables. That allows us to pass some parameters to our code without altering the code itself.

Here, for the sake of this example, I'll put the credentials of the user we created for our lambda functions.

*Note: is this the right thing to do? [Some people have some concerns about having sensitive information in lambda environments](https://lumigo.io/blog/aws-lambda-vs-ec2/). Lambda does share resources with other AWS users, so we have to count on AWS to guarantee the privacy of this information. If we want to take care of safety ourselves, we have to lauch an EC2 machine, but then we lose the benefits of the serverless architecture*.

![lambda_env](Lambda/lambda_env.png)

# The Lambda Script

Finally, we get to the scripting part. 
Basically what our lambda function has to do is access an S3 bucket containing CSV files, read each one of them and upload them to the database.

We can obtain this result with the following script:

In [3]:
import boto3
import pandas as pd
import io
import psycopg2
from psycopg2 import sql
import os
import json

def lambda_handler(event, context):
    
    # Connects to the S3 Resource
    s3 = boto3.resource("s3",
                    aws_access_key_id = os.environ["aws_secret_key_id"],
                   aws_secret_access_key = os.environ["aws_secret_access_key"],
                   region_name = os.environ["aws_region"])

    # Finds the Bucket
    bucket = s3.Bucket("olist-dataset")

    # Connects to the Database
    
    conn = psycopg2.connect(os.environ["db_conn_string"])
    cur = conn.cursor()
    
    # Loops through the bucket contents
    for file in bucket.objects.all():
        
        print(f"{file.key}")
        
        bytes_file = io.BytesIO() # Instantiates bytes file
        
        file_name = file.key
        
        if file_name == "product_category_name_translation.csv":
            
            target_table_name = "product_category_name_translation"
            
        else:
            target_table_name = file_name.replace("olist_","").replace("_dataset.csv","")
        
        print(target_table_name)
        
        download_start = pd.Timestamp("now")
        print("Starting Download")
        
        bucket.download_fileobj(file_name, bytes_file)
        
        download_end = pd.Timestamp("now")
        print("Finished Downloading. Elapsed Time = ", download_end - download_start)
        
        csv_file = io.TextIOWrapper(bytes_file, encoding="utf8")
        csv_file.seek(0)
            
        query = sql.SQL("""COPY {table_name}
                        FROM STDIN
                        WITH HEADER CSV ENCODING 'utf8'""").format(table_name= sql.Identifier(target_table_name))
            
        upload_start = pd.Timestamp("now")
        print("Starting upload")
        cur.copy_expert(query, csv_file)
        upload_end = pd.Timestamp("now")
        print("Finished Upload. Elapsed Time = ", upload_end - upload_start)
            
        conn.commit()
        print("Done With Table {}".format(target_table_name))
        
        
    return {
            'statusCode': 200,
            'body': json.dumps('Function executed successfully!!!')
        }

Now the only thing left to do is encapsulate the file in the lambda handler function that is provided with lambda.

![handler](Lambda/lambda_handler.png)

After we've clicked deploy to save our changes, we just have to push the test button to run it. 

# Or Not...

Not everything is great with lambda. Some packages (like pandas and psycopg2 - or even some custom packages that we need) are not available out of the box. In order to allows lambda to use those packages, we have to prepare a layer to the function.

There are many tutorials in the internet on how to do it, for example [this one](https://www.gcptutorials.com/post/how-to-use-pandas-in-aws-lambda).

Basically, we have to:

1) Create a python virtual environment

2) Install the packages in the virtual environment (in my case, the "requirements.txt" file has to be included) and prepare a zip file with all the libraries. 

3) Than we have to create a layer for the function and attach it to the lambda function we want to run.

![layer](Lambda/lambda_layer.png)

Once done, we'll see that the lambda runs on a layer.

![layer_successful](Lambda/lambda_layer_successful.png)

Now, the function will run to the end.

![layer_result](Lambda/lambda_result.png)

# Conclusion

Lambda is a great AWS service. It's quite flexible (in spite of the necessity to add layers sometimes) and allows us to focus on writing the code, instead of provisioning servers, installing an IDE, creating an environment and so on...

It's surely one of the most useful AWS services and surely one that's worth learning how to use!