# OpenAI tests for lambda - azure function converter

## Testing RAG over Code

In this Notebook we'll follow the instructions from the langchain website for RAG with code [here](https://python.langchain.com/docs/use_cases/question_answering/code_understanding)

### Install the necessary libraries

In [None]:
pip install langchain

In [None]:
pip install openai

In [None]:
pip install chromadb

In [None]:
pip install tiktoken

In [None]:
pip install python-dotenv

### Setup

Import the libraries and environment variables to gain access to the `Open API Key`

In [1]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language

import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

api_key=os.environ['OPENAI_API_KEY']
base_url=os.environ['OPENAI_BASE_URL']
api_version=os.environ['OPENAI_API_VERSION']

print(base_url)

https://devsquad-eastus-2.openai.azure.com/


### Loading the code

We will upload all the go project files using the `langchain.document_loaders.TextLoader`. Let's define the target path with the code we want to use for RAG. 

In [2]:
repo_path = "../go-examples"

We load the go code using `LanguageParser`, which will:

- Keep top-level functions and classes together (into a single document)
- Put remaining code into a separate document
- Retains metadata about where each split comes from

In [3]:
# Load
loader = GenericLoader.from_filesystem(
    repo_path,
    glob="**/*",
    suffixes=[".go"],
    parser=LanguageParser(language=Language.GO, parser_threshold=500),
)
documents = loader.load()
len(documents)

21

### Splitting

Split the Document into chunks for embedding and vector storage.

We can use `RecursiveCharacterTextSplitter` with the language specified.

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

go_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.GO, chunk_size=500, chunk_overlap=0
)
texts = go_splitter.split_documents(documents)
len(texts)

66

In [6]:
print(texts[0:5])

[Document(page_content='package main\n\nimport (\n\t"context"\n\t"fmt"\n\n\t"github.com/aws/aws-lambda-go/lambda"\n)\n\ntype MyEvent struct {\n\tName string `json:"name"`\n}\n\ntype MyResponse struct {\n\tMessage string `json:"message"`\n}\n\nfunc HandleRequest(ctx context.Context, event *MyEvent) (*MyResponse, error) {\n\tif event == nil {\n\t\treturn nil, fmt.Errorf("received nil event")\n\t}\n\tmessage := fmt.Sprintf("Hello %s!", event.Name)\n\treturn &MyResponse{Message: message}, nil\n}\n\nfunc main() {\n\tlambda.Start(HandleRequest)\n}', metadata={'source': '..\\go-examples\\examples\\gin\\basic-conversion-1\\input\\main.go', 'language': <Language.GO: 'go'>}), Document(page_content='package main\n\nimport (\n\t"fmt"\n\t"log"\n\t"net/http"\n\t"os"\n\n\t"github.com/gin-gonic/gin"\n)\n\nconst (\n\tEnvVarAzureFunctionPort = "FUNCTIONS_PORT"\n)\n\ntype MyEvent struct {\n\tName string `json:"name"`\n}\n\ntype MyResponse struct {\n\tMessage string `json:"message"`\n}', metadata={'source

### RetrievalQA

We need to store the documents in a way we can semantically search for their content.

The most common approach is to embed the contents of each document then store the embedding and document in a vector store.

When setting up the vectorstore retriever:

We test max marginal relevance for retrieval
And 8 documents returned

In [8]:
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = AzureOpenAIEmbeddings(
    api_key=api_key,
    azure_endpoint=base_url, 
    api_version=api_version,
    azure_deployment="text-embedding-ada-002"
)

db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever(
    search_type="mmr",  # Also test "similarity"
    search_kwargs={"k": 8},
)

### Chat
Test chat, just as we do for chatbots.

In [9]:
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import AzureChatOpenAI
from langchain.memory import ConversationSummaryMemory

llm = AzureChatOpenAI(
    api_key=api_key,
    azure_endpoint=base_url, 
    api_version=api_version,
    model="gpt-4",
    temperature=0
)
memory = ConversationSummaryMemory(
    llm=llm, memory_key="chat_history", return_messages=True
)
qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

Let's start asking some questions about our code

In [10]:
question = "What files include lambda code"
result = qa(question)
result["answer"]

"The provided context includes multiple Go source files, and several of them contain AWS Lambda function code. The Lambda function code is identifiable by the use of the `github.com/aws/aws-lambda-go/lambda` package and the invocation of `lambda.Start()` with a handler function. Based on the context provided, the following files contain AWS Lambda function code:\n\n1. The file that contains the `HandleRequest` function, which is a Lambda handler function that takes an `Event` struct and returns a greeting message. This is a typical Lambda function setup in Go.\n\n2. The file that contains the `LambdaHandler` function, which seems to be a handler for a Lambda function, although it's not following the standard naming convention (`HandleRequest` or similar) and it's using `ctx *gin.Context`, which suggests it might be part of an API Gateway integration using the Gin framework. However, it's not clear if this is a complete Lambda function since there's no `lambda.Start()` invocation shown 

In [11]:
questions = [
    "What is the file hierarchy?",
    "What files use github.com/aws/aws-lambda-go/lambda?",
    "What one improvement do you propose to remove lambda code?",
]

for question in questions:
    result = qa(question)
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")

-> **Question**: What is the file hierarchy? 

**Answer**: Based on the provided context, it appears that there are snippets of Go (Golang) code intended for AWS Lambda functions. However, the file hierarchy or directory structure is not explicitly provided in the context. Typically, Go projects have a specific structure, but without explicit directory names or paths, I can only speculate on a common structure.

A typical Go project might look like this:

```
/my-lambda-project
  /cmd
    /mylambdafunction
      main.go
  /pkg
    /jokes
      jokes.go
  /internal
    /util
      util.go
  go.mod
  go.sum
```

In this hypothetical structure:

- `/cmd` contains the entry points for the application, with each subdirectory representing a different Lambda function.
- `/pkg` might contain library code that's intended to be used by other applications.
- `/internal` contains private application and library code.
- `go.mod` and `go.sum` are at the root of the project and define the module's de

In [14]:
question = "name the files that use aws s3"
result = qa(question)
result["answer"]

'The first code snippet contains code for AWS S3 operations. It initializes an S3 client and lists objects in an S3 bucket named "examplebucket". Here is the relevant part of the code that deals with S3:\n\n```go\nimport (\n\t"context"\n\t"github.com/aws/aws-sdk-go-v2/config"\n\t"github.com/aws/aws-sdk-go-v2/service/s3"\n\t"github.com/aws/aws-sdk-go-v2/service/s3/types"\n\t"log"\n)\n\n// ...\n\nvar myObjects []types.Object\n\nfunc init() {\n\t// Load the SDK configuration\n\tcfg, err := config.LoadDefaultConfig(context.TODO())\n\tif err != nil {\n\t\tlog.Fatalf("Unable to load SDK config: %v", err)\n\t}\n\n\t// Initialize an S3 client\n\tsvc := s3.NewFromConfig(cfg)\n\n\t// Define the bucket name as a variable so we can take its address\n\tbucketName := "examplebucket"\n\tinput := &s3.ListObjectsV2Input{\n\t\tBucket: &bucketName,\n\t}\n\n\t// List objects in the bucket\n\tresult, err := svc.ListObjectsV2(context.TODO(), input)\n\tif err != nil {\n\t\tlog.Fatalf("Failed to list objects: