# OpenAI tests for lambda - azure function converter

## Testing RAG over Code

In this Notebook we'll follow the instructions from the langchain website for RAG with code [here](https://python.langchain.com/docs/use_cases/question_answering/code_understanding)

### Install the necessary libraries

In [None]:
pip install langchain

In [None]:
pip install openai

In [None]:
pip install chromadb

In [None]:
pip install tiktoken

In [None]:
pip install python-dotenv

### Setup

Import the libraries and environment variables to gain access to the `Open API Key`

In [1]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language

import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

api_key=os.environ['OPENAI_API_KEY']
base_url=os.environ['OPENAI_BASE_URL']

print(base_url)

https://devsquad-eastus-2.openai.azure.com/


### Loading the code

We will upload all the go project files using the `langchain.document_loaders.TextLoader`. Let's define the target path with the code we want to use for RAG. 

In [4]:
repo_path = "../app/"

We load the go code using `LanguageParser`, which will:

- Keep top-level functions and classes together (into a single document)
- Put remaining code into a separate document
- Retains metadata about where each split comes from

In [5]:
# Load
loader = GenericLoader.from_filesystem(
    repo_path,
    glob="**/*",
    suffixes=[".go"],
    parser=LanguageParser(language=Language.GO, parser_threshold=500),
)
documents = loader.load()
len(documents)

7

### Splitting

Split the Document into chunks for embedding and vector storage.

We can use `RecursiveCharacterTextSplitter` with the language specified.

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

go_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.GO, chunk_size=2000, chunk_overlap=200
)
texts = go_splitter.split_documents(documents)
len(texts)

7

In [12]:
print(texts[0])

page_content='package main\n\nimport (\n\t"context"\n\t"encoding/json"\n\t"fmt"\n\t"io"\n\t"net/http"\n\t"os"\n)\n\ntype MyEvent struct {\n\tName string `json:"name"`\n}\n\ntype MyResponse struct {\n\tMessage string `json:"message"`\n}\n\nfunc HandleRequest(ctx context.Context, event *MyEvent) (*MyResponse, error) {\n\tif event == nil {\n\t\treturn nil, fmt.Errorf("received nil event")\n\t}\n\tmessage := fmt.Sprintf("Hello %s!", event.Name)\n\treturn &MyResponse{Message: message}, nil\n}\n\nfunc azureHandler(w http.ResponseWriter, r *http.Request) {\n\tfmt.Printf("This HTTP triggered function executed successfully. Pass a name in the query string for a personalized response.\\n")\n\treqData, err := io.ReadAll(r.Body)\n\tif err != nil {\n\t\tfmt.Printf("error on reading request body: %v\\n", err.Error())\n\t\tw.WriteHeader(http.StatusBadRequest)\n\t\tw.Write([]byte(err.Error()))\n\t\treturn\n\t}\n\tvar event MyEvent\n\terr = json.Unmarshal(reqData, &event)\n\tif err != nil {\n\t\tfmt.Pr

### RetrievalQA

We need to store the documents in a way we can semantically search for their content.

The most common approach is to embed the contents of each document then store the embedding and document in a vector store.

When setting up the vectorstore retriever:

We test max marginal relevance for retrieval
And 8 documents returned

In [7]:
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = AzureOpenAIEmbeddings(
    api_key=api_key,
    azure_endpoint=base_url, 
    api_version="2023-07-01-preview",
    azure_deployment="text-embedding-ada-002"
)

db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever(
    search_type="mmr",  # Also test "similarity"
    search_kwargs={"k": 8},
)

### Chat
Test chat, just as we do for chatbots.

In [8]:
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import AzureChatOpenAI
from langchain.memory import ConversationSummaryMemory

llm = AzureChatOpenAI(
    api_key=api_key,
    azure_endpoint=base_url, 
    api_version="2023-07-01-preview",
    model="gpt-4",
    temperature=0
)
memory = ConversationSummaryMemory(
    llm=llm, memory_key="chat_history", return_messages=True
)
qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

Let's start asking some questions about our code

In [9]:
question = "What classes include lambda code"
result = qa(question)
result["answer"]

Number of requested results 20 is greater than number of elements in index 7, updating n_results = 7


'The classes that contain AWS Lambda code are the ones that import the `github.com/aws/aws-lambda-go/lambda` package and use the `lambda.Start` function to start the Lambda handler. Based on the provided context, the following classes contain AWS Lambda code:\n\n1. The first class with the `HandleRequest` function that takes a `MyEvent` struct and returns a `MyResponse` struct.\n2. The second class with the `HandleRequest` function that takes a `SaveRequest` struct and returns a `Response` struct.\n\nBoth of these classes are designed to be deployed as AWS Lambda functions, as indicated by their use of the `lambda.Start` function in their `main` functions.'

In [11]:
questions = [
    "What is the file hierarchy?",
    "What files use github.com/aws/aws-lambda-go/lambda?",
    "What one improvement do you propose to remove lambda code?",
]

for question in questions:
    result = qa(question)
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")

Number of requested results 20 is greater than number of elements in index 7, updating n_results = 7


-> **Question**: What is the file hierarchy? 

**Answer**: The provided context does not explicitly define a file hierarchy, but it does include multiple Go source files (`main` packages) that represent different AWS Lambda and Azure Functions handlers. Based on the context, we can infer a simple hierarchy where each file represents a separate function or handler within the project. Here's a conceptual representation of the file hierarchy based on the provided Go source files:

```
/
|-- examples/
|   `-- storage/
|       `-- storage.go
|
|-- aws-lambda-go-handler-1.go
|-- aws-lambda-go-handler-2.go
|-- azure-function-go-handler-1.go
|-- azure-function-go-handler-2.go
|-- azure-function-go-handler-3.go
|-- azure-function-go-handler-4.go
```

Here's a breakdown of the hierarchy:

- The root directory (`/`) contains all the Go source files for different handlers.
- The `examples` directory contains example code or shared libraries used by the handlers.
- Inside the `examples` directory, 

Number of requested results 20 is greater than number of elements in index 7, updating n_results = 7


-> **Question**: What files use github.com/aws/aws-lambda-go/lambda? 

**Answer**: The files in the project that utilize the package `github.com/aws/aws-lambda-go/lambda` are:

1. The first file with the `HandleRequest` function that takes a `MyEvent` struct and returns a `MyResponse` struct.
2. The second file with the `HandleRequest` function that takes a `SaveRequest` struct and returns a `Response` struct. 



Number of requested results 20 is greater than number of elements in index 7, updating n_results = 7


-> **Question**: What one improvement do you propose to remove lambda code? 

**Answer**: One proposed improvement for removing AWS Lambda code from the project is the transition to using Azure Functions with HTTP triggers. This is evident from the provided context where the original AWS Lambda functions, which are designed to handle events in the AWS ecosystem, are being converted to work with Azure's serverless platform.

The code snippets show the evolution from using the AWS Lambda Go SDK (`github.com/aws/aws-lambda-go/lambda`) to using standard Go HTTP handling (`net/http`) and the Gin web framework (`github.com/gin-gonic/gin`) for setting up HTTP endpoints. This allows the functions to be triggered by HTTP requests instead of AWS-specific event sources.

The final code snippets demonstrate the functions being adapted to work as Azure Functions with HTTP triggers, where the `azureHandler` function is set up to handle incoming HTTP requests, process them, and return HTTP responses.

In [13]:
question = "that's not the file structure can you try again?"
result = qa(question)
result["answer"]

Number of requested results 20 is greater than number of elements in index 7, updating n_results = 7


"Based on the provided context, it appears that there are multiple Go files, each containing a `main` package and a `main` function. This suggests that these files are intended to be separate executables, rather than a single Go project with a shared file structure. In a typical Go project, you would have a single `main` package with one `main` function that serves as the entry point of the application.\n\nHowever, if we were to organize these files into a single Go project, we would need to refactor the code to avoid multiple `main` functions and to properly separate concerns. Below is an example of how you might structure a Go project with shared packages and a single entry point:\n\n```\nmy-go-project/\n├── cmd/\n│   └── server/\n│       └── main.go          # Contains the main function and server setup\n├── internal/\n│   ├── handler/\n│   │   └── handler.go       # Contains HTTP and Lambda handlers\n│   └── storage/\n│       └── storage.go       # Contains the storage interface an