# Package Installer Integration for Streamlit


**Overview**
This notebook outlines the integration of LLMs like GPT-4 with [Streamlit](https://streamlit.io/) for developing interactive web applications for data analysis. [Streamlit](https://streamlit.io/) converts data analysis scripts into web apps, displaying results in real-time on web pages.

**Key Challenge:**
LLMs often produce code that depends on packages not pre-installed. This notebook has addressed this by enabling LLMs to dynamically install required packages, ensuring smooth code execution in Streamlit environments. This approach bypasses the need for sandbox environments, making data analysis code interactively available on the web.

**Solution:**
The solutions involves providing dataset information and user query to the LLM. The LLM then generates Streamlit-compatible code, which includes the capability for dynamic Python package installation during runtime.

**Purpose:**
The main goal of this notebook is to demonstrate the dynamic installation of Python packages during runtime. This is particularly essential for scripts generated by LLMs in a Streamlit context, streamlining the process of transforming data analysis scripts into interactive web applications.

First, we need to import following packages:

In [66]:
import io
import re

import pandas as pd
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain.chat_models import ChatOpenAI
from langchain.prompts import (
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
)
from langchain_community.tools import PackageInstallTool
from langchain_core.prompts import ChatPromptTemplate

### Creating a Prompt Template

Then, we create a prompt for our agent:

In [67]:
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template(
            template_format="jinja2",
            template="""You are a data analyst. Your role is to provide a code snippet in Python tailored to data analysis query for use in Streamlit and install any imported packages using the provided tools. Your output will only contain Python code snippet without any text explanation.
Dataset Information:{{dataset_info}}
Variable Name for DataFrame: df
Libraries pre-imported include: pandas as pd, streamlit as st""",
        ),
        HumanMessagePromptTemplate.from_template("{query}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

### Defining our LLM

An instance of **ChatOpenAI** is created with the model **gpt-4-1106-preview** and a temperature setting of 0 to avoid hallucination.

In [None]:
llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0)

### Setting Up Tools

**PackageInstallTool** is included as a tool to handle the installation of required Python packages.

In [69]:
tools = [PackageInstallTool().as_tool()]

### Creating and Configuring the Agent

An agent is created using **create_openai_functions_agent**, combining the LLM and the tools.

**AgentExecutor** is set up with this agent, designed to handle execution and return intermediate steps for debugging or detailed output.

In [70]:
agent = create_openai_functions_agent(llm, tools, prompt)

In [71]:
agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, return_intermediate_steps=True
)

### Data Preparation
A dataset (in this case, California housing data) is read into a DataFrame.

The information about the dataset is extracted and stored in dataset_info.

The dataset used in this notebook can be found [here](https://storage.cloud.google.com/package-installer-streamlit/california_housing_train.csv).


In [72]:
df = pd.read_csv("~/desktop/california_housing_train.csv")

In [73]:
buffer = io.StringIO()
df.info(buf=buffer)
dataset_info = buffer.getvalue()

### Defining Parameters and Executing the Agent

Parameters for the agent, including the dataset information and a specific data query, are defined.

The agent is then invoked with these parameters to generate a Python code snippet.

In [78]:
params = {
    "dataset_info": dataset_info,
    "query": "Can you please plot population against longitude, latitude",
}

In [79]:
result = agent_executor.invoke(params)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `package_install` with `{'package_names': ['matplotlib']}`




[0m

Packages successfully installed: matplotlib.
[36;1m[1;3mTrue[0m[32;1m[1;3m```python
import matplotlib.pyplot as plt

# Plotting population against longitude and latitude
plt.figure(figsize=(10, 8))
plt.scatter(df['longitude'], df['latitude'], c=df['population'], cmap='viridis', s=1)
plt.colorbar(label='Population')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Population Distribution by Geographic Coordinates')
st.pyplot(plt)
```[0m

[1m> Finished chain.[0m


### Extracting the Generated Python Code

The output from the agent is processed using regular expressions to extract the Python code snippet.

If a code snippet is successfully extracted, it is printed out.

In [80]:
extracted_code = re.search(r"```python\s+(.*?)\s+```", result["output"], re.DOTALL)

In [81]:
if extracted_code:
    python_code = extracted_code.group(1)
    print(python_code)

import matplotlib.pyplot as plt

# Plotting population against longitude and latitude
plt.figure(figsize=(10, 8))
plt.scatter(df['longitude'], df['latitude'], c=df['population'], cmap='viridis', s=1)
plt.colorbar(label='Population')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Population Distribution by Geographic Coordinates')
st.pyplot(plt)
