### **Step 1: Installing Necessary Libraries**:

- **langchain**: The core framework for building applications using Large Language Models (LLMs). It provides tools for chaining models, working with prompts, and integrating various models and services.
- **openai==0.28**: The OpenAI API package (version 0.28), which allows you to interact with OpenAI’s GPT models like `gpt-3.5` and `gpt-4` for natural language processing tasks.
- **transformers**: A Hugging Face library for working with a wide range of pre-trained models, including popular models like LLaMA, Falcon, and GPT, for tasks such as text generation and classification.
- **torch**: The PyTorch library, which is essential for running deep learning models, including LLMs, and provides tools for efficient computation and model training.
- **langchain_openai**: A LangChain integration specifically designed to facilitate interaction with OpenAI models, making it easier to use OpenAI's API within LangChain workflows.
- **ipywidgets**: A library for creating interactive widgets in J


In [1]:
!pip install langchain  # Core framework for working with LLMs
!pip install langchain-community # Install the community package containing LLMs
!pip install openai==0.28  # OpenAI API package (version 0.28) for GPT models
!pip install transformers  # Hugging Face's library for open-source models like LLaMA and Falcon
!pip install torch  # PyTorch, required for running deep learning models efficiently
!pip install langchain_openai

Collecting openai==0.28
  Using cached openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Using cached openai-0.28.0-py3-none-any.whl (76 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.65.1
    Uninstalling openai-1.65.1:
      Successfully uninstalled openai-1.65.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-openai 0.3.7 requires openai<2.0.0,>=1.58.1, but you have openai 0.28.0 which is incompatible.[0m[31m
[0mSuccessfully installed openai-0.28.0
Collecting openai<2.0.0,>=1.58.1 (from langchain_openai)
  Using cached openai-1.65.1-py3-none-any.whl.metadata (27 kB)
Using cached openai-1.65.1-py3-none-any.whl (472 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 0.28.0
    Uninstalling openai-0.28.0:
      Succes

##**Step 2: Importing Libraries**


In [13]:
# ==================================================
# Data Analysis & Visualization Libraries
# ==================================================
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# ==================================================
# LangChain Libraries for AI and Data Processing
# ==================================================
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
from langchain.utilities import SQLDatabase
from langchain.agents import create_sql_agent, AgentType
from langchain.schema import SystemMessage, HumanMessage

# ==================================================
# SQL & Database Libraries
# ==================================================
from sqlalchemy import create_engine
import sqlite3

# ==================================================
# Widgets for User Interface
# ==================================================
import ipywidgets as widgets
from IPython.display import display

# ==================================================
# Miscellaneous
# ==================================================
import warnings
import re
from google.colab import files

# Ignore all warnings
warnings.filterwarnings("ignore")


### **Step 3: Hardcoding OpenAI API Key in Google Colab**  

In this step, we initialize the OpenAI GPT-3.5-turbo model by **directly hardcoding the API key** in the script.  

#### **Implementation:**  
```python
from langchain.chat_models import ChatOpenAI

# Hardcoded OpenAI API key (Use cautiously)
openai_api_key = "your-api-key-here"

# Initialize OpenAI model
llm = ChatOpenAI(openai_api_key=openai_api_key, model_name="gpt-3.5-turbo", temperature=1.)


In [18]:
openai_api_key = 'your-openai-api-key'
llm = ChatOpenAI(openai_api_key=openai_api_key, model_name = 'gpt-3.5-turbo', temperature=0.1)

## **Step 4: Function to Setup Dataset Context to the LLM**



In [9]:
from langchain_huggingface import HuggingFaceEndpoint

def context_setting(cleansedVgsales):
    # Get all column names from the dataset
    column_names_string = ', '.join(cleansedVgsales.columns)

    # Building a prompt template
    context_explanation_prompt = (
        "You are an AI assistant trained to generate clear and concise explanations for dataset columns. "
        "Your task is to describe the purpose of each column in a given dataset. For each column, provide a short, direct explanation of what the column represents. "
        "Avoid unnecessary information — focus only on what each column contains or measures.\n\n"
        "Please follow this exact format:\n"
        "1.) column_name: [Brief explanation of the column's purpose]\n"
        "2.) column_name: [Brief explanation of the column's purpose]\n\n"
        "Here are the column names in the dataset:\n"
        f"{column_names_string}\n\n"
        "Generate the explanations now."
    )

    # Using GPT-3.5-Turbo to generate context based on the dataset
    context_explanation = llm.invoke(context_explanation_prompt)

    # Create a final prompt for AI to understand the dataset's context
    context_explanation_visual = (
        "You are a data analyst working with a dataset. The dataset contains the following columns and their explanations:\n"
        f"{context_explanation}\n\n"
        "Based on this dataset, you will be asked questions about data analysis and visualization. "
        "Your task is to respond by generating clean, efficient Python code that uses the 'df' DataFrame. "
        "The code should focus on data analysis and visualizations, without additional explanations — let the code speak for itself."
    )

    # Confirm the AI has the correct context
    print("✅ Dataset context is set. Ready for data analysis and visualization tasks!")
    return context_explanation_visual


##  **Step 5: Creating Custom Functions**


In [10]:
# ==================================================
# 📌 Load and Connect Data to SQL
# ==================================================

# Function to write the cleansed dataframe to a database
def load_data_to_sql(df, db_path):
    engine = create_engine(db_path)
    df.to_sql("Vgsales", engine, if_exists="replace", index=False)
    engine.dispose()

# Initialize SQL connection
def establish_sql_connection(db_uri):
    return SQLDatabase.from_uri(db_uri)

# ==================================================
# 📌 Set Up SQL Agent
# ==================================================

def create_sql_agent_from_db(db, llm):
    # Creating SQL agent directly with db without toolkit
    agent = create_sql_agent(
        llm=llm,
        db=db,  # Provide the database directly here
        agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    )
    return agent

# ==================================================
# 📌 Run Query with Context
# ==================================================

def run_query(agent, question):
    prompt = f"Look at each column and answer the following question: {question}"
    return agent.run(prompt)

# ==================================================
# 📌 Function to Clean the Output Code
# ==================================================
def extract_python_code(response):
    # Regex to grab code between ```python and ```
    code_blocks = re.findall(r'```python(.*?)```', response, re.DOTALL)
    if code_blocks:
        return code_blocks[0].strip()  # Return the first code block, stripped of whitespace
    return response.strip()  # Fallback to plain response if no code block is found

## **Step 6: Building the AI powered Chatbot**



In [19]:
# Function to handle the chatbot with both output options
def start_chatbot(db_path, csv_path, openai_api_key, context_explanation_visual):
    # Load the data into SQL
    load_data_to_sql(df, db_path)  # Pass the path to CSV file directly

    # Establish SQL connection
    db = establish_sql_connection(db_path)

    # Set up the SQL agent
    agent = create_sql_agent_from_db(db, llm)

    # Chatbot message area (output)
    chat_display = widgets.Output()

    # Function to handle the output and generate response
    def handle_message(change):
        user_question = question_input.value
        output_type = output_type_widget.value  # Get the selected output type

        if user_question.lower() in ['exit', 'quit', 'bye']:
            chat_display.clear_output()
            with chat_display:
                print("Goodbye! Have a great day!")
            return

        with chat_display:
            # Clear previous output before displaying the new result
            print(f"You: {user_question}")

            if output_type == 'Visual':
                print("🤖 Generating visual response...")

                # Prepare the prompt to generate Python code for visualization
                prompt = f"{context_explanation_visual}\n\nQuestion: Write a neat python programming code to perform: {user_question}"

                try:
                    # Get AI-generated code from GPT-3.5-turbo
                    messages = [
                        SystemMessage(content=context_explanation_visual),  # Guides AI behavior
                        HumanMessage(content=prompt),  # User query
                      ]

                  # Get the model's response
                    response = llm.invoke(messages)
                    clean_code = response.content
                    clean_code = clean_code[9:-3] # To cleanup unnecessary explanation

                    # Clean up the code: Ensure it's not broken or incomplete
                    clean_code = clean_code.strip().replace("‘", "").replace("’", "")

                    # print(clean_code)

                    # Save and execute the clean code
                    with open("generated_code.py", "w") as f:
                        f.write(clean_code)

                    # executes the python code generated by the LLM
                    exec_globals = {"df": df, "plt": plt, "pd": pd}
                    exec(clean_code, exec_globals)

                    # Show the plot
                    plt.show()

                except SyntaxError as se:
                    print(f"❌ Syntax Error in generated code: {se}")
                except Exception as e:
                    print(f"❌ Error executing the code: {e}")

            else:  # For Text response, run SQL query using OpenAI-based agent
                result = run_query(agent, user_question)
                print(f"🤖 {result}")

        # Clear input after sending the message
        question_input.value = ""

    # Text input widget
    question_input = widgets.Text(
        description="You: ",
        placeholder="Ask a question...",
        layout=widgets.Layout(width="80%")
    )

    # Buttons for selecting Text or Visual response
    output_type_widget = widgets.ToggleButtons(
        options=['Text', 'Visual'],
        description='Output Type:',
        style={'description_width': 'initial'}
    )

    # Send button to submit the question
    send_button = widgets.Button(
        description="Send",
        button_style="primary"
    )
    send_button.on_click(handle_message)

    # Display widgets
    display(widgets.VBox([question_input, output_type_widget, send_button, chat_display]))

# Upload file(s)
print("🤖 Please upload a clean dataset file")
uploaded = files.upload()

# List the uploaded files
for filename in uploaded.keys():
    print(f"File Uploaded: {filename}")

# Use the uploaded filename directly
file_name = list(uploaded.keys())[0]  # Automatically use the first uploaded file name
cleansedVgsales = pd.read_csv(file_name)  # Read the CSV from the uploaded file

# Assign the dataframe to 'df' for processing
df = cleansedVgsales

# Generate context based on the dataset
context_explanation_visual = context_setting(cleansedVgsales)  # Assuming context_setting is defined

# Running the chatbot
start_chatbot("sqlite:///./Vgsales.db", file_name, openai_api_key, context_explanation_visual)


🤖 Please upload a clean dataset file


Saving CleansedVgsalesv2.csv to CleansedVgsalesv2.csv
File Uploaded: CleansedVgsalesv2.csv
✅ Dataset context is set. Ready for data analysis and visualization tasks!


VBox(children=(Text(value='', description='You: ', layout=Layout(width='80%'), placeholder='Ask a question...'…

## **Conclusion**

The project demonstrates how AI is merging with data visualization and analysis tools to construct a smart chatbot that can perform SQL commands and generate visualizations based on Python. The chatbot works alongside a dataset, allowing users to interact with the data through a graphical user interface. With tools like LangChain and Matplotlib, the chatbot not only answers questions about the dataset but also creates visualizations, enhancing the user experience with rich and informative insights. 📊🤖

The key points of the project are as follows:

1. **Data Upload and Processing**: Upload a dataset, import it into an SQL database, and establish the context for analysis. 📂🔄
2. **AI-driven Analysis**: Use AI models to parse user queries and interact with the database to provide text-based responses or Python code for visualization. 🤖💬
3. **Dynamic Output Choices**: Users can choose to request either text-based responses or visualizations, offering a dynamic and interactive interface. 🔄✨
4. **Optimized Query Execution**: The SQL agent ensures that data queries are executed properly, and the integration with Python’s data analysis libraries provides seamless visualizations. ⚙️💻

Overall, this project demonstrates how AI can be leveraged to automate and facilitate data analysis tasks, offering both technical complexity and ease of use for the end user. It provides a pathway for future improvements in AI-powered data analysis, enabling the processing of other forms of data and more sophisticated queries. This makes it an efficient and powerful tool for data-driven decision-making. 📈🚀
