# "Interactive Data Queries with AI: Leveraging OpenAI for CSV Data Insights"

## Introduction:

In the era of big data, extracting insights from datasets can often be a complex and code-intensive process. This Jupyter notebook presents an innovative approach to data analysis, harnessing the power of generative AI to interactively query and gain insights from CSV datasets. By utilizing OpenAI's chat completion API, this notebook demonstrates how one can ask questions directly to the data, bypassing traditional data analysis complexities.


In [None]:
#| default_exp Ask_your_dataset

## Overview of the Code

This code is designed to read data from one or more CSV files, process this data, and then use OpenAI's API to answer questions about the data. The code is organized into classes, which are like blueprints for creating objects that encapsulate specific functionalities.

## Main Components

### CSVDataReader Class:

This class is responsible for handling the reading and processing of CSV files.

### OpenAIChatClient Class:

This class manages the interaction with the OpenAI API. It sends the processed data from the CSV files along with a user's question to the API and then receives the answer.

### Main Execution Block:

This is where the program starts executing. It creates instances of the above classes and uses them to read CSV data and get answers from the OpenAI API.

In [None]:
#| export
import openai
import pandas as pd

# CSVDataReader class modified for Streamlit
class CSVDataReader:
    """
    This class handles reading and processing one or more CSV data files using Streamlit's file uploader.
    """
    def read_data(self, uploaded_files):
        """
        Reads and concatenates multiple uploaded CSV files into a single DataFrame.
        Args:
            uploaded_files: List of uploaded files.
        Returns:
            str: String representation of the concatenated CSV data.
        """
        dataframes = [pd.read_csv(file) for file in uploaded_files]
        concatenated_df = pd.concat(dataframes, ignore_index=True)
        return concatenated_df.to_string(index=False)
    
# OpenAIChatClient class
class OpenAIChatClient:
    """
    This class encapsulates the interaction with OpenAI's Chat API.
    """
    def __init__(self, api_key):
        """
        Initializes the OpenAI client with the provided API key.
        Args:
            api_key (str): The OpenAI API key.
        """
        openai.api_key = api_key
        self.client = openai.ChatCompletion()

    def query_gpt(self, csv_data, user_question):
        """
        Sends a query to the OpenAI API and gets a response.
        Args:
            csv_data (str): The CSV data in string format.
            user_question (str): The user's question about the data.
        Returns:
            str: The response from the OpenAI API.
        """
        system_message = ("You are a highly capable assistant skilled in data analysis. "
                          "You can interpret CSV data, perform calculations, summarize information, "
                          "answer specific queries, and even create visual representations if needed. "
                          "Here is the data you need to work with:\n" + csv_data)
        api_response = self.client.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": system_message},
                {"role": "user", "content": user_question}
            ])
        return api_response.choices[0].message['content']

In [None]:
# Main execution (example)
# if __name__ == "__main__":
#     # Define CSV file paths
#     file_paths = ['../data/sampledata.csv']

#     # Read and process CSV data
#     data_processor = CSVDataProcessor(file_paths)
#     csv_data_str = data_processor.df.to_string(index=False)

#     # Create an instance of OpenAIChatClient
#     chat_client = OpenAIChatClient()

#     # Sample questions for demonstration
#     questions = ["What is in the csv file ?","How many rows are there in the csv file ?","How many columns are there in the csv file ?"]

#     # Ask questions and print responses
#     for question in questions:
#         answer = chat_client.query_gpt(csv_data_str, question)
#         print(f"Question: {question}\nAnswer: {answer}\n")


### Commentary

Data Processing: The system effectively reads the CSV file, showing it can handle basic data processing tasks.

AI Response: The AI provides a concise summary of the CSV file's contents, identifying the columns and their context. This indicates good comprehension of the data structure.

Accuracy: AI demonstrates accurate data interpretation, which is beneficial for quick insights.

Practical Use: This example shows the system can be a useful tool for rapidly extracting information from data files without detailed manual analysis.

The system works well for straightforward data queries, offering a quick and efficient way to gain insights from CSV files.

### Conclusion 

In conclusion, this notebook successfully demonstrates the innovative application of AI in the field of data analysis. By integrating OpenAI's model with CSV data, it offers a user-friendly way to interactively query and gain insights from datasets. This approach simplifies data analysis, making it accessible even to those without extensive coding or data science backgrounds. Its potential applications span various domains where quick data insights are valuable.