In [6]:
#| default_exp askDataset

# AI-Assisted Data Analysis Notebook
This notebook is designed for AI-assisted data labeling and querying using OpenAI's APIs. It demonstrates modular programming practices, AI-assisted labeling, and dataset querying functionalities. 

In [2]:
#| export
import requests
import openai
from openai import OpenAI
import pandas as pd

**Explanation**: This part sets up the notebook for use with nbdev, a system that turns Jupyter Notebooks into Python modules. It imports necessary libraries: requests for making HTTP requests (e.g., to the OpenAI API), openai for interacting with OpenAI's GPT model, and pandas for data manipulation.

# Ask Dataset Class

- Initialization (__init__): Accepts a data file, attempts to read it as a CSV using Pandas, and stores it in an instance variable.

- dataset_string Method: Converts the DataFrame (from the CSV file) to a string.

- ask_chatgpt Method: Sends a query to the OpenAI ChatGPT model, along with the dataset, and returns the model's response. This is the core functionality, utilizing the OpenAI API to analyze the dataset and answer questions about it.


In [3]:
#| export


class AskDataset:
    def __init__(self, data_file):
        try:
            # Attempt to read the uploaded file
            data_csv = pd.read_csv(data_file)
            self.data_csv = data_csv  # Store the DataFrame for later use
        except Exception as e:
            # Provide detailed error information
            raise Exception(f"Failed to initialize AskDataset with file: {data_file}. Error: {e}")

    def dataset_string(self):
        # Convert the DataFrame to a string and return
        return self.data_csv.to_string(index=False)

    def ask_chatgpt(self, question):
        try:
           
            # API endpoint for ChatGPT
            url = "https://api.openai.com/v1/chat/completions"

            # Headers including the Authorization with your API key
            headers = {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {openai_api_key}"
            }

            # Data payload for the request
            data_payload = {
                "model": "gpt-3.5-turbo",
                "messages": [
                    {"role": "system", "content": "You are a problem solver. Go through the dataset to find the answer."},
                    {"role": "user", "content": f"Based on {self.dataset_string()} solve the Question: {question}"}
                ]
            }

            response = requests.post(url, headers=headers, json=data_payload, timeout=10)
            response.raise_for_status()

            response_json = response.json()

            if 'choices' in response_json and response_json['choices']:
                message = response_json['choices'][0]['message']
                if isinstance(message, str):
                    return message.strip()
                elif isinstance(message, dict) and 'content' in message:
                    # Extracting content if the message is a dictionary
                    return message['content'].strip()
                else:
                    return "Received unexpected response type: " + str(message)
            else:
                return "No choices in response or empty response: " + str(response_json)

        except requests.exceptions.Timeout:
            return "Request timed out. Please try again."
        except requests.exceptions.HTTPError as err:
            return f"HTTP error occurred: {err}"
        except Exception as e:
            return f"An unexpected error occurred: {e}"
        





# Usuage Example

In [6]:
ask_dataset = AskDataset('/workspaces/ai-assisted-coding_panther/231-3.csv')

response = ask_dataset.ask_chatgpt('how many rows are NaN?')
print(response)

To solve the question, we need to count the number of rows that have NaN values in the "Label" column. Based on the provided conversation, the rows with NaN values are:

1. Rows with index [0, 3, 4, 6, 10, 12, 14, 16, 20, 22, 23, 24, 28, 29, 31, 33, 34, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50]

There are a total of 30 rows with NaN values in the "Label" column.


**Explanation**: In this example, the AskDataset class is instantiated with a CSV file containing data. Then, a question is posed to the ChatGPT model about this data. The expected output is an answer from ChatGPT, providing insights based on the data.

This approach demonstrates how the AskDataset class can be used to leverage AI (via OpenAI's GPT model) to analyze and extract insights from a dataset without manually writing data analysis code.