# **Welcome to The Notebook**

### Task 1 - Setting up the project environment

Installing the needed modules

In [None]:
!pip install openai==0.28 python-dotenv

Importing the modules

In [None]:
import pandas as pd
import numpy as np
import os
import openai
from dotenv import load_dotenv
import json
import re
from google.colab import files

Setting up the OpenAI API:

1. Prepare a `.env` file to store the OpenAI API key.
2. Uploading the `.env` file to our colab environment
3. Load the API key and setup the API

Upload your `.env` file here

In [None]:
uploaded = files.upload()

Now let's load environment variables and get the API key

In [None]:
load_dotenv(dotenv_path="apikey.env.txt")

APIKEY = os.getenv("APIKEY")
ORGID = os.getenv("ORGID")

Let's setup our OpenAI API

In [None]:
openai.organization = ORGID
openai.api_key = APIKEY

### Task 2 - Craft Prompts to Communicate with the API

To communicate with the API we need to learn how to craft a prompt.

A prompt object contains two elements:
1. Role: Specifies the communicator's role—either `User`, `System`, or `Assistant`.
2. Content: Contains the text of the communication

example:
`prompt = {'role': 'user', 'content': 'what is the captial of Italy?'}`

Different Roles:

- **User**: Initiates the conversation, asks questions, and gives instructions to - the AI model.
- **System**: Sets the initial context or instructions for the conversation, guiding the AI's behavior.
- **Assistant**: Generates responses based on the user's queries and the context provided by the system, acting as the AI model's replies.

User initiates the conversation, system provides context, and assistant generates responses.


In [None]:
user_prompt = {'role': 'user', 'content': 'Name the main four ingredients of Beef Wellington.'}
system_prompt = {'role': 'system', 'content': 'You are professional chef, provide the user with the vegeterian version of any recipe, and avoid extra elaborations.'}

response = openai.ChatCompletion.create(
    model='gpt-4',
    messages=[system_prompt, user_prompt],
    max_tokens=500
)

content = response.choices[0].message.content
print("Content: ", content)

### Task 3 - Generate and Execute python code

Define a function to generate a chat response using the OpenAI GPT-4 model, given system and user messages as input.

In [None]:
def generate_chat_response(system_content, user_content):
    # Create two message dictionaries, one for the system and one for the user.
    system = {'role': 'system', 'content': system_content}
    user = {'role': 'user', 'content': user_content}

    # Use OpenAI's ChatCompletion API to generate a response.
    response = openai.ChatCompletion.create(
        model='gpt-4',
        messages=[system, user],  # List of messages (system and user)
        max_tokens=1200  # Set a limit on the number of tokens in the response
    )

    # Return the generated response.
    return response

Let's craft a prompt to generate a simple python method

In [None]:
system_content = """
You are a python code generator. The user provides a task or a problem and you will generate a python code to solve the problem.
Please ensure that the code is correct and executable, and return only the code inside triple backticks(```)
"""

user_content = "Generate a python method that gets a number and returns if the number is even or odd."

response = generate_chat_response(system_content, user_content)
content = response.choices[0].message.content
print("Content: ", content)

Let's extract the code from the prompt response

In [None]:
def extract_code(response_content):
    # Define a regular expression pattern to match text between triple backticks (```)
    pattern = r'```(.*?)```'

    # Use re.findall to find all non-overlapping matches of the pattern in the input string
    matches = re.findall(pattern, response_content, re.DOTALL)

    # Remove the python keyword from in the code and Return the first match found
    return matches[0].replace("python", "")


code = extract_code(content)
print("Code: ", code)

Now let's execute the generated code and use it

In [None]:
exec(code)
check_number(5)

### Task 4 - Generate Python Code for Data Preparation

Defining `generate_code` helper method

In [None]:
def generate_code(system_content, user_content):
    # Create two message dictionaries, one for the system and one for the user.
    system = {'role': 'system', 'content': system_content}
    user = {'role': 'user', 'content': user_content}

    # Use OpenAI's ChatCompletion API to generate a response.
    response = openai.ChatCompletion.create(
        model='gpt-4',
        messages=[system, user],  # List of messages (system and user)
        max_tokens=1200  # Set a limit on the number of tokens in the response
    )

    # Extract the response content
    response_content = response.choices[0].message.content

    # Define a regular expression pattern to match text between triple backticks (```)
    pattern = r'```(.*?)```'

    # Use re.findall to find all non-overlapping matches of the pattern in the input string
    matches = re.findall(pattern, response_content, re.DOTALL)

    # Remove the python keyword from in the code and Return the first match found
    return matches[0].replace("python", "")

Let's load product sales dataset

In [None]:
sales_data = pd.read_csv("product_sales_dataset.csv")
sales_data.head()

Let's check the data types in our dataset

In [None]:
sales_data.dtypes

**Some Data Preparation**
* Extract month name from the `date` column
* Calculate profit per product


In [None]:
system_content = """
You are a python code generator. The user provides a task or a problem and you will generate a python code to solve the problem.
Please ensure that the code is correct and executable, and return only the code inside triple backticks(```)
"""

user_content = """
Return a python method called get_month_names. That gets a pandas series containing some dates with string data type.
And returns a pandas series containing the month name of each date.
"""

code = generate_code(system_content, user_content)
print("Code: ", code)

Let's execute this code

In [None]:
exec(code)
#get_month_names(sales_data["date"])
sales_data["month"] = get_month_names(sales_data["date"])
sales_data.head()

Lets calculate profit per product

In [None]:
user_content = """
Return a python method that gets a dataframe, product_cost, product_price and items_sold as input.
And it calculates the profit per product and append it as a new column called 'product_profit' to the dataframe.
And then it return the dataframe.
"""

code = generate_code(system_content, user_content)
print("Code: ", code)

Now let's execute this code and use it.

In [None]:
exec(code)
sales_data = calculate_product_profit(sales_data, "product_cost", "product_price", "items_sold")
sales_data.head()

### Task 5 - Generate Python Code for Data Visualization

Defining some helper functions

In [None]:
def generate_chat_response(system_content, user_content):
    # Create two message dictionaries, one for the system and one for the user.
    system = {'role': 'system', 'content': system_content}
    user = {'role': 'user', 'content': user_content}

    # Use OpenAI's ChatCompletion API to generate a response.
    response = openai.ChatCompletion.create(
        model='gpt-4',
        messages=[system, user],  # List of messages (system and user)
        max_tokens=1200  # Set a limit on the number of tokens in the response
    )

    # Return the generated response.
    return response


def extract_code(response_content):
    # Define a regular expression pattern to match text between triple backticks (```)
    pattern = r'```(.*?)```'

    # Use re.findall to find all non-overlapping matches of the pattern in the input string
    matches = re.findall(pattern, response_content, re.DOTALL)

    # Remove the python keyword from in the code and Return the first match found
    return matches[0].replace("python", "")


def generate_code_and_execute(user_content, execute=True):
    # Defining system content for code generation
    system_content = """
    You are a python code generator. You know pandas. You answer to every question with Python code.
    You return python code wrapped in ``` delimiter. Import any neede python module. And you don't provide any elaborations.
    """

    # generate chat response
    response = generate_chat_response(system_content, user_content)

    # extract resonse content
    response_content = response.choices[0].message.content

    # let's extract the code from the response content
    code = extract_code(response_content)

    if execute:
        exec(code, globals())  # Execute the generated Python code if execute is set to True

    return code


def update_code(code, user_content, execute = True):
    # Defining system content for code update
    system_content = f"""
    You are a python code generator. You know pandas. You are given the following python method: {code}. Update the code based on the user content. Do not change the method name.
    You return the updated python code wrapped in ``` delimiter. And you don't provide any elaborations.
    """
    # generate chat response
    response_content = generate_chat_response(system_content, user_content).choices[0].message.content

    # extracting the code
    new_code = extract_code(response_content)

    if execute:
        exec(new_code, globals())  # Execute the generated Python code if execute is set to True

    return new_code

Let's check our data again

In [None]:
sales_data


**Question 1-** How does the daily `average` profit change over time?

Steps to answer this question:

1. Aggregate our data to have average `Product_Profit` per day.
2. Draw a line chart to visualize the data


In [None]:
user_content = "Return a python method called plot_average_profit_per_day that gets a dataframe, aggregate the dataframe based on 'Date' column and calculate the average 'Product_profit'. Then drwns a line chart to visualize how the average daily profit changes over time. Use seaborn module."

code = generate_code_and_execute(user_content)
print("Code: ", code)

Let's use the generated method

In [None]:
plot_average_profit_per_day(sales_data)

What if we want to update the generated visualization?

In [None]:
new_code = update_code(code, "change the line color to orange. And reduce the line thickness.")

print("New Code: ", new_code)

Running the new code

In [None]:
plot_average_profit_per_day(sales_data)

**Execrise:** Question 2 - Create a barchart visualization to show total average profit per Product Category.

In [None]:
user_content = "Return a python method called plot_total_average_per_category that gets a dataframes, aggergate the dataframe based on 'Product_profit'. Then draws a bar chart to visualize the results to compare average profit per product."

code = generate_code_and_execute(user_content)
print("Code: ", code)

In [None]:
plot_total_average_per_category(sales_data)

Updating the code to have bar chart with different bar color per product category.

In [None]:
new_code = update_code(code, "Update the code assign different colors to each product category and make the chart bigger.")

print("New Code: ", new_code)

### Task 6 - Create Visualizations using AI

Let's checkout the data again

In [None]:
sales_data

**Question 3** - Which product has the highest total sold items?

In [None]:
user_content = """
Return python method called plot_sold_items_per_product that gets dataframe.
First, it aggregates the data based on the 'Product_Name' and calculate the sum of 'Items_Sold' column.
Then it draws a bar chart to visualize the results.
"""

code = generate_code_and_execute(user_content)
plot_sold_items_per_product(sales_data)
print("Code: ", code)

Let's make it a horizontal bar chart and highlight the prodcut with the highest number of sold items.

In [None]:
new_code = update_code(code, user_content="Make the bar chart horizontal. Highlight the product name with higher number of sold items.")

plot_sold_items_per_product(sales_data)
print("New Code: ", new_code)