# **Welcome to The Notebook**

### Task 1 - Setting up the project environment

Installing the needed modules

In [1]:
# %pip install openai==0.28 python-dotenv

Importing the modules

In [1]:
import pandas as pd
import numpy as np
import os
import openai
from dotenv import load_dotenv
import json
import re

# %pip install google-generativeai
# import google.generativeai as genai
import google.generativeai as genai



Setting up the OpenAI API:

1. Prepare a `.env` file to store the OpenAI API key.
2. Uploading the `.env` file to our colab environment
3. Load the API key and setup the API

Upload your `.env` file here

In [2]:
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
# print(GOOGLE_API_KEY)  # Just to verify it works; remove or hide this line for security.

Now let's load environment variables and get the API key

Let's setup our OpenAI API

### Task 2 - Craft Prompts to Communicate with the API

To communicate with the API we need to learn how to craft a prompt.

A prompt object contains two elements:
1. Role: Specifies the communicator's role—either `User`, `System`, or `Assistant`.
2. Content: Contains the text of the communication

example:
`prompt = {'role': 'user', 'content': 'what is the captial of Italy?'}`

Different Roles:

- **User**: Initiates the conversation, asks questions, and gives instructions to - the AI model.
- **System**: Sets the initial context or instructions for the conversation, guiding the AI's behavior.
- **Assistant**: Generates responses based on the user's queries and the context provided by the system, acting as the AI model's replies.

User initiates the conversation, system provides context, and assistant generates responses.


In [7]:
from PIL import Image

model = genai.GenerativeModel('gemini-pro-vision')
img = Image.open('challenge task\subplots.png')
response = model.generate_content(["Describe this image", img])
print(response.text)

 The image shows five line charts, each representing the sales of a different category of items over time. The x-axis of each chart is the date, and the y-axis is the number of items sold. The five categories are Art & Crafts, Electronics, Games, Sports & Outdoors, and Toys.

The line chart shows that the sales of Art & Crafts items are the highest, followed by Electronics, Games, Sports & Outdoors, and Toys. The sales of Art & Crafts items are the most consistent throughout the year, while the sales of Toys are the most variable. The sales of Electronics and Games are similar to each other, and the sales of Sports & Outdoors are the lowest.


In [5]:
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("Write a story about a robot.", stream=True)
for chunk in response:
    print(chunk.text, end='', flush=True)

In the enigmatic depths of a state-of-the-art laboratory, where cutting-edge technology danced in harmony, a creation of extraordinary brilliance was stirring to life. Its metallic chassis shimmered under the glow of iridescent lights, and its intricate circuitry hummed with the promise of unprecedented capabilities. This was RA-01, a marvel of human ingenuity, destined to redefine the boundaries of robotics.

RA-01 possessed an unyielding body, crafted from titanium alloy, impervious to the ravages of time and environment. Its multifaceted sensors scanned its surroundings with unmatched precision, enabling it to perceive the world in ways unimaginable to mere mortals. But it was the robot's cognitive engine that truly set it apart.

Infused with a vast repository of knowledge and a staggering analytical capacity, RA-01 exhibited an uncanny understanding of the human psyche. Its natural language processing abilities allowed it to communicate seamlessly, its synthetic voice effortlessly

In [26]:
embedding_model = 'models/embedding-001'
result = genai.embed_content(
    model=embedding_model,
    content="I'm Tom",
    task_type="retrieval_document"
)
print(result['embedding'])

[0.04047133, -0.011649686, -0.045584477, -0.04300226, 0.07413616, -0.020718645, -0.0029202271, -0.016356498, -0.0031816284, 0.030236812, -0.030348165, 0.0077063073, -0.027783826, 0.013180981, 0.0017774705, -0.025058605, 0.032831006, 0.037285417, 0.04176248, -0.029488526, 0.01184973, 0.01927381, -0.0186322, -0.012131158, 0.044597145, -0.004335161, 0.044522144, -0.034154024, -0.01963716, 0.007113668, -0.04285228, 0.038584493, -0.018372927, 0.027778499, 0.005201795, -0.054620463, -0.012211726, 0.023063896, 0.01748237, 0.0077342736, 0.0070697884, -0.044862293, -0.009603103, 0.028251812, 0.010456886, -0.004836407, -0.03783878, 0.033243455, -0.020023083, -0.019299809, 0.03163989, 0.0070110834, 0.06320513, -0.039913043, -0.00026085018, -0.026784431, 0.036692664, 0.007840465, -0.031190591, 0.015762603, 0.019358214, 0.008844534, 0.051850505, -0.000617393, -0.045531183, -0.062298954, -0.055260137, 0.01040255, 0.020224433, 0.020843754, -0.022485198, -0.045799084, 0.05023066, -0.0041463426, 0.0106

In [30]:
from sklearn.metrics.pairwise import cosine_similarity

# Assuming result['embedding'] is your first document embedding
embedding1 = result['embedding']

# Generate embedding for another document
embedding2 = genai.embed_content(
    model=embedding_model,
    content="Generate Content Response",
    task_type="retrieval_document"
)['embedding']

# Calculate cosine similarity
similarity = cosine_similarity([embedding1], [embedding2])

print(f"Similarity: {similarity[0][0]}")

Similarity: 0.7958216518835025


In [8]:
user =  {'role': 'user', 'content': 'what is the captial of Italy?'}

model = genai.GenerativeModel('gemini-pro')
# model = genai.GenerativeModel('gemini-1.5-flash')

response = model.generate_content('what is the captial of Italy?')
response
# response = model.generate_content(
#     messages=[user],  # List of messages (system and user)
#     max_tokens=500  # Set a limit on the number of tokens in the response
# )

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "Rome"
              }
            ],
            "role": "model"
          },
          "finish_reason": "STOP",
          "index": 0,
          "safety_ratings": [
            {
              "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_HATE_SPEECH",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_HARASSMENT",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
              "probability": "NEGLIGIBLE"
            }
          ]
        }
      ],
      "usage_metadata": {
        "prompt_token_count": 9,

In [18]:
# Calling the generate_content method on the model object.
# This method is expected to generate content based on the input prompt.
response = model.generate_content(
    # The input prompt for the content generation. In this case, it's asking for a story about a magic backpack.
    "Tell me a story about a magic backpack.",
    # The generation_config parameter specifies various configuration options for the generation process.
    generation_config=genai.types.GenerationConfig(
        # candidate_count specifies the number of different outputs to generate for the given input.
        # Here, it's set to 1, meaning only one version of the story will be generated.
        candidate_count=1,
        # stop_sequences is a list of strings that, when generated, will signal the model to stop generating further content.
        # Here, the generation will stop if "x" is produced.
        stop_sequences=["x"],
        # max_output_tokens specifies the maximum length of the generated content in terms of tokens.
        # A token can be a word or part of a word, so this doesn't directly translate to word count.
        max_output_tokens=10000,
        # temperature controls the randomness of the output. A lower temperature results in less random completions.
        # Setting it to 0.7 strikes a balance between randomness and coherence.
        temperature=0.7,
    ),
)

response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "Elara, a girl with eyes the color of a stormy sea and hair like spun moonlight, lived in a village nestled between towering mountains. Life was simple, filled with the rhythm of farm work and the stories whispered around crackling fires. But Elara yearned for something more, a world beyond the familiar peaks.\n\nOne day, while e"
              }
            ],
            "role": "model"
          },
          "finish_reason": "STOP",
          "index": 0,
          "safety_ratings": [
            {
              "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_HATE_SPEECH",
              "probability": "NEGLIGIBLE"
            },
            {
     

### Task 3 - Generate and Execute python code

Define a function to generate a chat response using the OpenAI GPT-4 model, given system and user messages as input.

In [4]:
def generate_chat_response(system_content, user_content):
    # Create two message dictionaries, one for the system and one for the user.
    system = {'role': 'system', 'content': system_content}
    user = {'role': 'user', 'content': user_content}

    # Use OpenAI's ChatCompletion API to generate a response.
    response = openai.ChatCompletion.create(
        model='gpt-4',
        messages=[system, user],  # List of messages (system and user)
        max_tokens=1200  # Set a limit on the number of tokens in the response
    )

    # Return the generated response.
    return response

Let's craft a prompt to generate a simple python method

Let's extract the code from the prompt response

In [5]:
def extract_code(response_content):
    # Define a regular expression pattern to match text between triple backticks (```)
    pattern = r'```(.*?)```'

    # Use re.findall to find all non-overlapping matches of the pattern in the input string
    matches = re.findall(pattern, response_content, re.DOTALL)

    # Remove the python keyword from in the code and Return the first match found
    return matches[0].replace("python", "")

Now let's execute the generated code and use it

### Task 4 - Generate Python Code for Data Preparation

Defining `generate_code` helper method

In [6]:
def generate_code(system_content, user_content):
    # Create two message dictionaries, one for the system and one for the user.
    system = {'role': 'system', 'content': system_content}
    user = {'role': 'user', 'content': user_content}

    # Use OpenAI's ChatCompletion API to generate a response.
    response = openai.ChatCompletion.create(
        model='gpt-4',
        messages=[system, user],  # List of messages (system and user)
        max_tokens=1200  # Set a limit on the number of tokens in the response
    )

    # Extract the response content
    response_content = response.choices[0].message.content

    # Define a regular expression pattern to match text between triple backticks (```)
    pattern = r'```(.*?)```'

    # Use re.findall to find all non-overlapping matches of the pattern in the input string
    matches = re.findall(pattern, response_content, re.DOTALL)

    # Remove the python keyword from in the code and Return the first match found
    return matches[0].replace("python", "")

Let's load product sales dataset

Let's check the data types in our dataset

**Some Data Preparation**
* Extract month name from the `date` column
* Calculate profit per product


Let's execute this code

Lets calculate profit per product

Now let's execute this code and use it.

### Task 5 - Generate Python Code for Data Visualization

Defining some helper functions

In [7]:
def generate_chat_response(system_content, user_content):
    # Create two message dictionaries, one for the system and one for the user.
    system = {'role': 'system', 'content': system_content}
    user = {'role': 'user', 'content': user_content}

    # Use OpenAI's ChatCompletion API to generate a response.
    response = openai.ChatCompletion.create(
        model='gpt-4',
        messages=[system, user],  # List of messages (system and user)
        max_tokens=1200  # Set a limit on the number of tokens in the response
    )

    # Return the generated response.
    return response


def extract_code(response_content):
    # Define a regular expression pattern to match text between triple backticks (```)
    pattern = r'```(.*?)```'

    # Use re.findall to find all non-overlapping matches of the pattern in the input string
    matches = re.findall(pattern, response_content, re.DOTALL)

    # Remove the python keyword from in the code and Return the first match found
    return matches[0].replace("python", "")


def generate_code_and_execute(user_content, execute=True):
    # Defining system content for code generation
    system_content = """
    You are a python code generator. You know pandas. You answer to every question with Python code.
    You return python code wrapped in ``` delimiter. Import any neede python module. And you don't provide any elaborations.
    """

    # generate chat response
    response = generate_chat_response(system_content, user_content)

    # extract resonse content
    response_content = response.choices[0].message.content

    # let's extract the code from the response content
    code = extract_code(response_content)

    if execute:
        exec(code, globals())  # Execute the generated Python code if execute is set to True

    return code


def update_code(code, user_content, execute = True):
    # Defining system content for code update
    system_content = f"""
    You are a python code generator. You know pandas. You are given the following python method: {code}. Update the code based on the user content. Do not change the method name.
    You return the updated python code wrapped in ``` delimiter. And you don't provide any elaborations.
    """
    # generate chat response
    response_content = generate_chat_response(system_content, user_content).choices[0].message.content

    # extracting the code
    new_code = extract_code(response_content)

    if execute:
        exec(new_code, globals())  # Execute the generated Python code if execute is set to True

    return new_code

Let's check our data again


**Question 1-** How does the daily `average` profit change over time?

Steps to answer this question:

1. Aggregate our data to have average `Product_Profit` per day.
2. Draw a line chart to visualize the data


Let's use the generated method

What if we want to update the generated visualization?

Running the new code

**Execrise:** Question 2 - Create a barchart visualization to show total average profit per Product Category.

Updating the code to have bar chart with different bar color per product category.

### Task 6 - Create Visualizations using AI

Let's checkout the data again

**Question 3** - Which product has the highest total sold items?

Let's make it a horizontal bar chart and highlight the prodcut with the highest number of sold items.