In [None]:
import logging

import boto3

boto3.setup_default_session()
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.debug("test")

# Getting started with chAI

chAI is a powerful tool that leverages AWS Bedrock and Claude AI models to analyze data, generate visualisations, and provide insights. This notebook demonstrates how to initialise and use chAI in your data analysis workflow.

## Initialising chAI
There are two ways to initialise chAI:

### Option 1: Specifying Parameters Directly
You can provide configuration parameters explicitly when initializing the chAI object. For convenience, we've defined Enum values in the AWSRegion and LLMModel classes that represent commonly used AWS regions and Claude model variants:

In [None]:
from chai import chAI
from chai.constants import AWSRegion, LLMModel

chai = chAI(
    aws_profile="aws-prototype",
    llm_region=AWSRegion.US_EAST_1,
    llm_model=LLMModel.CLAUDE_SONNET_3_7,
)

INFO:chai.chAI:chAI Start
INFO:chai.config:Using user-specified variables: AWS_PROFILE: aws-prototype, LLM_REGION: us-east-1, LLM_MODEL: CLAUDE_SONNET_3_5
INFO:chai.config:Loaded config for agent successfully.
INFO:chai.bedrock:Initialising BedrockHandler
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:chai.bedrock:Creating ChatBedrock LLM instance
INFO:chai.bedrock:Successfully created ChatBedrock LLM instance
INFO:chai.chAI:Setting up chAI agent
INFO:chai.chAI:chAI agent successfully set up


### Option 2: Using Environment Variables
For an alternative configuration approach or where you wish to keep profile credentials out of your code, chAI can automatically load settings from environment variables. Simply create a .env file in your project directory with the following parameters:

- AWS_PROFILE=your-aws-profile
- LLM_REGION=your-aws-region
- LLM_MODEL=your-chosen-model

In [28]:
from chai import chAI

chai = chAI()

INFO:chai.chAI:chAI Start
INFO:chai.config:Using .env variables: AWS_PROFILE: aws-prototype, LLM_REGION: us-east-1, LLM_MODEL: CLAUDE_SONNET_3_5
INFO:chai.config:Loaded config for agent successfully.
INFO:chai.bedrock:Initialising BedrockHandler
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:chai.bedrock:Creating ChatBedrock LLM instance
INFO:chai.bedrock:Successfully created ChatBedrock LLM instance
INFO:chai.chAI:Setting up chAI agent
INFO:chai.chAI:chAI agent successfully set up


For both approaches, for the LLM_MODEL variable you can use either the full model identifier (e.g., anthropic.claude-3-sonnet-20240229-v1:0) or one of the predefined Enum values from constants.py (e.g., CLAUDE_SONNET_3_5).

## What Happens During Initialisation
When you initialise chAI, several important steps occur:
- Configuration Loading: chAI loads configuration either from your specified parameters or from environment variables.
- AWS Authentication: chAI connects to AWS using the specified profile, automatically handling credentials from your ~/.aws/credentials file.
- Bedrock Handler Setup: A connection to AWS Bedrock service is established.
- LLM Instance Creation: chAI initialises a ChatBedrock LLM instance using the Claude model you specified to handle agentic functions.
- Agent Setup: Finally, chAI sets up an agent with specialised tools for data analysis and visualisation.

# Functionality Examples

## Intelligent Visualisation Recommendations

Have a dataset but don't know where to start looking for insights? chAI can help! With its data analysis capabilities, chAI examines your dataset and provides tailored visualisation recommendations to uncover meaningful patterns and insights.

Simply pass your DataFrame to the steep() method along with a prompt describing what you're looking for. 

When analysing your dataset, chAI delivers a comprehensive set of visualisation recommendations, each containing:
- 📊 Visualisation Type: Specific chart recommendations tailored to your data
- 🎯 Purpose: Clear explanation of what each visualisation aims to reveal
- 📈 Chart Type: The most appropriate chart type for the analysis
- 🔢 Variables: Which columns to use for x and y axes
- 💡 Expected Insights: What patterns or relationships you might discover

### Sample Data

In [None]:
import numpy as np
import pandas as pd


def create_sample_dataset():
    """Create a sample dataset for testing"""
    np.random.seed(42)
    dates = pd.date_range(start="2023-01-01", periods=100, freq="D")

    sample_df = pd.DataFrame(
        {
            "date": dates,
            "sales": np.random.normal(1000, 100, 100),
            "customers": np.random.randint(50, 200, 100),
            "satisfaction_score": np.random.uniform(3.5, 5.0, 100),
            "category": np.random.choice(["A", "B", "C"], 100),
        }
    )

    return sample_df


sample_df = create_sample_dataset()

### Call chAI

In [7]:
response = chai.steep(
    data=sample_df,
    prompt="Analyse this dataset and suggest visualisations to explore trends and insights.",
)

INFO:chai.chAI:Detected DataFrame input. Preparing to analyse...
INFO:chai.requests:Parsing DataFrame into structured JSON dictionary
INFO:chai.chAI:Sending prompt and data to agent executor...


In [8]:
# Access the visualization suggestions
print(response.suggestions)

📊 VISUALISATION SUGGESTIONS:


📈 TIME SERIES ANALYSIS OF SALES AND CUSTOMERS
Purpose: To analyze the trend of sales and customer count over time
Chart Type: Line Chart
Variables: date (x-axis), sales and customers (y-axis)
Expected Insights: Identify any patterns, seasonality, or trends in sales and customer numbers over the given time period. This can help in understanding business performance and potential correlations between sales and customer count.

--------------------------------------------------

📈 CORRELATION BETWEEN SALES AND CUSTOMER SATISFACTION
Purpose: To explore the relationship between sales and customer satisfaction scores
Chart Type: Scatter Plot
Variables: sales (x-axis), satisfaction_score (y-axis)
Expected Insights: Determine if there's a correlation between sales performance and customer satisfaction. This can help in understanding if higher sales are associated with higher satisfaction scores or vice versa.

--------------------------------------------------

📈

## Chart Analysis and Replication with chAI

chAI's powerful image analysis capabilities allow you to extract insights from chart images AND recreate them as interactive Plotly visualisations. This feature is perfect for reproducing charts taken from reports, presentations, or publications with full interactivity. You can turn static charts into interactive ones, guide the analysis with specific prompts, or get production-ready Plotly code that you can modify and extend as required.

When analysing chart images, chAI provides:
- 📊 Comprehensive Analysis: Insights about the data represented in the chart
- 💻 Plotly Code: Ready-to-use Python code that attempts to recreate as faithfully as possible the visualisation using Plotly Graph Objects
- 🔄 Interactive Chart: An HTML file auto-generated with the interactive version of the chart and stored locally.

Each of these components is stored in the `teapot` object as `analysis`, `code`, and `path` respectively.

### How to Analyse a Chart Image
Simply provide an image path and a prompt describing what you want to learn:

In [18]:
prompt = """
    Please analyse this chart and provide a comprehensive review with the following structure:

    1. Key Findings:
    - What are the most significant patterns or insights?
    - What are the highest and lowest values?
    - Are there any notable disparities or trends?

    2. Comparative Analysis:
    - Compare the different outcome groups
    - Identify any significant gaps or differences
    - Highlight any unexpected patterns

    Please provide your analysis in clear, business-friendly language suitable for stakeholders.
    """

In [19]:
image_analysis = chai.steep(prompt=prompt, image_path="../tests/img/satisfaction.png")

INFO:chai.chAI:Detected image location input. Preparing to review...
INFO:chai.requests:Encoding image to base64
INFO:chai.chAI:Sending prompt and data to agent executor...


In [20]:
# Access the analysis
print(image_analysis.analysis)

## Insights
1. The distribution of satisfaction scores is somewhat bimodal, with peaks around 4.2-4.4 and at 5.0.
2. The highest frequency is in the 4.2-4.4 range, with about 17 counts.
3. The lowest frequency is in the 3.4-3.6 range, with about 7 counts.
4. There's a notable increase in frequency for the highest satisfaction score of 5.0.
5. The distribution is generally skewed towards higher satisfaction scores, with most counts occurring above 4.0.


In [21]:
# View the generated Plotly code
print(image_analysis.code)

import plotly.graph_objects as go

satisfaction_scores = [3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8, 5.0]
counts = [7, 7, 12, 12, 17, 13, 12, 12, 16]

fig = go.Figure(data=[go.Bar(
    x=satisfaction_scores,
    y=counts,
    marker_color='orange'
)])

fig.update_layout(
    title='Satisfaction Score Distribution',
    xaxis_title='Satisfaction Score',
    yaxis_title='Count',
    template='plotly_white',
    bargap=0
)

fig.show()


In [22]:
# View the path to the locally generated chart
print(image_analysis.path)

/Users/jose.orjales/gds-ideas-chai/examples/plotly_visualisations/visualisation_1742559196.html


In [23]:
print(image_analysis)

{"analysis": "## Insights\n1. The distribution of satisfaction scores is somewhat bimodal, with peaks around 4.2-4.4 and at 5.0.\n2. The highest frequency is in the 4.2-4.4 range, with about 17 counts.\n3. The lowest frequency is in the 3.4-3.6 range, with about 7 counts.\n4. There's a notable increase in frequency for the highest satisfaction score of 5.0.\n5. The distribution is generally skewed towards higher satisfaction scores, with most counts occurring above 4.0.", "path": "/Users/jose.orjales/gds-ideas-chai/examples/plotly_visualisations/visualisation_1742559196.html", "code": "import plotly.graph_objects as go\n\nsatisfaction_scores = [3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8, 5.0]\ncounts = [7, 7, 12, 12, 17, 13, 12, 12, 16]\n\nfig = go.Figure(data=[go.Bar(\n    x=satisfaction_scores,\n    y=counts,\n    marker_color='orange'\n)])\n\nfig.update_layout(\n    title='Satisfaction Score Distribution',\n    xaxis_title='Satisfaction Score',\n    yaxis_title='Count',\n    template='p

## Chart Template Generation

The chAI library provides a powerful and intuitive way to generate data visualisations through natural language prompts. You can simply describe how you wish a chart to be structured, coloured, and more, alongside the chart_type parameter, and chAI will output plotly code and a copy of the generated chart locally. Using a back-end library of "typical" chart templates upon which to base its responses, you can get production-ready Plotly code that you can modify further and extend as required in seconds.

When generating charts, chAI provides:
- 💻 Plotly Code: Ready-to-use Python code that attempts to recreate as faithfully as possible your requirements using Plotly Graph Objects
- 🔄 Interactive Chart: An HTML file auto-generated with the interactive version of the chart and stored locally.

Each of these components is stored in the `teapot` object as `code` and `path` respectively.

### How to generate a chart

In [14]:
chart_prompt = "I want a red chart with bold axis titles and labels on the bars. There should also be a legend showing each distinct category"
chart_generation = chai.steep(prompt=chart_prompt, chart_type="scatter")

INFO:chai.chAI:Processing chart type request: scatter
INFO:chai.chAI:Sending prompt and data to agent executor...


In [15]:
# View the generated Plotly code
print(chart_generation.code)

import plotly.graph_objects as go
import numpy as np

# Generate sample data
np.random.seed(42)
x = np.random.uniform(0, 10, 50)
y = 2 * x + np.random.normal(0, 2, 50)

# Create figure with multiple series
fig = go.Figure()

# Add multiple scatter series with different shades of red
series_data = [
    {'x': x, 'y': y, 'name': 'Series 1', 'color': '#ff0000'},
    {'x': x, 'y': y + 2, 'name': 'Series 2', 'color': '#ff3333'},
    {'x': x, 'y': y - 2, 'name': 'Series 3', 'color': '#ff6666'},
    {'x': x, 'y': y * 0.8, 'name': 'Series 4', 'color': '#ff9999'}
]

for series in series_data:
    fig.add_trace(
        go.Scatter(
            x=series['x'],
            y=series['y'],
            mode='markers',
            name=series['name'],
            marker=dict(
                size=10,
                color=series['color'],
                opacity=0.7,
                line=dict(width=1, color='#444444')
            ),
            hovertemplate='x: %{x:.2f}<br>y: %{y:.2f}<extra></extra>'


In [16]:
# View the path to the locally generated chart
print(chart_generation.path)

/Users/jose.orjales/gds-ideas-chai/examples/plotly_visualisations/visualisation_1742559136.html


In [None]:
print(chart_generation)

{"path": "/Users/jose.orjales/gds-ideas-chai/examples/plotly_visualisations/visualisation_1742559136.html", "code": "import plotly.graph_objects as go\nimport numpy as np\n\n# Generate sample data\nnp.random.seed(42)\nx = np.random.uniform(0, 10, 50)\ny = 2 * x + np.random.normal(0, 2, 50)\n\n# Create figure with multiple series\nfig = go.Figure()\n\n# Add multiple scatter series with different shades of red\nseries_data = [\n    {'x': x, 'y': y, 'name': 'Series 1', 'color': '#ff0000'},\n    {'x': x, 'y': y + 2, 'name': 'Series 2', 'color': '#ff3333'},\n    {'x': x, 'y': y - 2, 'name': 'Series 3', 'color': '#ff6666'},\n    {'x': x, 'y': y * 0.8, 'name': 'Series 4', 'color': '#ff9999'}\n]\n\nfor series in series_data:\n    fig.add_trace(\n        go.Scatter(\n            x=series['x'],\n            y=series['y'],\n            mode='markers',\n            name=series['name'],\n            marker=dict(\n                size=10,\n                color=series['color'],\n                opac