In [1]:
# Import necessary libraries
from typing import List
import os

import pandas as pd
from pydantic import BaseModel

### Create a DataFrame

We define a list of English entities and create a Pandas DataFrame from this list.

In [2]:
# Define a list of entities to translate
fruits: List[str] = ["apple", "banana", "orange", "grape", "kiwi", "mango", "peach", "pear", "pineapple", "strawberry"]
fruits_df = pd.DataFrame({"name": fruits})
fruits_df

Unnamed: 0,name
0,apple
1,banana
2,orange
3,grape
4,kiwi
5,mango
6,peach
7,pear
8,pineapple
9,strawberry


# import openaivec.pandas_ext

This example demonstrates how to integrate the `openaivec.pandas_ext` module with Pandas for text translation tasks. Follow the examples below for single and multi-language translations.

If environment variavle `OPENAI_API_KEY` is set, `pandas_ext` automatically use the client `openai.OpenAI`.

If environment variables `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT` and `AZURE_OPENAI_API_VERSION` are set, `pandas_ext` automatically use the client `openai.AzureOpenAI`.

If you must use specific instance of `openai.OpenAI`, please set client with `pandas_ext.use`.

In [3]:
import openai
from openaivec import pandas_ext

pandas_ext.use(openai.OpenAI())

# Process the columns with OpenAI
Once we load `pandas_ext`, we are able to process with series with simple accessof `pd.Series.ai.predict`.

In [4]:
# Translate name to French and add as a new column
s: pd.Series = fruits_df.name.ai.predict("gpt-4o-mini", "translate to French")

s

0     pomme
1    banane
2    orange
3    raisin
4      kiwi
5    mangue
6     pêche
7     poire
8    ananas
9    fraise
Name: name, dtype: object

And embeddings also works with method `pd.Series.ai.embed`

In [5]:
e: pd.Series = fruits_df.name.ai.embed("text-embedding-3-small")

e

0    [0.017625168, -0.016837789, -0.041888542, 0.01...
1    [0.008520956, -0.024025958, -0.029065346, -0.0...
2    [-0.025900742, -0.005591442, -0.0061122794, 0....
3    [-0.038733866, 0.009573344, -0.02065904, -0.00...
4    [-0.005749877, -0.021446224, -0.02598345, 0.02...
5    [0.05547354, -0.008819806, -0.01995101, -0.006...
6    [0.030666627, -0.041978087, -0.01391589, 0.037...
7    [0.023707643, -0.022397757, -0.008799582, 0.03...
8    [0.021006476, -0.060446374, -0.002942336, 0.02...
9    [0.020135976, -0.014379032, -0.04067174, -0.02...
Name: name, dtype: object

# Structured Output with pandas_ext

Structured output is also available in `pd.Series.ai.predict`.

In [6]:
# Define a structured output model for translations (Example: using Pydantic for structured output)
class Translation(BaseModel):
    en: str  # English
    fr: str  # French
    ja: str  # Japanese
    es: str  # Spanish
    de: str  # German
    it: str  # Italian
    pt: str  # Portuguese
    ru: str  # Russian

translations: pd.Series = fruits_df.name.ai.predict(
    model_name="gpt-4o-mini",
    prompt="translate to multiple languages",
    response_format=Translation
)

translations

0    en='apple' fr='pomme' ja='リンゴ' es='manzana' de...
1    en='banana' fr='banane' ja='バナナ' es='plátano' ...
2    en='orange' fr='orange' ja='オレンジ' es='naranja'...
3    en='grape' fr='raisin' ja='ぶどう' es='uva' de='T...
4    en='kiwi' fr='kiwi' ja='キウイ' es='kiwi' de='Kiw...
5    en='mango' fr='mangue' ja='マンゴー' es='mango' de...
6    en='peach' fr='pêche' ja='桃' es='durazno' de='...
7    en='pear' fr='poire' ja='梨' es='pera' de='Birn...
8    en='pineapple' fr='ananas' ja='パイナップル' es='piñ...
9    en='strawberry' fr='fraise' ja='いちご' es='fresa...
Name: name, dtype: object

And these values of `pd.Series` are instance of `pydantic.BaseModel`. 

`pd.Series.ai.extract` method can parse each element as `pd.DataFrame`

In [7]:
translations.ai.extract()

Unnamed: 0,en,fr,ja,es,de,it,pt,ru
0,apple,pomme,リンゴ,manzana,Apfel,mela,maçã,яблоко
1,banana,banane,バナナ,plátano,Banane,banana,banana,банан
2,orange,orange,オレンジ,naranja,Orange,arancia,laranja,апельсин
3,grape,raisin,ぶどう,uva,Traube,uva,uva,виноград
4,kiwi,kiwi,キウイ,kiwi,Kiwi,kiwi,kiwi,киви
5,mango,mangue,マンゴー,mango,Mango,mango,manga,манго
6,peach,pêche,桃,durazno,Pfirsich,pesca,pêssego,персик
7,pear,poire,梨,pera,Birne,pera,pera,груша
8,pineapple,ananas,パイナップル,piña,Ananas,ananas,abacaxi,ананас
9,strawberry,fraise,いちご,fresa,Erdbeere,fragola,morango,клубника


# Example of Data Enrichment of fruit table

These interfaces can be seamlessly integreted with `pd.DataFrame` APIs.

Let's enrich your data with power of LLMs!

In [None]:
fruits_df.assign(
    color=lambda df: df.name.ai.predict("gpt-4o-mini", "Return the color of given fruit"),
    embedding=lambda df: df.name.ai.embed("text-embedding-3-small"),
    translation=lambda df: df.name.ai.predict(
        model_name="gpt-4o-mini",
        prompt="translate to multiple languages",
        response_format=Translation
    )
).pipe(
    # Extract the translation column from the structured output
    lambda df: df.ai.extract(column="translation")   
)

Unnamed: 0,name,color,embedding,en,fr,ja,es,de,it,pt,ru
0,apple,red,"[0.017625168, -0.016837789, -0.041888542, 0.01...",apple,pomme,リンゴ,manzana,Apfel,mela,maçã,яблоко
1,banana,yellow,"[0.008520956, -0.024025958, -0.029065346, -0.0...",banana,banane,バナナ,plátano,Banane,banana,banana,банан
2,orange,orange,"[-0.025900742, -0.005591442, -0.0061122794, 0....",orange,orange,オレンジ,naranja,Orange,arancia,laranja,апельсин
3,grape,purple,"[-0.038733866, 0.009573344, -0.02065904, -0.00...",grape,raisin,ぶどう,uva,Traube,uva,uva,виноград
4,kiwi,green,"[-0.005749877, -0.021446224, -0.02598345, 0.02...",kiwi,kiwi,キウイ,kiwi,Kiwi,kiwi,kiwi,киви
5,mango,yellow/orange,"[0.05547354, -0.008819806, -0.01995101, -0.006...",mango,mangue,マンゴー,mango,Mango,mango,manga,манго
6,peach,pink/cream,"[0.030666627, -0.041978087, -0.01391589, 0.037...",peach,pêche,桃,durazno,Pfirsich,pesca,pêssego,персик
7,pear,green/yellow,"[0.023707643, -0.022397757, -0.008799582, 0.03...",pear,poire,梨,pera,Birne,pera,pera,груша
8,pineapple,brown/yellow,"[0.021006476, -0.060446374, -0.002942336, 0.02...",pineapple,ananas,パイナップル,piña,Ananas,ananas,abacaxi,ананас
9,strawberry,red,"[0.020135976, -0.014379032, -0.04067174, -0.02...",strawberry,fraise,いちご,fresa,Erdbeere,fragola,morango,клубника
