<img src="assets/data_analysis_with_polars_copyright-1.png" width="500"/>

## Test installation
In this notebook we test your installation and whether you are in the virtual environment.

First we will run `which python` - this should lead to a path in your `course_env` virtual environment

If it is not in your virtual environment you need to activate your virtual environment. 

If you have a pre-existing `conda` environment activated you may need to `deactivate` that environment before activating the course_env enviroment

## Test imports
You will need to be able to import Polars and (less importantly) Plotly.

**If this cell runs for more than 15 seconds without finishing then restart the kernel and try again**

In [1]:
import polars as pl
import pandas as pd
import numpy as np
import plotly.express as px

## Test data read
You will need to be able to read the following data files

In [16]:
df = pl.read_csv(
    "../case_study1/langgraph/01_article_writer_should_write.csv",
    separator="|",
    has_header=False,  # Default is True
    skip_rows=0,  # Skip rows at beginning
    null_values=["", "NULL"],  # What to treat as null
    encoding="utf8",  # File encoding
    new_columns=[
        "datetime",
        "article_writer",
        "evaluator",
        "model",
        "temperature",
        "query",
        "output",
        "input_tokens",
        "output_tokens",
        "time_taken",
    ],
)

In [17]:
df.head(2)

datetime,article_writer,evaluator,model,temperature,query,output,input_tokens,output_tokens,time_taken
str,str,str,str,i64,str,str,i64,i64,f64
"""2025-07-20-20-46-12""","""ARTICLE_WRITER""","""EVALUATOR""","""gpt-4o-mini""",0,"""Planning permission for 49 Eng…","""no""",6,5,1.18
"""2025-07-20-20-46-13""","""ARTICLE_WRITER""","""EVALUATOR""","""gpt-4o-mini""",0,"""AI and Machine learning in mod…","""yes""",6,5,0.6


In [20]:
df = pl.read_csv(
    "../case_study1/langgraph/02_article_writer_translate.csv",
    separator="|",
    has_header=False,  # Default is True
    skip_rows=0,  # Skip rows at beginning
    null_values=["", "NULL"],  # What to treat as null
    encoding="utf8",  # File encoding
    new_columns=[
        "datetime",
        "article_writer",
        "evaluator",
        "model",
        "temperature",
        "input",
        "output",
        "result",
        "time_taken",
    ],
)

Ok, if all that worked you should be good to go!

In [21]:
df.head(1)

datetime,article_writer,evaluator,model,temperature,input,output,result,time_taken
str,str,str,str,i64,str,str,str,f64
"""2025-07-20-20-46-15""","""ARTICLE_WRITER""","""TRANSLATE""","""gpt-4o-mini""",0,"""AI and Machine learning in mod…","""content=""L'IA et l'apprentissa…","""content=""L'IA et l'apprentissa…",0.59


In [6]:
print(df["output"])  # df["output"]

shape: (1,)
Series: 'output' [str]
[
	"content="L'IA et l'apprentissa…
]


In [7]:
import re
import ast

# Get the OUTPUT column data
output_data = df.select("output").to_series().to_list()

# For each row's output data
for i, output_str in enumerate(output_data):
    print(f"Row {i+1} 'output' data:")
    print("-" * 50)

    # Extract content
    content_match = re.search(r'content="([^"]*)"', output_str)
    if content_match:
        print(f"content: {content_match.group(1)}")

    # Extract additional_kwargs
    kwargs_match = re.search(r"additional_kwargs=(\{[^}]*\})", output_str)
    if kwargs_match:
        try:
            kwargs_dict = ast.literal_eval(kwargs_match.group(1))
            print(f"additional_kwargs: {kwargs_dict}")
            for key, value in kwargs_dict.items():
                print(f"  {key}: {value}")
        except Exception:
            print(f"additional_kwargs: {kwargs_match.group(1)}")

    # Extract response_metadata
    metadata_match = re.search(r"response_metadata=(\{.*?\}(?=\s+id=))", output_str)
    if metadata_match:
        print(f"response_metadata found (complex nested structure)")
        # Parse token usage specifically
        token_usage_match = re.search(r"'token_usage':\s*(\{[^}]*\})", output_str)
        if token_usage_match:
            try:
                token_dict = ast.literal_eval(token_usage_match.group(1))
                print(f"  token_usage:")
                for key, value in token_dict.items():
                    print(f"    {key}: {value}")
            except Exception:
                print(f"  token_usage: {token_usage_match.group(1)}")

    # Extract usage_metadata
    usage_match = re.search(r"usage_metadata=(\{[^}]*\})", output_str)
    if usage_match:
        try:
            usage_dict = ast.literal_eval(usage_match.group(1))
            print("usage_metadata:")
            for key, value in usage_dict.items():
                print(f"  {key}: {value}")
        except Exception:
            print(f"usage_metadata: {usage_match.group(1)}")

    print("\n")

Row 1 'output' data:
--------------------------------------------------
content: L'IA et l'apprentissage automatique dans les affaires modernes
additional_kwargs: {'refusal': None}
  refusal: None
response_metadata found (complex nested structure)
  token_usage: {'completion_tokens': 12, 'prompt_tokens': 45, 'total_tokens': 57, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}
usage_metadata: {'input_tokens': 45, 'output_tokens': 12, 'total_tokens': 57, 'input_token_details': {'audio': 0, 'cache_read': 0}




In [8]:
df = pl.read_csv(
    "../case_study1/langgraph/03_article_writer_expand.csv",
    separator="|",
    has_header=False,  # Default is True
    skip_rows=0,  # Skip rows at beginning
    null_values=["", "NULL"],  # What to treat as null
    encoding="utf8",  # File encoding
    new_columns=[
        "datetime",
        "article_writer",
        "evaluator",
        "model",
        "temperature",
        "input",
        "output",
        "result",
        "time_taken",
    ],
)

In [9]:
import re
import ast

# Get the OUTPUT column data
output_data = df.select("result").to_series().to_list()
print(f"Number of rows in output_data: {len(output_data)}")

# For each row's output data

for i, output_str in enumerate(output_data):
    print(f"Row {i+1} 'output' data:")
    print("-" * 50)

    # Extract content
    content_match = re.search(r'content="([^"]*)"', output_str)
    if content_match:
        print(f"content: {content_match.group(1)}")

    # Extract additional_kwargs
    kwargs_match = re.search(r"additional_kwargs=(\{[^}]*\})", output_str)
    if kwargs_match:
        try:
            kwargs_dict = ast.literal_eval(kwargs_match.group(1))
            print(f"additional_kwargs: {kwargs_dict}")
            for key, value in kwargs_dict.items():
                print(f"  {key}: {value}")
        except Exception:
            print(f"additional_kwargs: {kwargs_match.group(1)}")

    print("\n")

Number of rows in output_data: 1
Row 1 'output' data:
--------------------------------------------------
content: L'intelligence artificielle (IA) et l'apprentissage automatique jouent un rôle de plus en plus crucial dans les affaires modernes. Ces technologies permettent aux entreprises d'analyser de vastes quantités de données pour en extraire des insights précieux, améliorant ainsi la prise de décision. Par exemple, grâce à l'IA, les entreprises peuvent personnaliser leurs offres, anticiper les besoins des clients et optimiser leurs opérations. De plus, l'automatisation des tâches répétitives libère du temps pour les employés, leur permettant de se concentrer sur des activités à plus forte valeur ajoutée. En intégrant l'IA et l'apprentissage automatique, les entreprises peuvent non seulement accroître leur efficacité, mais aussi rester compétitives dans un marché en constante évolution.
additional_kwargs: {'refusal': None}
  refusal: None


