# Working with Files

Lesson 11 - txt files python , open read, store contents to file, extract information with LLMs in bullet points to txt file, structure prompts to extract desired data from txt file

In Python it is extremely simple to work with files like .txt or .md files for example.

In [11]:
from ai_tools import ask_ai
from IPython.display import Markdown

In [1]:
# To open a file

with open("./file.txt", "r") as f:
    data = f.read()

print(data)

https://arxiv.org/pdf/2412.14161

The paper titled "TheAgent Company: Benchmarking LLM Agents on Consequential Real World Tasks" introduces a new benchmark, TheAgentCompany, aimed at evaluating the efficacy of large language model (LLM)-powered AI agents in completing real-world professional tasks in a simulated software company environment. The research is conducted by a collaborative team, mainly from Carnegie Mellon University and other institutions, and emphasizes the growing presence of AI in work settings.

### Key Points from the Paper:

1. **Motivation**:
   - The rapid advancements in LLMs are prompting questions about AI's potential to automate or assist in various work-related tasks.
   - Understanding AI agents’ capabilities is crucial for businesses considering AI integration and for policymakers assessing AI’s impact on employment.

2. **Benchmark Overview**:
   - TheAgentCompany simulates a software company environment with 175 diverse professional tasks spanning categor

In this particular file we read a summary of a paper called: ["TheAgentCompany: Benchmarking LLM Agents on Clnsequential Real World Tasks"](https://arxiv.org/pdf/2412.14161).

We can also create files easily in Python:

In [2]:
content = "This is a file"
with open("summary-notes.txt", "w") as f:
    f.write(content)

In [3]:
# in the cmd below we print the contents of an existing file in the current directory
!cat ./summary-notes.txt

This is a file

Below are more examples for the different modes of reading and writing files available via the built-in `open()` method:

In [4]:
# Common file modes in Python's open() function:

# "a" - Append - Opens file for appending, creates new file if not exists
with open("summary-notes.txt", "a") as f:
    f.write("append this")

In [5]:
!cat summary-notes.txt

This is a fileappend this

In [8]:
# "x" - Exclusive creation - Opens for writing, fails if file exists

with open("file.txt", "x") as f:
    f.write("new file content")

FileExistsError: [Errno 17] File exists: 'file.txt'

In [9]:
# "+" - Read and write mode
with open("file.txt", "r+") as f:  # Open for both reading and writing
    data = f.read()
    f.write("new data")

The cool stuff about being able to do this is that we can connect our ability of generating summaries of information with AI, along with our ability to read and write files in Python to create super powerful workflows.

For example, below we will write single sentence summaries for multiple files containing information about different papers. 

In [12]:
folder_with_papers = "./assets-resources/papers/"
file_names = ["paper1.txt", "paper2.txt", "paper3.txt"]

def summarize_this_paper(paper_contents):
    summary_prompt = f"Summarize this paper\n\n: {paper_contents} in a couple of sentences."
    output_summary = ask_ai(summary_prompt)
    
    return output_summary

paper_summaries_list = []
for file_name in file_names:
    file_path = folder_with_papers + file_name
    with open(file_path, "r") as f:
        contents_of_the_paper = f.read()
    
    paper_summary = summarize_this_paper(contents_of_the_paper)
    paper_summaries_list.append(paper_summary)
            

# Display the markdown content in the notebook
    
markdown_content = "# Paper Summaries\n\n"
for i, summary in enumerate(paper_summaries_list, 1):
    markdown_content += f"## Paper {i}\n\n"
    markdown_content += f"{summary}\n\n"
        
Markdown(markdown_content)

# Paper Summaries

## Paper 1

The paper titled "TheAgent Company: Benchmarking LLM Agents on Consequential Real World Tasks" introduces a new benchmark, TheAgentCompany, aimed at evaluating the efficacy of large language model (LLM)-powered AI agents in completing real-world professional tasks in a simulated software company environment. The research is conducted by a collaborative team, mainly from Carnegie Mellon University and other institutions, and emphasizes the growing presence of AI in work settings.

### Key Points from the Paper:

1. **Motivation**:
   - The rapid advancements in LLMs are prompting questions about AI's potential to automate or assist in various work-related tasks.
   - Understanding AI agents’ capabilities is crucial for businesses considering AI integration and for policymakers assessing AI’s impact on employment.

2. **Benchmark Overview**:
   - TheAgentCompany simulates a software company environment with 175 diverse professional tasks spanning categories like software engineering, project management, and finance.
   - The benchmark allows agents to interact through web browsing, coding, and colleague communication, providing a realistic testing framework.

3. **Performance Findings**:
   - Experiments conducted with several LLMs, including closed (like OpenAI's GPT-4o and Claude) and open-weight models (like Llama), reveal that the top-performing model, Claude-3.5-Sonnet, achieved 24% task completion autonomously, with a score of 34.4% when accounting for partial completions.
   - Despite these advancements, LLM agents struggle significantly with longer, more complex tasks, especially those requiring social interaction and navigation of intricate user interfaces.

4. **Framework and Design**:
   - TheAgentCompany provides a self-hosted and reproducible environment utilizing open-source software.
   - Tasks are structured into parts with defined checkpoints, allowing agents to receive partial credit for incomplete tasks.
   - Evaluators for tasks are tailored to assess not just the success of task completion but also the quality of interactions with simulated colleagues.

5. **Interaction and Collaboration**:
   - A significant component of the benchmark involves the ability of agents to communicate effectively with simulated colleagues within the environment, enhancing the realism and complexity of tasks.

6. **Future Directions**:
   - The paper suggests that while TheAgentCompany provides a foundational step for understanding LLM capabilities in professional settings, there is a need to expand the tasks covered and include more creative or less straightforward tasks.
   - Continuous improvements in LLMs are expected, highlighting their potential for increased efficiency and performance across various domains.

7. **Conclusions**:
   - The research underscores the current limitations of LLM agents in effectively automating diverse professional tasks.
   - The results serve as a litmus test for future developments, pointing towards areas where LLM technology must improve, particularly in tasks involving human-like social interactions and complex decision-making.

In summary, TheAgentCompany represents a significant effort to quantify AI agents' performance in real-world applications and to chart a course for further research in this rapidly evolving field. The study emphasizes the necessity of enhancing both the complexity of tasks and the agents' social interaction capabilities, urging further exploration into LLM agent performance to better align with real-world professional needs.

## Paper 2

The paper presents a novel research area called Automated Design of Agentic Systems (ADAS), which aims to automatically create and optimize agentic systems using Foundation Models. The authors introduce an innovative algorithm called Meta Agent Search, wherein a "meta" agent is tasked with iteratively designing new agents by leveraging code-based representations. Through extensive experiments across various domains (coding, reading comprehension, math), the results demonstrate that agents developed via this automation significantly outperform state-of-the-art hand-designed agents, showcasing robustness across tasks and the potential for ADAS in advancing artificial intelligence research while highlighting the importance of safety in this burgeoning field.

## Paper 3

The paper titled "Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences" introduces EvalGen, a mixed-initiative system designed to enhance the alignment of evaluation metrics generated by Large Language Models (LLMs) with user preferences during the evaluation of LLM outputs. The authors emphasize the challenges posed by the subjective nature of evaluation and the common phenomenon of "criteria drift," where users refine their criteria based on their experiences grading outputs.

EvalGen enables users to generate evaluation criteria and corresponding assertions (either LLM-based or code-based), while also allowing for iterative feedback through user grading. The system utilizes this feedback to select the most aligned assertions, thereby improving the evaluation process. A qualitative study involving industry practitioners highlighted both the positive reception of EvalGen's capabilities and the difficulties posed by criteria drift, indicating that users often need to redefine their evaluation criteria as they interact with outputs.

Key findings suggest that criteria are dynamically dependent on the specific outputs being evaluated, and this highlights the necessity for an iterative design in future LLM evaluation assistants. The authors propose directions for designing these tools, emphasizing the importance of accommodating evolving user standards in interactive settings as well as considering the implications of deploying both code-based and LLM-based assertions in practice. The paper ultimately calls into question the notion of fixed evaluation criteria and encourages ongoing adjustment and refinement throughout the evaluation process.



We can do similar things to extract specific information from documents, imagine you have a bunch of differently formatted invoices from which you would like to organize the information extracting things like the amounts and dates.

## Key Takeaways

- Always use `with` statements when working with files to ensure proper closure
- Different file modes serve different purposes:
  - `"r"` for reading
  - `"w"` for writing (creates new/overwrites)
  - `"a"` for appending
  - `"r+"` for reading and writing
- Always handle potential file-related exceptions
- File operations can be combined with data processing for powerful automation
- Consider creating helper functions for common file operations
- Remember to close files properly (using `with` statements)
- Be careful with file paths and permissions

In the next lesson, we'll explore how to work with different file formats like CSV and JSON!