In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from perfeed.tools.pr_summarizer import PRSummarizer
from perfeed.git_providers.github import GithubProvider
from perfeed.llms.ollama_client import OllamaClient
from perfeed.llms.openai_client import OpenAIClient
from perfeed.data_stores import FeatherStorage
import pandas as pd
import asyncio
import nest_asyncio
import json

from dotenv import load_dotenv
load_dotenv()
nest_asyncio.apply()

from IPython.display import display, Markdown

# PR summary

In [3]:
summarizer = PRSummarizer(
    GithubProvider("Perfeed"), 
    llm=OpenAIClient("gpt-4o-mini")
    # llm=OllamaClient("llama3.1")
)
pr_summary, metadata = asyncio.run(summarizer.run("perfeed", 8))

{    "type": ["Enhancement"],    "title": "Add BaseClient for llm interface with OpenAI and Ollama impl",    "description": "This PR introduces a new `BaseClient` interface for LLM clients, along with implementations for `OpenAIClient` and `OllamaClient`. The `OllamaClient` allows for chat completions using the Ollama API, while the `OpenAIClient` integrates with OpenAI's API, ensuring flexibility for various LLM clients. Additionally, it updates the `poetry.lock` and `pyproject.toml` files to include the necessary dependencies.",    "pr_files": [        {            "filename": "poetry.lock",            "language": "TOML",            "changes_summary": "Added dependencies for `ollama` and updated existing dependencies.",            "changes_title": "Update dependencies for LLM clients",            "label": "dependencies"        },        {            "filename": "pyproject.toml",            "language": "TOML",            "changes_summary": "Included `ollama` as a dependency for the pr

In [4]:
# 4o-mini
json.loads(pr_summary.model_dump_json())

{'type': ['Enhancement'],
 'title': 'Add BaseClient for llm interface with OpenAI and Ollama impl',
 'description': "This PR introduces a new `BaseClient` interface for LLM clients, along with implementations for `OpenAIClient` and `OllamaClient`. The `OllamaClient` allows for chat completions using the Ollama API, while the `OpenAIClient` integrates with OpenAI's API, ensuring flexibility for various LLM clients. Additionally, it updates the `poetry.lock` and `pyproject.toml` files to include the necessary dependencies.",
 'pr_files': [{'filename': 'poetry.lock',
   'language': 'TOML',
   'changes_summary': 'Added dependencies for `ollama` and updated existing dependencies.',
   'changes_title': 'Update dependencies for LLM clients',
   'label': 'dependencies'},
  {'filename': 'pyproject.toml',
   'language': 'TOML',
   'changes_summary': 'Included `ollama` as a dependency for the project.',
   'changes_title': 'Add ollama dependency',
   'label': 'dependencies'},
  {'filename': 'src/

In [None]:
# llama 3.1
json.loads(pr_summary.model_dump_json())

{'type': ['Enhancement'],
 'title': 'Add BaseClient for llm interface with OpenAI and Ollama impl',
 'description': 'This pull request adds a base client class for interacting with various language models (LLMs) like OpenAI and Ollama. It includes two new classes, `OpenAIClient` and `OllamaClient`, which inherit from the `BaseClient` class. These clients provide an interface to interact with their respective LLMs.',
 'pr_files': [{'filename': 'src/llms/base_client.py',
   'language': 'Python',
   'changes_summary': 'Added a base client class for interacting with various language models (LLMs). Two new classes, `OpenAIClient` and `OllamaClient`, were added to inherit from the `BaseClient` class.',
   'changes_title': 'Add Base Client Class',
   'label': 'Enhancement'},
  {'filename': 'src/llms/openai_client.py',
   'language': 'Python',
   'changes_summary': 'Added an OpenAI client class that inherits from the `BaseClient` class. It provides an interface to interact with the OpenAI API.

# weekly

In [3]:
summarizer = PRSummarizer(
    GithubProvider("Perfeed"), 
    llm=OllamaClient("llama3.1")
)
fs = FeatherStorage(data_type="pr_summary", overwrite=False, append=True)
# for pr in [11, 12, 13, 14]:
#     pr_summary, metadata = asyncio.run(summarizer.run("perfeed", pr))
#     fs.save(data=pr_summary, metadata=metadata)

In [4]:
df = fs.load()
rank = df.groupby(['pr_number', 'llm_provider', 'model'])['created_at'].rank(ascending=False)
df = df[rank==1]
df['review_interval'] = (pd.to_datetime(df['pr_merged_at']) - pd.to_datetime(df['pr_created_at'])).dt.total_seconds() / 86400
df = df[df['author'] == 'jzxcd']

In [5]:
from jinja2 import Environment, StrictUndefined
from perfeed.config_loader import settings
from perfeed.utils.utils import json_output_curator

llm_llama = OllamaClient("llama3.1")
llm_openai = OpenAIClient("gpt-4o-mini")

In [6]:

variables = {
    "pr_summaries": df.to_json(orient='records'),
}
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(
    settings.weekly_summary_prompt.system
).render(variables)
user_prompt = environment.from_string(
    settings.weekly_summary_prompt.user
).render(variables)


In [7]:
# llama3.1
summary = llm_llama.chat_completion(system_prompt, user_prompt)
Markdown(summary)

**PR Summary for the Week**

This week, jzxcd merged 2 PRs that addressed critical issues in Perfeed.

### Overview
The two PRs focused on bug fixes and enhancements to improve performance and functionality.

* **PR 13:** This PR fixed inconsistencies in the `pr_summarizer` output and added asyncio support. It also restructured the `utils` module.
* **PR 14:** This PR added support for feather and SQL DB storage, along with save and load functionality.

### Significant Changes
#### PR 13:
* Renamed `tokenizer.py` to `utils.py` and refactored the `utils` module.
* Added asyncio support to the `pr_summarizer`.
* Refactored the `utils` module to improve maintainability.

#### PR 14:
* Added support for feather and SQL DB storage.
* Introduced save and load functionality.
* Moved the `validate_and_convert` function to the `BaseStorage` class.

### Noteworthy Refactors/Architecture
None

### Review Process
Both PRs had thorough discussions on GitHub, with multiple code changes and suggestions from contributors. The issues were addressed, and the PRs were merged successfully.

**PR 13:**

* Created on: 2024-10-26T23:48:11Z
* Merged on: 2024-10-28T16:13:24Z
* Review interval: 1.6841782407

**PR 14:**

* Created on: 2024-10-26T23:48:11Z
* Merged on: 2024-10-28T16:13:24Z
* Review interval: 1.6841782407

In [9]:
# gpt-4o-mini
summary = llm_openai.chat_completion(system_prompt, user_prompt)
Markdown(summary)

### PR Summary for the Week

This week, the team focused on bug fixes and enhancements, merging two significant pull requests.

---

#### Overview:
- The first PR addressed issues with the `pr_summarizer`, ensuring proper JSON output and adding asyncio support. The second PR introduced support for feather and SQL database storage, along with save and load functionalities.

---

#### Significant Changes:
1. **PR Summarizer Fixes and Improvements**:
   - Fixed JSON output formatting in the `pr_summarizer`.
   - Added asyncio support to enhance performance.
   - Refactored the `utils` module to improve structure and maintainability.

2. **Datastore Enhancements**:
   - Introduced `BaseStorage` class for handling different storage types.
   - Added `FeatherStorage` and `SQLStorage` classes for feather and SQL database support, respectively.
   - Updated `.gitignore` to exclude unnecessary files.

---

#### Refactors/Architecture:
- The `utils` module was restructured, including renaming `tokenizer.py` to `utils.py` and adding the `_output_curator` function.
- The `DataStore` class factory function was removed to simplify the architecture.
- The `validate_and_convert` function was moved to the `BaseStorage` class, promoting better organization of storage-related logic.

---

#### Review Process:
- For the `pr_summarizer` PR, there were discussions around maintaining async functionality in the `PRSummarizer.run` method, which led to the addition of async support.
- In the datastore PR, feedback included the addition of the `PRSummaryMetadata` class and discussions on the `OllamaClient` class enhancements.
- The author addressed all comments effectively, leading to both PRs being merged within 4 hours and 1.5 days, respectively, after submission.

--- 

This summary highlights the team's commitment to improving code quality and functionality, aligning with sprint objectives focused on bug fixes and performance enhancements.

In [10]:
summary

"### PR Summary for the Week\n\nThis week, the team focused on bug fixes and enhancements, merging two significant pull requests.\n\n---\n\n#### Overview:\n- The first PR addressed issues with the `pr_summarizer`, ensuring proper JSON output and adding asyncio support. The second PR introduced support for feather and SQL database storage, along with save and load functionalities.\n\n---\n\n#### Significant Changes:\n1. **PR Summarizer Fixes and Improvements**:\n   - Fixed JSON output formatting in the `pr_summarizer`.\n   - Added asyncio support to enhance performance.\n   - Refactored the `utils` module to improve structure and maintainability.\n\n2. **Datastore Enhancements**:\n   - Introduced `BaseStorage` class for handling different storage types.\n   - Added `FeatherStorage` and `SQLStorage` classes for feather and SQL database support, respectively.\n   - Updated `.gitignore` to exclude unnecessary files.\n\n---\n\n#### Refactors/Architecture:\n- The `utils` module was restructu