In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from perfeed.tools.pr_summarizer import PRSummarizer
from perfeed.git_providers.github import GithubProvider
from perfeed.llms.ollama_client import OllamaClient
from perfeed.llms.openai_client import OpenAIClient
from perfeed.data_stores import FeatherStorage
import pandas as pd
import asyncio
import nest_asyncio
import json

from dotenv import load_dotenv
load_dotenv()
nest_asyncio.apply()

from IPython.display import display, Markdown

# PR summary

In [13]:
summarizer = PRSummarizer(
    GithubProvider("Perfeed"), 
    llm=OpenAIClient("gpt-4o-mini")
    # llm=OllamaClient("llama3.1")
)
pr_summary, metadata = asyncio.run(summarizer.run("perfeed", 8))

{    "type": ["Enhancement"],    "title": "Add BaseClient for llm interface with OpenAI and Ollama impl",    "description": "This PR introduces a new `BaseClient` interface for LLM clients, along with implementations for `OpenAIClient` and `OllamaClient`. This structure allows for more flexible integration of various LLM clients. Additionally, it updates the `pyproject.toml` and `poetry.lock` files to include the new `ollama` package.",    "pr_files": [        {            "filename": "poetry.lock",            "language": "plaintext",            "changes_summary": "Added `ollama` package and updated dependencies for `openai` and `scikit_learn`.",            "changes_title": "Update dependencies and add ollama package",            "label": "dependencies"        },        {            "filename": "pyproject.toml",            "language": "plaintext",            "changes_summary": "Included `ollama` package in the project dependencies.",            "changes_title": "Add ollama package to p

In [14]:
# 4o-mini
json.loads(pr_summary.model_dump_json())

{'type': ['Enhancement'],
 'title': 'Add BaseClient for llm interface with OpenAI and Ollama impl',
 'description': 'This PR introduces a new `BaseClient` interface for LLM clients, along with implementations for `OpenAIClient` and `OllamaClient`. This structure allows for more flexible integration of various LLM clients. Additionally, it updates the `pyproject.toml` and `poetry.lock` files to include the new `ollama` package.',
 'pr_files': [{'filename': 'poetry.lock',
   'language': 'plaintext',
   'changes_summary': 'Added `ollama` package and updated dependencies for `openai` and `scikit_learn`.',
   'changes_title': 'Update dependencies and add ollama package',
   'label': 'dependencies'},
  {'filename': 'pyproject.toml',
   'language': 'plaintext',
   'changes_summary': 'Included `ollama` package in the project dependencies.',
   'changes_title': 'Add ollama package to project dependencies',
   'label': 'dependencies'},
  {'filename': 'src/llms/base_client.py',
   'language': 'py

In [15]:
# llama 3.1
json.loads(pr_summary.model_dump_json())

{'type': ['Enhancement'],
 'title': 'Add BaseClient for llm interface with OpenAI and Ollama impl',
 'description': 'This PR introduces a new `BaseClient` interface for LLM clients, along with implementations for `OpenAIClient` and `OllamaClient`. This structure allows for more flexible integration of various LLM clients. Additionally, it updates the `pyproject.toml` and `poetry.lock` files to include the new `ollama` package.',
 'pr_files': [{'filename': 'poetry.lock',
   'language': 'plaintext',
   'changes_summary': 'Added `ollama` package and updated dependencies for `openai` and `scikit_learn`.',
   'changes_title': 'Update dependencies and add ollama package',
   'label': 'dependencies'},
  {'filename': 'pyproject.toml',
   'language': 'plaintext',
   'changes_summary': 'Included `ollama` package in the project dependencies.',
   'changes_title': 'Add ollama package to project dependencies',
   'label': 'dependencies'},
  {'filename': 'src/llms/base_client.py',
   'language': 'py

# weekly

In [12]:
summarizer = PRSummarizer(
    GithubProvider("Perfeed"), 
    llm=OllamaClient("llama3.1")
)
fs = FeatherStorage(data_type="pr_summary", overwrite=False, append=True)
# for pr in [11, 12, 13, 14]:
#     pr_summary, metadata = asyncio.run(summarizer.run("perfeed", pr))
#     fs.save(data=pr_summary, metadata=metadata)

In [4]:
df = fs.load()
rank = df.groupby(['pr_number', 'llm_provider', 'model'])['created_at'].rank(ascending=False)
df = df[rank==1]
df['review_interval'] = (pd.to_datetime(df['pr_merged_at']) - pd.to_datetime(df['pr_created_at'])).dt.total_seconds() / 3600
df = df[df['author'] == 'jzxcd']

In [10]:
from jinja2 import Environment, StrictUndefined
from perfeed.config_loader import settings
from perfeed.utils.utils import json_output_curator

llm_llama = OllamaClient("llama3.1")
llm_openai = OpenAIClient("gpt-4o-mini")

In [8]:

variables = {
    "pr_summaries": df.to_json(orient='records'),
}
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(
    settings.weekly_summary_prompt.system
).render(variables)
user_prompt = environment.from_string(
    settings.weekly_summary_prompt.user
).render(variables)


In [9]:
# llama3.1
summary = llm_llama.chat_completion(system_prompt, user_prompt)
Markdown(summary)

**PR Summary for the Week**

### Overview of PRs

This week, there were two pull requests (PRs) merged into the Perfeed repository.

#### PR 1: Fixing pr_summarizer for Proper Format and Asyncio Run
Type: Bug fix
Title: fix pr_summarizer for proper format and asyncio run
Description: This PR fixes the `pr_summarizer` to properly output JSON format and adds asyncio support. It also restructures the utils.

#### PR 2: Adding Support for Feather and SQL DB Storage
Type: Bug fix
Title: Datastore
Description: Added support for feather and SQL DB storage. Also added save and load functionality.

### Significant Changes

**PR 1**

* **Bug Fix**: Updated `pr_summarizer` to properly output JSON format and added asyncio support.
* **Enhancement**: Restructured the utils module, added _output_curator function to the utils module, and renamed tokenizer.py to utils.py.

**PR 2**

* **Bug Fix**: Added support for feather and SQL DB storage, and added save and load functionality.
* **Discussion Points**:
	+ Discussion on async function and PRSummarizer.run()
	+ Discussion on PRSummaryMetadata class
	+ Discussion on OllamaClient class
	+ Discussion on DataStore class and factory function
	+ Discussion on BaseStorage class and return type
	+ Discussion on validate_and_convert function

### Review Interval

The review interval for both PRs was within a reasonable timeframe.

### Conclusion

Two important PRs were merged into the Perfeed repository this week, fixing issues with `pr_summarizer` and adding support for feather and SQL DB storage. The discussions on these PRs provided valuable insights and suggestions from team members.

In [11]:
# gpt-4o-mini
summary = llm_openai.chat_completion(system_prompt, user_prompt)
Markdown(summary)

### Summary of Recent PRs

This week, the team merged 2 PRs focused on bug fixes and enhancements to the `pr_summarizer` and datastore functionalities.

---

**Overview:**
- The first PR addressed issues with the `pr_summarizer`, ensuring proper JSON output and adding asyncio support. The second PR introduced support for feather and SQL database storage, along with save and load functionalities.

---

**Significant Changes:**
- **PR Summarizer Fixes:**
  - Fixed JSON output formatting and added asyncio support to enhance performance.
  - Restructured utility functions for better organization.
  
- **Datastore Enhancements:**
  - Introduced a base class for storage handlers, allowing for easier management of different storage types.
  - Added classes for feather and SQL database storage, enabling flexible data handling.

---

**Refactors/Architecture:**
- Renamed `tokenizer.py` to `utils.py` to better reflect its purpose.
- Updated the `.gitignore` file to exclude unnecessary data files.
- Created an `__init__.py` file for the data stores to facilitate package management.

---

**Review Process:**
- For the `pr_summarizer` PR, discussions included the use of the async keyword and the purpose of the `PRSummaryMetadata` class, with clarifications provided by the author, jzxcd.
- In the datastore PR, user chihangwang raised several questions regarding the design choices, such as the factory function and return types. jzxcd engaged in discussions, agreeing to some suggestions while justifying others based on usability.
- Both PRs were merged efficiently, with the first PR taking approximately 4 hours and the second around 40 hours to address feedback and finalize changes.

--- 

This summary aligns with our sprint objectives by enhancing the functionality and performance of our tools, ensuring a more robust and maintainable codebase.