<a href="https://colab.research.google.com/github/louisbrulenaudet/ragoon/blob/main/RAGoon%20%3A%20Improve%20Large%20Language%20Models%20retrieval%20using%20dynamic%20web-search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAGoon : Improve Large Language Models retrieval using dynamic web-search ⚡
[![Python](https://img.shields.io/pypi/pyversions/tensorflow.svg)](https://badge.fury.io/py/tensorflow) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) ![Maintainer](https://img.shields.io/badge/maintainer-@louisbrulenaudet-blue)

![Plot](https://github.com/louisbrulenaudet/ragoon/blob/main/thumbnail.png?raw=true)

RAGoon is a Python library that aims to improve the performance of language models by providing contextually relevant information through retrieval-based querying, web scraping, and data augmentation techniques. It offers an integration of various APIs, enabling users to retrieve information from the web, enrich it with domain-specific knowledge, and feed it to language models for more informed responses.

RAGoon's core functionality revolves around the concept of few-shot learning, where language models are provided with a small set of high-quality examples to enhance their understanding and generate more accurate outputs. By curating and retrieving relevant data from the web, RAGoon equips language models with the necessary context and knowledge to tackle complex queries and generate insightful responses.

## Usage Example
Here's an example of how to use RAGoon:

```python
from groq import Groq
# from openai import OpenAI
from ragoon import RAGoon

# Initialize RAGoon instance
ragoon = RAGoon(
    google_api_key="your_google_api_key",
    google_cx="your_google_cx",
    completion_client=Groq(api_key="your_groq_api_key")
)

# Search and get results
query = "I want to do a left join in python polars"
results = ragoon.search(
    query=query,
    completion_model="Llama3-70b-8192",
    max_tokens=512,
    temperature=1,
)

# Print results
print(results)
```

## Citing this project
If you use this code in your research, please use the following BibTeX entry.

```BibTeX
@misc{louisbrulenaudet2024,
	author = {Louis Brulé Naudet},
	title = {RAGoon : Improve Large Language Models retrieval using dynamic web-search},
	howpublished = {\url{https://github.com/louisbrulenaudet/ragoon}},
	year = {2024}
}
```
## Feedback
If you have any feedback, please reach out at [louisbrulenaudet@icloud.com](mailto:louisbrulenaudet@icloud.com).

# Installation

In [1]:
!pip3 install ragoon groq openai beautifulsoup4 httpx google-api-python-client

Collecting ragoon
  Downloading ragoon-0.0.3-py3-none-any.whl (14 kB)
Collecting groq
  Downloading groq-0.8.0-py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.4/105.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-1.30.3-py3-none-any.whl (320 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting httpx
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

# Configuration

In [2]:
from google.colab import userdata
from groq import Groq
from ragoon import RAGoon

# Usage

In [3]:
# Initialize RAGoon instance
ragoon = RAGoon(
    google_api_key=userdata.get("google_api_key"),
    google_cx=userdata.get("google_cx"),
    completion_client=Groq(api_key=userdata.get("groq_api_key"))
)

# Search and get results
query = "I want to do a left join in python polars"
results = ragoon.search(
    query=query,
    completion_model="Llama3-70b-8192",
    max_tokens=512,
    temperature=1,
)

# Print results
print(results)

An error occurred while scraping https://towardsdatascience.com/understand-polars-lack-of-indexes-526ea75e413: Redirect response '307 Temporary Redirect' for url 'https://towardsdatascience.com/understand-polars-lack-of-indexes-526ea75e413'
Redirect location: 'https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Funderstand-polars-lack-of-indexes-526ea75e413'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/307
An error occurred while scraping https://www.reddit.com/r/Python/comments/ululk1/i_used_a_new_dataframe_library_polars_to_wrangle/: Client error '403 Blocked' for url 'https://www.reddit.com/r/Python/comments/ululk1/i_used_a_new_dataframe_library_polars_to_wrangle/'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403




['Python API reference\nManipulation/selection\npolars.DataF...\npolars.DataFrame.join\n#\nDataFrame.\njoin\n(\nother\n:\nDataFrame\n,\non\n:\nstr\n|\nExpr\n|\nSequence\n[\nstr\n|\nExpr\n]\n|\nNone\n=\nNone\n,\nhow\n:\nJoinStrategy\n=\n\'inner\'\n,\n*\n,\nleft_on\n:\nstr\n|\nExpr\n|\nSequence\n[\nstr\n|\nExpr\n]\n|\nNone\n=\nNone\n,\nright_on\n:\nstr\n|\nExpr\n|\nSequence\n[\nstr\n|\nExpr\n]\n|\nNone\n=\nNone\n,\nsuffix\n:\nstr\n=\n\'_right\'\n,\nvalidate\n:\nJoinValidation\n=\n\'m:m\'\n,\njoin_nulls\n:\nbool\n=\nFalse\n,\ncoalesce\n:\nbool\n|\nNone\n=\nNone\n,\n)\n→\nDataFrame\n[source]\n#\nJoin in SQL-like fashion.\nParameters\n:\nother\nDataFrame to join with.\non\nName(s) of the join columns in both DataFrames.\nhow\n{‘inner’, ‘left’, ‘full’, ‘semi’, ‘anti’, ‘cross’}\nJoin strategy.\ninner\nReturns rows that have matching values in both tables\nleft\nReturns all rows from the left table, and the matched rows from the\nright table\nfull\nReturns all rows when there is a match in eit