[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/generative-qa/openai/gen-qa-openai/gen-qa-openai.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/generation/generative-qa/openai/gen-qa-openai/gen-qa-openai.ipynb)

# Retrieval Enhanced Generative Question Answering with OpenAI

#### Fixing LLMs that Hallucinate

In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a generative OpenAI model to generate an answer backed by real data sources. Required installs for this notebook are:

In [9]:
!pip install -qU openai pinecone-client datasets tqdm

In [36]:
import os
import openai

# get API key from top-right dropdown on OpenAI website
openai.api_key = "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"
openai.Engine.list()  # check we have authenticated

<OpenAIObject list at 0x7f18cdeabf10> JSON: {
  "data": [
    {
      "created": null,
      "id": "whisper-1",
      "object": "engine",
      "owner": "openai-internal",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "davinci",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-edit-001",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage-code-search-code",
      "object": "engine",
      "owner": "openai-dev",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-similarity-babbage-001",
      "object": "engine",
      "owner"

For many questions *state-of-the-art (SOTA)* LLMs are more than capable of answering correctly.

In [16]:
query = "IPL 2020 winner?"

# now query text-davinci-003 WITHOUT context
res = openai.Completion.create(
    engine='text-davinci-003',
    prompt=query,
    temperature=0,
    max_tokens=400,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None
)

res['choices'][0]['text'].strip()

'The Mumbai Indians won the 2020 Indian Premier League (IPL) title, defeating the Delhi Capitals by five wickets in the final.'

In [8]:
import pandas as pd
from sqlalchemy import create_engine, text

In [6]:
db=create_engine('sqlite:///:memory')

In [11]:
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# using the attribute information as the column names
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris =  pd.read_csv(csv_url, names = col_names)

However, that isn't always the case. Let's first rewrite the above into a simple function so we're not rewriting this every time.

In [12]:
iris.to_sql(name='iris1',con=db)

150

In [13]:
def query_db(sql_statement):
  with db.connect() as conn:
    response=conn.execute(text(sql_statement))
    return response.all()

In [16]:
query_db('SELECT * FROM iris1 limit 5')

[(0, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
 (1, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),
 (2, 4.7, 3.2, 1.3, 0.2, 'Iris-setosa'),
 (3, 4.6, 3.1, 1.5, 0.2, 'Iris-setosa'),
 (4, 5.0, 3.6, 1.4, 0.2, 'Iris-setosa')]

In [39]:
','.join(iris.columns)

'Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class'

In [20]:
def sql_translate(sql_query_text):
  response=openai.Completion.create(model="text-davinci-003",
        prompt="### sqlite sql table, with their properties:\n#\n# iris1(Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class)\n#\n### A query to answer : {sql_query_text}\nSELECT",
        temperature=0,
        max_tokens=250,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        stop=["#",";"]
  )
  return 'SELECT' + response['choices'][0]['text']

In [22]:
sql_translate('count for Iris setosa class')

"SELECT * FROM iris1 WHERE Class = 'Iris-setosa'"

In [59]:
messages = [{"role": "system",
                    "content": "You are a strict assistant, transalating natural language to sql queries."},
            {"role": "user",
                    "content": "### sqlite sql table, with their properties:\n#\n# iris1(Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class)\n#\n### A query to answer : {sql_query_text}\nSELECT"},
            {"role": "user",
                    "content": "all distinct class "}]
response=openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                        messages=messages,
        #prompt="### sqlite sql table, with their properties:\n#\n# iris1(Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class)\n#\n### A query to answer : {sql_query_text}\nSELECT",
        temperature=0.1,
        max_tokens=250,
        
  )
print(response)
print(response['choices'][0]['message']['content'])
 

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "To select all distinct classes from the `iris1` table, the SQL query would be:\n\n```\nSELECT DISTINCT Class FROM iris1;\n``` \n\nThis will return a list of all unique values in the `Class` column of the `iris1` table.",
        "role": "assistant"
      }
    }
  ],
  "created": 1685612784,
  "id": "chatcmpl-7MZAuMhhixXPLSbXTJHhJWe2ybVHD",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 53,
    "prompt_tokens": 80,
    "total_tokens": 133
  }
}
To select all distinct classes from the `iris1` table, the SQL query would be:

```
SELECT DISTINCT Class FROM iris1;
``` 

This will return a list of all unique values in the `Class` column of the `iris1` table.


In [60]:
query_db("SELECT DISTINCT Class FROM iris1")

[('Iris-setosa',), ('Iris-versicolor',), ('Iris-virginica',)]

In [33]:
import os
import openai
openai.api_key = "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"

In [35]:
','.join(iris.columns)

'Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class'

In [117]:
messages = [{"role": "system",
                    "content": "You are a strict assistant, transalating natural language to sql queries."}]
response=openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                        messages=messages,
        #prompt="### sqlite sql table, with their properties:\n#\n# iris1(Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class)\n#\n### A query to answer : {sql_query_text}\nSELECT",
        temperature=0.1,
        max_tokens=250,
        
  )
print(response)
print(response['choices'][0]['message']['content'])
 

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Sure, I can help you with that. What do you need translated?",
        "role": "assistant"
      }
    }
  ],
  "created": 1685562109,
  "id": "chatcmpl-7MLzZ8JLL3nEHsIKApjkhrEPPw74Y",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 23,
    "total_tokens": 38
  }
}
Sure, I can help you with that. What do you need translated?


In [1]:

def sql_translate(sql_query_text):
  response=openai.Completion.create(model="code-davinci-002",
        prompt="### sqlite sql table, with their properties:\n#\n# iris1(Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class)\n#\n### A query to answer : {sql_query_text}\nSELECT",
        temperature=0,
        max_tokens=250,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        stop=["#",";"]
  )
  return 'SELECT' + response['choices'][0]['text']

In [None]:

def sql_translate(sql_query_text):
  response=openai.Completion.create(engine="davinci",
        prompt="### sqlite sql table, with their properties:\n#\n# iris1(Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class)\n#\n### A query to answer : {sql_query_text}\nSELECT",
        temperature=0.2,
        max_tokens=250,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        stop=["#",";"]
  )
  return 'SELECT' + response['choices'][0]['text']

In [2]:
sql_translate('average for each Class')

NameError: ignored

In [74]:
query_db('SELECT * FROM iris1 WHERE Sepal_length=6.3')

[(56, 6.3, 3.3, 4.7, 1.6, 'Iris-versicolor'),
 (72, 6.3, 2.5, 4.9, 1.5, 'Iris-versicolor'),
 (87, 6.3, 2.3, 4.4, 1.3, 'Iris-versicolor'),
 (100, 6.3, 3.3, 6.0, 2.5, 'Iris-virginica'),
 (103, 6.3, 2.9, 5.6, 1.8, 'Iris-virginica'),
 (123, 6.3, 2.7, 4.9, 1.8, 'Iris-virginica'),
 (133, 6.3, 2.8, 5.1, 1.5, 'Iris-virginica'),
 (136, 6.3, 3.4, 5.6, 2.4, 'Iris-virginica'),
 (146, 6.3, 2.5, 5.0, 1.9, 'Iris-virginica')]

In [9]:
def complete(prompt):
    # query text-davinci-003
    res = openai.Completion.create(
        engine='text-davinci-003',
        prompt=prompt,
        temperature=0,
        max_tokens=400,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )
    return res['choices'][0]['text'].strip()

In [12]:
iris = 'https://gist.github.com/curran/a08a1080b88344b0c8a7#file-iris-csv'


Now let's ask a more specific question about training a specific type of transformer model called a *sentence-transformer*. The ideal answer we'd be looking for is _"Multiple Negatives Ranking (MNR) loss"_.

Don't worry if this is a new term to you, it isn't required to understand what we're doing or demoing here.

In [15]:
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# using the attribute information as the column names
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris =  pd.read_csv(csv_url, names = col_names)

In [16]:
iris.head()

Unnamed: 0,Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [10]:
query = (
    "Which training method should I use for sentence transformers when " +
    "I only have pairs of related sentences?"
)

complete(query)

'If you only have pairs of related sentences, then the best training method to use for sentence transformers is the supervised learning approach. This approach involves providing the model with labeled data, such as pairs of related sentences, and then training the model to learn the relationships between the sentences. This approach is often used for tasks such as natural language inference, semantic similarity, and paraphrase identification.'

One of the common answers I get to this is:

```
The best training method to use for fine-tuning a pre-trained model with sentence transformers is the Masked Language Model (MLM) training. MLM training involves randomly masking some of the words in a sentence and then training the model to predict the masked words. This helps the model to learn the context of the sentence and better understand the relationships between words.
```

This answer seems pretty convincing right? Yet, it's wrong. MLM is typically used in the pretraining step of a transformer model but *cannot* be used to fine-tune a sentence-transformer, and has nothing to do with having _"pairs of related sentences"_.

An alternative answer I recieve is about `supervised learning approach` being the most suitable. This is completely true, but it's not specific and doesn't answer the question.

We have two options for enabling our LLM in understanding and correctly answering this question:

1. We fine-tune the LLM on text data covering the topic mentioned, likely on articles and papers talking about sentence transformers, semantic search training methods, etc.

2. We use **R**etrieval **A**ugmented **G**eneration (RAG), a technique that implements an information retrieval component to the generation process. Allowing us to retrieve relevant information and feed this information into the generation model as a *secondary* source of information.

We will demonstrate option **2**.

---

## Building a Knowledge Base

With open **2** the retrieval of relevant information requires an external _"Knowledge Base"_, a place where we can store and use to efficiently retrieve information. We can think of this as the external _long-term memory_ of our LLM.

We will need to retrieve information that is semantically related to our queries, to do this we need to use _"dense vector embeddings"_. These can be thought of as numerical representations of the *meaning* behind our sentences.

There are many options for creating these dense vectors, like open source [sentence transformers](https://pinecone.io/learn/nlp/) or OpenAI's [ada-002 model](https://youtu.be/ocxq84ocYi0). We will use OpenAI's offering in this example.

We have already authenticated our OpenAI connection, to create an embedding we just do:

In [12]:
embed_model = "text-embedding-ada-002"

res = openai.Embedding.create(
    input=[
        "Sample document text goes here",
        "there will be several phrases in each batch"
    ], engine=embed_model
)

In the response `res` we will find a JSON-like object containing our new embeddings within the `'data'` field.

In [13]:
res.keys()

dict_keys(['object', 'data', 'model', 'usage'])

Inside `'data'` we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains `1536` dimensions (the output dimensionality of the `text-embedding-ada-002` model.

In [14]:
len(res['data'])

2

In [15]:
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])

(1536, 1536)

We will apply this same embedding logic to a dataset containing information relevant to our query (and many other queries on the topics of ML and AI).

### Data Preparation

The dataset we will be using is the `jamescalam/youtube-transcriptions` from Hugging Face _Datasets_. It contains transcribed audio from several ML and tech YouTube channels. We download it with:

In [16]:
from datasets import load_dataset

data = load_dataset('jamescalam/youtube-transcriptions', split='train')
data

Downloading readme:   0%|          | 0.00/2.13k [00:00<?, ?B/s]

Downloading and preparing dataset json/jamescalam--youtube-transcriptions to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--youtube-transcriptions-08d889f6a5386b9b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/79.8M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--youtube-transcriptions-08d889f6a5386b9b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.


Dataset({
    features: ['title', 'published', 'url', 'video_id', 'channel_id', 'id', 'text', 'start', 'end'],
    num_rows: 208619
})

In [17]:
data[0]

{'title': 'Training and Testing an Italian BERT - Transformers From Scratch #4',
 'published': '2021-07-06 13:00:03 UTC',
 'url': 'https://youtu.be/35Pdoyi6ZoQ',
 'video_id': '35Pdoyi6ZoQ',
 'channel_id': 'UCv83tO5cePwHMt1952IVVHw',
 'id': '35Pdoyi6ZoQ-t0.0',
 'text': 'Hi, welcome to the video.',
 'start': 0.0,
 'end': 9.36}

The dataset contains many small snippets of text data. We will need to merge many snippets from each video to create more substantial chunks of text that contain more information.

In [18]:
from tqdm.auto import tqdm

new_data = []

window = 20  # number of sentences to combine
stride = 4  # number of sentences to 'stride' over, used to create overlap

for i in tqdm(range(0, len(data), stride)):
    i_end = min(len(data)-1, i+window)
    if data[i]['title'] != data[i_end]['title']:
        # in this case we skip this entry as we have start/end of two videos
        continue
    text = ' '.join(data[i:i_end]['text'])
    # create the new merged dataset
    new_data.append({
        'start': data[i]['start'],
        'end': data[i_end]['end'],
        'title': data[i]['title'],
        'text': text,
        'id': data[i]['id'],
        'url': data[i]['url'],
        'published': data[i]['published'],
        'channel_id': data[i]['channel_id']
    })

  0%|          | 0/52155 [00:00<?, ?it/s]

In [28]:
new_data[-3]

{'start': 3742.44,
 'end': 3819.72,
 'title': 'GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton&#39;s Paper Explained)',
 'text': "It has a bunch of parts that are maybe not super friendly to hardware at the time like this iterative procedure. But honestly, it is not much more than a neural network. Sorry, a recurrent neural network with very complicated recurrence functions. The video extension might be a bit tricky. And, but the rest and the regularization might be a bit tricky, the exact objective. So the denoising auto encoder objective isn't super detailed in the paper, he simply says, reconstruct a corrupted version of the input. How exactly the input happens, maybe there's a CNN, maybe the CNN feeds information into actually multiple layers. None of that is exactly specified. So there's lots to figure out. I do think the ideas are very cool. And I love idea papers. And therefore I recommend that if you're interested more, give this thing a read, gi

Now we need a place to store these embeddings and enable a efficient _vector search_ through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io) and enter it below where we will initialize our connection to Pinecone and create a new index.

In [None]:
import pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or "PINECONE_API_KEY"
# find your environment next to the api key in pinecone console
env = os.getenv("PINECONE_ENVIRONMENT") or "PINECONE_ENVIRONMENT"

pinecone.init(api_key=api_key, enviroment=env)
pinecone.whoami()

In [None]:
index_name = 'openai-youtube-transcriptions'

In [None]:
# check if index already exists (it shouldn't if this is first time)
if index_name not in pinecone.list_indexes():
    # if does not exist, create index
    pinecone.create_index(
        index_name,
        dimension=len(res['data'][0]['embedding']),
        metric='cosine',
        metadata_config={'indexed': ['channel_id', 'published']}
    )
# connect to index
index = pinecone.Index(index_name)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:

In [None]:
from tqdm.auto import tqdm
from time import sleep

batch_size = 100  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(new_data), batch_size)):
    # find end of batch
    i_end = min(len(new_data), i+batch_size)
    meta_batch = new_data[i:i_end]
    # get ids
    ids_batch = [x['id'] for x in meta_batch]
    # get texts to encode
    texts = [x['text'] for x in meta_batch]
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(input=texts, engine=embed_model)
    except:
        done = False
        while not done:
            sleep(5)
            try:
                res = openai.Embedding.create(input=texts, engine=embed_model)
                done = True
            except:
                pass
    embeds = [record['embedding'] for record in res['data']]
    # cleanup metadata
    meta_batch = [{
        'start': x['start'],
        'end': x['end'],
        'title': x['title'],
        'text': x['text'],
        'url': x['url'],
        'published': x['published'],
        'channel_id': x['channel_id']
    } for x in meta_batch]
    to_upsert = list(zip(ids_batch, embeds, meta_batch))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

  0%|          | 0/487 [00:00<?, ?it/s]

Now we search, for this we need to create a _query vector_ `xq`:

In [None]:
res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

# retrieve from Pinecone
xq = res['data'][0]['embedding']

# get relevant contexts (including the questions)
res = index.query(xq, top_k=2, include_metadata=True)

In [None]:
res

{'matches': [{'id': 'pNvujJ1XyeQ-t418.88',
              'metadata': {'channel_id': 'UCv83tO5cePwHMt1952IVVHw',
                           'end': 568.4,
                           'published': datetime.date(2021, 11, 24),
                           'start': 418.88,
                           'text': 'pairs of related sentences you can go '
                                   'ahead and actually try training or '
                                   'fine-tuning using NLI with multiple '
                                   "negative ranking loss. If you don't have "
                                   'that fine. Another option is that you have '
                                   'a semantic textual similarity data set or '
                                   'STS and what this is is you have so you '
                                   'have sentence A here, sentence B here and '
                                   'then you have a score from from 0 to 1 '
                                   '

In [None]:
limit = 3750

def retrieve(query):
    res = openai.Embedding.create(
        input=[query],
        engine=embed_model
    )

    # retrieve from Pinecone
    xq = res['data'][0]['embedding']

    # get relevant contexts
    res = index.query(xq, top_k=3, include_metadata=True)
    contexts = [
        x['metadata']['text'] for x in res['matches']
    ]

    # build our prompt with the retrieved contexts included
    prompt_start = (
        "Answer the question based on the context below.\n\n"+
        "Context:\n"
    )
    prompt_end = (
        f"\n\nQuestion: {query}\nAnswer:"
    )
    # append contexts until hitting limit
    for i in range(1, len(contexts)):
        if len("\n\n---\n\n".join(contexts[:i])) >= limit:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts[:i-1]) +
                prompt_end
            )
            break
        elif i == len(contexts)-1:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts) +
                prompt_end
            )
    return prompt

In [None]:
# first we retrieve relevant items from Pinecone
query_with_contexts = retrieve(query)
query_with_contexts

"Answer the question based on the context below.\n\nContext:\npairs of related sentences you can go ahead and actually try training or fine-tuning using NLI with multiple negative ranking loss. If you don't have that fine. Another option is that you have a semantic textual similarity data set or STS and what this is is you have so you have sentence A here, sentence B here and then you have a score from from 0 to 1 that tells you the similarity between those two scores and you would train this using something like cosine similarity loss. Now if that's not an option and your focus or use case is on building a sentence transformer for another language where there is no current sentence transformer you can use multilingual parallel data. So what I mean by that is so parallel data just means translation pairs so if you have for example a English sentence and then you have another language here so it can it can be anything I'm just going to put XX and that XX is your target language you can 

In [None]:
# then we complete the context-infused query
complete(query_with_contexts)

'You should use Natural Language Inference (NLI) with multiple negative ranking loss.'

And we get a pretty great answer straight away, specifying to use _multiple-rankings loss_ (also called _multiple negatives ranking loss_).

Once we're done with the index we delete it to save resources:

In [None]:
pinecone.delete_index(index_name)

In [13]:
import os
from pprint import pprint

import bitdotio
from dotenv import load_dotenv

#from pg_text_query import get_db_schema, get_default_prompt, generate_query

# Initialize OPENAI_API_KEY and BITIO_KEY
load_dotenv()


DB_NAME = "bitdotio/palmerpenguins"

b = bitdotio.bitdotio(os.getenv("BITIO_KEY"))

# Extract a structured db schema from Postgres
with b.pooled_cursor(DB_NAME) as cur:
    db_schema = get_db_schema(cur, DB_NAME)
pprint(db_schema)

TypeError: ignored

In [9]:
!pip install pg_text_query

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement pg_text_query (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for pg_text_query[0m[31m
[0m

In [40]:
pip install bitdotio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bitdotio
  Downloading bitdotio-2.2.1-py3-none-any.whl (14 kB)
Collecting requests>=2.28.1 (from bitdotio)
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.6/62.6 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: requests, bitdotio
  Attempting uninstall: requests
    Found existing installation: requests 2.27.1
    Uninstalling requests-2.27.1:
      Successfully uninstalled requests-2.27.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.31.0 which is incompatible.[0m[31m
[0mSuccessfully installed bitdotio-2.2.1 requests-2.31.0


---