# S2QA with Ollama - Llama 2

## Setup
First, follow the [readme](https://github.com/jmorganca/ollama#building) to set up and run a local Ollama instance.

This demo is running -
```bash
./ollama run llama2:13b-chat
```

When the Ollama app is running on your local machine:
- All of your local models are automatically served on localhost:11434
- Select your model when setting llm = Ollama(..., model="<model family>:<version>")
- If you set llm = Ollama(..., model="<model family") without a version it will simply look for latest

In [1]:
import sys
sys.path.append('/Users/shaurya/projects/S2QA/llama_index/')

In [4]:
from llama_index.llms.ollama import Ollama
from llama_index.query_engine import CitationQueryEngine
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.response.notebook_utils import display_response
from llama_hub.semanticscholar.base import SemanticScholarReader


In [5]:
llm = Ollama(model="llama2")

In [8]:
s2reader = SemanticScholarReader()

# narrow down the search space
query_space = "biases in large language models"

# increase limit to get more documents
documents = s2reader.load_data(query=query_space, limit=10)

service_context = ServiceContext.from_defaults(
    llm=llm
)

In [9]:
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

# query the index
query_string = "explain all the biases in large language models in a markdown table"
# query the citation query engine
response = query_engine.query(query_string)
display_response(response, show_source=True, source_length=100, show_source_metadata=True)


**`Final Response:`** Certainly! Here's a markdown table summarizing the different types of biases in large language models based on the provided sources:

| Bias Type | Description | Sources |
| --- | --- | --- |
| Data selection bias | Bias caused by the choice of texts that make up a training corpus. | Source 1 |
| Social bias | Biases evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. | Source 1, Source 2 |
| Gender bias | Bias towards certain genders. | Source 2 |
| Unintended consequences of biased model outputs | Ethical concerns arising from the unintended consequences of biased model outputs. | Source 3 |
| Mitigating biases | Approaches to identify, quantify, and mitigate biases in language models. | Source 3 |

Here's a brief explanation for each bias type:

* Data selection bias: This occurs when the training data used to train large language models is not representative of the population or domain the model is intended to serve. For example, if a language model is trained on a dataset that contains mostly male names, it may have difficulty generating gender-neutral text.
* Social bias: Large language models can learn biases from the data they are trained on, which can result in generated text that reflects these biases. For instance, a language model trained on a dataset that contains predominantly male authors may generate more masculine than feminine words.
* Gender bias: As mentioned in Source 2, recent studies have shown that language models can learn to be biased towards certain genders. This can result in generated text that reinforces gender stereotypes or discrimination.
* Unintended consequences of biased model outputs: Biases in large language models can lead to unintended consequences, such as perpetuating harmful stereotypes or contributing to social inequalities. For example, a language model trained on a dataset that contains offensive language may generate offensive text even if it is not intended to do so.
* Mitigating biases: There are several approaches to identify, quantify, and mitigate biases in large language models, such as using diverse and representative training data, adjusting model parameters, or incorporating human feedback. Source 3 provides a comprehensive overview of these approaches.

I hope this helps! Let me know if you have any further questions.

---

**`Source Node 1/3`**

**Node ID:** c6bcdf65-c293-450c-91cb-ab352a3fa773<br>**Similarity:** 0.8526282063593319<br>**Text:** Source 1:
Biases in Large Language Models: Origins, Inventory, and Discussion In this article, we...<br>**Metadata:** {'title': 'Biases in Large Language Models: Origins, Inventory, and Discussion', 'venue': 'ACM Journal of Data and Information Quality', 'year': 2023, 'paperId': '6d0656d9bb60a2bea50c4b894fbcc5d1e32134e7', 'citationCount': 4, 'openAccessPdf': 'https://dl.acm.org/doi/pdf/10.1145/3597307', 'authors': ['Roberto Navigli', 'Simone Conia', 'Björn Ross'], 'externalIds': {'DBLP': 'journals/jdiq/NavigliCR23', 'DOI': '10.1145/3597307', 'CorpusId': 258688053}}<br>

---

**`Source Node 2/3`**

**Node ID:** 3ce61a64-9270-4b45-bf40-90e83ac729cb<br>**Similarity:** 0.8406245995128746<br>**Text:** Source 2:
Disclosing the Biases in Large Language Models via Reward Based Interrogation The succe...<br>**Metadata:** {'title': 'Disclosing the Biases in Large Language Models via Reward Based Interrogation', 'venue': '', 'year': 2022, 'paperId': '8be25af7560907713ad9c787b664224edfd35505', 'citationCount': 0, 'openAccessPdf': None, 'authors': ['Ezgi Korkmaz'], 'externalIds': {'CorpusId': 253764622}}<br>

---

**`Source Node 3/3`**

**Node ID:** 10b98fce-ed29-49df-b3b1-485dd1d84972<br>**Similarity:** 0.8333128382605138<br>**Text:** Source 3:
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models As the ...<br>**Metadata:** {'title': 'Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models', 'venue': 'arXiv.org', 'year': 2023, 'paperId': '16d83e930a4dab2d49f5d276838ddce79df3f787', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Emilio Ferrara'], 'externalIds': {'DBLP': 'journals/corr/abs-2304-03738', 'ArXiv': '2304.03738', 'DOI': '10.48550/arXiv.2304.03738', 'CorpusId': 258041203}}<br>