# Try out Embedding
- Faiss
- Chromadb
- Langchain
- https://huggingface.co/BAAI/bge-small-en-v1.5


In [1]:
import pandas as pd
import chromadb

In [3]:
governments = pd.read_json("../government_metadata.json", orient="index")

In [5]:
chroma_client = chromadb.PersistentClient(path="../data/vectorstore.db")

In [6]:
collection = chroma_client.create_collection(name="civ_game")

In [7]:
ids_summaries = (governments.index + "_summary").tolist()
summaries = governments["summary"].tolist()
ids_ft = (governments.index + "_fulltext").tolist()
fulltexts = governments["full_page_content"].tolist()

In [8]:
## Adding data to a collection
collection.add(documents=summaries, ids=ids_summaries)
collection.add(documents=fulltexts, ids=ids_ft)

In [48]:
results = collection.query(query_texts=["What is capitalism?"], n_results=1)

In [51]:
collection.query(query_texts=["Which form of government cares most about resources?"], n_results=2)

{'ids': [['Ergatocracy_fulltext', 'Technocracy_fulltext']],
 'distances': [[1.109222412109375, 1.119309902191162]],
 'metadatas': [[None, None]],
 'embeddings': None,
 'documents': [['This article lists forms of government and political systems, according to a series of different ways of categorizing them. The systems listed are not mutually exclusive, and often have overlapping definitions.According to Yale professor Juan José Linz there are three main types of political systems today: democracies, \ntotalitarian regimes and, sitting between these two, authoritarian regimes with hybrid regimes. Another modern classification system includes monarchies as a standalone entity or as a hybrid system of the main three. Scholars generally refer to a dictatorship as either a form of authoritarianism or  totalitarianism.The ancient Greek philosopher Plato discusses in the Republic five types of regimes: aristocracy, timocracy, oligarchy, democracy, and tyranny. \n The question raised by Plato 

In [None]:
## Updating data from a collection
"""
collection.update(
    ids=["id1", "id2", "id3", ...],
    embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...],
    metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}, ...],
    documents=["doc1", "doc2", "doc3", ...],
)
"""

In [None]:
## Langchain + Chroma


In [29]:
## Sentence Transformers

In [30]:
## Huggingface

## Try loading documents into langchain
Will use openAI embeddings

In [1]:
from langchain.document_loaders import WikipediaLoader
from langchain.document_loaders import JSONLoader

In [12]:
docs = WikipediaLoader(query="Capitalism", load_max_docs=2).load()

In [13]:
docs

[Document(page_content='Capitalism is an economic system based on the private ownership of the means of production and their operation for profit. Central characteristics of capitalism include capital accumulation, competitive markets, price systems, private property, property rights recognition, voluntary exchange, and wage labor. In a market economy, decision-making and investments are determined by owners of wealth, property, or ability to maneuver capital or production ability in capital and financial markets—whereas prices and the distribution of goods and services are mainly determined by competition in goods and services markets.Economists, historians, political economists, and sociologists have adopted different perspectives in their analyses of capitalism and have recognized various forms of it in practice. These include laissez-faire or free-market capitalism, anarcho-capitalism, state capitalism, and welfare capitalism. Different forms of capitalism feature varying degrees o

In [2]:
import json
from pathlib import Path
from typing import Callable, Dict, List, Optional, Union

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


class JSONLoader(BaseLoader):
    """ Custom JSON loader for loading wikipedia data into langchain"""
    def __init__(
        self,
        file_path: Union[str, Path],
        content_key: Optional[str] = None,
        fulltext: bool = False,
        ):
        self.file_path = Path(file_path).resolve()
        self._content_key = content_key
        self.fulltext = fulltext
        
    def load(self) -> List[Document]:
        """Load and return documents from the JSON file."""

        docs=[]
        # Load JSON file
        with open(self.file_path) as file:
            data = json.load(file)

            # Iterate through 'pages'
        for government_name, government_text in data.items():
            base_metadata = {'gov_type': government_name}
            
            summary = government_text['summary']
            summary_metadata = base_metadata.copy()
            summary_metadata['type'] = 'summary'
            docs.append(Document(page_content=summary, metadata=summary_metadata))
            if self.fulltext:
                full_text = government_text['full_page_content']
                full_text_metadata = base_metadata.copy()
                full_text_metadata['type'] = 'fulltext'
                docs.append(Document(page_content=full_text, metadata=full_text_metadata))

        return docs

In [3]:
loader = JSONLoader(
    file_path='../government_metadata.json'
    )
data = loader.load()

In [4]:
data

[Document(page_content="Anarchy is a society without rulers and gods.\nIn practical terms, anarchy can refer to the curtailment or abolition of traditional forms of government and institutions. It can also designate a nation or any inhabited place that has no system of government or central rule. Anarchy is primarily advocated by individual anarchists who propose replacing government with voluntary institutions. These institutions or free associations are generally modeled on nature since they can represent concepts such as community and economic self-reliance, interdependence, or individualism. In simple terms anarchy means 'without rulers' or 'without authority' in which there is no rule by group or tyrant, rather instead by an individual upon themselves or by the people entirely. It is non-coercive. Although anarchy is often negatively used as a synonym of chaos or societal collapse or anomie, this is not the meaning that anarchists attribute to anarchy, a society without hierarchie

In [5]:
# Try vectorstore
from datetime import datetime
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma

In [6]:
# Create persistent client with langchain and openaiembeddings
db = Chroma.from_documents(data, OpenAIEmbeddings(), persist_directory="../data/vectorstore.db", collection_name="civ_game")

## Example load

In [7]:
# Load
db2 = Chroma(persist_directory="../data/vectorstore.db", embedding_function=OpenAIEmbeddings(), collection_name="civ_game")
query =" What is capitalism"
docs = db2.similarity_search(query)
print(docs[0].page_content)

Capitalism is an economic system based on the private ownership of the means of production and their operation for profit. Central characteristics of capitalism include capital accumulation, competitive markets, price systems, private property, property rights recognition, voluntary exchange, and wage labor. In a market economy, decision-making and investments are determined by owners of wealth, property, or ability to maneuver capital or production ability in capital and financial markets—whereas prices and the distribution of goods and services are mainly determined by competition in goods and services markets.Economists, historians, political economists, and sociologists have adopted different perspectives in their analyses of capitalism and have recognized various forms of it in practice. These include laissez-faire or free-market capitalism, anarcho-capitalism, state capitalism, and welfare capitalism. Different forms of capitalism feature varying degrees of free markets, public o