Part 3.5 of the LLM per game TLDR generation project

We collect the critic reviews (n=5), then get the keywords/sentence of the aspects of the game.

In [2]:
from langchain_community.vectorstores import Chroma     # Chroma interface for langchain
# save the collection to persistent storage (in the docker container)
import chromadb

chromadb_client = chromadb.HttpClient(host='localhost', port='8000')

In [24]:
chromadb_client.list_collections()

[Collection(name=monster_hunter_world_iceborne),
 Collection(name=dota2),
 Collection(name=monster_hunter_world),
 Collection(name=starfield),
 Collection(name=cyberpunk_2077_phantom_liberty),
 Collection(name=cyberpunk_2077),
 Collection(name=counter-strike_2)]

In [4]:
import os, sys
from collections import deque
from datetime import datetime
MISTRAL_API_KEY = None          # add ur Mistral API key here
print(MISTRAL_API_KEY)

sXsWR7dwcK0hyEdacg3ySISJaXZPUWvt


In [5]:
GAME_ASPECTS = ['Gameplay', 'Narrative', 'Accessibility', 'Sound', 'Graphics & Art Design', 'Performance', 'Bug', 'Suggestion', 'Price', 'Overall']

QUESTION_TEMPLATE_01 = \
'''Extract the the following aspect of the game from the reviews. The aspect '''

In [14]:
# game_steamid = 1716740              # starfield
# game_name = 'starfield'             # also the folder name where the reviews are stored

# game_steamid = 1118010
# game_name = 'monster_hunter_world_iceborne'

# game_steamid = 582010
# game_name = 'monster_hunter_world'

# game_steamid = 2138330          # cyberpunk2077 phantom liberty
# game_name = 'cyberpunk_2077_phantom_liberty'

# game_steamid = 1091500          # cyberpunk2077
# game_name = 'cyberpunk_2077'

game_steamid = 730
game_name = 'counter-strike_2'

# game_steamid = 570
# game_name = "dota2"

---

Create collection (can be skipped if created)

In [15]:
try:
    db = Chroma.get(game_name)
except:
    print("No database found for game: ", game_name)

No database found for game:  counter-strike_2


In [16]:
# create collection by reading critic reviews
from pathlib import Path
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader, DirectoryLoader       
from transformers import AutoTokenizer

file_dir_path = Path(f"../{game_name}/")
if not file_dir_path.exists():
    print("No reviews found for game: ", game_name)
    exit()

loader = DirectoryLoader(str(file_dir_path), glob="./*.txt", loader_cls=TextLoader)
docs = loader.load()

# split it into chunks
# text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(tokenizer, chunk_size=360, chunk_overlap=60, separators=[' ', '', '\n'])
docs = text_splitter.split_documents(docs)

# create documents list and metadata list for Chroma
documents = []
metadata = []
for doc in docs:
    documents.append(doc.page_content)
    metadata.append(doc.metadata)

In [17]:
docs

[Document(page_content="What I think of Counter-Strike 2 on day 1\nIt's CS:GO Jim, but not as we know it.\nBy Rich Stanton published 28 September 2023\n\nValve announced Counter-Strike 2 at the start of this year and has been running it in a limited beta ever since,  swapping 1-2 maps at a time, adding isolated features, and generally withholding the full package that it unveiled yesterday. That limited rollout made it hard to get a handle on just how different it's going to", metadata={'source': '../counter-strike_2/counter-strike_2_03.txt'}),
 Document(page_content="made it hard to get a handle on just how different it's going to feel from CS:GO, but now we have no option. When CS2 went live, CS:GO became a beta branch on Steam (despite some mixed messages from Valve, CS:GO has not been 'disappeared' Stalin-style and remains accessible albeit with community servers).\n\n\n\nEverything has changed: nothing has changed\n\nPerhaps Counter-Strike 2's biggest problem is that number. In a 

In [18]:
len(documents), len(metadata)

(99, 99)

Save the newly created collection to docker persistent storage

In [19]:
# temporary embedding func (using mixtral8x7B)
sys.path.append("../")
from mistralai_embeddings import CustomMistralAIEmbeddings

embedding_stats_deque = deque()

mistralai_embedding = CustomMistralAIEmbeddings(
    model="mistral-embed",
    mistral_api_key=MISTRAL_API_KEY,
    timeout=300,
    embedding_stats_deque=embedding_stats_deque
)
embedding_func = mistralai_embedding        # use mistralAI for embeddings

# a temporary db
db = Chroma.from_documents(
    docs, embedding=embedding_func, collection_name=game_name, client=chromadb_client, persist_directory='../chromadb_storage'
)


In [20]:
len(db.get()['ids'])

99

In [21]:
retriever = db.as_retriever(
    search_type="mmr",
    search_kwargs={'k':5, 'lambda_mult':0.25}
)

In [23]:
embedding_stats = embedding_stats_deque.popleft()
embedding_stats

IndexError: pop from an empty deque

---

Prompting (reviewing the quality of the extracted documents)

In [15]:
def _process_embedding_stats(embedding_stats:list):
    usage_info_sum = {
        'prompt_tokens': 0, 'total_tokens': 0, 'completion_tokens': 0
    }

    for usage_info in embedding_stats:
        usage_info_sum['prompt_tokens'] += usage_info.prompt_tokens
        usage_info_sum['total_tokens'] += usage_info.total_tokens
        usage_info_sum['completion_tokens'] += usage_info.completion_tokens

    return usage_info_sum

In [15]:
aspects_response = {k: '' for k in GAME_ASPECTS}
chain_llm_output_json_list = []
embedding_usage_info_list = []

for aspect in GAME_ASPECTS:
    my_question = QUESTION_TEMPLATE_01 + f"is {aspect}"

    relevant_docs = retriever.get_relevant_documents(query=my_question, k=5, lambda_mult=0.25)
    embedding_stats = embedding_func.embedding_stats_deque.popleft()           # get embedding token usage
    print(f'Embedding stats: {embedding_stats}')
    embedding_usage_info_01 = _process_embedding_stats(embedding_stats)
    embedding_usage_info_list.append(embedding_usage_info_01)

    print(f"Aspect: {aspect}")
    print(f"Question: {my_question}")
    print(f"Response: {relevant_docs}")

    aspects_response[aspect] = relevant_docs
    print('\n\n')

Embedding stats: [UsageInfo(prompt_tokens=20, total_tokens=20, completion_tokens=0)]
Aspect: Gameplay
Question: Extract the the following aspect of the game from the reviews. The aspect is Gameplay
Response: [Document(page_content="explore the best the core RPG mechanics have to offer. These are what carried me through an otherwise disappointing experience.\n\n\n\nThe Good\nThere's plenty of flexibility in the RPG mechanics to build a character to suit your desired playstyle\nQuickhacking combat is satisfying to execute and turns combat scenarios into elaborate puzzles\nSide quests and characters provide the most interesting and human moments in the game\nKeanu Reeves' Johnny Silverhand adds dimension to nearly every quest, forcing you to rethink your decision-making instincts\n\nThe Bad\nThere's", metadata={'source': '../cyberpunk_2077/cyberpunk_2077_04.txt'}), Document(page_content="main story doesn't cohere with the rest of the game, with an urgency that's at odds with everything el

In [16]:
for aspect, response in aspects_response.items():
    print(f"Aspect: {aspect}")
    for doc in response:
        print(f'{doc.page_content}; source: {doc.metadata["source"]}')
        print('----------')
    print('\n\n')

Aspect: Gameplay
explore the best the core RPG mechanics have to offer. These are what carried me through an otherwise disappointing experience.



The Good
There's plenty of flexibility in the RPG mechanics to build a character to suit your desired playstyle
Quickhacking combat is satisfying to execute and turns combat scenarios into elaborate puzzles
Side quests and characters provide the most interesting and human moments in the game
Keanu Reeves' Johnny Silverhand adds dimension to nearly every quest, forcing you to rethink your decision-making instincts

The Bad
There's; source: ../cyberpunk_2077/cyberpunk_2077_04.txt
----------
main story doesn't cohere with the rest of the game, with an urgency that's at odds with everything else
Technical issues, from visual bugs to full-on crashes, are so pervasive that it's impossible to ignore; source: ../cyberpunk_2077/cyberpunk_2077_04.txt
----------
return the game. Here at RPGFan, one reviewer was unable to finish even the game’s prologu