# Preprocessing Notebook

- load the data and format it into prompts and metadata 
- metadata: 
    - id: id
    - name: name
    - keywords: genres.name, themes.name, keywords.name,
    - image_urls: cover.url 
    - ratings: total_rating
    - download_urls:  websites.url
- prompts: name, summary, storyline all concatenated

In [1]:
import pandas as pd

In [7]:
df = pd.read_csv('response.csv')
df.columns = ['id', 'name', 'summary', 'storyline', 'themes', 'keywords', 'genres', 'cover', 'total_rating', 'websites']
df.sample(5)

Unnamed: 0,id,name,summary,storyline,themes,keywords,genres,cover,total_rating,websites
308,9190,Cyber Troopers Virtual-On Oratorio Tangram,Cyber Troopers Virtual-On: Oratorio Tangram is...,,"Action, Science fiction","vehicular combat, mech, robots, online, vehicl...",Fighting,//images.igdb.com/igdb/image/upload/t_thumb/co...,,https://en.wikipedia.org/wiki/Cyber_Troopers_V...
282,141408,Zaos,International Free to Play Fantasy MMORPG game...,In a war-torn land spanning multiple continent...,Fantasy,,"Role-playing (RPG), Adventure",//images.igdb.com/igdb/image/upload/t_thumb/co...,92.087912,"https://zaos.global, https://facebook.com/g/fa..."
180,21717,LittleBigPlanet 2: Special Edition,"Exclusive for North America, a Special Edition...",Players continue Sackboy's journey after the e...,"Comedy, Sandbox, Kids","3d platformer, user generated content","Platform, Puzzle, Adventure",//images.igdb.com/igdb/image/upload/t_thumb/co...,,https://littlebigplanet.playstation.com/little...
324,254339,Super Mario Bros. Wonder,The next evolution of 2D side-scrolling Super ...,,"Action, Fantasy","casual, collecting, side-scrolling, cute, funn...",Platform,//images.igdb.com/igdb/image/upload/t_thumb/co...,88.425574,"https://supermariobroswonder.nintendo.com/, ht..."
120,178264,Bad End Theater,Welcome to Bad End Theater! Select your protag...,,"Fantasy, Romance","anime, 2d, choose your own adventure, demons, ...","Simulator, Adventure, Indie, Visual Novel",//images.igdb.com/igdb/image/upload/t_thumb/co...,95.009059,https://store.steampowered.com/app/1764390/BAD...


In [8]:
def create_document_column(df):
    for i, row in df.iterrows():
        df.at[i, 'document'] = f"{row['name']}:\n\n{row['summary'] if pd.notna(row['summary']) else ''}\n\n{row['storyline'] if pd.notna(row['storyline']) else ''}" 
    df.drop(columns=['summary', 'storyline'], inplace=True)
    return df

df = create_document_column(df)
print(df.sample(1)['document'].values[0])

Final Fantasy X-2 HD Remaster:

Final Fantasy X-2 HD Remaster was released individually in the Japan and Asia regions. Outside of these regions, only Final Fantasy X has a physical release, but it includes a download code for Final Fantasy X-2. Those who pick up both Final Fantasy X and Final Fantasy X-2 on PlayStation Vita can swap saves between systems to transfer data between the standalone Vita version and the PlayStation 3 counterpart.




In [9]:
def create_keyword_column(df):
    for i, row in df.iterrows():
        df.at[i, 'keywords'] = ', '.join(filter(pd.notna, [row['themes'], row['keywords'], row['genres']]))
    
    df.drop(columns=['themes', 'genres'], inplace=True)
    return df

df = create_keyword_column(df)
df.sample(1)

Unnamed: 0,id,name,keywords,cover,total_rating,websites,document
269,272,Unreal Tournament,"Action, arena, robots, reptilian humanoid, Sho...",//images.igdb.com/igdb/image/upload/t_thumb/co...,84.631503,https://en.wikipedia.org/wiki/Unreal_Tournamen...,Unreal Tournament:\n\nUnreal Tournament is a f...


In [10]:
df.to_csv('game_data.csv', index=False)