**Testing API Endpoint Functions with Scaled-Down Datasets**
- In this notebook, we will walk through the process of testing the functions for the API endpoints. Due to limited memory capacity on the render environment, the datasets used in these tests needed to be scaled down. For the most part, we reduced them to 5000 random rows. 
- It is important to note that when tested with a smaller amount of data, the results from the functions were practically the same as when using larger datasets.

In [125]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

In [126]:
path_items = '../data/processed/df_items_clean.parquet'
path_genres = '../data/processed/df_genres_dummies.parquet'
path_games = '../data/processed/df_games_clean.parquet'
path_reviews = '../data/processed/df_reviews_clean.parquet'


df_items = pd.read_parquet(path_items)
df_genres = pd.read_parquet(path_genres)
df_games = pd.read_parquet(path_games)
df_reviews = pd.read_parquet(path_reviews)

df_items['game_id'] = df_items['game_id'].astype('Int64')
df_genres['game_id'] = df_genres['game_id'].astype('Int64')
df_games['game_id'] = df_games['game_id'].astype('Int64')
df_reviews['game_id'] = df_reviews['game_id'].astype('Int64')
df_games = pd.read_parquet(path_games)

**Function 1:**  
- `PlayTimeGenre`(genre: str) should return the year with the most hours played for a specified genre.

In [127]:

save_path = '../data/processed/'

playtime_genre = pd.merge(df_items, df_games, on='game_id')
playtime_genre = pd.merge(playtime_genre, df_genres, on='game_id')
playtime_genre.to_parquet('playtime_genre.parquet')

In [128]:
columns_items_required = ['game_id', 'playtime_forever']  
columns_games_required = ['game_id', 'release_year']  
columns_genres_required = df_genres.columns  

df_items_reduced = df_items[columns_items_required]
df_games_reduced = df_games[columns_games_required]
df_genres_reduced = df_genres[columns_genres_required]

df_combined = pd.merge(df_items_reduced, df_games_reduced, on='game_id')
df_combined = pd.merge(df_combined, df_genres_reduced, on='game_id')

# Sample 5000 random rows 
sampled_df = df_combined.sample(n=5000, random_state=42)
sampled_df.to_parquet(save_path + 'playtime_genre_combined.parquet', compression='snappy')


In [129]:
def playtime_genre(genre: str):
    
    df_combined = pd.read_parquet(save_path + 'playtime_genre_combined.parquet')
    genre_lower = genre.lower() # case sensitivity

    # Find corresponding genre
    genre_column = next((col for col in df_combined.columns if col.lower() == genre_lower), None)
    
    if not genre_column:
        return {"error": f"Genre '{genre}' not found"}

    genre_df = df_combined[df_combined[genre_column] == 1]

    playtime_by_year = genre_df.groupby('release_year')['playtime_forever'].sum()

    max_playtime_year = playtime_by_year.idxmax()


    return {"Year with most hours played for Genre {}".format(genre): int(max_playtime_year)}


In [130]:

result_1 = playtime_genre("action")
result_2 = playtime_genre("adventure")
result_3 = playtime_genre("strategy")

print(result_1)
print(result_2)
print(result_3)


{'Year with most hours played for Genre action': 2012}
{'Year with most hours played for Genre adventure': 2006}
{'Year with most hours played for Genre strategy': 2012}


**Function 2:**
- `UserForGenre(genre: str)` should return the user who has accumulated the most hours played for the given genre and a list of the accumulation of hours played by year.

In [None]:
columns_items_required = ['game_id', 'user_id', 'playtime_forever']
columns_games_required = ['game_id', 'release_year'] 
columns_genres_required = df_genres.columns

df_items_reduced = df_items[columns_items_required]
df_games_reduced = df_games[columns_games_required]
df_genres_reduced = df_genres[columns_genres_required]

df_items_reduced['playtime_forever'] = df_items_reduced['playtime_forever'].fillna(0).astype(int)

df_combined = pd.merge(df_items_reduced, df_games_reduced, on='game_id')
df_combined = pd.merge(df_combined, df_genres_reduced, on='game_id')

sampled_df = df_combined.sample(n=5000, random_state=42)
sampled_df.to_parquet(save_path + 'user_genre_combined.parquet', compression='snappy')

In [132]:
def user_for_genre(genre: str):
    df_combined = pd.read_parquet('../data/processed/user_genre_combined.parquet')

    genre_lower = genre.lower()
    genre_column = next((col for col in df_combined.columns if col.lower() == genre_lower), None)
    if not genre_column:
        return {"error": f"Genre '{genre}' not found"}
    genre_df = df_combined[df_combined[genre_column] == 1]
    user_playtime = genre_df.groupby('user_id')['playtime_forever'].sum()
    top_user = user_playtime.idxmax()
    playtime_by_year = genre_df[genre_df['user_id'] == top_user].groupby('release_year')['playtime_forever'].sum().reset_index()
    playtime_by_year = playtime_by_year[playtime_by_year['playtime_forever'] > 0]
    playtime_by_year = playtime_by_year.sort_values('release_year', ascending=False)

    return {
        "User with most hours played for Genre {}".format(genre): str(top_user),
        "Hours Played": playtime_by_year.to_dict(orient='records')
    }


In [133]:
test_genres = ["action", "adventure", "strategy"]

for genre in test_genres:
    result = user_for_genre(genre)
    print(f"Results for genre '{genre}':")
    print(result)
    print("\n")


Results for genre 'action':
{'User with most hours played for Genre action': 'webbydee', 'Hours Played': [{'release_year': 2004, 'playtime_forever': 154662}]}


Results for genre 'adventure':
{'User with most hours played for Genre adventure': 'JF-Fuzion', 'Hours Played': [{'release_year': 2006, 'playtime_forever': 135874}]}


Results for genre 'strategy':
{'User with most hours played for Genre strategy': 'webbydee', 'Hours Played': [{'release_year': 2004, 'playtime_forever': 154662}]}




**Function 3:**
- `UsersRecommend(year: int)` returns the top 3 games MOST recommended by users for the given year. 
- These are games with positive or neutral reviews that received the most recommendations from users in that year.

In [134]:

columns_games_required = ['game_id', 'release_year', 'app_name']
columns_reviews_required = ['game_id', 'recommend']

df_games_reduced = df_games[columns_games_required]
df_reviews_reduced = df_reviews[columns_reviews_required]

df_combined_reviews = pd.merge(df_games_reduced, df_reviews_reduced, on='game_id')

sampled_df = df_combined_reviews.sample(n=5000, random_state=42)
sampled_df.to_parquet(save_path + 'games_reviews_combined.parquet', compression='snappy')



In [135]:
def users_recommend(year: int):
    df_combined_reviews = pd.read_parquet(save_path + 'games_reviews_combined.parquet')

    year_games = df_combined_reviews[df_combined_reviews['release_year'] == year]

    if year_games.empty:
        return {"message": f"No data available for the year {year}"}

    recommended_reviews = year_games[year_games['recommend'] == True]

    top_games = recommended_reviews['app_name'].value_counts().head(3).index.tolist()
    top_games_ranked = [{"Rank {}".format(i + 1): game} for i, game in enumerate(top_games)]

    if not top_games_ranked:
        return {"message": f"No recommended games found for the year {year}"}

    return {"Top recommended games for the year {}".format(year): top_games_ranked}



In [136]:
test_years = [2017, 2016, 2015]  

for year in test_years:
    result = users_recommend(year)
    print(f"Results for the year {year}:")
    print(result)
    print("\n")


Results for the year 2017:
{'Top recommended games for the year 2017': [{'Rank 1': 'Unturned'}, {'Rank 2': 'Robocraft'}, {'Rank 3': 'ARK: Survival Evolved'}]}


Results for the year 2016:
{'Top recommended games for the year 2016': [{'Rank 1': 'Starbound'}, {'Rank 2': 'Heroes & Generals'}, {'Rank 3': 'Stardew Valley'}]}


Results for the year 2015:
{'Top recommended games for the year 2015': [{'Rank 1': 'Grand Theft Auto V'}, {'Rank 2': 'Rocket League®'}, {'Rank 3': 'Fallout 4'}]}




In [137]:
df_combined_reviews.head()

Unnamed: 0,game_id,release_year,app_name,recommend
0,282010,1997,Carmageddon Max Pack,True
1,70,1998,Half-Life,True
2,70,1998,Half-Life,True
3,70,1998,Half-Life,True
4,70,1998,Half-Life,True


**Function 4:**
- `UsersWorstDeveloper(year: int)` returns the top 3 developers with the LEAST recommended games by users for the given year. 
- These are developers whose games received negative reviews and had the most negative comments from users in that year.


In [138]:

columns_games_required = ['game_id', 'release_year', 'developer']
columns_reviews_required = ['game_id', 'recommend']

df_games_reduced = df_games[columns_games_required]
df_reviews_reduced = df_reviews[columns_reviews_required]

df_combined_reviews_developer = pd.merge(df_games_reduced, df_reviews_reduced, on='game_id')

sampled_df = df_combined_reviews_developer.sample(n=5000, random_state=42)
sampled_df.to_parquet(save_path + 'games_reviews_developer_combined.parquet', compression='snappy')

In [139]:
def users_worst_developer(year: int):
    
    df_combined_reviews_developer = pd.read_parquet(save_path + 'games_reviews_developer_combined.parquet')
    year_games = df_combined_reviews_developer[df_combined_reviews_developer['release_year'] == year]

    if year_games.empty:
        return {"message": f"No data available for the year {year}"}

    negative_reviews = year_games[year_games['recommend'] == False]

    worst_developers = negative_reviews['developer'].value_counts().nsmallest(3).index.tolist()
    worst_developers_ranked = [{"Rank {}".format(i + 1): dev} for i, dev in enumerate(worst_developers)]

    if not worst_developers_ranked:
        return {"message": f"No negative reviews found for developers in the year {year}"}

    return {"Top 3 developers with the most negative reviews for the year {}".format(year): worst_developers_ranked}

In [140]:
test_years = [2017, 2016, 2015] 

for year in test_years:
    result = users_worst_developer(year)
    print(f"Results for the year {year}:")
    print(result)
    print("\n")


Results for the year 2017:
{'Top 3 developers with the most negative reviews for the year 2017': [{'Rank 1': 'Wolfire Games'}, {'Rank 2': 'Sauropod Studio'}, {'Rank 3': 'oddonegames'}]}


Results for the year 2016:
{'Top 3 developers with the most negative reviews for the year 2016': [{'Rank 1': 'Viswanath Atlu,Laurie Banks,Rohan Bhukan,Nick Burnham,Avinash Kalapala,Yash Kapani,Katharine Marsh,Ankur Rathore,Hardit Singh,Anoop Nihar Srinivas,Ryan Guanyuhao Jiang,Robert Zhu'}, {'Rank 2': 'G2CREW'}, {'Rank 3': 'Capcom'}]}


Results for the year 2015:
{'Top 3 developers with the most negative reviews for the year 2015': [{'Rank 1': 'DotEmu'}, {'Rank 2': 'Darius Bode'}, {'Rank 3': 'EightyEightGames'}]}




**Function 5:**
- `sentiment_analysis(developer: str)` returns a dictionary where the developer's name is the key, and the value is a list containing the total number of records of user reviews categorized with sentiment analysis.

In [149]:
columns_games_required = ['game_id', 'developer']
columns_reviews_required = ['game_id', 'sentiment_analysis']

df_games_reduced = df_games[columns_games_required]
df_reviews_reduced = df_reviews[columns_reviews_required]

df_combined_sentiment = pd.merge(df_games_reduced, df_reviews_reduced, on='game_id')


sampled_df = df_combined_sentiment.sample(n=5000, random_state=42)
sampled_df.to_parquet(save_path + 'games_sentiment_combined.parquet', compression='snappy')


In [150]:
sampled_df.head()

Unnamed: 0,game_id,developer,sentiment_analysis
7168,730,Valve,2
7831,730,Valve,0
42034,92000,Dark Energy Digital Ltd.,2
33403,224260,No More Room in Hell Team,2
6373,730,Valve,1


In [142]:
def sentiment_analysis(developer: str):
    df_combined_sentiment = pd.read_parquet(save_path + 'games_sentiment_combined.parquet')
    developer_lower = developer.lower()

    matched_developer = df_combined_sentiment['developer'].str.lower().eq(developer_lower)
    if not matched_developer.any():
        return {"error": f"Developer '{developer}' not found"}

    dev_reviews = df_combined_sentiment[matched_developer]
    if dev_reviews.empty:
        return {"message": "No reviews found for developer {}".format(developer)}

    sentiment_count = dev_reviews['sentiment_analysis'].value_counts().rename({0: 'Negative', 1: 'Neutral', 2: 'Positive'})

    return {"Sentiment analysis for developer {}".format(developer): sentiment_count.to_dict()}


In [143]:
test_developers = ["Valve", "Stainless Games Ltd", "Test Missing"]

for developer in test_developers:
    result = sentiment_analysis(developer)
    print(f"Results for developer '{developer}':")
    print(result)
    print("\n")


Results for developer 'Valve':
{'Sentiment analysis for developer Valve': {'Positive': 542, 'Neutral': 256, 'Negative': 158}}


Results for developer 'Stainless Games Ltd':
{'error': "Developer 'Stainless Games Ltd' not found"}


Results for developer 'Test Missing':
{'error': "Developer 'Test Missing' not found"}




In [144]:
df_combined_sentiment.head()

Unnamed: 0,game_id,developer,sentiment_analysis
0,282010,Stainless Games Ltd,1
1,70,Valve,0
2,70,Valve,0
3,70,Valve,0
4,70,Valve,2


**ML Function:**
-  recommend_game(product_id: int) takes a product ID as input and should return a list of 5 recommended games that are similar to the input game.



In [145]:
import pandas as pd
from sklearn.preprocessing import StandardScaler


df_merged = df_games.merge(df_genres, on='game_id', how='left')

features = ['release_year'] + list(df_genres.columns[1:]) 

scaler = StandardScaler()
df_merged['release_year'] = scaler.fit_transform(df_merged[['release_year']])

df_final = df_merged[['game_id', 'app_name'] + features]

df_sampled = df_final.sample(n=2000, random_state=42)

save_path = '../data/processed/preprocessed_sample_for_recommendation.parquet'
df_sampled.to_parquet(save_path, compression='snappy')



In [146]:
df_sampled.head()

Unnamed: 0,game_id,app_name,release_year,Accounting,Action,Adventure,Animation & Modeling,Audio Production,Casual,Design & Illustration,...,Photo Editing,RPG,Racing,Simulation,Software Training,Sports,Strategy,Utilities,Video Production,Web Publishing
17084,520641,Ashes of the Singularity: Escalation - Soundtr...,0.657395,0,0,0,0,0,0,0,...,0,0,0,1,0,0,1,0,0,0
18206,542640,RPG Maker MV - Karugamo Contemporary BGM Pack 01,0.351686,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,1
24171,331068,Call of Duty®: Advanced Warfare - Steampunk Ex...,0.045976,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5627,428830,Save the Dodos,0.351686,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
15214,598480,World of One,0.657395,0,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [147]:
def recommend_game(game_id, top_n=5):
  
    df_sampled = pd.read_parquet('../data/processed/preprocessed_sample_for_recommendation.parquet')
    df_sampled = df_sampled.reset_index(drop=True)

    # Filter numeric features for the similarity calculation
    numeric_features = df_sampled.select_dtypes(include=[np.number]).columns.tolist()
    similarity_matrix = cosine_similarity(df_sampled[numeric_features].fillna(0))
    similarity_matrix = np.nan_to_num(similarity_matrix)
 
    if game_id not in df_sampled['game_id'].values:
        return f"No recommendations found: {game_id} is not in the data."

    game_idx = df_sampled.index[df_sampled['game_id'] == game_id].tolist()
    if not game_idx:
        return f"No recommendations found: Game with ID {game_id} not found in data."
    game_idx = game_idx[0]

    # Similarity scores & sorting
    similarity_scores = list(enumerate(similarity_matrix[game_idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    similar_games_indices = [i for i, score in similarity_scores[1:top_n+1]]
    similar_game_names = df_sampled['app_name'].iloc[similar_games_indices].tolist()


    input_game_name = df_sampled['app_name'].iloc[game_idx]
    recommendation_message = f"Recommended games based on game ID {game_id} - {input_game_name}:"
    
    return [recommendation_message] + similar_game_names




In [148]:
test_game_id = 542640  
recommendations = recommend_game(test_game_id, top_n=5)

print("Recommendations for Game ID", test_game_id)
for rec in recommendations:
    print(rec)


Recommendations for Game ID 542640
Recommended games based on game ID 542640 - RPG Maker MV - Karugamo Contemporary BGM Pack 01:
RPG Maker MV - Twilight Shrine: Japanese Resource Pack
RPG Maker MV - Samurai Classics: Temple of Darkness
RPG Maker MV - Karugamo Fantasy BGM Pack 04
RPG Maker VX Ace - Futuristic Characters Pack
RPG Maker VX Ace - Time Fantasy: Winter Tiles
