# General Assembly DSI 9 - Jorge Ramos
# Capstone: New App Evaluator
## Problem Statement
Determine whether a new app would be successful or not based on its title and description. To reach this result the model will relate the title and description of existing apps to their respective ratings and create a recommender that will act as an evaluator for future apps. This application will require a person to input the title and description of their app in order to be evaluated.

## Executive Summary
The users will access the web page and input their app's title and description. There will be a submit button that, when pressed, will activate the the evaluator and output a result. This result will contain the predicted rating that comes from the average of the most related apps. If the app is rated with a 4 or more, it will be considered successful, but below a 4 will suggest the person to rethink the title and description of the app.

Additionally the result will include some recommendations like:
- Age Rating:
  - Based on the existing most related apps.
- Genre:
  - Based on the existing most related apps.
  - This might be more than one genre if there is a chance that the app may be related to some other one.
- Categories:
  - Based on the existing most related apps.
  - This will include all the different categories in which the related apps have been classified in.
- Free or Paid:
  - Based on the existing most related apps.
  - A percentage of how many of the apps are free regarding the top ten most similar.
  - Will give the average price of the related apps.
  - And will show the maximum price of the related apps.

At the end, there will be a list of the most related apps with their respective ratings and links to the App Store.

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import time

In [2]:
%%capture
from tqdm import tqdm_notebook as tqdm
tqdm().pandas()

### Load CSV

In [3]:
df = pd.read_csv("data/appstore_games.csv")
df.head(3)

Unnamed: 0,URL,ID,Name,Subtitle,Icon URL,Average User Rating,User Rating Count,Price,In-app Purchases,Description,Developer,Age Rating,Languages,Size,Primary Genre,Genres,Original Release Date,Current Version Release Date
0,https://apps.apple.com/us/app/sudoku/id284921427,284921427,Sudoku,,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,4.0,3553.0,2.99,,"Join over 21,000,000 of our fans and download ...",Mighty Mighty Good Games,4+,"DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT...",15853568.0,Games,"Games, Strategy, Puzzle",11/07/2008,30/05/2017
1,https://apps.apple.com/us/app/reversi/id284926400,284926400,Reversi,,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,3.5,284.0,1.99,,"The classic game of Reversi, also known as Oth...",Kiss The Machine,4+,EN,12328960.0,Games,"Games, Strategy, Board",11/07/2008,17/05/2018
2,https://apps.apple.com/us/app/morocco/id284946595,284946595,Morocco,,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,3.0,8376.0,0.0,,Play the classic strategy game Othello (also k...,Bayou Games,4+,EN,674816.0,Games,"Games, Board, Strategy",11/07/2008,5/09/2017


### Initial Data Specs
There is data for 17,007 different apps.

In [4]:
df.shape

(17007, 18)

In [5]:
df.dtypes

URL                              object
ID                                int64
Name                             object
Subtitle                         object
Icon URL                         object
Average User Rating             float64
User Rating Count               float64
Price                           float64
In-app Purchases                 object
Description                      object
Developer                        object
Age Rating                       object
Languages                        object
Size                            float64
Primary Genre                    object
Genres                           object
Original Release Date            object
Current Version Release Date     object
dtype: object

### Renaming Columns

In [6]:
df.columns

Index(['URL', 'ID', 'Name', 'Subtitle', 'Icon URL', 'Average User Rating',
       'User Rating Count', 'Price', 'In-app Purchases', 'Description',
       'Developer', 'Age Rating', 'Languages', 'Size', 'Primary Genre',
       'Genres', 'Original Release Date', 'Current Version Release Date'],
      dtype='object')

In [7]:
df.columns = map(str.lower, df.columns)

In [8]:
df.columns = df.columns.str.replace(" ", "_")

In [9]:
df.columns

Index(['url', 'id', 'name', 'subtitle', 'icon_url', 'average_user_rating',
       'user_rating_count', 'price', 'in-app_purchases', 'description',
       'developer', 'age_rating', 'languages', 'size', 'primary_genre',
       'genres', 'original_release_date', 'current_version_release_date'],
      dtype='object')

### Managing Null Values

In [10]:
df.isnull().sum()

url                                 0
id                                  0
name                                0
subtitle                        11746
icon_url                            0
average_user_rating              9446
user_rating_count                9446
price                              24
in-app_purchases                 9324
description                         0
developer                           0
age_rating                          0
languages                          60
size                                1
primary_genre                       0
genres                              0
original_release_date               0
current_version_release_date        0
dtype: int64

#### Replacing all NaN values of "Average User Ratings"  and "User Rating Count" for 0
According to the discussion thread in kaggle, these values are equivalent to NaNs because the apps have 4 or less ratings. As well as apple is probably assuming that these are non-representative for an assumption of good or bad overall rating, I will assume they all have 0 ratings. \
With this assumption in consideration, I will also give all this apps a rating of 0.

In [11]:
df.loc[df["average_user_rating"] > 0].shape[0]

7561

In [12]:
df.loc[(df["user_rating_count"] >= 5)].shape[0]

7561

In [13]:
# Simple addition to make sure I am considering all 17,007 apps
7561 + 9446 # Value counts + NaN value counts

17007

In [14]:
df["average_user_rating"].fillna(value = 0, inplace= True)
df["user_rating_count"].fillna(value = 0, inplace= True)

In [15]:
df.isnull().sum()

url                                 0
id                                  0
name                                0
subtitle                        11746
icon_url                            0
average_user_rating                 0
user_rating_count                   0
price                              24
in-app_purchases                 9324
description                         0
developer                           0
age_rating                          0
languages                          60
size                                1
primary_genre                       0
genres                              0
original_release_date               0
current_version_release_date        0
dtype: int64

#### Replacing all NaN values of "subtitle" for " "
The NaN values in the subtitle column indicate that there is no subtitle, so this values will be changed for empty strings (a space) to make sure that we get the correct information while merging title, subtitle and description into one string.

In [16]:
# Number of NaN values
df.loc[df["subtitle"].isnull()].shape[0]

11746

In [17]:
df["subtitle"].fillna(value = " ", inplace = True)
df.isnull().sum()

url                                0
id                                 0
name                               0
subtitle                           0
icon_url                           0
average_user_rating                0
user_rating_count                  0
price                             24
in-app_purchases                9324
description                        0
developer                          0
age_rating                         0
languages                         60
size                               1
primary_genre                      0
genres                             0
original_release_date              0
current_version_release_date       0
dtype: int64

#### Managing prices and correcting ratings
Since there are only 24 apps that have NaN values for prices I will input the actual values manually and correct rating information for these apps, if it is available in the website. I will also drop the rows containing apps that do not exist any more.

In [18]:
df.loc[df["price"].isnull(), "name"].shape[0]

24

In [19]:
df.loc[df["price"].isnull(), "name"]

8341                             Germiz
15216                        Gears POP!
16104                 State of Survival
16158              LEAGUE OF WONDERLAND
16351    Western Redemption: Cowboy Gun
16358                       Magic ARena
16628        WW2 Battle Front Simulator
16635                    "Lock's Quest"
16715                     Second Galaxy
16725      Game of Thrones Beyond\u2026
16731                     Color Defense
16760    Crazy Restaurant Cooking Games
16780                   Fleet Chronicle
16782            Idle Mars Colonization
16795        Idle Computer Game Company
16879    Terrorist Shooter: City Missio
16891                           Type II
16902          Island Jurassic Survival
16942            Flying Carpet Shooting
16951      War Shooting Battle Survival
16978          Magic Puzzle Box: 3 in 1
16980              Block Soldier Sniper
17000                         Super kid
17001             Lava Island Adventure
Name: name, dtype: object

In [20]:
df["price"].unique()

array([  2.99,   1.99,   0.  ,   0.99,   5.99,   7.99,   4.99,   3.99,
         9.99,  19.99,   6.99,  11.99,   8.99, 139.99,  12.99,  99.99,
        14.99,  16.99, 179.99,    nan,  37.99,  36.99,  29.99,  18.99,
        59.99])

In [21]:
# Germiz
df.loc[df["name"] == "Germiz", "price"] = 0.0

# Gears POP!
df.loc[df["name"] == "Gears POP!", "price"] = 0.0
df.loc[df["name"] == "Gears POP!", "average_user_rating"] = 4.2
df.loc[df["name"] == "Gears POP!", "user_rating_count"] = 5100.0

# State of Survival
df.loc[df["name"] == "State of Survival", "price"] = 0.0
df.loc[df["name"] == "State of Survival", "average_user_rating"] = 4.7
df.loc[df["name"] == "State of Survival", "user_rating_count"] = 5800.0

# LEAGUE OF WONDERLAND
df.loc[df["name"] == "LEAGUE OF WONDERLAND", "price"] = 0.0
df.loc[df["name"] == "LEAGUE OF WONDERLAND", "average_user_rating"] = 4.1
df.loc[df["name"] == "LEAGUE OF WONDERLAND", "user_rating_count"] = 62.0

# Western Redemption: Cowboy Gun
df.loc[df["name"] == "Western Redemption: Cowboy Gun", "price"] = 0.0

# Magic ARena
df.loc[df["name"] == "Magic ARena", "price"] = 0.0
df.loc[df["name"] == "Magic ARena", "average_user_rating"] = 3.7
df.loc[df["name"] == "Magic ARena", "user_rating_count"] = 70.0

# WW2 Battle Front Simulator
df.loc[df["name"] == "WW2 Battle Front Simulator", "price"] = 0.0
df.loc[df["name"] == "WW2 Battle Front Simulator", "average_user_rating"] = 4.4
df.loc[df["name"] == "WW2 Battle Front Simulator", "user_rating_count"] = 773.0

# Lock's Quest
df.loc[df["name"] == "\"Lock's Quest\"", "price"] = 6.99

# Second Galaxy
df.loc[df["name"] == "Second Galaxy", "price"] = 0.0
df.loc[df["name"] == "Second Galaxy", "average_user_rating"] = 4.6
df.loc[df["name"] == "Second Galaxy", "user_rating_count"] = 8900.0

# Game of Thrones Beyond\u2026 -- Does not exist any more, row will be deleted
# df.loc[df["name"] == "Game of Thrones Beyond\u2026", "price"] = 0.0
df.drop(index=df.loc[df["name"] == "Game of Thrones Beyond\\u2026"].index[0], inplace=True)

# Color Defense
df.loc[df["name"] == "Color Defense", "price"] = 2.99

# Crazy Restaurant Cooking Games
df.loc[df["name"] == "Crazy Restaurant Cooking Games", "price"] = 0.0
df.loc[df["name"] == "Crazy Restaurant Cooking Games", "average_user_rating"] = 4.8
df.loc[df["name"] == "Crazy Restaurant Cooking Games", "user_rating_count"] = 600.0

# Fleet Chronicle
df.loc[df["name"] == "Fleet Chronicle", "price"] = 0.0

# Idle Mars Colonization -- Does not exist any more, row will be deleted
# df.loc[df["name"] == "Idle Mars Colonization", "price"] = 0.0
df.drop(index=df.loc[df["name"] == "Idle Mars Colonization"].index[0], inplace=True)

# Idle Computer Game Company -- Does not exist any more, row will be deleted
# df.loc[df["name"] == "Idle Computer Game Company", "price"] = 0.0
df.drop(index=df.loc[df["name"] == "Idle Computer Game Company"].index[0], inplace=True)

# Terrorist Shooter: City Missio
df.loc[df["name"] == "Terrorist Shooter: City Missio", "price"] = 0.0

# Type II
df.loc[df["name"] == "Type II", "price"] = 5.99

# Island Jurassic Survival
df.loc[df["name"] == "Island Jurassic Survival", "price"] = 0.0

# Flying Carpet Shooting
df.loc[df["name"] == "Flying Carpet Shooting", "price"] = 0.0

# War Shooting Battle Survival
df.loc[df["name"] == "War Shooting Battle Survival", "price"] = 0.0

# Magic Puzzle Box: 3 in 1
df.loc[df["name"] == "Magic Puzzle Box: 3 in 1", "price"] = 0.0

# Block Soldier Sniper
df.loc[df["name"] == "Block Soldier Sniper", "price"] = 0.0

# Super kid -- Does not exist any more, row will be deleted
# df.loc[df["name"] == "Super kid", "price"] = 0.0
df.drop(index=df.loc[df["name"] == "Super kid"].index[0], inplace=True)

# Lava Island Adventure -- Does not exist any more, row will be deleted
# df.loc[df["name"] == "Lava Island Adventure", "price"] = 0.0
df.drop(index=df.loc[df["name"] == "Lava Island Adventure"].index[0], inplace=True)

In [22]:
df.shape

(17002, 18)

In [23]:
df.isnull().sum()

url                                0
id                                 0
name                               0
subtitle                           0
icon_url                           0
average_user_rating                0
user_rating_count                  0
price                              0
in-app_purchases                9320
description                        0
developer                          0
age_rating                         0
languages                         60
size                               0
primary_genre                      0
genres                             0
original_release_date              0
current_version_release_date       0
dtype: int64

#### Managing languages
Since there are only 60 apps without a language I will input data manually in 3 steps.
1. Drop rows with apps that don't exist anymore
2. Change NaN languages for apps different than just "English"
3. Replace all remaining NaN values with "EN" for English

In [24]:
df.loc[df["languages"].isnull()].shape[0]

60

In [25]:
null_lang = df.loc[df["languages"].isnull()]
null_lang.reset_index(drop = True, inplace= True)

In [26]:
null_lang.head()

Unnamed: 0,url,id,name,subtitle,icon_url,average_user_rating,user_rating_count,price,in-app_purchases,description,developer,age_rating,languages,size,primary_genre,genres,original_release_date,current_version_release_date
0,https://apps.apple.com/us/app/gravitation/id28...,286313771,Gravitation,,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,2.5,35.0,0.0,,"""Gravitation is a new implementation of the pu...",Robert Farnum,4+,,6328320.0,Games,"Games, Entertainment, Puzzle, Strategy",30/07/2008,14/11/2013
1,https://apps.apple.com/us/app/naval-fight/id33...,334183810,Naval Fight,,https://is3-ssl.mzstatic.com/image/thumb/Purpl...,2.5,1604.0,0.0,,"""Naval Fight is a battleship game where you at...",Steve Tranby,4+,,7075285.0,Games,"Games, Board, Strategy, Entertainment",14/10/2009,18/12/2010
2,https://apps.apple.com/us/app/guess-number-fre...,408198768,Guess Number Free,,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,3.5,8.0,0.0,,Guess Number is a fun little game where the ap...,ESP,4+,,1746288.0,Games,"Games, Strategy, Puzzle, Education",15/12/2010,13/05/2011
3,https://apps.apple.com/us/app/bad-air-day-lite...,412654105,Bad Air Day LITE,,https://is1-ssl.mzstatic.com/image/thumb/Purpl...,3.5,701.0,0.0,,"""Game Description:\nHelp aromatic Artie keep h...",GameHouse,4+,,16633132.0,Games,"Games, Strategy, Action, Entertainment",3/02/2011,3/02/2011
4,https://apps.apple.com/us/app/shatranj/id41771...,417715370,Shatranj,,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,0.0,0.0,0.0,,"Chess, formally known as Shatranj (Currently a...",HiddenBrains,4+,,7634750.0,Games,"Games, Strategy",10/02/2011,10/02/2011


In [27]:
base_url = "https://itunes.apple.com/lookup?id="

In [28]:
def app_store_api_lang_update(sel_df, orig_df, base_url):
    for i in tqdm(range(len(sel_df))):
        if i % 20 == 0 and i > 19:
            time.sleep(60)
        app_id = str(sel_df["id"][i])
        url = base_url + app_id
        res = requests.get(url)
        if res.status_code == 200:
            if res.json()["resultCount"] != 0:
                new_data = res.json()["results"][0]["languageCodesISO2A"]
                if new_data == []:
                    print(sel_df["url"][i])
                    print(sel_df["id"][i])
                else:
                    e = orig_df.loc[orig_df["id"] == int(app_id)]
                    return e
            else:
                print(f"This app does not exist anymore: {sel_df['id'][i]}")

In [29]:
app_store_api_lang_update(null_lang, df, base_url)

HBox(children=(IntProgress(value=0, max=60), HTML(value='')))

https://apps.apple.com/us/app/gravitation/id286313771
286313771
https://apps.apple.com/us/app/naval-fight/id334183810
334183810
This app does not exist anymore: 408198768
https://apps.apple.com/us/app/bad-air-day-lite/id412654105
412654105
https://apps.apple.com/us/app/shatranj/id417715370
417715370
https://apps.apple.com/us/app/shatranj-hd/id420374009
420374009
https://apps.apple.com/us/app/monster-war/id789200887
789200887
https://apps.apple.com/us/app/civil-war-battles-ozark/id836652600
836652600
https://apps.apple.com/us/app/civil-war-battles-peninsula/id870615379
870615379
https://apps.apple.com/us/app/civil-war-chancellorsville/id870746172
870746172
https://apps.apple.com/us/app/dont-step-white-tile-edition-piano-tap-style-tiles/id879733874
879733874
https://apps.apple.com/us/app/civil-war-battles-corinth/id891328640
891328640
https://apps.apple.com/us/app/civil-war-battles-atlanta/id893988270
893988270
https://apps.apple.com/us/app/drawtopia-lite/id894430140
894430140
https://ap

Drop rows with apps that don't exist anymore

In [30]:
df.drop(index=df.loc[df["id"] == 408198768].index[0], inplace=True)
df.drop(index=df.loc[df["id"] == 1156789859].index[0], inplace=True)
df.drop(index=df.loc[df["id"] == 1431083580].index[0], inplace=True)
df.drop(index=df.loc[df["id"] == 1434042472].index[0], inplace=True)
df.drop(index=df.loc[df["id"] == 1449929230].index[0], inplace=True)

Change NaN languages for apps different than just "English"

In [31]:
df.loc[df["id"] == 1323959674, "languages"] = "EN, VI"
df.loc[df["id"] == 1450659337, "languages"] = "EN, DE"

Replace all remaining NaN values with "EN" for English

In [32]:
df["languages"].fillna(value = "EN", inplace = True)
df.shape

(16997, 18)

In [33]:
df.isnull().sum()

url                                0
id                                 0
name                               0
subtitle                           0
icon_url                           0
average_user_rating                0
user_rating_count                  0
price                              0
in-app_purchases                9317
description                        0
developer                          0
age_rating                         0
languages                          0
size                               0
primary_genre                      0
genres                             0
original_release_date              0
current_version_release_date       0
dtype: int64

#### Managing In-app Purchases
For this part I went through the following steps:
1. Find the apps that have a strange value for this feature but the sum of all of them are equal to 0 and turn them into "0.00" or their respective values to match the rest.
2. Delete apps that don't exist any more.
3. Fill the rest of NaNs with "0.00".

In [34]:
purch_str = df.loc[df["in-app_purchases"].isnull() == False, "in-app_purchases"]
purch_str_sp = [i.split(",") for i in purch_str]
purch_str_sp

[['1.99'],
 ['0.99'],
 ['0.99'],
 ['1.99', ' 0.99', ' 1.99', ' 0.99', ' 4.99', ' 1.99', ' 1.99'],
 ['0.99', ' 0.99', ' 0.99'],
 ['0.99', ' 0.99', ' 0.99', ' 0.99'],
 ['1.99', ' 0.99'],
 ['0.99', ' 2.99', ' 1.99'],
 ['1.99'],
 ['0.0',
  ' 0.99',
  ' 5.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99'],
 ['2.99',
  ' 2.99',
  ' 5.99',
  ' 0.99',
  ' 0.0',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99'],
 ['9.99',
  ' 9.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99'],
 ['0.99', ' 0.99', ' 0.99'],
 ['0.99', ' 1.99', ' 0.99', ' 0.99'],
 ['9.99',
  ' 2.99',
  ' 4.99',
  ' 19.99',
  ' 5.49',
  ' 23.49',
  ' 49.99',
  ' 36.99',
  ' 13.99',
  ' 99.99'],
 ['0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99'],
 ['4.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99',
  ' 0.99'],
 ['0.99', ' 0.99'],
 ['0.99',
  ' 0.99',
  '

In [35]:
sum_list = []
for num, elem in enumerate(purch_str_sp):
    sum_i = 0
    for i in range(len(elem)):
        sum_i += float(purch_str_sp[num][i])
    sum_list.append(sum_i)
    
print(len(sum_list))
print(min(sum_list))
print(max(sum_list))

7680
0.0
674.9


In [36]:
sum_list.count(0.0)

18

In [37]:
pd.options.display.max_colwidth = 100

In [38]:
df.loc[df["in-app_purchases"] == "0", ["id", "name"]]

Unnamed: 0,id,name
961,477637460,Reversi Community
3096,786329111,"""Moguu's Nest"""
3169,794672565,Orbs of Eternity
3378,828423465,Resurrace
3680,861896839,Evolserk
4736,925606367,Cupcake Stacker FREE
5389,962184200,Football Overlord
5769,979048674,Dragon Egg Match Free: Best Connecting Puzzle Game
6080,993416526,Sum Of Fifteen
8744,1120945464,"""Moguu's Territory Board"""


In [39]:
df.loc[df["in-app_purchases"] == "0", "url"]

961                                      https://apps.apple.com/us/app/reversi-community/id477637460
3096                                           https://apps.apple.com/us/app/moguus-nest/id786329111
3169                                      https://apps.apple.com/us/app/orbs-of-eternity/id794672565
3378                                             https://apps.apple.com/us/app/resurrace/id828423465
3680                                              https://apps.apple.com/us/app/evolserk/id861896839
4736                                  https://apps.apple.com/us/app/cupcake-stacker-free/id925606367
5389                                     https://apps.apple.com/us/app/football-overlord/id962184200
5769     https://apps.apple.com/us/app/dragon-egg-match-free-best-connecting-puzzle-game/id979048674
6080                                        https://apps.apple.com/us/app/sum-of-fifteen/id993416526
8744                               https://apps.apple.com/us/app/moguus-territory-board/id1

In [40]:
df.loc[df["in-app_purchases"] == "0.0, 0.0", ["id", "name"]]

Unnamed: 0,id,name
3197,799624006,QueueUp: A World of Warcraft PvP Battle Zone Companion


In [41]:
df.loc[df["in-app_purchases"] == "0.0, 0.0", "url"]

3197    https://apps.apple.com/us/app/queueup-a-world-of-warcraft-pvp-battle-zone-companion/id799624006
Name: url, dtype: object

In [42]:
df.loc[df["in-app_purchases"] == "0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0", ["id", "name"]]

Unnamed: 0,id,name
4149,894293034,Celestial Fleet
4854,933504458,Video Walkthrough for Plague Inc.
5993,989456404,Video Walkthrough for Cities Skylines


In [43]:
df.loc[df["in-app_purchases"] == "0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0", "url"]

4149                          https://apps.apple.com/us/app/celestial-fleet/id894293034
4854         https://apps.apple.com/us/app/video-walkthrough-for-plague-inc/id933504458
5993    https://apps.apple.com/us/app/video-walkthrough-for-cities-skylines/id989456404
Name: url, dtype: object

In [44]:
# Reversi Community -- No specified price
df.loc[df["id"] == 477637460, "in-app_purchases"] = "0.00"

# Moguu's Nest
df.loc[df["id"] == 786329111, "in-app_purchases"] = "0.99"

# Orbs of Eternity -- No specified price
df.loc[df["id"] == 794672565, "in-app_purchases"] = "0.00"

# Resurrace
df.loc[df["id"] == 828423465, "in-app_purchases"] = "0.99, 0.99"

# Evolserk
df.loc[df["id"] == 861896839, "in-app_purchases"] = "0.99"

# Cupcake Stacker FREE -- Does not exist any more, row will be deleted
# df.loc[df["id"] == 925606367, "in-app_purchases"] = ""
df.drop(index=df.loc[df["id"] == 925606367].index[0], inplace=True)

# Football Overlord -- No specified price
df.loc[df["id"] == 962184200, "in-app_purchases"] = "0.00"

# Dragon Egg Match Free: Best Connecting Puzzle Game
df.loc[df["id"] == 979048674, "in-app_purchases"] = "0.99, 0.99, 1.99, 4.99, 9.99"

# Sum Of Fifteen -- No specified price
df.loc[df["id"] == 993416526, "in-app_purchases"] = "0.00"

# Moguu's Territory Board -- No specified price
df.loc[df["id"] == 1120945464, "in-app_purchases"] = "0.00"

# Mods for Starbound -- Does not exist any more, row will be deleted
# df.loc[df["id"] == 1148248451, "in-app_purchases"] = ""
df.drop(index=df.loc[df["id"] == 1148248451].index[0], inplace=True)

# Mods for Don't Starve and Don't Starve Together -- No specified price
df.loc[df["id"] == 1199797479, "in-app_purchases"] = "0.00"

# Mods for World of Tanks (WoT) -- No specified price
df.loc[df["id"] == 1203620227, "in-app_purchases"] = "0.00"

# Coverage -- No specified price
df.loc[df["id"] == 1406306272, "in-app_purchases"] = "0.00"

# QueueUp: A World of Warcraft PvP Battle Zone Companion
df.loc[df["id"] == 799624006, "in-app_purchases"] = "0.00"

# Celestial Fleet
df.loc[df["id"] == 894293034, "in-app_purchases"] = "7.99, 7.99, 7.99, 7.99, 7.99, 7.99, 2.99, 2.99, 2.99, 2.99"

# Video Walkthrough for Plague Inc.
df.loc[df["id"] == 933504458, "in-app_purchases"] = "0.00"

# Video Walkthrough for Cities Skylines
df.loc[df["id"] == 989456404, "in-app_purchases"] = "0.00"

In [45]:
df["in-app_purchases"].fillna(value = "0.00", inplace = True)
df.shape

(16995, 18)

In [46]:
df.isnull().sum()

url                             0
id                              0
name                            0
subtitle                        0
icon_url                        0
average_user_rating             0
user_rating_count               0
price                           0
in-app_purchases                0
description                     0
developer                       0
age_rating                      0
languages                       0
size                            0
primary_genre                   0
genres                          0
original_release_date           0
current_version_release_date    0
dtype: int64

In [47]:
pd.options.display.max_colwidth = 50

### Reset index

In [48]:
df.reset_index(drop= True, inplace= True)

### Save to csv

In [49]:
df.to_csv("data/clean_appstore_games.csv", index = False)