# Recommendation System for Google Play Store 

* Using Apriori algorithm for implementing the recommendations (Association Rule)

- The result would be the recommended two apps that users most loved
- only for apps that users rated them above the average value (Good rating)


- ### About Me
    - #### By Mohamed Saeed Ali bin Omar | Computer Science
    - #### GitHub: https://github.com/MohamedSaeed-dev

## Import the required libraries
- ### Pandas: 
    is a Python library used for working with data sets. It provides powerful functions for analyzing, cleaning, exploring, and manipulating data.
    
- ### Apyori:
    is a Python library that provides a simple implementation of the Apriori algorithm, and widely used for association rule mining.

In [2]:
import pandas as pd
from apyori import apriori

## Read the Google Play Store datasets

In [3]:
# Google Play Store Dataset

apps_data = pd.read_csv("./googleplaystore.csv")
users_data = pd.read_csv("./users.csv")
display(apps_data)
print(apps_data.shape)

display(users_data)
print(users_data.shape)



Unnamed: 0,App_Name,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,App_ID
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,1
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,2
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,3
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,4
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up,10837
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up,10838
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up,10839
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device,10840


(10841, 14)


Unnamed: 0,UserId,AppId,Rating,Timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
...,...,...,...,...
99999,671,6268,2.5,1065579370
100000,671,6269,4.0,1065149201
100001,671,6365,4.0,1070940363
100002,671,6385,2.5,1070979663


(100004, 4)


### Data Preprocessing
- #### Data Cleaning
    - Feature Selection
        - Let's choose the columns that we need for our recommendation
        - by Removing columns that we don't need in each dataset

In [4]:
apps_data = apps_data.drop(columns=["Category", "Reviews", "Size", "Installs", "Type", "Price", "Content Rating", "Genres","Last Updated", "Current Ver", "Android Ver"])
display(apps_data)

users_data = users_data.drop(columns=["Timestamp"])
display(users_data)

Unnamed: 0,App_Name,Rating,App_ID
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,1
1,Coloring book moana,3.9,2
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",4.7,3
3,Sketch - Draw & Paint,4.5,4
4,Pixel Draw - Number Art Coloring Book,4.3,5
...,...,...,...
10836,Sya9a Maroc - FR,4.5,10837
10837,Fr. Mike Schmitz Audio Teachings,5.0,10838
10838,Parkinson Exercices FR,,10839
10839,The SCP Foundation DB fr nn5n,4.5,10840


Unnamed: 0,UserId,AppId,Rating
0,1,31,2.5
1,1,1029,3.0
2,1,1061,3.0
3,1,1129,2.0
4,1,1172,4.0
...,...,...,...
99999,671,6268,2.5
100000,671,6269,4.0
100001,671,6365,4.0
100002,671,6385,2.5


- #### Null values
- Let's Check if the columns have null values in each dataset

In [5]:
headers_apps = ["App_Name", "Rating"]
for i in headers_apps:
    print(f"{i} = {len(apps_data[apps_data[i].isna()])}")
    
headers_users = ["UserId", "AppId", "Rating"]
for i in headers_users:
    print(f"{i} = {len(users_data[users_data[i].isna()])}")


App_Name = 0
Rating = 1474
UserId = 0
AppId = 0
Rating = 0


- As we see, Rating in apps dataset has null values
    - we must drop them

In [6]:
# Remove any row conatins any null values
apps_data.dropna(axis=0, how='any', inplace=True)


- Let's check again

In [7]:
for i in headers_apps:
    print(f"{i} = {len(apps_data[apps_data[i].isna()])}")

App_Name = 0
Rating = 0


- #### Duplicated values
- Now let's find the duplicated apps names

In [8]:
duplicated_apps = apps_data[apps_data["App_Name"].duplicated()]["App_Name"]
len_duplicated = len(duplicated_apps)
print(duplicated_apps)
print(f"{len_duplicated} Duplicated values")

229            Quick PDF Scanner + OCR FREE
236                                     Box
239                      Google My Business
256                     ZOOM Cloud Meetings
261               join.me - Simple Meetings
                        ...                
10715                    FarmersOnly Dating
10720    Firefox Focus: The privacy browser
10730                           FP Notebook
10753        Slickdeals: Coupons & Shopping
10768                                  AAFP
Name: App_Name, Length: 1170, dtype: object
1170 Duplicated values


- Let's remove them

In [9]:
apps_data = apps_data.drop_duplicates(subset=["App_Name"])

- Let's check the duplicate after removing

In [10]:
duplicated_apps = apps_data[apps_data["App_Name"].duplicated()]["App_Name"]
len_duplicated = len(duplicated_apps)
print(f"{len_duplicated} Duplicated values")

0 Duplicated values


- #### Outlier values
- Let's check the values of Rating column 
    - the overall rating of apps
    - the user's rating of app 

In [11]:
app_rating = set(apps_data["Rating"])
users_rating= set(users_data["Rating"])
print(f"Overall Rating : {app_rating} ")
print(f"Users's Rating : {users_rating} ")

Overall Rating : {1.9, 2.6, 3.9, 4.3, 4.5, 4.1, 4.7, 4.4, 3.8, 4.2, 4.6, 3.2, 4.0, 5.0, 2.5, 3.5, 3.0, 2.0, 19.0, 4.9, 1.0, 1.5, 1.6, 2.1, 3.6, 3.1, 1.7, 1.2, 2.8, 2.7, 2.3, 2.2, 3.7, 3.3, 4.8, 1.8, 1.4, 2.9, 2.4, 3.4} 
Users's Rating : {0.5, 1.0, 2.0, 3.5, 2.5, 3.0, 4.0, 5.0, 4.5, 1.5} 


- We notice that Rating column of apps dataset (Overall Rating) has a balance range from 1.0 to 5.0, but there is an outlier value we need to remove it -> 19.0

- Users's dataset is fine

In [12]:
rowToRemove = 19.0
apps_data = apps_data[apps_data["Rating"] != 19.0]
display(apps_data)


Unnamed: 0,App_Name,Rating,App_ID
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,1
1,Coloring book moana,3.9,2
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",4.7,3
3,Sketch - Draw & Paint,4.5,4
4,Pixel Draw - Number Art Coloring Book,4.3,5
...,...,...,...
10834,FR Calculator,4.0,10835
10836,Sya9a Maroc - FR,4.5,10837
10837,Fr. Mike Schmitz Audio Teachings,5.0,10838
10839,The SCP Foundation DB fr nn5n,4.5,10840


- Let's check again for Rating column in apps dataset

In [13]:
app_rating = set(apps_data["Rating"])
print(f"Overall Rating : {app_rating} ")

Overall Rating : {1.9, 2.6, 3.9, 4.3, 4.5, 4.1, 4.7, 4.4, 3.8, 4.2, 4.6, 3.2, 4.0, 5.0, 2.5, 3.5, 3.0, 2.0, 4.9, 1.0, 1.5, 1.6, 2.1, 3.6, 3.1, 1.7, 1.2, 2.8, 2.7, 2.3, 2.2, 3.7, 3.3, 4.8, 1.8, 1.4, 2.9, 2.4, 3.4} 


## Apriori Algorithm

- Before implementing the algorithm, we need to find the average of Rating columns, To filter the dataset based on it
- filtering the two dataset to those rows whose rating more than the average

    - We need the Apps of highest overall rating.
    - and the Apps of highest user's rating .

In [14]:
def average_rating(rating):
    return sum(rating) / len(rating)

In [15]:
apps_rating = set(apps_data["Rating"])
users_rating = set(users_data["Rating"])

app_rating_avg = average_rating(apps_rating)
user_rating_avg = average_rating(users_rating)

print(f"The Average of App Rating is {app_rating_avg}")
print(f"The Average of User's Rating is {user_rating_avg}")



The Average of App Rating is 3.0923076923076924
The Average of User's Rating is 2.75


In [16]:
apps_data = apps_data[apps_data['Rating'] > app_rating_avg].reset_index(drop=True)
users_data = users_data[users_data["Rating"] > user_rating_avg].reset_index(drop=True)
display(apps_data)
display(users_data)

Unnamed: 0,App_Name,Rating,App_ID
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,1
1,Coloring book moana,3.9,2
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",4.7,3
3,Sketch - Draw & Paint,4.5,4
4,Pixel Draw - Number Art Coloring Book,4.3,5
...,...,...,...
7831,FR Calculator,4.0,10835
7832,Sya9a Maroc - FR,4.5,10837
7833,Fr. Mike Schmitz Audio Teachings,5.0,10838
7834,The SCP Foundation DB fr nn5n,4.5,10840


Unnamed: 0,UserId,AppId,Rating
0,1,1029,3.0
1,1,1061,3.0
2,1,1172,4.0
3,1,1339,3.5
4,1,1953,4.0
...,...,...,...
82165,671,5991,4.5
82166,671,5995,4.0
82167,671,6269,4.0
82168,671,6365,4.0


- We will create a list of list
    - List of 100 length (initially) for 100 Users
    - Nested List of Apps of each user

In [17]:
record = []
for i in range(1, 101):
    user_row = users_data[users_data["UserId"] == i]
    apps = [apps_data[apps_data["App_ID"] == j]["App_Name"].values[0] for j in user_row["AppId"] if not apps_data[apps_data["App_ID"] == j].empty]
    record.append(apps)
record

[['SUMMER SONIC app',
  'Nubank',
  'Citi Mobile®',
  'My Chakra Meditation 2',
  'TEKKEN™',
  'LEGO® Juniors Create & Cruise',
  'codeSpark Academy & The Foos',
  'Motorola FM Radio'],
 ['Kids Paint Free - Drawing Fun',
  'Photo Designer - Write your name with shapes',
  'PIP Camera - PIP Collage Maker',
  'Install images with music to make video without Net - 2018',
  'Monster Truck Stunt 3D 2019',
  'Ultimate F1 Racing Championship',
  'CDL Practice Test 2018 Edition',
  'Selfie Camera',
  'Amazon Kindle',
  'FBReader: Favorite Book Reader',
  'Google Play Books',
  'Recipes of Prophetic Medicine for free',
  'Ebook Reader',
  'English to Urdu Dictionary',
  'Azpen eReader',
  'Jobs in Alabama - Jobs in Alba',
  'Myanmar 2D/3D',
  'ATI Cargoes and Transportation',
  'TurboScan: scan documents and receipts in PDF',
  'Crew - Free Messaging and Scheduling',
  'Invoice & Time Tracking - Zoho',
  'Start Meeting',
  'Skype for Business for Android',
  'Verify - Receipts & Expenses',
  'G

- #### Implement the apriori algorithm for our apps

In [18]:
association_rules = apriori(record, min_support=0.2, min_confidence=0.2, min_lift=2, min_length=2)
association_results = list(association_rules)

In [19]:
print(f"There are {len(association_results)} Relation derived.")

There are 35 Relation derived.


In [20]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'5 Minute Ab Workouts', 'Adobe Premiere Clip'})
frozenset({'Install images with music to make video without Net - 2018', 'A hundred'})
frozenset({'Adobe Premiere Clip', 'ARK: Survival Evolved'})
frozenset({'Golden telegram', 'ARK: Survival Evolved'})
frozenset({'Adobe Premiere Clip', 'Golden telegram'})
frozenset({'Adobe Premiere Clip', 'Meitu – Beauty Cam, Easy Photo Editor'})
frozenset({'All Email Providers', 'Hitwe - meet people and chat'})
frozenset({'My Vodacom SA', 'All Email Providers'})
frozenset({'Hitwe - meet people and chat', 'BBW Dating & Curvy Singles Chat- LargeFriends'})
frozenset({'Chick-fil-A', 'Dairy Queen'})
frozenset({'Chick-fil-A', 'Fat Burning Workout - Home Weight lose'})
frozenset({'Chick-fil-A', 'Running Distance Tracker +'})
frozenset({'Chick-fil-A', 'Starbucks'})
frozenset({'Starbucks', 'Dairy Queen'})
frozenset({'Starbucks', 'Fat Burning Workout - Home Weight lose'})
frozenset({'Mingle - Online Dating App to Chat & Meet People', 'Hitwe - meet peop

- ## Display the Results :

In [21]:
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print(f"Rule: {items[0]} -> {items[1]}")

    # second index of the inner list
    print(f"Support: {item[1]}")

    # third index of the list located at 0th
    # of the third index of the inner list

    print(f"Confidence: {item[2][0][2]}")
    print(f"Lift: {item[2][0][3]}")
    print("=====================================")

Rule: 5 Minute Ab Workouts -> Adobe Premiere Clip
Support: 0.22
Confidence: 0.9166666666666667
Lift: 2.7777777777777777
Rule: Install images with music to make video without Net - 2018 -> A hundred
Support: 0.21
Confidence: 0.5675675675675675
Lift: 2.364864864864865
Rule: Adobe Premiere Clip -> ARK: Survival Evolved
Support: 0.27
Confidence: 0.9000000000000001
Lift: 2.7272727272727275
Rule: Golden telegram -> ARK: Survival Evolved
Support: 0.21
Confidence: 0.7
Lift: 2.0
Rule: Adobe Premiere Clip -> Golden telegram
Support: 0.24
Confidence: 0.7272727272727272
Lift: 2.0779220779220777
Rule: Adobe Premiere Clip -> Meitu – Beauty Cam, Easy Photo Editor
Support: 0.23
Confidence: 0.696969696969697
Lift: 2.3232323232323235
Rule: All Email Providers -> Hitwe - meet people and chat
Support: 0.23
Confidence: 0.7419354838709677
Lift: 2.318548387096774
Rule: My Vodacom SA -> All Email Providers
Support: 0.21
Confidence: 0.6774193548387096
Lift: 2.7096774193548385
Rule: Hitwe - meet people and chat