### Google Apps. Recommendation Engine.

### As part of this project, I will be creating a recommendation engine for suggesting similar apps from Google store. I will be using the concept of Item to Item collaberative filtering process for this project.

#### Let's import the basic libraries for this project.

In [65]:
import pandas                          as     pd
import numpy                           as     np
import matplotlib                      as     mpl
import matplotlib.pyplot               as     plt
import seaborn                         as     sb
from   sklearn                         import preprocessing 
from   sklearn.model_selection         import train_test_split
from   sklearn.feature_extraction.text import CountVectorizer
from   sklearn.metrics.pairwise        import cosine_similarity
from   sklearn.metrics.pairwise        import pairwise_distances

#### Importing the required files. We are using the files available in Kaggle for our project.

In [171]:
csv_path_apps = 'D:/Datasets/google-play-store-apps/googleplaystore.csv'
csv_path_reviews = 'D:/Datasets/google-play-store-apps/googleplaystore_user_reviews.csv'

app_df = pd.read_csv(csv_path_apps)
reviews_df = pd.read_csv(csv_path_reviews)

#### Diaplaying the top 5 records in the app dataset.

In [172]:
app_df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


#### Displaying the top 5 records in the reviews dataset.

In [173]:
reviews_df.head()

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333
1,10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462
2,10 Best Foods for You,,,,
3,10 Best Foods for You,Works great especially going grocery store,Positive,0.4,0.875
4,10 Best Foods for You,Best idea us,Positive,1.0,0.3


#### Now, from visual inspection itself, we can see that there are duplicate rows in the app dataset. So, we will remove them from our app dataset and just have a single record for each app.

In [174]:
app_df.sort_values('App', inplace=True)

In [175]:
app_df.drop_duplicates(subset='App', inplace=True)

In [176]:
app_df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
8884,"""i DT"" Fútbol. Todos Somos Técnicos.",SPORTS,,27,3.6M,500+,Free,0,Everyone,Sports,"October 7, 2017",0.22,4.1 and up
8532,+Download 4 Instagram Twitter,SOCIAL,4.5,40467,22M,"1,000,000+",Free,0,Everyone,Social,"August 2, 2018",5.03,4.1 and up
324,- Free Comics - Comic Apps,COMICS,3.5,115,9.1M,"10,000+",Free,0,Mature 17+,Comics,"July 13, 2018",5.0.12,5.0 and up
4541,.R,TOOLS,4.5,259,203k,"10,000+",Free,0,Everyone,Tools,"September 16, 2014",1.1.06,1.5 and up
4636,/u/app,COMMUNICATION,4.7,573,53M,"10,000+",Free,0,Mature 17+,Communication,"July 3, 2018",4.2.4,4.1 and up


#### Now, let's create a new dataframe containing the average sentiment polarity and sentiment subjectivity for each app.

In [177]:
reviews_summary_df = pd.DataFrame(reviews_df.groupby('App').mean())
reviews_summary_df.head()

Unnamed: 0_level_0,Sentiment_Polarity,Sentiment_Subjectivity
App,Unnamed: 1_level_1,Unnamed: 2_level_1
10 Best Foods for You,0.470733,0.495455
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室,0.392405,0.545516
11st,0.181294,0.443957
1800 Contacts - Lens Store,0.318145,0.591098
1LINE – One Line with One Touch,0.19629,0.557315


In [178]:
reviews_summary_df.sort_values('App', inplace=True)
reviews_summary_df.head()

Unnamed: 0_level_0,Sentiment_Polarity,Sentiment_Subjectivity
App,Unnamed: 1_level_1,Unnamed: 2_level_1
10 Best Foods for You,0.470733,0.495455
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室,0.392405,0.545516
11st,0.181294,0.443957
1800 Contacts - Lens Store,0.318145,0.591098
1LINE – One Line with One Touch,0.19629,0.557315


#### Now, let's join both the app dataset and the reviews summary dataset using the App Names as Key.

In [179]:
app_review_df = pd.merge(app_df, reviews_summary_df, on='App', how='left')
app_review_df

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Sentiment_Polarity,Sentiment_Subjectivity
0,"""i DT"" Fútbol. Todos Somos Técnicos.",SPORTS,,27,3.6M,500+,Free,0,Everyone,Sports,"October 7, 2017",0.22,4.1 and up,,
1,+Download 4 Instagram Twitter,SOCIAL,4.5,40467,22M,"1,000,000+",Free,0,Everyone,Social,"August 2, 2018",5.03,4.1 and up,,
2,- Free Comics - Comic Apps,COMICS,3.5,115,9.1M,"10,000+",Free,0,Mature 17+,Comics,"July 13, 2018",5.0.12,5.0 and up,,
3,.R,TOOLS,4.5,259,203k,"10,000+",Free,0,Everyone,Tools,"September 16, 2014",1.1.06,1.5 and up,,
4,/u/app,COMMUNICATION,4.7,573,53M,"10,000+",Free,0,Mature 17+,Communication,"July 3, 2018",4.2.4,4.1 and up,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9655,"뽕티비 - 개인방송, 인터넷방송, BJ방송",VIDEO_PLAYERS,,414,59M,"100,000+",Free,0,Mature 17+,Video Players & Editors,"July 18, 2018",4.0.7,4.0.3 and up,,
9656,💎 I'm rich,LIFESTYLE,3.8,718,26M,"10,000+",Paid,$399.99,Everyone,Lifestyle,"March 11, 2018",1.0.0,4.4 and up,,
9657,"💘 WhatsLov: Smileys of love, stickers and GIF",SOCIAL,4.6,22098,18M,"1,000,000+",Free,0,Everyone,Social,"July 24, 2018",4.2.4,4.0.3 and up,,
9658,📏 Smart Ruler ↔️ cm/inch measuring for homework!,TOOLS,4.0,19,3.2M,"10,000+",Free,0,Everyone,Tools,"October 21, 2017",1.0,4.2 and up,,


#### Now, let's check for any null values.

In [180]:
app_review_df['Sentiment_Polarity'].isnull()

0       True
1       True
2       True
3       True
4       True
        ... 
9655    True
9656    True
9657    True
9658    True
9659    True
Name: Sentiment_Polarity, Length: 9660, dtype: bool

#### Based on the results, we have columns which have NULL (NaN) in them.

#### Let's choose the features based on which we want to recommend customers. For the selected features, we will be replacing NaNs with "".

In [207]:
selections = ['App','Category','Rating','Type','Genres','Sentiment_Polarity','Sentiment_Subjectivity']

In [208]:
app_review_df.fillna('', inplace=True) 

In [246]:
app_review_df[624:627]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Sentiment_Polarity,Sentiment_Subjectivity,comb_features
624,Alarm Clock,TOOLS,4.3,114788,Varies with device,"5,000,000+",Free,0,Everyone,Tools,"January 22, 2018",Varies with device,Varies with device,0.174667,0.371872,Alarm Clock TOOLS 4.3 Free Tools 0.17466728294...
625,Alarm Clock Free,TOOLS,4.0,59973,11M,"10,000,000+",Free,0,Everyone,Tools,"March 16, 2018",1.2.5,4.0 and up,0.155762,0.35684,Alarm Clock Free TOOLS 4.0 Free Tools 0.155761...
626,Alarm Clock Plus★,LIFESTYLE,4.4,155693,Varies with device,"5,000,000+",Free,0,Everyone,Lifestyle,"September 30, 2014",Varies with device,Varies with device,,,Alarm Clock Plus★ LIFESTYLE 4.4 Free Lifestyle


#### None of the features now contains NaNs.

#### Now, let's create a new dataframe which will contain the combined featres values.

In [218]:
app_review_df['comb_features'] = app_review_df.apply(comb_selection, axis=1)

In [247]:
app_review_df[624:627]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Sentiment_Polarity,Sentiment_Subjectivity,comb_features
624,Alarm Clock,TOOLS,4.3,114788,Varies with device,"5,000,000+",Free,0,Everyone,Tools,"January 22, 2018",Varies with device,Varies with device,0.174667,0.371872,Alarm Clock TOOLS 4.3 Free Tools 0.17466728294...
625,Alarm Clock Free,TOOLS,4.0,59973,11M,"10,000,000+",Free,0,Everyone,Tools,"March 16, 2018",1.2.5,4.0 and up,0.155762,0.35684,Alarm Clock Free TOOLS 4.0 Free Tools 0.155761...
626,Alarm Clock Plus★,LIFESTYLE,4.4,155693,Varies with device,"5,000,000+",Free,0,Everyone,Lifestyle,"September 30, 2014",Varies with device,Varies with device,,,Alarm Clock Plus★ LIFESTYLE 4.4 Free Lifestyle


#### Now, let's convert the new combined feature created to a count matrix.

In [220]:
count = CountVectorizer()
count_matrix = count.fit_transform(app_review_df['comb_features'])

In [221]:
count_matrix

<9660x10337 sparse matrix of type '<class 'numpy.int64'>'
	with 61465 stored elements in Compressed Sparse Row format>

#### Now, let's create the cosine similarity matrix from the count_matrix that we have created.

In [222]:
cos_similarity = cosine_similarity(count_matrix)

#### Request user for the app that they want to check for similarity

In [251]:
app_user_likes = input('Please enter the name of the app: ')
app_index = get_index_from_title(app_user_likes)
similar_apps = list(enumerate(cos_similarity[app_index]))

Please enter the name of the app: 3D Bowling


In [252]:
sorted_similar_apps = sorted(similar_apps,key=lambda x:x[1],reverse=True)[1:]

#### Display the top 5 similar apps.

In [253]:
i=0
print('Top 5 similar apps to '+app_user_likes+' are:\n')
for element in sorted_similar_apps:
    print(get_title_from_index(element[0]))
    i=i+1
    if i>5:
        break

Top 5 similar apps to 3D Bowling are:

3D Tennis
Beach Volleyball 3D
Bike 3D Configurator
Hunting Safari 3D
PBA® Bowling Challenge
Wrestling Revolution 3D


### Additional Functions Declaration.

#### Declaring the function for combining the features selected for recommendation to a single value feature. 

In [215]:
def comb_selection(record):
    comb_record = record['App']+' '+record['Category']+' '+ str(record['Rating'])+' '+record['Type']+' '+record['Genres']+    ' '+str(record['Sentiment_Polarity'])+' '+str(record['Sentiment_Subjectivity'])
    return comb_record

#### Declaring the function for fetching the index of the entered app.

In [216]:
def get_title_from_index(index):
    return app_review_df[app_review_df.index == index]['App'].values[0]

#### Declaring the function for fetching the title of the similar apps.

In [217]:
def get_index_from_title(title):
    return app_review_df[app_review_df.App == title].index.values[0]