
# <p style="padding:10px;background-color:#8B0000;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">" Composite Score Statistical Analysis: Mobile App Performance"</p>

 

### <p style="padding:10px;background-color:#0f5c59;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 5px 20px;overflow:hidden;font-weight:500">Problem Statement</p>

#### Problem Statement:
The endeavor of this project is to harness a composite score statistical model to quantify the performance of mobile applications. Leveraging key indicators such as ratings, reviews, and downloads, the model endeavors to identify the top and bottom 100 apps, providing a clear metric of app success in the marketplace.

#### Methodology

- **Data Cleaning & Preprocessing**: Address missing values, standardize file sizes, and clean data to ensure consistency and reliability.
- **Feature Engineering**: Develop new features that may influence app performance and categorize them accordingly for in-depth analysis.
- **Statistical Modeling**: Use a composite score model, weighing different performance metrics to quantify app success.
- **Normalization**: Normalize each parameter's values to a 0-1 scale for a balanced comparison.
- **Composite Score Calculation**: Multiply normalized values by predetermined weights and sum them to obtain a composite score.
- **Ranking**: Sort apps in descending order based on the composite score to identify the top and bottom performers.
- **Analysis**: Conduct a detailed analysis of the top and bottom 100 apps to draw insights into the attributes that affect mobile app performance.

# Data Pre-processing and Feature Enginerring

In [1]:
import pandas as pd
import numpy as np
import seaborn as sn
import matplotlib.pyplot as plt

In [2]:
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)

In [3]:
df = pd.read_excel('IIM_Kashipur_Dataset_Application.xlsx')


In [4]:
df.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [5]:
df.shape

(10854, 12)

In [6]:
df.isna().sum()

Application            14
Ratings              1487
Reviews                13
File_Size              12
Downloads              13
Category               14
Purchase_Price         13
Content Rating         14
Genres                 13
Latest Updates         13
App_Version            21
Android_Supported      16
dtype: int64

In [7]:
#Removed duplicates
df = df.drop_duplicates()
df.shape

(10354, 12)

In [8]:
df.isna().sum()

Application             3
Ratings              1467
Reviews                 2
File_Size               1
Downloads               2
Category                3
Purchase_Price          2
Content Rating          3
Genres                  2
Latest Updates          2
App_Version            10
Android_Supported       5
dtype: int64

In [9]:
df.columns

Index(['Application', 'Ratings', 'Reviews', 'File_Size', 'Downloads',
       'Category', 'Purchase_Price', 'Content Rating', 'Genres',
       'Latest Updates', 'App_Version', 'Android_Supported'],
      dtype='object')

In [10]:
df.dtypes

Application           object
Ratings              float64
Reviews               object
File_Size             object
Downloads             object
Category              object
Purchase_Price        object
Content Rating        object
Genres                object
Latest Updates        object
App_Version           object
Android_Supported     object
dtype: object

# Missing value treatment and Data cleaning

In [11]:
df['Ratings'].fillna(0, inplace=True)
df['Reviews'].fillna(0, inplace=True)
df['File_Size'].fillna(0, inplace=True)
df['Downloads'].fillna(0, inplace=True)


In [12]:
df.isna().sum()

Application           3
Ratings               0
Reviews               0
File_Size             0
Downloads             0
Category              3
Purchase_Price        2
Content Rating        3
Genres                2
Latest Updates        2
App_Version          10
Android_Supported     5
dtype: int64

In [13]:
df.dropna(subset=['Application'], inplace=True)
df.isna().sum()

Application          0
Ratings              0
Reviews              0
File_Size            0
Downloads            0
Category             1
Purchase_Price       0
Content Rating       1
Genres               0
Latest Updates       0
App_Version          8
Android_Supported    3
dtype: int64

In [14]:
df1 = df.copy(deep = True)

In [15]:
df1.isna().sum()

Application          0
Ratings              0
Reviews              0
File_Size            0
Downloads            0
Category             1
Purchase_Price       0
Content Rating       1
Genres               0
Latest Updates       0
App_Version          8
Android_Supported    3
dtype: int64

In [16]:
df1.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [17]:
df1.shape

(10351, 12)

In [18]:
df1 = df1[df1['Latest Updates'] != '1.0.19']
df1.shape

(10350, 12)

**Coverting KB to MB**

In [19]:
# Define a function to convert sizes to megabytes
def convert_to_mb(size_str):
    if size_str.endswith('k'):
        return round(float(size_str[:-1]) / 1024, 3)
    elif size_str.endswith('M'):
        return float(size_str[:-1])

df1['File_Size_MB'] = df1['File_Size'].apply(convert_to_mb)

df1.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8


In [20]:
df1.describe()

Unnamed: 0,Ratings,File_Size_MB
count,10350.0,8825.0
mean,3.594976,21.28269
std,1.538004,22.540666
min,0.0,0.008
25%,3.7,4.7
50%,4.2,13.0
75%,4.5,29.0
max,5.0,100.0


In [21]:
df1['Downloads_Cleaned'] = df1['Downloads'].str.replace('+', '', regex=False)
df1.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0,10000
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0,500000
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7,5000000
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0,50000000
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8,100000


In [22]:
df1.dtypes

Application           object
Ratings              float64
Reviews               object
File_Size             object
Downloads             object
Category              object
Purchase_Price        object
Content Rating        object
Genres                object
Latest Updates        object
App_Version           object
Android_Supported     object
File_Size_MB         float64
Downloads_Cleaned     object
dtype: object

**Date Conversion**

In [23]:
# Convert 'Latest Updates' to datetime
df1['Latest Updates Date Format'] = pd.to_datetime(df1['Latest Updates'])

# Find the maximum and minimum dates
max_date = df1['Latest Updates Date Format'].max()
min_date = df1['Latest Updates Date Format'].min()

max_date, min_date

(Timestamp('2018-08-08 00:00:00'), Timestamp('2010-05-21 00:00:00'))

In [24]:
df1.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0,10000,2018-01-07
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0,500000,2018-01-15
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7,5000000,2018-08-01
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0,50000000,2018-06-08
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8,100000,2018-06-20


In [25]:
from datetime import datetime

# Reference date
reference_date = datetime(2018, 8, 31)

# Calculate the difference in months
df1['Months From Last Update'] = df1['Latest Updates Date Format'].apply(lambda x: (reference_date.year - x.year) * 12 + reference_date.month - x.month)

df1.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0,10000,2018-01-07,7
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0,500000,2018-01-15,7
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7,5000000,2018-08-01,0
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0,50000000,2018-06-08,2
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8,100000,2018-06-20,2


In [26]:

# Function to convert the 'Downloads_Cleaned' column to integer
def convert_to_int(x):
    try:
        return int(x.replace(',', ''))
    except:
        return 0

# Apply the function to the 'Downloads_Cleaned' column
df1['Downloads_Cleaned'] = df1['Downloads_Cleaned'].apply(convert_to_int)
df1['Downloads_Cleaned'] = df1['Downloads_Cleaned'].astype(int)

In [27]:
df1.dtypes

Application                           object
Ratings                              float64
Reviews                               object
File_Size                             object
Downloads                             object
Category                              object
Purchase_Price                        object
Content Rating                        object
Genres                                object
Latest Updates                        object
App_Version                           object
Android_Supported                     object
File_Size_MB                         float64
Downloads_Cleaned                      int32
Latest Updates Date Format    datetime64[ns]
Months From Last Update                int64
dtype: object

In [28]:
df1.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0,10000,2018-01-07,7
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0,500000,2018-01-15,7
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7,5000000,2018-08-01,0
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0,50000000,2018-06-08,2
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8,100000,2018-06-20,2


# Creating a New Normalized target feature

In [29]:
df2=df1.copy(deep=True)

In [30]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Assuming df1 is your DataFrame with columns 'Ratings', 'Reviews', and 'Downloads_Cleaned'

# Initialize the MinMaxScaler
scaler = MinMaxScaler()

# Scale the specified columns
df2[['Ratings_N', 'Reviews_N', 'Downloads_Cleaned_N']] = scaler.fit_transform(df2[['Ratings', 'Reviews', 'Downloads_Cleaned']])

# Calculate the product of the three columns and create a new column
df2['Normalized Score'] = df2['Ratings_N'] + df2['Reviews_N'] + df2['Downloads_Cleaned_N']


In [31]:
df2.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
0,Photo Editor & Candy Camera & Grid & ScrapBook,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0,10000,2018-01-07,7,0.82,2e-06,1e-05,0.820012
1,Coloring book moana,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0,500000,2018-01-15,7,0.78,1.2e-05,0.0005,0.780512
2,"U Launcher Lite â€“ FREE Live Cool Themes, Hid...",4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7,5000000,2018-08-01,0,0.94,0.00112,0.005,0.94612
3,Sketch - Draw & Paint,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0,50000000,2018-06-08,2,0.9,0.002759,0.05,0.952759
4,Pixel Draw - Number Art Coloring Book,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8,100000,2018-06-20,2,0.86,1.2e-05,0.0001,0.860112


In [32]:
# Sorting the DataFrame in descending order by 'Normalized Score'
df2_sorted = df2.sort_values(by='Normalized Score', ascending=False)
df2_sorted.head(20)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
2544,Facebook,4.1,78158306,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.82,1.0,1.0,2.82
3943,Facebook,4.1,78128208,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.82,0.999615,1.0,2.819615
336,WhatsApp Messenger,4.4,69119316,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.88,0.88435,1.0,2.76435
3904,WhatsApp Messenger,4.4,69109672,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.88,0.884227,1.0,2.764227
2604,Instagram,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,,1000000000,2018-07-31,1,0.9,0.851828,1.0,2.751828
2545,Instagram,4.5,66577313,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,,1000000000,2018-07-31,1,0.9,0.851826,1.0,2.751826
3909,Instagram,4.5,66509917,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,,1000000000,2018-07-31,1,0.9,0.850964,1.0,2.750964
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,,1000000000,2018-08-01,0,0.8,0.724767,1.0,2.524767
335,Messenger â€“ Text and Video Chat for Free,4.0,56642847,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,,1000000000,2018-08-01,0,0.8,0.724719,1.0,2.524719
1872,Subway Surfers,4.5,27725352,76M,"1,000,000,000+",Free,0,Everyone 10+,Arcade,"July 12, 2018",1.90.0,4.1 and up,76.0,1000000000,2018-07-12,1,0.9,0.354733,1.0,2.254733


In [33]:
df2.shape

(10350, 20)

In [34]:
df2_sorted['Reviews'] = df2_sorted['Reviews'].astype(int)

**Removing other Dusplicate entries with same app name**

In [35]:
# Grouping by 'Application' and getting the row with the maximum 'Reviews' only by removing other duplicate entries
unique_apps_df = df2_sorted.loc[df2_sorted.groupby('Application')['Reviews'].idxmax()]

unique_apps_df.head()

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
8884,"""i DT"" FÃºtbol. Todos Somos TÃ©cnicos.",0.0,27,3.6M,500+,Free,0,Everyone,Sports,"October 7, 2017",0.22,4.1 and up,3.6,500,2017-10-07,10,0.0,3.454527e-07,5e-07,8.454527e-07
8532,+Download 4 Instagram Twitter,4.5,40467,22M,"1,000,000+",Free,0,Everyone,Social,"August 2, 2018",5.03,4.1 and up,22.0,1000000,2018-08-02,0,0.9,0.0005177569,0.001,0.9015178
4541,.R,4.5,259,203k,"10,000+",Free,0,Everyone,Tools,"September 16, 2014",1.1.06,1.5 and up,0.198,10000,2014-09-16,47,0.9,3.313787e-06,1e-05,0.9000133
4636,/u/app,4.7,573,53M,"10,000+",Free,0,Mature 17+,Communication,"July 3, 2018",4.2.4,4.1 and up,53.0,10000,2018-07-03,1,0.94,7.331275e-06,1e-05,0.9400173
5940,058.ba,4.4,27,14M,100+,Free,0,Everyone,News & Magazines,"July 6, 2018",1,4.2 and up,14.0,100,2018-07-06,1,0.88,3.454527e-07,1e-07,0.8800004


In [36]:
unique_apps_df = unique_apps_df.sort_values(by='Normalized Score', ascending=False)
unique_apps_df.head(20)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
2544,Facebook,4.1,78158306,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.82,1.0,1.0,2.82
336,WhatsApp Messenger,4.4,69119316,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.88,0.88435,1.0,2.76435
2604,Instagram,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,,1000000000,2018-07-31,1,0.9,0.851828,1.0,2.751828
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,,1000000000,2018-08-01,0,0.8,0.724767,1.0,2.524767
1872,Subway Surfers,4.5,27725352,76M,"1,000,000,000+",Free,0,Everyone 10+,Arcade,"July 12, 2018",1.90.0,4.1 and up,76.0,1000000000,2018-07-12,1,0.9,0.354733,1.0,2.254733
3665,YouTube,4.3,25655305,Varies with device,"1,000,000,000+",Free,0,Teen,Video Players & Editors,"August 2, 2018",Varies with device,Varies with device,,1000000000,2018-08-02,0,0.86,0.328248,1.0,2.188248
2884,Google Photos,4.5,10859051,Varies with device,"1,000,000,000+",Free,0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,,1000000000,2018-08-06,0,0.9,0.138937,1.0,2.038937
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Varies with device,"500,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,,500000000,2018-08-03,0,0.94,0.549097,0.5,1.989097
411,Google Chrome: Fast & Secure,4.3,9643041,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,,1000000000,2018-08-01,0,0.86,0.123378,1.0,1.983378
3234,Google,4.4,8033493,Varies with device,"1,000,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.88,0.102785,1.0,1.982785


In [37]:
unique_apps_df.tail(20)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
8509,Dr D K Olukoya,0.0,0,3.3M,1+,Free,0,Teen,Lifestyle,"July 25, 2018",1,4.1 and up,3.3,1,2018-07-25,1,0.0,0.0,1e-09,1e-09
6624,BP Fitness Lead Scanner,0.0,0,6.7M,1+,Paid,$109.99,Everyone,Events,"August 23, 2017",0.0.1,4.1 and up,6.7,1,2017-08-23,12,0.0,0.0,1e-09,1e-09
8612,DN Snacks,0.0,0,7.6M,1+,Free,0,Everyone,Business,"April 9, 2018",1.0.0,4.0 and up,7.6,1,2018-04-09,4,0.0,0.0,1e-09,1e-09
10101,Amleen Ey,0.0,0,19M,1+,Free,0,Everyone,Social,"July 24, 2017",1,4.1 and up,19.0,1,2017-07-24,13,0.0,0.0,1e-09,1e-09
6277,Bi-Tank Ads Free,0.0,0,Varies with device,1+,Paid,$0.99,Everyone,Arcade,"July 31, 2017",1.1,4.1 and up,,1,2017-07-31,13,0.0,0.0,1e-09,1e-09
5307,Ak Parti YardÄ±m Toplama,0.0,0,8.7M,0+,Paid,$13.99,Teen,Social,"July 28, 2017",3.4.4.3.3,4.1 and up,8.7,0,2017-07-28,13,0.0,0.0,0.0,0.0
5945,Ain Arabic Kids Alif Ba ta,0.0,0,33M,0+,Paid,$2.99,Everyone,Education,"April 15, 2016",1,3.0 and up,33.0,0,2016-04-15,28,0.0,0.0,0.0,0.0
8081,CX Network,0.0,0,10M,0+,Free,0,Everyone,Business,"August 6, 2018",1.3.1,4.1 and up,10.0,0,2018-08-06,0,0.0,0.0,0.0,0.0
4465,Popsicle Launcher for Android P 9.0 launcher,0.0,0,5.5M,0+,Paid,$1.49,Everyone,Personalization,"July 11, 2018",1.1,4.2 and up,5.5,0,2018-07-11,1,0.0,0.0,0.0,0.0
9337,EG | Explore Folegandros,0.0,0,56M,0+,Paid,$3.99,Everyone,Travel & Local,"January 22, 2017",1.1.1,4.1 and up,56.0,0,2017-01-22,19,0.0,0.0,0.0,0.0


In [38]:
unique_apps_df.isna().sum()

Application                      0
Ratings                          0
Reviews                          0
File_Size                        0
Downloads                        0
Category                         1
Purchase_Price                   0
Content Rating                   0
Genres                           0
Latest Updates                   0
App_Version                      8
Android_Supported                2
File_Size_MB                  1228
Downloads_Cleaned                0
Latest Updates Date Format       0
Months From Last Update          0
Ratings_N                        0
Reviews_N                        0
Downloads_Cleaned_N              0
Normalized Score                 0
dtype: int64

In [39]:
unique_apps_df[unique_apps_df['File_Size_MB'].isna()].head(10)


Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
2544,Facebook,4.1,78158306,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.82,1.0,1.0,2.82
336,WhatsApp Messenger,4.4,69119316,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.88,0.88435,1.0,2.76435
2604,Instagram,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,,1000000000,2018-07-31,1,0.9,0.851828,1.0,2.751828
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,,1000000000,2018-08-01,0,0.8,0.724767,1.0,2.524767
3665,YouTube,4.3,25655305,Varies with device,"1,000,000,000+",Free,0,Teen,Video Players & Editors,"August 2, 2018",Varies with device,Varies with device,,1000000000,2018-08-02,0,0.86,0.328248,1.0,2.188248
2884,Google Photos,4.5,10859051,Varies with device,"1,000,000,000+",Free,0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,,1000000000,2018-08-06,0,0.9,0.138937,1.0,2.038937
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Varies with device,"500,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,,500000000,2018-08-03,0,0.94,0.549097,0.5,1.989097
411,Google Chrome: Fast & Secure,4.3,9643041,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,,1000000000,2018-08-01,0,0.86,0.123378,1.0,1.983378
3234,Google,4.4,8033493,Varies with device,"1,000,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,,1000000000,2018-08-03,0,0.88,0.102785,1.0,1.982785
3223,Maps - Navigate & Explore,4.3,9235373,Varies with device,"1,000,000,000+",Free,0,Everyone,Travel & Local,"July 31, 2018",Varies with device,Varies with device,,1000000000,2018-07-31,1,0.86,0.118162,1.0,1.978162


In [40]:
unique_apps_df['File_Size_MB'].fillna('Varies with device', inplace=True)


In [41]:
unique_apps_df.shape

(9658, 20)

# Dataframe after removing the application with missing ratings

In [42]:
unique_apps_df_without_missing_ratings= unique_apps_df[unique_apps_df['Ratings'] != 0]

In [43]:
unique_apps_df_without_missing_ratings.shape

(8195, 20)

In [44]:
unique_apps_df_without_missing_ratings.dtypes

Application                           object
Ratings                              float64
Reviews                                int32
File_Size                             object
Downloads                             object
Category                              object
Purchase_Price                        object
Content Rating                        object
Genres                                object
Latest Updates                        object
App_Version                           object
Android_Supported                     object
File_Size_MB                          object
Downloads_Cleaned                      int32
Latest Updates Date Format    datetime64[ns]
Months From Last Update                int64
Ratings_N                            float64
Reviews_N                            float64
Downloads_Cleaned_N                  float64
Normalized Score                     float64
dtype: object

In [45]:
unique_apps_df_without_missing_ratings.head(100)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
2544,Facebook,4.1,78158306,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.82,1.0,1.0,2.82
336,WhatsApp Messenger,4.4,69119316,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.88,0.88435,1.0,2.76435
2604,Instagram,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-07-31,1,0.9,0.851828,1.0,2.751828
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-01,0,0.8,0.724767,1.0,2.524767
1872,Subway Surfers,4.5,27725352,76M,"1,000,000,000+",Free,0,Everyone 10+,Arcade,"July 12, 2018",1.90.0,4.1 and up,76.0,1000000000,2018-07-12,1,0.9,0.354733,1.0,2.254733
3665,YouTube,4.3,25655305,Varies with device,"1,000,000,000+",Free,0,Teen,Video Players & Editors,"August 2, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-02,0,0.86,0.328248,1.0,2.188248
2884,Google Photos,4.5,10859051,Varies with device,"1,000,000,000+",Free,0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-06,0,0.9,0.138937,1.0,2.038937
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Varies with device,"500,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,Varies with device,500000000,2018-08-03,0,0.94,0.549097,0.5,1.989097
411,Google Chrome: Fast & Secure,4.3,9643041,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-01,0,0.86,0.123378,1.0,1.983378
3234,Google,4.4,8033493,Varies with device,"1,000,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.88,0.102785,1.0,1.982785


In [46]:
unique_apps_df_without_missing_ratings.tail(50)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
7416,CJ Fallon eBook Reader,1.9,76,14M,"5,000+",Free,0,Everyone,Education,"February 13, 2014",2.2.1,3.0 and up,14.0,5000,2014-02-13,54,0.38,9.723855e-07,5e-06,0.380006
4906,Kenmore Smart AC,1.9,71,24M,"5,000+",Free,0,Everyone,Lifestyle,"May 8, 2017",1.0.11,4.4 and up,24.0,5000,2017-05-08,15,0.38,9.084127e-07,5e-06,0.380006
4995,ESS Mobile for AE,1.9,63,1.8M,"5,000+",Free,0,Everyone,Productivity,"September 27, 2016",2.0.1027,3.0 and up,1.8,5000,2016-09-27,23,0.38,8.060564e-07,5e-06,0.380006
9890,TAXLANDIA,1.9,141,65M,"1,000+",Free,0,Everyone,Educational,"March 6, 2018",1.8.2,4.3 and up,65.0,1000,2018-03-06,5,0.38,1.804031e-06,1e-06,0.380003
6685,BR Chat Bot,1.9,16,2.6M,"1,000+",Free,0,Everyone,Social,"June 30, 2016",1,4.0 and up,2.6,1000,2016-06-30,26,0.38,2.047127e-07,1e-06,0.380001
7633,Co Checker,1.9,12,2.0M,"1,000+",Free,0,Everyone,Tools,"October 20, 2017",1.8.6,4.3 and up,2.0,1000,2017-10-20,10,0.38,1.535345e-07,1e-06,0.380001
5941,Loreal - BA Makeup,1.9,8,6.7M,"1,000+",Free,0,Everyone,Tools,"April 18, 2018",1.1.1,4.0.3 and up,6.7,1000,2018-04-18,4,0.38,1.023564e-07,1e-06,0.380001
4904,Smart-AC Universal Remote Free,1.8,3270,1.8M,"500,000+",Free,0,Everyone,Entertainment,"August 18, 2015",1,2.2 and up,1.8,500000,2015-08-18,36,0.36,4.183816e-05,0.0005,0.360542
4522,Q-Tech Companion App,1.8,17,3.1M,"10,000+",Free,0,Everyone,Food & Drink,"March 8, 2017",1.6.3,4.4 and up,3.1,10000,2017-03-08,17,0.36,2.175073e-07,1e-05,0.36001
9800,Furrion ES Control,1.8,26,2.7M,"5,000+",Free,0,Everyone,Video Players & Editors,"September 29, 2016",1,4.2 and up,2.7,5000,2016-09-29,23,0.36,3.326582e-07,5e-06,0.360005


In [47]:
unique_apps_df_without_missing_ratings.describe()

Unnamed: 0,Ratings,Reviews,Downloads_Cleaned,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score
count,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0
mean,4.17335,255531.9,9190562.0,9.299573,0.83467,0.003269415,0.009190562,0.84713
std,0.536234,1985828.0,58266200.0,13.527404,0.107247,0.02540777,0.0582662,0.134881
min,1.0,1.0,1.0,0.0,0.2,1.279455e-08,1e-09,0.2
25%,4.0,127.0,10000.0,1.0,0.8,1.624907e-06,1e-05,0.800052
50%,4.3,3017.0,100000.0,3.0,0.86,3.860114e-05,0.0001,0.860197
75%,4.5,43993.5,1000000.0,12.0,0.9,0.0005628768,0.001,0.915839
max,5.0,78158310.0,1000000000.0,99.0,1.0,1.0,1.0,2.82





### <p style="padding:10px;background-color:#0f5c59;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 5px 20px;overflow:hidden;font-weight:500">Composite Score Model</p>

In [48]:
composite_score_df=unique_apps_df_without_missing_ratings.copy(deep=True)

In [49]:
# Calculate the product of the three columns and create a new column
composite_score_df['Composite Score'] = 0.35*composite_score_df['Ratings_N'] + 0.3*composite_score_df['Reviews_N'] + 0.35*composite_score_df['Downloads_Cleaned_N']

In [50]:
composite_score_df.head(100)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score,Composite Score
2544,Facebook,4.1,78158306,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.82,1.0,1.0,2.82,0.937
336,WhatsApp Messenger,4.4,69119316,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.88,0.88435,1.0,2.76435,0.923305
2604,Instagram,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-07-31,1,0.9,0.851828,1.0,2.751828,0.920548
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-01,0,0.8,0.724767,1.0,2.524767,0.84743
1872,Subway Surfers,4.5,27725352,76M,"1,000,000,000+",Free,0,Everyone 10+,Arcade,"July 12, 2018",1.90.0,4.1 and up,76.0,1000000000,2018-07-12,1,0.9,0.354733,1.0,2.254733,0.77142
3665,YouTube,4.3,25655305,Varies with device,"1,000,000,000+",Free,0,Teen,Video Players & Editors,"August 2, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-02,0,0.86,0.328248,1.0,2.188248,0.749474
2884,Google Photos,4.5,10859051,Varies with device,"1,000,000,000+",Free,0,Everyone,Photography,"August 6, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-06,0,0.9,0.138937,1.0,2.038937,0.706681
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Varies with device,"500,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,Varies with device,500000000,2018-08-03,0,0.94,0.549097,0.5,1.989097,0.668729
411,Google Chrome: Fast & Secure,4.3,9643041,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-01,0,0.86,0.123378,1.0,1.983378,0.688013
3234,Google,4.4,8033493,Varies with device,"1,000,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.88,0.102785,1.0,1.982785,0.688835


In [51]:
composite_score_df.tail(20)

Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score,Composite Score
6008,BD Provider App,1.4,45,13M,"5,000+",Free,0,Everyone,Health & Fitness,"September 12, 2016",1,4.1 and up,13.0,5000,2016-09-12,23,0.28,5.757545e-07,5e-06,0.280006,0.098002
8258,Quiz DC,1.4,33,3.1M,"1,000+",Free,0,Everyone,Trivia,"December 25, 2017",1.1,4.1 and up,3.1,1000,2017-12-25,8,0.28,4.2222e-07,1e-06,0.280001,0.098
7677,Hercules CP Mobile,1.4,20,4.1M,"1,000+",Free,0,Everyone,Tools,"July 6, 2018",2.1.5,4.3 and up,4.1,1000,2018-07-06,1,0.28,2.558909e-07,1e-06,0.280001,0.098
10082,EY TaxChat,1.2,44,27M,"1,000+",Free,0,Everyone,Finance,"April 9, 2018",3.24.0,6.0 and up,27.0,1000,2018-04-09,4,0.24,5.6296e-07,1e-06,0.240002,0.084001
7926,Tech CU Card Manager,1.0,2,7.2M,"1,000+",Free,0,Everyone,Finance,"July 25, 2017",1.0.1,4.0 and up,7.2,1000,2017-07-25,13,0.2,2.558909e-08,1e-06,0.200001,0.07
10324,FE Mechanical Engineering Prep,1.0,2,21M,"1,000+",Free,0,Everyone,Education,"July 27, 2018",5.33.3669,5.0 and up,21.0,1000,2018-07-27,1,0.2,2.558909e-08,1e-06,0.200001,0.07
6319,BJ Bridge Standard American 2018,1.0,1,4.9M,"1,000+",Free,0,Everyone,Card,"May 21, 2018",6.2-sayc,4.0 and up,4.9,1000,2018-05-21,3,0.2,1.279455e-08,1e-06,0.200001,0.07
7144,CB Mobile Biz,1.0,3,8.4M,500+,Free,0,Everyone,Finance,"February 22, 2016",4.4.1255,4.0 and up,8.4,500,2016-02-22,30,0.2,3.838364e-08,5e-07,0.200001,0.07
10591,Lottery Ticket Checker - Florida Results & Lotto,1.0,3,41M,500+,Free,0,Everyone,Tools,"December 12, 2017",1,4.2 and up,41.0,500,2017-12-12,8,0.2,3.838364e-08,5e-07,0.200001,0.07
8820,DS Creator 2.0,1.0,2,4.4M,500+,Free,0,Everyone,Tools,"March 23, 2018",2.0.180226.1,4.0 and up,4.4,500,2018-03-23,5,0.2,2.558909e-08,5e-07,0.200001,0.07


In [52]:
composite_score_df.describe()

Unnamed: 0,Ratings,Reviews,Downloads_Cleaned,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score,Composite Score
count,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0
mean,4.17335,255531.9,9190562.0,9.299573,0.83467,0.003269415,0.009190562,0.84713,0.296332
std,0.536234,1985828.0,58266200.0,13.527404,0.107247,0.02540777,0.0582662,0.134881,0.046584
min,1.0,1.0,1.0,0.0,0.2,1.279455e-08,1e-09,0.2,0.07
25%,4.0,127.0,10000.0,1.0,0.8,1.624907e-06,1e-05,0.800052,0.280018
50%,4.3,3017.0,100000.0,3.0,0.86,3.860114e-05,0.0001,0.860197,0.301064
75%,4.5,43993.5,1000000.0,12.0,0.9,0.0005628768,0.001,0.915839,0.320246
max,5.0,78158310.0,1000000000.0,99.0,1.0,1.0,1.0,2.82,0.937


In [53]:
composite_score_df_1 =composite_score_df.copy(deep=True)

In [54]:
# Pprocess the 'Genres' column
composite_score_df_1['New_Genre'] = composite_score_df_1['Genres'].apply(lambda x: x.split(';')[0] if ';' in x else x)
composite_score_df_1.head()


Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score,Composite Score,New_Genre
2544,Facebook,4.1,78158306,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.82,1.0,1.0,2.82,0.937,Social
336,WhatsApp Messenger,4.4,69119316,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 3, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-03,0,0.88,0.88435,1.0,2.76435,0.923305,Communication
2604,Instagram,4.5,66577446,Varies with device,"1,000,000,000+",Free,0,Teen,Social,"July 31, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-07-31,1,0.9,0.851828,1.0,2.751828,0.920548,Social
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Varies with device,"1,000,000,000+",Free,0,Everyone,Communication,"August 1, 2018",Varies with device,Varies with device,Varies with device,1000000000,2018-08-01,0,0.8,0.724767,1.0,2.524767,0.84743,Communication
1872,Subway Surfers,4.5,27725352,76M,"1,000,000,000+",Free,0,Everyone 10+,Arcade,"July 12, 2018",1.90.0,4.1 and up,76.0,1000000000,2018-07-12,1,0.9,0.354733,1.0,2.254733,0.77142,Arcade


In [55]:
composite_score_df_1.tail(100)


Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score,Composite Score,New_Genre
10043,Remote EX for NISSAN,2.3,223,1.0M,"5,000+",Paid,$1.49,Everyone,Communication,"July 26, 2014",1.3,3.0 and up,1.0,5000,2014-07-26,49,0.46,2.853184e-06,5e-06,0.460008,0.161003,Communication
7978,CV-RECORD Pro,2.3,42,6.0M,"1,000+",Paid,$0.99,Everyone,Communication,"January 12, 2017",1.0.5,4.1 and up,6.0,1000,2017-01-12,19,0.46,5.373709e-07,1e-06,0.460002,0.161001,Communication
9283,EF Go Ahead Tour Companion,2.3,23,43M,"1,000+",Free,0,Everyone,Travel & Local,"June 18, 2018",3.0.7,4.4 and up,43.0,1000,2018-06-18,2,0.46,2.942745e-07,1e-06,0.460001,0.161,Travel & Local
2403,5 Minute Veterinary Consult: Canine & Feline,2.3,17,6.8M,"1,000+",Free,0,Everyone,Medical,"March 29, 2018",2.6,4.2 and up,6.8,1000,2018-03-29,5,0.46,2.175073e-07,1e-06,0.460001,0.161,Medical
7789,CR Family,2.3,10,5.2M,"1,000+",Free,0,Everyone,Education,"February 8, 2017",1.0.0,4.1 and up,5.2,1000,2017-02-08,18,0.46,1.279455e-07,1e-06,0.460001,0.161,Education
8478,DK Eyewitness Audio Walks,2.3,9,70M,"1,000+",Free,0,Everyone,Travel & Local,"April 6, 2018",1.0.1.8,4.2 and up,70.0,1000,2018-04-06,4,0.46,1.151509e-07,1e-06,0.460001,0.161,Travel & Local
6640,Bonoâ€™s Pit Bar-B-Q,2.3,3,36M,"1,000+",Free,0,Everyone 10+,Entertainment,"January 23, 2018",2.4,2.3.3 and up,36.0,1000,2018-01-23,7,0.46,3.838364e-08,1e-06,0.460001,0.161,Entertainment
6961,Benefit Extras Mobile App,2.3,3,2.4M,"1,000+",Free,0,Everyone,Health & Fitness,"July 18, 2018",6,5.0 and up,2.4,1000,2018-07-18,1,0.46,3.838364e-08,1e-06,0.460001,0.161,Health & Fitness
5102,Ag PhD Soils,2.3,3,16M,"1,000+",Free,0,Everyone,Productivity,"July 16, 2018",1.24.1,4.4 and up,16.0,1000,2018-07-16,1,0.46,3.838364e-08,1e-06,0.460001,0.161,Productivity
9711,EP Lab Digest,2.3,11,12M,500+,Free,0,Everyone,Medical,"July 12, 2018",1000000.6,4.1 and up,12.0,500,2018-07-12,1,0.46,1.4074e-07,5e-07,0.460001,0.161,Medical


In [56]:
composite_score_df_1['New_Genre'].value_counts()

Tools                      720
Entertainment              502
Education                  498
Action                     304
Finance                    302
Lifestyle                  302
Productivity               301
Personalization            298
Medical                    290
Sports                     270
Photography                263
Business                   263
Communication              257
Health & Fitness           246
Casual                     217
News & Magazines           204
Social                     203
Simulation                 200
Travel & Local             187
Arcade                     185
Shopping                   180
Books & Reference          171
Video Players & Editors    150
Dating                     133
Puzzle                     123
Maps & Navigation          118
Role Playing               111
Racing                     103
Strategy                    95
Food & Drink                94
Educational                 93
Adventure                   78
Auto & V

In [57]:
# Removing the dollar symbol and converting the column to a numerical type
composite_score_df_1['Purchase_Price'] = composite_score_df_1['Purchase_Price'].replace({'\$':''}, regex=True).astype(float)


In [58]:
composite_score_df_1.tail(10)


Unnamed: 0,Application,Ratings,Reviews,File_Size,Downloads,Category,Purchase_Price,Content Rating,Genres,Latest Updates,App_Version,Android_Supported,File_Size_MB,Downloads_Cleaned,Latest Updates Date Format,Months From Last Update,Ratings_N,Reviews_N,Downloads_Cleaned_N,Normalized Score,Composite Score,New_Genre
7427,CJ DVD Rentals,1.0,5,13M,100+,Free,0.0,Everyone,Communication,"October 6, 2017",1,4.1 and up,13.0,100,2017-10-06,10,0.2,6.397273e-08,1e-07,0.2,0.07,Communication
10400,Familial Hypercholesterolaemia Handbook,1.0,2,33M,100+,Free,0.0,Everyone,Medical,"July 2, 2018",2.0.1,4.1 and up,33.0,100,2018-07-02,1,0.2,2.558909e-08,1e-07,0.2,0.07,Medical
7383,Thistletown CI,1.0,1,6.6M,100+,Free,0.0,Everyone,Productivity,"March 15, 2018",41.9,4.1 and up,6.6,100,2018-03-15,5,0.2,1.279455e-08,1e-07,0.2,0.07,Productivity
7806,CR Magazine,1.0,1,7.8M,100+,Free,0.0,Everyone,Business,"July 23, 2014",2.4.2,2.3.3 and up,7.8,100,2014-07-23,49,0.2,1.279455e-08,1e-07,0.2,0.07,Business
6490,MbH BM,1.0,1,2.3M,100+,Free,0.0,Everyone,Medical,"December 14, 2016",1.1.3,4.3 and up,2.3,100,2016-12-14,20,0.2,1.279455e-08,1e-07,0.2,0.07,Medical
5978,Truck Driving Test Class 3 BC,1.0,1,2.0M,50+,Paid,1.49,Everyone,Education,"April 9, 2012",1,2.1 and up,2.0,50,2012-04-09,76,0.2,1.279455e-08,5e-08,0.2,0.07,Education
5151,Clarksburg AH,1.0,1,28M,50+,Free,0.0,Everyone,Medical,"May 1, 2017",300000.0.81,4.0.3 and up,28.0,50,2017-05-01,15,0.2,1.279455e-08,5e-08,0.2,0.07,Medical
8875,DT future1 cam,1.0,1,24M,50+,Free,0.0,Everyone,Tools,"March 27, 2018",3.1,2.2 and up,24.0,50,2018-03-27,5,0.2,1.279455e-08,5e-08,0.2,0.07,Tools
625,House party - live chat,1.0,1,9.2M,10+,Free,0.0,Mature 17+,Dating,"July 31, 2018",3.52,4.0.3 and up,9.2,10,2018-07-31,1,0.2,1.279455e-08,1e-08,0.2,0.07,Dating
4127,Speech Therapy: F,1.0,1,16M,10+,Paid,2.99,Everyone,Education,"October 7, 2016",1,2.3.3 and up,16.0,10,2016-10-07,22,0.2,1.279455e-08,1e-08,0.2,0.07,Education


# Drop Unnecessary colomns

In [59]:
# Columns to be dropped
columns_to_drop = ['File_Size', 'Downloads', 'Genres', 'Latest Updates', 'App_Version', 'Latest Updates Date Format', 'Ratings_N', 'Reviews_N', 'Downloads_Cleaned_N']

# Dropping the columns
composite_score_df_1.drop(columns_to_drop, axis=1, inplace=True)

In [60]:
composite_score_df_1.head()

Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre
2544,Facebook,4.1,78158306,Free,0.0,Teen,Varies with device,Varies with device,1000000000,0,2.82,0.937,Social
336,WhatsApp Messenger,4.4,69119316,Free,0.0,Everyone,Varies with device,Varies with device,1000000000,0,2.76435,0.923305,Communication
2604,Instagram,4.5,66577446,Free,0.0,Teen,Varies with device,Varies with device,1000000000,1,2.751828,0.920548,Social
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Free,0.0,Everyone,Varies with device,Varies with device,1000000000,0,2.524767,0.84743,Communication
1872,Subway Surfers,4.5,27725352,Free,0.0,Everyone 10+,4.1 and up,76.0,1000000000,1,2.254733,0.77142,Arcade


In [61]:
composite_score_df_1.dtypes

Application                 object
Ratings                    float64
Reviews                      int32
Category                    object
Purchase_Price             float64
Content Rating              object
Android_Supported           object
File_Size_MB                object
Downloads_Cleaned            int32
Months From Last Update      int64
Normalized Score           float64
Composite Score            float64
New_Genre                   object
dtype: object

In [62]:
composite_score_df_1['File_Size_MB']=composite_score_df_1['File_Size_MB'].replace('Varies with device', -1)

In [63]:
composite_score_df_1.describe()

Unnamed: 0,Ratings,Reviews,Purchase_Price,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score
count,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0,8195.0
mean,4.17335,255531.9,1.035573,18.510443,9190562.0,9.299573,0.84713,0.296332
std,0.536234,1985828.0,16.858269,22.500018,58266200.0,13.527404,0.134881,0.046584
min,1.0,1.0,0.0,-1.0,1.0,0.0,0.2,0.07
25%,4.0,127.0,0.0,2.8,10000.0,1.0,0.800052,0.280018
50%,4.3,3017.0,0.0,9.4,100000.0,3.0,0.860197,0.301064
75%,4.5,43993.5,0.0,27.0,1000000.0,12.0,0.915839,0.320246
max,5.0,78158310.0,400.0,100.0,1000000000.0,99.0,2.82,0.937


In [64]:
composite_score_df_1['File_Size_MB_Bins']=composite_score_df_1['File_Size_MB']

In [65]:
composite_score_df_1.head(100)

Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre,File_Size_MB_Bins
2544,Facebook,4.1,78158306,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.82,0.937,Social,-1.0
336,WhatsApp Messenger,4.4,69119316,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.76435,0.923305,Communication,-1.0
2604,Instagram,4.5,66577446,Free,0.0,Teen,Varies with device,-1.0,1000000000,1,2.751828,0.920548,Social,-1.0
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.524767,0.84743,Communication,-1.0
1872,Subway Surfers,4.5,27725352,Free,0.0,Everyone 10+,4.1 and up,76.0,1000000000,1,2.254733,0.77142,Arcade,76.0
3665,YouTube,4.3,25655305,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.188248,0.749474,Video Players & Editors,-1.0
2884,Google Photos,4.5,10859051,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.038937,0.706681,Photography,-1.0
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Free,0.0,Everyone,Varies with device,-1.0,500000000,0,1.989097,0.668729,Tools,-1.0
411,Google Chrome: Fast & Secure,4.3,9643041,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.983378,0.688013,Communication,-1.0
3234,Google,4.4,8033493,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.982785,0.688835,Tools,-1.0


In [66]:
composite_score_df_2 =composite_score_df_1.copy(deep=True)

In [67]:

# Define the bins
bins = [0, 10, 30, 70, float('inf')]
# Define the labels for the bins
labels = ['0-10MB', '10-30MB', '30-70MB', '70+MB']

# Apply the binning
composite_score_df_2['File_Size_MB_Bins_Categorized'] = pd.cut(composite_score_df_2['File_Size_MB_Bins'], bins=bins, labels=labels, right=False)


In [68]:
composite_score_df_2.head(100)

Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre,File_Size_MB_Bins,File_Size_MB_Bins_Categorized
2544,Facebook,4.1,78158306,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.82,0.937,Social,-1.0,
336,WhatsApp Messenger,4.4,69119316,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.76435,0.923305,Communication,-1.0,
2604,Instagram,4.5,66577446,Free,0.0,Teen,Varies with device,-1.0,1000000000,1,2.751828,0.920548,Social,-1.0,
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.524767,0.84743,Communication,-1.0,
1872,Subway Surfers,4.5,27725352,Free,0.0,Everyone 10+,4.1 and up,76.0,1000000000,1,2.254733,0.77142,Arcade,76.0,70+MB
3665,YouTube,4.3,25655305,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.188248,0.749474,Video Players & Editors,-1.0,
2884,Google Photos,4.5,10859051,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.038937,0.706681,Photography,-1.0,
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Free,0.0,Everyone,Varies with device,-1.0,500000000,0,1.989097,0.668729,Tools,-1.0,
411,Google Chrome: Fast & Secure,4.3,9643041,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.983378,0.688013,Communication,-1.0,
3234,Google,4.4,8033493,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.982785,0.688835,Tools,-1.0,


In [69]:
composite_score_df_2.dtypes

Application                        object
Ratings                           float64
Reviews                             int32
Category                           object
Purchase_Price                    float64
Content Rating                     object
Android_Supported                  object
File_Size_MB                      float64
Downloads_Cleaned                   int32
Months From Last Update             int64
Normalized Score                  float64
Composite Score                   float64
New_Genre                          object
File_Size_MB_Bins                 float64
File_Size_MB_Bins_Categorized    category
dtype: object

In [70]:
# Add 'Varies with device' as a new category to the column
composite_score_df_2['File_Size_MB_Bins_Categorized'] = composite_score_df_2['File_Size_MB_Bins_Categorized'].cat.add_categories('Varies with device')

# Replace missing values in the column
composite_score_df_2['File_Size_MB_Bins_Categorized'] = composite_score_df_2['File_Size_MB_Bins_Categorized'].fillna('Varies with device')


In [71]:
composite_score_df_2.head(20)

Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre,File_Size_MB_Bins,File_Size_MB_Bins_Categorized
2544,Facebook,4.1,78158306,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.82,0.937,Social,-1.0,Varies with device
336,WhatsApp Messenger,4.4,69119316,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.76435,0.923305,Communication,-1.0,Varies with device
2604,Instagram,4.5,66577446,Free,0.0,Teen,Varies with device,-1.0,1000000000,1,2.751828,0.920548,Social,-1.0,Varies with device
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.524767,0.84743,Communication,-1.0,Varies with device
1872,Subway Surfers,4.5,27725352,Free,0.0,Everyone 10+,4.1 and up,76.0,1000000000,1,2.254733,0.77142,Arcade,76.0,70+MB
3665,YouTube,4.3,25655305,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.188248,0.749474,Video Players & Editors,-1.0,Varies with device
2884,Google Photos,4.5,10859051,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.038937,0.706681,Photography,-1.0,Varies with device
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Free,0.0,Everyone,Varies with device,-1.0,500000000,0,1.989097,0.668729,Tools,-1.0,Varies with device
411,Google Chrome: Fast & Secure,4.3,9643041,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.983378,0.688013,Communication,-1.0,Varies with device
3234,Google,4.4,8033493,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.982785,0.688835,Tools,-1.0,Varies with device





### <p style="padding:10px;background-color:#0f5c59;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 5px 20px;overflow:hidden;font-weight:500">Top 100 and Bottom 100 dataframe</p>

In [72]:
# Creating a new dataframe with the first 100 and last 100 records
first_100 = composite_score_df_2.head(100)
last_100 = composite_score_df_2.tail(100)

# Adding labels to the first and last 100 records
first_100['App Hit Category'] = 'Top 100'
last_100['App Hit Category'] = 'Bottom 100'

# Combining the dataframes
top_bottom_100_df = pd.concat([first_100, last_100], ignore_index=True)
#top_bottom_100_df.head()  # Displaying the first few rows of the new dataframe


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  first_100['App Hit Category'] = 'Top 100'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  last_100['App Hit Category'] = 'Bottom 100'


In [73]:
top_bottom_100_df.head(300)

Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre,File_Size_MB_Bins,File_Size_MB_Bins_Categorized,App Hit Category
0,Facebook,4.1,78158306,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.82,0.937,Social,-1.0,Varies with device,Top 100
1,WhatsApp Messenger,4.4,69119316,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.76435,0.923305,Communication,-1.0,Varies with device,Top 100
2,Instagram,4.5,66577446,Free,0.0,Teen,Varies with device,-1.0,1000000000,1,2.751828,0.920548,Social,-1.0,Varies with device,Top 100
3,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.524767,0.84743,Communication,-1.0,Varies with device,Top 100
4,Subway Surfers,4.5,27725352,Free,0.0,Everyone 10+,4.1 and up,76.0,1000000000,1,2.254733,0.77142,Arcade,76.0,70+MB,Top 100
5,YouTube,4.3,25655305,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.188248,0.749474,Video Players & Editors,-1.0,Varies with device,Top 100
6,Google Photos,4.5,10859051,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.038937,0.706681,Photography,-1.0,Varies with device,Top 100
7,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Free,0.0,Everyone,Varies with device,-1.0,500000000,0,1.989097,0.668729,Tools,-1.0,Varies with device,Top 100
8,Google Chrome: Fast & Secure,4.3,9643041,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.983378,0.688013,Communication,-1.0,Varies with device,Top 100
9,Google,4.4,8033493,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.982785,0.688835,Tools,-1.0,Varies with device,Top 100


In [74]:
top_bottom_100_df.shape

(200, 16)


### <p style="padding:10px;background-color:#90EE90;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 5px 20px;overflow:hidden;font-weight:500">Top 100 Apps</p>

In [75]:
#top_bottom_100_df.to_excel('top_bottom_100_records.xlsx', index=False)

In [85]:
df_styled = first_100.style.applymap(lambda x: 'background-color : lightgreen', subset=['Application'])
df_styled

Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre,File_Size_MB_Bins,File_Size_MB_Bins_Categorized,App Hit Category
2544,Facebook,4.1,78158306,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.82,0.937,Social,-1.0,Varies with device,Top 100
336,WhatsApp Messenger,4.4,69119316,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.76435,0.923305,Communication,-1.0,Varies with device,Top 100
2604,Instagram,4.5,66577446,Free,0.0,Teen,Varies with device,-1.0,1000000000,1,2.751828,0.920548,Social,-1.0,Varies with device,Top 100
382,Messenger â€“ Text and Video Chat for Free,4.0,56646578,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.524767,0.84743,Communication,-1.0,Varies with device,Top 100
1872,Subway Surfers,4.5,27725352,Free,0.0,Everyone 10+,4.1 and up,76.0,1000000000,1,2.254733,0.77142,Arcade,76.0,70+MB,Top 100
3665,YouTube,4.3,25655305,Free,0.0,Teen,Varies with device,-1.0,1000000000,0,2.188248,0.749474,Video Players & Editors,-1.0,Varies with device,Top 100
2884,Google Photos,4.5,10859051,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,2.038937,0.706681,Photography,-1.0,Varies with device,Top 100
4005,Clean Master- Space Cleaner & Antivirus,4.7,42916526,Free,0.0,Everyone,Varies with device,-1.0,500000000,0,1.989097,0.668729,Tools,-1.0,Varies with device,Top 100
411,Google Chrome: Fast & Secure,4.3,9643041,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.983378,0.688013,Communication,-1.0,Varies with device,Top 100
3234,Google,4.4,8033493,Free,0.0,Everyone,Varies with device,-1.0,1000000000,0,1.982785,0.688835,Tools,-1.0,Varies with device,Top 100



### <p style="padding:10px;background-color:#ffcccb;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 5px 20px;overflow:hidden;font-weight:500">Bottom 100 Apps</p>

In [86]:
df_styled = last_100.style.applymap(lambda x: 'background-color : #ffcccb', subset=['Application'])
df_styled


Unnamed: 0,Application,Ratings,Reviews,Category,Purchase_Price,Content Rating,Android_Supported,File_Size_MB,Downloads_Cleaned,Months From Last Update,Normalized Score,Composite Score,New_Genre,File_Size_MB_Bins,File_Size_MB_Bins_Categorized,App Hit Category
10043,Remote EX for NISSAN,2.3,223,Paid,1.49,Everyone,3.0 and up,1.0,5000,49,0.460008,0.161003,Communication,1.0,0-10MB,Bottom 100
7978,CV-RECORD Pro,2.3,42,Paid,0.99,Everyone,4.1 and up,6.0,1000,19,0.460002,0.161001,Communication,6.0,0-10MB,Bottom 100
9283,EF Go Ahead Tour Companion,2.3,23,Free,0.0,Everyone,4.4 and up,43.0,1000,2,0.460001,0.161,Travel & Local,43.0,30-70MB,Bottom 100
2403,5 Minute Veterinary Consult: Canine & Feline,2.3,17,Free,0.0,Everyone,4.2 and up,6.8,1000,5,0.460001,0.161,Medical,6.8,0-10MB,Bottom 100
7789,CR Family,2.3,10,Free,0.0,Everyone,4.1 and up,5.2,1000,18,0.460001,0.161,Education,5.2,0-10MB,Bottom 100
8478,DK Eyewitness Audio Walks,2.3,9,Free,0.0,Everyone,4.2 and up,70.0,1000,4,0.460001,0.161,Travel & Local,70.0,70+MB,Bottom 100
6640,Bonoâ€™s Pit Bar-B-Q,2.3,3,Free,0.0,Everyone 10+,2.3.3 and up,36.0,1000,7,0.460001,0.161,Entertainment,36.0,30-70MB,Bottom 100
6961,Benefit Extras Mobile App,2.3,3,Free,0.0,Everyone,5.0 and up,2.4,1000,1,0.460001,0.161,Health & Fitness,2.4,0-10MB,Bottom 100
5102,Ag PhD Soils,2.3,3,Free,0.0,Everyone,4.4 and up,16.0,1000,1,0.460001,0.161,Productivity,16.0,10-30MB,Bottom 100
9711,EP Lab Digest,2.3,11,Free,0.0,Everyone,4.1 and up,12.0,500,1,0.460001,0.161,Medical,12.0,10-30MB,Bottom 100




### <p style="padding:10px;background-color:#0f5c59;margin:0;color:white;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 5px 20px;overflow:hidden;font-weight:500">Hypothesis Testing</p>

In [76]:
hyp_test_df=top_bottom_100_df.copy(deep=True)

# T Test

In [77]:
from scipy import stats

# Split the dataframe into two groups based on the 'App Hit Category'
group1 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Top 100']['Months From Last Update']
group2 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Bottom 100']['Months From Last Update']

# Perform the two-sample t-test
t_stat, p_value = stats.ttest_ind(group1, group2)

# Print the p-value
print(f'P-value: {p_value}')

P-value: 3.930089771162395e-13


- **The very small p-value suggests that there is a statistically significant difference in the 'Months From Last Update' between the Top 100 and Bottom 100 app categories, implying that the time since last update may be an influential factor in an app's market success.**








In [78]:
from scipy import stats

# Adjusting the two-sample t-test for a one-tailed test

# Split the dataframe into two groups based on the 'App Hit Category'
group1 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Top 100']['Months From Last Update']
group2 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Bottom 100']['Months From Last Update']

# Perform the two-sample t-test
t_stat, p_value = stats.ttest_ind(group1, group2)

# The t-test by default is two-tailed. To get the one-tailed p-value, we divide by 2.
# We only do this if the mean of group1 is greater than the mean of group2, as per our hypothesis.
if t_stat > 0:
    p_value /= 2

# Print the p-value
print(f'P-value: {p_value}')


P-value: 3.930089771162395e-13


- **The low one-tailed p-value from the adjusted t-test suggests that there is a statistically significant difference in the 'Months From Last Update' between the Top 100 and Bottom 100 app categories, with the direction of this difference being consistent with the hypothesis that apps in the Top 100 category may have more recent updates compared to those in the Bottom 100.**








In [79]:
from scipy import stats

# Split the dataframe into two groups based on the 'App Hit Category'
group1 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Top 100']['Purchase_Price']
group2 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Bottom 100']['Purchase_Price']

# Perform the two-sample t-test
t_stat, p_value = stats.ttest_ind(group1, group2)

# Print the p-value
print(f'P-value: {p_value}')

P-value: 0.020831681270575355


- **The code provided executes a two-sample t-test between the 'Purchase_Price' of the Top 100 apps and Bottom 100 apps. If the resulting p-value is low, it would suggest that there is a statistically significant difference in purchase price between the two groups, implying that price could be a factor in an app's categorization as either a top or bottom app based on the dataset criteria.**

In [80]:

# Split the dataframe into two groups based on the 'App Hit Category'
group1 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Top 100']['Composite Score']
group2 = hyp_test_df[hyp_test_df['App Hit Category'] == 'Bottom 100']['Composite Score']

# Perform the two-sample t-test
t_stat, p_value = stats.ttest_ind(group1, group2)

# Print the p-value
print(f'P-value: {p_value}')

P-value: 3.39329425072269e-62


In [81]:
hyp_test_df.columns

Index(['Application', 'Ratings', 'Reviews', 'Category', 'Purchase_Price',
       'Content Rating', 'Android_Supported', 'File_Size_MB',
       'Downloads_Cleaned', 'Months From Last Update', 'Normalized Score',
       'Composite Score', 'New_Genre', 'File_Size_MB_Bins',
       'File_Size_MB_Bins_Categorized', 'App Hit Category'],
      dtype='object')

P-value: 3.198246529020563e-123 - ratings

4.063420948205263e-21- downloads
P-value: 2.761396006708284e-13 - reviews

P-value: 3.39329425072269e-62 - c-score


# Chi Square

In [82]:
from scipy.stats import chi2_contingency

# Create a contingency table
contingency_table = pd.crosstab(hyp_test_df['App Hit Category'], hyp_test_df['New_Genre'])

#  Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table)

p # this is the p-value of the test

3.279294677712606e-08

**A low p-value from this test suggests that there is a statistically significant association between the app hit category and the genre of the app, indicating that the genre could significantly impact whether an app ranks among the Top 100 or Bottom 100 in this dataset.**

In [83]:
# Create a contingency table
contingency_table = pd.crosstab(hyp_test_df['App Hit Category'], hyp_test_df['File_Size_MB_Bins_Categorized'])

# Run Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table)

p # this is the p-value of the test

6.668794117154246e-25

### File Size Bins vs. App Hit Category:
- The first script uses a contingency table to analyze the relationship between 'App Hit Category' and 'File_Size_MB_Bins_Categorized'. A very low p-value from this test indicates a statistically significant association between the file size of apps and their categorization as either Top 100 or Bottom 100. This suggests that the size of an app could be an influential factor in its performance or popularity rankings.

In [84]:
# Create a contingency table
contingency_table = pd.crosstab(hyp_test_df['App Hit Category'], hyp_test_df['Content Rating'])

# Run Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table)

p # this is the p-value of the test

9.819641417678042e-06

### Content Rating vs. App Hit Category:
- The second script examines the relationship between 'App Hit Category' and 'Content Rating' through another contingency table. A very low p-value here suggests a statistically significant association between the content ratings of apps and their performance rankings. This result implies that the content rating could significantly impact an app's likelihood of being a top performer or a bottom performer in the dataset.