# Data description
| Column Name        | Description                                                            |
|--------------------|-------------------------------------------------------------------------|
| Application name   | Name of the application                                                 |
| Sort ascending     | Sort in ascending order                                                 |
| Sort descending    | Sort in descending order                                                |
| Category           | Category the app belongs to                                             |
| Rating             | Overall user rating of the app (as when scraped)                        |        |                    |                                                                         |
| Reviews            | Number of user reviews for the app (as when scraped)                    |       |                    |                                                                         |
| Size               |Size of the app (as when scraped)                                        |
| Installs           |Number of user downloads/installs for the app (as when scraped)          |        |                    |                                                                         |
| Type               |Whether the app is paid or free                                          |
| Price              |Price of the app (as when scraped)                                       |
| Content Rating     |Age group the app is targeted at (e.g., Children / Mature 21+ / Adult)   |        |                    |                                                                          |
| Genres             | An app can belong to multiple genres (apart from its main category).    |
|                    | For example, a  musical family game will belong to multiple genres.     |        |                    |                                                                          |
|                    |                                                                         |


# Work a Data Cleaning

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv("googleplaystore.csv")
df = pd.DataFrame(data)

In [3]:
df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [5]:
df.describe()

Unnamed: 0,Rating
count,9367.0
mean,4.193338
std,0.537431
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,19.0


In [6]:
# Remove duplicated row
df.drop_duplicates(inplace=True)

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10358 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10358 non-null  object 
 1   Category        10358 non-null  object 
 2   Rating          8893 non-null   float64
 3   Reviews         10358 non-null  object 
 4   Size            10358 non-null  object 
 5   Installs        10358 non-null  object 
 6   Type            10357 non-null  object 
 7   Price           10358 non-null  object 
 8   Content Rating  10357 non-null  object 
 9   Genres          10358 non-null  object 
 10  Last Updated    10358 non-null  object 
 11  Current Ver     10350 non-null  object 
 12  Android Ver     10355 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [8]:
df.describe()

Unnamed: 0,Rating
count,8893.0
mean,4.189542
std,0.545452
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,19.0


In [9]:
#to fix Rating columns need to fill null vvalues with median as there is an outliers
df['Rating'].fillna(df['Rating'].median(), inplace=True)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10358 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10358 non-null  object 
 1   Category        10358 non-null  object 
 2   Rating          10358 non-null  float64
 3   Reviews         10358 non-null  object 
 4   Size            10358 non-null  object 
 5   Installs        10358 non-null  object 
 6   Type            10357 non-null  object 
 7   Price           10358 non-null  object 
 8   Content Rating  10357 non-null  object 
 9   Genres          10358 non-null  object 
 10  Last Updated    10358 non-null  object 
 11  Current Ver     10350 non-null  object 
 12  Android Ver     10355 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [11]:
df[df['Rating']>=5]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
329,Hojiboy Tojiboyev Life Hacks,COMICS,5.0,15,37M,"1,000+",Free,0,Everyone,Comics,"June 26, 2018",2.0,4.0.3 and up
612,American Girls Mobile Numbers,DATING,5.0,5,4.4M,"1,000+",Free,0,Mature 17+,Dating,"July 17, 2018",3.0,4.0.3 and up
615,Awake Dating,DATING,5.0,2,70M,100+,Free,0,Mature 17+,Dating,"July 24, 2018",2.2.9,4.4 and up
633,Spine- The dating app,DATING,5.0,5,9.3M,500+,Free,0,Teen,Dating,"July 14, 2018",4.0,4.0.3 and up
636,Girls Live Talk - Free Text and Video Chat,DATING,5.0,6,5.0M,100+,Free,0,Mature 17+,Dating,"August 1, 2018",8.2,4.0.3 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10721,Mad Dash Fo' Cash,GAME,5.0,14,16M,100+,Free,0,Everyone,Arcade,"June 19, 2017",2.5a,4.1 and up
10742,GKPB FP Online Church,LIFESTYLE,5.0,32,7.9M,"1,000+",Free,0,Everyone,Lifestyle,"December 31, 2017",0.7.1,4.4 and up
10776,Monster Ride Pro,GAME,5.0,1,24M,10+,Free,0,Everyone,Racing,"March 5, 2018",2.0,2.3 and up
10820,Fr. Daoud Lamei,FAMILY,5.0,22,8.6M,"1,000+",Free,0,Teen,Education,"June 27, 2018",3.8.0,4.1 and up


In [12]:
#handling outliers problem by fill them be the median
df.loc[df['Rating'] >5, 'Rating'] = df['Rating'].median()

In [13]:
df['Rating'].max()

5.0

In [14]:
#now  i can say the Rating is fixed  

# Fix Size

In [15]:
df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [16]:
# Replace 'Varies with device' with NaN to handle missing values
df['Size'] = df['Size'].replace("Varies with device", np.nan)

In [17]:
df['Size'].info()

<class 'pandas.core.series.Series'>
Int64Index: 10358 entries, 0 to 10840
Series name: Size
Non-Null Count  Dtype 
--------------  ----- 
8832 non-null   object
dtypes: object(1)
memory usage: 161.8+ KB


In [18]:
# Extract numeric values from 'Size' (e.g., "19M" → "19")
df['Size_Value'] = df['Size'].str.extract(r'([\d.]+)').astype(float)

In [19]:
#Extract unit ('M' or 'K') from 'Size'
df['Size_Unit'] = df['Size'].str.extract(r'([MK])')

In [20]:
# Convert MB (M) to KB (1M = 1024K)
df.loc[df['Size_Unit'] == 'M', 'Size_Value'] *= 1024 

In [21]:
df['Size_Unit'].head()

0    M
1    M
2    M
3    M
4    M
Name: Size_Unit, dtype: object

In [22]:
df['Size_Value'].head()

0    19456.0
1    14336.0
2     8908.8
3    25600.0
4     2867.2
Name: Size_Value, dtype: float64

In [23]:
# Extract numeric values from the 'Size' column and convert them to float
df['Size'] = df['Size'].str.extract(r'([\d.]+)').astype(float)

In [24]:
df['Size'].info()

<class 'pandas.core.series.Series'>
Int64Index: 10358 entries, 0 to 10840
Series name: Size
Non-Null Count  Dtype  
--------------  -----  
8832 non-null   float64
dtypes: float64(1)
memory usage: 161.8 KB


In [25]:
df['Size_Value'].head()

0    19456.0
1    14336.0
2     8908.8
3    25600.0
4     2867.2
Name: Size_Value, dtype: float64

In [26]:
df['Size'].info()

<class 'pandas.core.series.Series'>
Int64Index: 10358 entries, 0 to 10840
Series name: Size
Non-Null Count  Dtype  
--------------  -----  
8832 non-null   float64
dtypes: float64(1)
memory usage: 161.8 KB


In [27]:
# Fill missing values with the median size within each category (Group by Category)
df['Size_Value'] = df['Size_Value'].fillna(df['Size_Value'].median())

# Print the first 10 rows to verify the changes
df.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Size_Value,Size_Unit
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19456.0,M
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14336.0,M
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8908.8,M
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25600.0,M
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2867.2,M
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up,5734.4,M
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19.0,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up,19456.0,M
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29.0,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up,29696.0,M
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33.0,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up,33792.0,M
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up,3174.4,M


In [28]:
df['Size'] = df['Size'].fillna(df['Size'].median())

# Fix Price

In [29]:
df['Price']= df['Price'].str.extract(r'([\d.]+)').astype(float)

In [30]:
df['Price'].fillna(df['Price'].mean(), inplace=True)

In [31]:
df['Price'].isnull().sum()

0

In [32]:
df['Price'].info()

<class 'pandas.core.series.Series'>
Int64Index: 10358 entries, 0 to 10840
Series name: Price
Non-Null Count  Dtype  
--------------  -----  
10358 non-null  float64
dtypes: float64(1)
memory usage: 161.8 KB


# Fix Category

In [33]:
# Fill missing values with the most common category (mode)
df['Category'].fillna(df['Category'].mode()[0], inplace=True)
# Convert all category names to lowercase to ensure consistency
df['Category'] = df['Category'].str.strip().str.lower()

In [34]:
df['Category'].head()

0    art_and_design
1    art_and_design
2    art_and_design
3    art_and_design
4    art_and_design
Name: Category, dtype: object

In [35]:
df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Size_Value,Size_Unit
0,Photo Editor & Candy Camera & Grid & ScrapBook,art_and_design,4.1,159,19.0,"10,000+",Free,0.0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19456.0,M
1,Coloring book moana,art_and_design,3.9,967,14.0,"500,000+",Free,0.0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14336.0,M
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",art_and_design,4.7,87510,8.7,"5,000,000+",Free,0.0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8908.8,M
3,Sketch - Draw & Paint,art_and_design,4.5,215644,25.0,"50,000,000+",Free,0.0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25600.0,M
4,Pixel Draw - Number Art Coloring Book,art_and_design,4.3,967,2.8,"100,000+",Free,0.0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2867.2,M


# Fix Android Version

In [36]:
df['Android Ver'].info()

<class 'pandas.core.series.Series'>
Int64Index: 10358 entries, 0 to 10840
Series name: Android Ver
Non-Null Count  Dtype 
--------------  ----- 
10355 non-null  object
dtypes: object(1)
memory usage: 161.8+ KB


In [37]:
df['Android Ver'] = df['Android Ver'].apply(lambda x: x.split('and')[0] if isinstance(x, str) else x)

In [38]:
df['Android Ver'].head()

0    4.0.3 
1    4.0.3 
2    4.0.3 
3      4.2 
4      4.4 
Name: Android Ver, dtype: object

In [39]:
df['Android Ver'].info()

<class 'pandas.core.series.Series'>
Int64Index: 10358 entries, 0 to 10840
Series name: Android Ver
Non-Null Count  Dtype 
--------------  ----- 
10355 non-null  object
dtypes: object(1)
memory usage: 161.8+ KB


In [40]:
df[['Android Ver']].head()

Unnamed: 0,Android Ver
0,4.0.3
1,4.0.3
2,4.0.3
3,4.2
4,4.4


# 	Handle Missing Values

In [41]:
#App
df['App'].isnull().sum()

0

In [42]:
df['Category'].isnull().sum()

0

In [43]:
df.isnull().sum()

App                  0
Category             0
Rating               0
Reviews              0
Size                 0
Installs             0
Type                 1
Price                0
Content Rating       1
Genres               0
Last Updated         0
Current Ver          8
Android Ver          3
Size_Value           0
Size_Unit         1842
dtype: int64

In [44]:
df.dropna(inplace=True)

In [45]:
df.isnull().sum()

App               0
Category          0
Rating            0
Reviews           0
Size              0
Installs          0
Type              0
Price             0
Content Rating    0
Genres            0
Last Updated      0
Current Ver       0
Android Ver       0
Size_Value        0
Size_Unit         0
dtype: int64

In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8510 entries, 0 to 10840
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             8510 non-null   object 
 1   Category        8510 non-null   object 
 2   Rating          8510 non-null   float64
 3   Reviews         8510 non-null   object 
 4   Size            8510 non-null   float64
 5   Installs        8510 non-null   object 
 6   Type            8510 non-null   object 
 7   Price           8510 non-null   float64
 8   Content Rating  8510 non-null   object 
 9   Genres          8510 non-null   object 
 10  Last Updated    8510 non-null   object 
 11  Current Ver     8510 non-null   object 
 12  Android Ver     8510 non-null   object 
 13  Size_Value      8510 non-null   float64
 14  Size_Unit       8510 non-null   object 
dtypes: float64(4), object(11)
memory usage: 1.0+ MB


# Fix anything need fixing 

In [47]:
#convert from object to int
df['Reviews'] = df['Reviews'].astype(int)

In [48]:
df['Reviews'].isnull().sum()

0

In [49]:
#convert from object to int
df['Installs'] = df['Installs'].str.replace('[+,]', '', regex=True).astype(int)

In [50]:
df['Installs'].isnull().sum()

0

In [51]:
Q1 = df['Reviews'].quantile(0.25)
Q3 = df['Reviews'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# replace outliers by median
median_reviews = df['Reviews'].median()
df['Reviews'] = df['Reviews'].mask((df['Reviews'] < lower_bound) | (df['Reviews'] > upper_bound), median_reviews)


In [52]:
Q1 = df['Rating'].quantile(0.25)
Q3 = df['Rating'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['Rating'] < lower_bound) | (df['Rating'] > upper_bound)]
print(outliers)


                                                     App             Category  \
87                         RST - Sale of cars on the PCT    auto_and_vehicles   
159                                       Cloud of Books  books_and_reference   
176                                     Free Book Reader  books_and_reference   
311    comico Popular Original Cartoon Updated Everyd...               comics   
312                        Daily Manga - Comic & Webtoon               comics   
...                                                  ...                  ...   
10743                                     FP BW LCD View               family   
10757                       Fisher-Price® Smart Connect™                tools   
10766                             FreedomPop Diagnostics                tools   
10819                                          Fanfic-FR  books_and_reference   
10828                            Manga-FR - Anime Vostfr               comics   

       Rating  Reviews  Siz

# Analytical Questions:

In [53]:
df['Price'].describe() 

count    8510.000000
mean        1.144521
std        17.414951
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max       400.000000
Name: Price, dtype: float64

In [54]:
#1.What is the most expensive app on the Play Store?
max_price = df['Price'].max()
print(f"the most expensive app on the Play Store = {max_price}")


the most expensive app on the Play Store = 400.0


In [55]:
#2.Which genre has the highest number of apps?
top_genre = df['Genres'].value_counts().idxmax()  # Get the most popular genre
top_genre_count = df['Genres'].value_counts().max()  # Get the number of apps in this genre
print(f"The most popular genre is '{top_genre}' with {top_genre_count} apps.")

The most popular genre is 'Tools' with 658 apps.


In [56]:
#3.What is the average size of free vs. paid apps?
avg_size = df.groupby('Type')['Size'].mean()
print(avg_size)

Type
Free    22.134753
Paid    21.229355
Name: Size, dtype: float64


In [57]:
#4.What are the top 5 most expensive apps with a perfect rating (5)?
top_expensive_perfect_apps = df[df['Rating'] == 5.0].sort_values(by='Price', ascending=False).head(5)
print(top_expensive_perfect_apps[['App', 'Price', 'Rating']])

                                            App  Price  Rating
5489                  AP Art History Flashcards  29.99     5.0
7477                 USMLE Step 2 CK Flashcards  19.99     5.0
5246                      Hey AJ! It's Bedtime!   4.99     5.0
7204  TI-84 CE Graphing Calculator Manual TI 84   4.99     5.0
5237                     Hey AJ! It's Saturday!   3.99     5.0


In [58]:
#5.How many apps have received more than 50K reviews?
num_apps_high_reviews = (df['Reviews'] > 50000).sum()
print("Number of apps with more than 50K reviews:", num_apps_high_reviews)

Number of apps with more than 50K reviews: 122


In [59]:
#6.What is the average price of apps, grouped by genre and number of installs?
avg_price_by_genre_installs = df.groupby(['Genres', 'Installs'])['Price'].mean().reset_index()
print(avg_price_by_genre_installs)

     Genres  Installs   Price
0    Action        10  0.0000
1    Action        50  0.9950
2    Action       100  1.3700
3    Action       500  0.0000
4    Action      1000  0.3725
..      ...       ...     ...
868    Word    100000  0.0000
869    Word   1000000  0.0000
870    Word   5000000  0.0000
871    Word  10000000  0.0000
872    Word  50000000  0.0000

[873 rows x 3 columns]


In [60]:
high_rating_apps = df[df['Rating'] > 4.7]  # Filter apps with a rating higher than 4.7

num_high_rating_apps = high_rating_apps.shape[0]  # Count the number of apps
avg_price_high_rating = high_rating_apps['Price'].mean()  # Calculate the average price
print("Number of apps with a rating > 4.7:", num_high_rating_apps)
print("Average price of these apps:", avg_price_high_rating)

Number of apps with a rating > 4.7: 550
Average price of these apps: 0.42949090909090903


In [61]:
# Filter apps with more than 5,000,000 installs
high_installs_apps = df[df['Installs'] >= 5000000]

# Calculate total revenue for these apps
total_revenue = (high_installs_apps['Price'] * high_installs_apps['Installs']).sum()

# Calculate Google's share (30%)
google_revenue = total_revenue * 0.30

print("Estimated Google revenue from apps with 5M+ installs: $", google_revenue)

Estimated Google revenue from apps with 5M+ installs: $ 2970000.0


In [62]:
#9.What are the maximum and minimum sizes of free vs. paid apps?
min_size = df.groupby('Type')['Size'].min()
max_size = df.groupby('Type')['Size'].max()
print(f"Min size :{min_size} , Max size :{max_size}")

Min size :Type
Free    1.0
Paid    1.0
Name: Size, dtype: float64 , Max size :Type
Free    100.0
Paid    100.0
Name: Size, dtype: float64


In [63]:
#10.Is there a correlation between an app’s rating, number of reviews, size, and its price?
correlation_matrix = df[['Rating', 'Reviews', 'Size', 'Price']].corr()
correlation_matrix 

Unnamed: 0,Rating,Reviews,Size,Price
Rating,1.0,0.047298,0.056569,-0.016887
Reviews,0.047298,1.0,0.106465,-0.02288
Size,0.056569,0.106465,1.0,-0.023249
Price,-0.016887,-0.02288,-0.023249,1.0


In [64]:
#11.How many apps exist for each type (free/paid) across different content ratings?
type_content_counts = df.groupby(['Content Rating', 'Type'])['App'].count()
print(type_content_counts)


Content Rating   Type
Adults only 18+  Free       2
Everyone         Free    6360
                 Paid     526
Everyone 10+     Free     283
                 Paid      29
Mature 17+       Free     351
                 Paid      15
Teen             Free     892
                 Paid      50
Unrated          Free       2
Name: App, dtype: int64


In [65]:
#12.How many apps are compatible with Android version 4.x?
android_4x_apps = df[df['Android Ver'].str.contains('^4\.', na=False, regex=True)]
num_android_4x_apps = android_4x_apps.shape[0]
print(f"Number of apps compatible with Android 4.x: {num_android_4x_apps}")


Number of apps compatible with Android 4.x: 6326
