# <b> ПРОЕКТ В ПРОЦЕССЕ ВЫПОЛНЕНИЯ ⚠️ </b>

#<b>Проект: Исследование приложений Google Play Market</b>


##<b>Описание проекта</b>

Цель проекта заключается в исследовании популярности приложений на платформе Google Play Market. <br>Для достижения этой цели мы анализируем различные аспекты приложений, включая их название, категорию, рейтинг, количество отзывов, размер, количество установок, тип (платное/бесплатное), цену, возрастную группу, жанр, дату последнего обновления и минимально требуемую версию Android.<br> Эти данные позволят нам понять, какие приложения наиболее популярны среди пользователей Google Play, и выявить факторы, влияющие на их популярность.

##<b>Описание данных</b>
- <b>App</b> - название приложения;
- <b>Category</b> - категория приложения;
- <b>Rating</b> - общий пользовательский рейтинг;
- <b>Reviews</b> - количество отзывов;
- <b>Size</b> - размер приложения;
- <b>Installs</b> - количество загрузок/установок;
- <b>Type</b> - платное/бесплатное;
- <b>Price</b> - цена приложения;
- <b>Content Rating</b> - возрастная группа;
- <b>Genres</b> - жанр приложения;
- <b>Last Updated</b> - последнее обновление;
- <b>Current Ver</b> - текущая доступная версия;
- <b>Android Ver</b> - минимально требуемая версия Android.

##<b>Задачи исследования</b>

<b>Предобработка данных</b>
- Приведем наименовая столбцов в корректному виду.
- Проверим пропуски и типы данных. Откорректируем, если это потребуется.
- Проверим данные на дубликаты.

<b>Исследовательский анализ данных</b>

<b>Общий вывод</b>

In [1]:
!gdown --id 13YMcVmYMpoNnmMPXizpkj4yt9p_1J_5g

Downloading...
From: https://drive.google.com/uc?id=13YMcVmYMpoNnmMPXizpkj4yt9p_1J_5g
To: /content/googleplaystore.csv
100% 1.36M/1.36M [00:00<00:00, 154MB/s]


In [2]:
import pandas as pd
import numpy as mp
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv('googleplaystore.csv')

In [4]:
df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


## <b>Предобработка данных</b>

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [6]:
df.columns = df.columns.str.replace(' ', '_').str.lower()

In [7]:
df.columns

Index(['app', 'category', 'rating', 'reviews', 'size', 'installs', 'type',
       'price', 'content_rating', 'genres', 'last_updated', 'current_ver',
       'android_ver'],
      dtype='object')

In [8]:
df.isna().sum()

app                  0
category             0
rating            1474
reviews              0
size                 0
installs             0
type                 1
price                0
content_rating       1
genres               0
last_updated         0
current_ver          8
android_ver          3
dtype: int64

In [9]:
df.dropna(inplace=True)

In [10]:
df['rating'] = df['rating'].fillna(0)
df['type'] = df['type'].fillna('Not Defined')
df['content_rating'] = df['content_rating'].fillna('Unrated')
df['current_ver'] = df['current_ver'].fillna('No Data')
df['android_ver'] = df['android_ver'].fillna('No Data')

In [11]:
df.duplicated().sum()

474

In [12]:
df.drop_duplicates(inplace=True)

In [13]:
df.duplicated().sum()

0

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8886 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   app             8886 non-null   object 
 1   category        8886 non-null   object 
 2   rating          8886 non-null   float64
 3   reviews         8886 non-null   object 
 4   size            8886 non-null   object 
 5   installs        8886 non-null   object 
 6   type            8886 non-null   object 
 7   price           8886 non-null   object 
 8   content_rating  8886 non-null   object 
 9   genres          8886 non-null   object 
 10  last_updated    8886 non-null   object 
 11  current_ver     8886 non-null   object 
 12  android_ver     8886 non-null   object 
dtypes: float64(1), object(12)
memory usage: 971.9+ KB


In [19]:
df['reviews'].unique()

array([   159,    967,  87510, ...,    603,   1195, 398307])

In [15]:
df['reviews'] = df['reviews'].astype(int)

In [23]:
df['size'].unique()

array([ 19. ,  14. ,   8.7,  25. ,   2.8,   5.6,  29. ,  33. ,   3.1,
        28. ,  12. ,  20. ,  21. ,  37. ,   5.5,  17. ,  39. ,  31. ,
         4.2,  23. ,   6. ,   6.1,   4.6,   9.2,   5.2,  11. ,  24. ,
         0. ,   9.4,  15. ,  10. ,   1.2,  26. ,   8. ,   7.9,  56. ,
        57. ,  35. ,  54. , 201. ,   3.6,   5.7,   8.6,   2.4,  27. ,
         2.7,   2.5,   7. ,  16. ,   3.4,   8.9,   3.9,   2.9,  38. ,
        32. ,   5.4,  18. ,   1.1,   2.2,   4.5,   9.8,  52. ,   9. ,
         6.7,  30. ,   2.6,   7.1,  22. ,   6.4,   3.2,   8.2,   4.9,
         9.5,   5. ,   5.9,  13. ,  73. ,   6.8,   3.5,   4. ,   2.3,
         2.1,  42. ,   9.1,  55. ,   7.3,   6.5,   1.5,   7.5,  51. ,
        41. ,  48. ,   8.5,  46. ,   8.3,   4.3,   4.7,   3.3,  40. ,
         7.8,   8.8,   6.6,   5.1,  61. ,  66. ,  79. ,   8.4,   3.7,
       118. ,  44. , 695. ,   1.6,   6.2,  53. ,   1.4,   3. ,   7.2,
         5.8,   3.8,   9.6,  45. ,  63. ,  49. ,  77. ,   4.4,  70. ,
         9.3,   8.1,

In [16]:
df['size'] = df['size'].str.replace('M', '').str.replace('k', '').str.replace('Varies with device', '0')
df['size'] = df['size'].astype(float)

In [17]:
df['type'].unique()

array(['Free', 'Paid'], dtype=object)

In [24]:
df['price'].unique()

array(['0', '$4.99', '$3.99', '$6.99', '$7.99', '$5.99', '$2.99', '$3.49',
       '$1.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49', '$10.00',
       '$24.99', '$11.99', '$79.99', '$16.99', '$14.99', '$29.99',
       '$12.99', '$2.49', '$10.99', '$1.50', '$19.99', '$15.99', '$33.99',
       '$39.99', '$3.95', '$4.49', '$1.70', '$8.99', '$1.49', '$3.88',
       '$399.99', '$17.99', '$400.00', '$3.02', '$1.76', '$4.84', '$4.77',
       '$1.61', '$2.50', '$1.59', '$6.49', '$1.29', '$299.99', '$379.99',
       '$37.99', '$18.99', '$389.99', '$8.49', '$1.75', '$14.00', '$2.00',
       '$3.08', '$2.59', '$19.40', '$3.90', '$4.59', '$15.46', '$3.04',
       '$13.99', '$4.29', '$3.28', '$4.60', '$1.00', '$2.95', '$2.90',
       '$1.97', '$2.56', '$1.20'], dtype=object)

In [25]:
df['price'] = df['price'].str.replace('$', '').astype(float)

  df['price'] = df['price'].str.replace('$', '').astype(float)


In [44]:
df['genres'].unique()

array(['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Auto & Vehicles', 'Beauty',
       'Books & Reference', 'Business', 'Comics', 'Comics;Creativity',
       'Communication', 'Dating', 'Education;Education', 'Education',
       'Education;Creativity', 'Education;Music & Video',
       'Education;Action & Adventure', 'Education;Pretend Play',
       'Education;Brain Games', 'Entertainment',
       'Entertainment;Music & Video', 'Entertainment;Brain Games',
       'Entertainment;Creativity', 'Events', 'Finance', 'Food & Drink',
       'Health & Fitness', 'House & Home', 'Libraries & Demo',
       'Lifestyle', 'Lifestyle;Pretend Play',
       'Adventure;Action & Adventure', 'Arcade', 'Casual', 'Card',
       'Casual;Pretend Play', 'Action', 'Strategy', 'Puzzle', 'Sports',
       'Music', 'Word', 'Racing', 'Casual;Creativity',
       'Casual;Action & Adventure', 'Simulation', 'Adventure', 'Board',
       'Trivia', 'Role Playing', 'Simulation;Education',
 

In [46]:
df['genres'] = df['genres'].str.split(';').str[0]

In [31]:
df['last_updated'] = pd.to_datetime(df['last_updated'])

In [34]:
df['current_ver'] = df['current_ver'].str.replace('Varies with device', '0')

In [38]:
df['android_ver'] = df['android_ver'].str.replace('and up', '').str.replace('W', '').str.replace('Varies with device', '0')

## <b> Исследовательский анализ данных </b>

In [52]:
df.head(10)

Unnamed: 0,app,category,rating,reviews,size,installs,type,price,content_rating,genres,last_updated,current_ver,android_ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,"10,000+",Free,0.0,Everyone,Art & Design,2018-01-07,1.0.0,4.0.3
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14.0,"500,000+",Free,0.0,Everyone,Art & Design,2018-01-15,2.0.0,4.0.3
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7,"5,000,000+",Free,0.0,Everyone,Art & Design,2018-08-01,1.2.4,4.0.3
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25.0,"50,000,000+",Free,0.0,Teen,Art & Design,2018-06-08,0,4.2
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8,"100,000+",Free,0.0,Everyone,Art & Design,2018-06-20,1.1,4.4
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6,"50,000+",Free,0.0,Everyone,Art & Design,2017-03-26,1.0,2.3
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19.0,"50,000+",Free,0.0,Everyone,Art & Design,2018-04-26,1.1,4.0.3
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29.0,"1,000,000+",Free,0.0,Everyone,Art & Design,2018-06-14,6.1.61.1,4.2
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33.0,"1,000,000+",Free,0.0,Everyone,Art & Design,2017-09-20,2.9.2,3.0
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1,"10,000+",Free,0.0,Everyone,Art & Design,2018-07-03,2.8,4.0.3
