# Анализ YouTube-канала для бизнеса

Целью проекта является построение итогового дашборда с отображением необходимых метрик; дашборд будет испольоваться заказчиком для анализа контента из YouTube.

**Ход исследования:**

Данные со статистикой по всем видео, а также со статистикой по дням, за октябрь 2022 мы получим из файлов `Content 2022-10-01_2022-11-01 RomanSergeevCom - Table data.csv` и `Date 2022-10-01_2022-11-01 RomanSergeevCom - Table data.csv`, соответственно.

О качестве данных ничего не известно. Перед проведением исследовательского анализа нам понадобится сделать их обзор. Мы проверим данные на ошибки и затем на этапе предобработки их исправим.

**Задачи первого этапа:**
1. Проанализировать данные, осуществить предобработку данных:
- исследовать пропущенные значения;
- исследовать соответствие типов;
- исследовать дубликаты;
- проверить корректность наименований колонок;
- переименовать  колонки в случае необходимости.
2. Проанализировать соотношение данных с заявленными метриками со стороны заказчика на возможность их отображения в дашборде.

## Обзор данных

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import time as tm
import datetime as dt
import warnings
warnings.filterwarnings("ignore")

### Статистика по всем видео за октябрь 2022

In [2]:
# функция для первичного анализа датасета
def primary_analysis (data):
    pd.options.display.max_columns = None
    display(data.head(5))
    data.info()
    display(data.describe())
    print('\nКоличество уникальных значений в столбцах:')
    print(data.nunique())
    print("\nКоличество явных дубликатов: {}, что составляет {:,.1%} от общего объема.".format(data.duplicated().sum(), data.duplicated().mean()))

In [3]:
content_stat = pd.read_csv('Content 2022-10-01_2022-11-01 RomanSergeevCom - Table data.csv')
primary_analysis(content_stat)

Unnamed: 0,Content,Video title,Video publish time,Clicks per end screen element shown (%),End screen element clicks,Teaser clicks per card teaser shown (%),Card teaser clicks,Card clicks,Clicks per card shown (%),Subscribers gained,Unique viewers,Returning viewers,New viewers,Average percentage viewed (%),Subscribers,Comments added,Shares,Dislikes,Likes,Views,Watch time (hours),Average view duration,Impressions,Impressions click-through rate (%)
0,Total,,,1.83,944.0,0.29,57.0,17.0,11.41,2041.0,62738,15424,47314,31.92,101.0,64.0,2816.0,83.0,3419.0,107075.0,8217.0323,0:04:36,1424477,4.47
1,ZFcv558-PDo,«Подсознание может всё!» Джон Кехо | Саммари ®,"Feb 6, 2019",0.87,25.0,0.15,2.0,0.0,0.0,151.0,6807,1824,4983,36.1,145.0,6.0,183.0,5.0,179.0,8090.0,483.5454,0:03:35,167318,2.98
2,73eE66Hk_1A,«От нуля к единице». Питер Тиль | Саммари ®,"Oct 4, 2022",1.27,33.0,,0.0,0.0,,25.0,6312,4154,2158,30.36,-48.0,12.0,76.0,14.0,291.0,7239.0,362.0123,0:03:00,130379,3.51
3,65Xci_uI_e0,«Магия утра». Хэл Элрод | Саммари,"Jan 30, 2017",1.36,25.0,0.1,1.0,1.0,20.0,89.0,4156,702,3454,31.1,87.0,1.0,174.0,1.0,98.0,5685.0,842.7552,0:08:53,21762,18.7
4,OSjaA86Z4Ho,"«Богатый папа, бедный папа». Роберт Кийосаки ...","Oct 23, 2017",1.57,39.0,0.18,3.0,0.0,0.0,60.0,4247,1276,2971,34.46,59.0,1.0,96.0,4.0,138.0,4817.0,358.6836,0:04:28,103528,2.78


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Data columns (total 24 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   Content                                  221 non-null    object 
 1   Video title                              220 non-null    object 
 2   Video publish time                       191 non-null    object 
 3   Clicks per end screen element shown (%)  179 non-null    float64
 4   End screen element clicks                196 non-null    float64
 5   Teaser clicks per card teaser shown (%)  177 non-null    float64
 6   Card teaser clicks                       196 non-null    float64
 7   Card clicks                              196 non-null    float64
 8   Clicks per card shown (%)                40 non-null     float64
 9   Subscribers gained                       196 non-null    float64
 10  Unique viewers                           221 non-n

Unnamed: 0,Clicks per end screen element shown (%),End screen element clicks,Teaser clicks per card teaser shown (%),Card teaser clicks,Card clicks,Clicks per card shown (%),Subscribers gained,Unique viewers,Returning viewers,New viewers,Average percentage viewed (%),Subscribers,Comments added,Shares,Dislikes,Likes,Views,Watch time (hours),Impressions,Impressions click-through rate (%)
count,179.0,196.0,177.0,196.0,196.0,40.0,196.0,221.0,221.0,221.0,195.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0,221.0,218.0
mean,1.656704,9.632653,0.314859,0.581633,0.173469,10.26425,17.887755,682.714932,218.411765,464.307692,35.838462,7.408163,0.653061,28.734694,0.846939,34.887755,1092.602041,83.84727,12891.21,3.881927
std,1.938613,67.748765,0.912532,4.190156,1.328392,21.186487,146.396433,4280.4262,1085.933075,3217.269064,10.961237,19.575409,4.679693,202.105992,6.041484,245.260338,7686.057578,591.432422,96971.33,2.615019
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.63,-48.0,0.0,0.0,-2.0,-1.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,42.0,18.0,19.0,27.215,0.0,0.0,2.0,0.0,3.0,83.0,6.426125,893.0,2.135
50%,1.17,1.0,0.0,0.0,0.0,0.0,2.0,157.0,61.0,86.0,35.61,2.0,0.0,5.0,0.0,7.5,238.0,15.15325,2401.0,4.005
75%,2.195,5.0,0.0,0.0,0.0,11.185,6.0,357.0,150.0,224.0,43.565,6.0,0.0,15.0,0.0,16.0,560.25,38.966875,4924.0,4.9525
max,11.59,944.0,6.47,57.0,17.0,100.0,2041.0,62738.0,15424.0,47314.0,67.98,145.0,64.0,2816.0,83.0,3419.0,107075.0,8217.0323,1424477.0,18.7



Количество уникальных значений в столбцах:
Content                                    221
Video title                                218
Video publish time                         186
Clicks per end screen element shown (%)    107
End screen element clicks                   27
Teaser clicks per card teaser shown (%)     33
Card teaser clicks                           6
Card clicks                                  5
Clicks per card shown (%)                    9
Subscribers gained                          33
Unique viewers                             169
Returning viewers                          138
New viewers                                149
Average percentage viewed (%)              191
Subscribers                                 36
Comments added                               8
Shares                                      43
Dislikes                                    10
Likes                                       52
Views                                      176
Watch time (hour

Описание столбцов в `Content 2022-10-01_2022-11-01 RomanSergeevCom - Table data.csv` согласно документации:
* `Content` – идентификатор контента;
* `Video title` – название видео;
* `Video publish time` – дата публикации видео;
* `Clicks per end screen element shown (%)` - отношение нажатий на элемент конечной заставки к его показам;
* `End screen element clicks` - число нажатий на элемент конечной заставки;
* `Teaser clicks per card teaser shown (%)` – отношение количества кликов по тизеру к общему числу его показов;
* `Card teaser clicks` – количество кликов по тизеру; если зритель нажал на значок подсказки, это считается кликом по последнему показанному тизеру;
* `Card clicks` – cколько раз зрители нажимали на подсказку;
* `Сlicks per card shown (%)` – отношение количества кликов по подсказке к общему числу ее показов;
* `Subscribers gained` – новые подписчики; общее число пользователей, подписавшихся на канал за выбранный период времени в указанном регионе;
* `Unique viewers` – предполагаемое количество пользователей, которые наблюдали за контентом в пределах выбранного периода времени;
* `Returning viewers` – количество постоянных зрителей;
* `New viewers` – количество пользователей, впервые обнаруживших канал, посмотрев это видео;
* `Average percentage viewed (%)` – средний процент просмотра показывает процент видео, которое просматривается во время воспроизведения;
* `Subscribers` – количество подписчиков;
* `Comments added` – количество комментариев, добавленных на видео или канале; данное число может включать удаленные комментарии, а также чат;
* `Shares` – сколько пользователей отреагировали, нажав кнопку «Поделиться»;
* `Dislikes` – сколько пользователей отреагировали, нажав кнопку «Не нравится»;
* `Likes` – изменение общего количества лайков, найденных путем вычитания «лайков, удаленных» из «лайков, добавленных» для выбранного периода времени, области и других фильтров;
* `Views` – общее количество просмотров по выбранному периоду времени, области и других фильтров;
* `Watch time (hours)` – общее время, в течение которого пользователи смотрели видео;
* `Average view duration` – данные о среднем времени просмотра по выбранному видео или временному диапазону;
* `Impressions` – информация о том, сколько раз значки ваших видео были показаны пользователям; учитываются только показы на YouTube;
* `Impressions click-through rate (%)` – сведения о том, как часто пользователи просматривают видео после того, как увидят его значок;

Датасет содержит данные со статистикой по всем видео за октябрь 2022 года. Во всех столбцах, кроме `Content`, `Unique viewers`, `Returning viewers`, `New viewers` и `Impressions` есть пропущенные значения; явные дубликаты отсутствуют. 
В столбцах `Video publish time`, `End screen element clicks`, `Card teaser clicks`, `Card clicks`, `Subscribers gained`, `Subscribers`, `Comments added`, `Shares`, `Dislikes`, `Likes`, `Views` и `Average view duration` тип данных не соответствует документации.

### Статистика по дням за октябрь 2022

In [4]:
day_stat = pd.read_csv('Date 2022-10-01_2022-11-01 RomanSergeevCom - Table data.csv')
primary_analysis(day_stat)

Unnamed: 0,Date,Clicks per end screen element shown (%),End screen element clicks,Teaser clicks per card teaser shown (%),Card teaser clicks,Clicks per card shown (%),Card clicks,Comments added,Shares,Dislikes,Likes,Subscribers gained,Returning viewers,New viewers,Unique viewers,Impressions click-through rate (%),Impressions,Videos published,Videos added,Subscribers,Average percentage viewed (%),Views,Watch time (hours),Average view duration
0,Total,1.83,944,0.29,57,11.41,17,64,2816,83,3419,2041,15424,47314,62738,4.47,1424477,1,1,101,31.92,107075,8217.0323,0:04:36
1,2022-10-14,2.55,36,0.32,2,20.0,1,0,107,3,95,55,810,1420,2230,4.32,42037,0,0,7,30.67,3085,236.706,0:04:36
2,2022-10-19,2.45,46,0.25,2,16.67,1,0,99,1,115,77,929,1725,2654,4.83,46610,0,0,21,32.68,3824,307.4319,0:04:49
3,2022-10-01,2.29,33,0.54,3,25.0,2,1,65,1,98,62,811,1172,1983,3.73,44269,0,0,-2,30.91,2860,204.9421,0:04:17
4,2022-10-20,2.27,38,0.16,1,0.0,0,3,91,4,86,57,944,1527,2471,4.62,44581,0,0,3,32.65,3429,275.8032,0:04:49


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 24 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   Date                                     32 non-null     object 
 1   Clicks per end screen element shown (%)  32 non-null     float64
 2   End screen element clicks                32 non-null     int64  
 3   Teaser clicks per card teaser shown (%)  32 non-null     float64
 4   Card teaser clicks                       32 non-null     int64  
 5   Clicks per card shown (%)                29 non-null     float64
 6   Card clicks                              32 non-null     int64  
 7   Comments added                           32 non-null     int64  
 8   Shares                                   32 non-null     int64  
 9   Dislikes                                 32 non-null     int64  
 10  Likes                                    32 non-null

Unnamed: 0,Clicks per end screen element shown (%),End screen element clicks,Teaser clicks per card teaser shown (%),Card teaser clicks,Clicks per card shown (%),Card clicks,Comments added,Shares,Dislikes,Likes,Subscribers gained,Returning viewers,New viewers,Unique viewers,Impressions click-through rate (%),Impressions,Videos published,Videos added,Subscribers,Average percentage viewed (%),Views,Watch time (hours)
count,32.0,32.0,32.0,32.0,29.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0
mean,1.838438,59.0,0.28375,3.5625,12.150345,1.0625,4.0,176.0,5.1875,213.6875,127.5625,1400.5625,2975.6875,4376.25,4.4925,89029.81,0.0625,0.0625,6.3125,31.884687,6692.1875,513.564522
std,0.356462,161.640781,0.222606,9.866651,13.976038,2.983152,11.167349,482.083047,14.414235,586.063436,349.287656,2580.670892,8093.691304,10662.586296,0.353991,243886.1,0.245935,0.245935,23.088103,0.996665,18330.056941,1406.370489
min,1.23,18.0,0.0,0.0,0.0,0.0,0.0,59.0,-1.0,57.0,49.0,757.0,1172.0,1983.0,3.73,38401.0,0.0,0.0,-49.0,29.1,2766.0,200.8965
25%,1.615,26.0,0.15,1.0,0.0,0.0,1.0,74.75,1.0,89.0,60.0,814.75,1467.25,2283.75,4.2775,41723.5,0.0,0.0,-3.25,31.45,3093.75,246.10285
50%,1.78,30.5,0.25,2.0,11.41,0.0,2.0,90.5,2.0,109.5,66.0,883.5,1528.0,2472.0,4.605,43765.5,0.0,0.0,4.0,31.83,3395.0,262.8328
75%,2.1225,36.0,0.3575,2.25,20.0,1.0,3.0,107.25,5.0,125.25,73.25,945.0,1609.75,2577.75,4.685,46569.5,0.0,0.0,16.25,32.4425,3577.5,277.83515
max,2.55,944.0,0.75,57.0,50.0,17.0,64.0,2816.0,83.0,3419.0,2041.0,15424.0,47314.0,62738.0,5.2,1424477.0,1.0,1.0,101.0,33.77,107075.0,8217.0323



Количество уникальных значений в столбцах:
Date                                       32
Clicks per end screen element shown (%)    28
End screen element clicks                  21
Teaser clicks per card teaser shown (%)    19
Card teaser clicks                          8
Clicks per card shown (%)                   9
Card clicks                                 4
Comments added                              9
Shares                                     26
Dislikes                                   10
Likes                                      26
Subscribers gained                         26
Returning viewers                          32
New viewers                                31
Unique viewers                             30
Impressions click-through rate (%)         29
Impressions                                32
Videos published                            2
Videos added                                2
Subscribers                                22
Average percentage viewed (%)       

Описание столбцов в `Date 2022-10-01_2022-11-01 RomanSergeevCom - Table data.csv` согласно документации:

* `Date` - *дата записи;
* `Clicks per end screen element shown (%)` - отношение нажатий на элемент конечной заставки к его показам;
* `End screen element clicks` - число нажатий на элемент конечной заставки;
* `Teaser clicks per card teaser shown (%)` - оношение количества кликов по тизеру к общему числу его показов;
* `Card teaser clicks` - количество кликов по тизеру; если зритель нажал на значок подсказки, это считается кликом по последнему показанному тизеру;
* `Clicks per card shown (%)` - отношение количества кликов по подсказке к общему числу ее показов;
* `Card clicks` - cколько раз зрители нажимали на подсказку;
* `Comments added` - количество комментариев, добавленных на видео или канале; данное число может включать удаленные комментарии, а также чат;
* `Shares` - *сколько пользователей отреагировали, нажав кнопку «Поделиться»;
* `Dislikes` - *сколько пользователей отреагировали, нажав кнопку «Не нравится»;
* `Likes` - изменение общего количества лайков, найденных путем вычитания «лайков, удаленных» из «лайков, добавленных» для выбранного периода времени, области и других фильтров;
* `Subscribers gained` - новые подписчики; общее число пользователей, подписавшихся на канал за выбранный период времени в указанном регионе;
* `Returning viewers` - количество постоянных зрителей;
* `New viewers` - количество пользователей, впервые обнаруживших канал, посмотрев это видео;
* `Unique viewers` - предполагаемое количество пользователей, которые наблюдали за контентом в пределах выбранного периода времени;
* `Impressions click-through rate (%)` - сведения о том, как часто пользователи просматривают видео после того, как увидят его значок;
* `Impressions` - информация о том, сколько раз значки ваших видео были показаны пользователям; учитываются только показы на YouTube;
* `Videos published` - *видео опубликовано;
* `Videos added` - *видео добавлено;
* `Subscribers` - *количество подписчиков;
* `Average percentage viewed (%)` - *средний процент просмотра показывает процент видео, которое просматривается во время воспроизведения;
* `Views ` - общее количество просмотров по выбранному периоду времени, области и других фильтров;
* `Watch time (hours) ` - общее время, в течение которого пользователи смотрели видео;
* `Average view duration` - *средний процент просмотра показывает процент видео, которое просматривается во время воспроизведения;

Датасет содержит данные со статистикой по дням за октябрь 2022 года. В столбце `Clicks per card shown (%)` есть пропуски; явные дубликаты отсутствуют. Типы данных в столбце `Average view duration` не соответствует дкументации.

### Вывод: 

Предварительно можно утверждать, что данных достаточно для проведения исследования. На следующем шаге необходимо провести предобработку данных и проверку на наличие ошибок.

## Предобработка данных

### Изменений названий столбцов

In [5]:
# откорректируем название столбцов в content_stat:
content_stat.columns = [x.lower() for x in content_stat.columns] 
content_stat.columns = [x.replace(" (%)" , "") for x in content_stat.columns]
content_stat.columns = [x.replace(" (hours)" , "") for x in content_stat.columns]
content_stat.columns = [x.replace(' ' , '_') for x in content_stat.columns]
content_stat.columns = [x.replace('-' , '_') for x in content_stat.columns]
display(content_stat.head(5))

Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
0,Total,,,1.83,944.0,0.29,57.0,17.0,11.41,2041.0,62738,15424,47314,31.92,101.0,64.0,2816.0,83.0,3419.0,107075.0,8217.0323,0:04:36,1424477,4.47
1,ZFcv558-PDo,«Подсознание может всё!» Джон Кехо | Саммари ®,"Feb 6, 2019",0.87,25.0,0.15,2.0,0.0,0.0,151.0,6807,1824,4983,36.1,145.0,6.0,183.0,5.0,179.0,8090.0,483.5454,0:03:35,167318,2.98
2,73eE66Hk_1A,«От нуля к единице». Питер Тиль | Саммари ®,"Oct 4, 2022",1.27,33.0,,0.0,0.0,,25.0,6312,4154,2158,30.36,-48.0,12.0,76.0,14.0,291.0,7239.0,362.0123,0:03:00,130379,3.51
3,65Xci_uI_e0,«Магия утра». Хэл Элрод | Саммари,"Jan 30, 2017",1.36,25.0,0.1,1.0,1.0,20.0,89.0,4156,702,3454,31.1,87.0,1.0,174.0,1.0,98.0,5685.0,842.7552,0:08:53,21762,18.7
4,OSjaA86Z4Ho,"«Богатый папа, бедный папа». Роберт Кийосаки ...","Oct 23, 2017",1.57,39.0,0.18,3.0,0.0,0.0,60.0,4247,1276,2971,34.46,59.0,1.0,96.0,4.0,138.0,4817.0,358.6836,0:04:28,103528,2.78


In [6]:
# откорректируем название столбцов в day_stat:
day_stat.columns = [x.lower() for x in day_stat.columns] 
day_stat.columns = [x.replace(" (%)" , "") for x in day_stat.columns]
day_stat.columns = [x.replace(" (hours)" , "") for x in day_stat.columns]
day_stat.columns = [x.replace(' ' , '_') for x in day_stat.columns]
day_stat.columns = [x.replace('-' , '_') for x in day_stat.columns]
display(day_stat.head(5))

Unnamed: 0,date,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,clicks_per_card_shown,card_clicks,comments_added,shares,dislikes,likes,subscribers_gained,returning_viewers,new_viewers,unique_viewers,impressions_click_through_rate,impressions,videos_published,videos_added,subscribers,average_percentage_viewed,views,watch_time,average_view_duration
0,Total,1.83,944,0.29,57,11.41,17,64,2816,83,3419,2041,15424,47314,62738,4.47,1424477,1,1,101,31.92,107075,8217.0323,0:04:36
1,2022-10-14,2.55,36,0.32,2,20.0,1,0,107,3,95,55,810,1420,2230,4.32,42037,0,0,7,30.67,3085,236.706,0:04:36
2,2022-10-19,2.45,46,0.25,2,16.67,1,0,99,1,115,77,929,1725,2654,4.83,46610,0,0,21,32.68,3824,307.4319,0:04:49
3,2022-10-01,2.29,33,0.54,3,25.0,2,1,65,1,98,62,811,1172,1983,3.73,44269,0,0,-2,30.91,2860,204.9421,0:04:17
4,2022-10-20,2.27,38,0.16,1,0.0,0,3,91,4,86,57,944,1527,2471,4.62,44581,0,0,3,32.65,3429,275.8032,0:04:49


### Пропущенные значения
Проанализируем и обработаем пропуски в датасете `content_stat`:

In [7]:
tmp = content_stat.isna().sum().reset_index()
tmp.columns = ['column', 'NaN_count']  
tmp['NaN_percentage'] =  tmp['NaN_count'] / len(content_stat)
display(tmp.style.format({'NaN_percentage': '{:,.1%}'.format}))

Unnamed: 0,column,NaN_count,NaN_percentage
0,content,0,0.0%
1,video_title,1,0.5%
2,video_publish_time,30,13.6%
3,clicks_per_end_screen_element_shown,42,19.0%
4,end_screen_element_clicks,25,11.3%
5,teaser_clicks_per_card_teaser_shown,44,19.9%
6,card_teaser_clicks,25,11.3%
7,card_clicks,25,11.3%
8,clicks_per_card_shown,181,81.9%
9,subscribers_gained,25,11.3%


In [8]:
# проанализируем пропуски в каждом столбце
col_names = content_stat.columns
for name in col_names:
    print("\nСтолбец {}:".format(name))
    display(content_stat[content_stat[name].isna()].head(3))


Столбец content:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate



Столбец video_title:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
0,Total,,,1.83,944.0,0.29,57.0,17.0,11.41,2041.0,62738,15424,47314,31.92,101.0,64.0,2816.0,83.0,3419.0,107075.0,8217.0323,0:04:36,1424477,4.47



Столбец video_publish_time:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
0,Total,,,1.83,944.0,0.29,57.0,17.0,11.41,2041.0,62738,15424,47314,31.92,101.0,64.0,2816.0,83.0,3419.0,107075.0,8217.0323,0:04:36,1424477,4.47
182,McNSJKUm0CA,«Сила воли». Келли Макгонигал | Видео Саммари,,,0.0,0.0,0.0,0.0,,0.0,18,3,15,19.96,0.0,0.0,0.0,0.0,0.0,21.0,2.2945,0:06:33,52,5.77
183,4EU5_xOlLBQ,«Тренируем мозг». Рюта Кавашима | (АНИМАЦИЯ),,,0.0,,0.0,0.0,,0.0,18,4,14,46.2,0.0,0.0,1.0,0.0,0.0,19.0,0.7632,0:02:24,1,0.0



Столбец clicks_per_end_screen_element_shown:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
14,SGoiX_O7MVA,«Как завоёвывать друзей и оказывать влияние на...,"Jun 12, 2019",,0.0,1.2,3.0,1.0,11.11,20.0,1300,587,713,38.8,20.0,0.0,18.0,2.0,55.0,1526.0,110.1869,0:04:19,32526,2.66
22,GR9x_mT3PyQ,«Тренируем мозг». Рюта Кавашима | Саммари ®,"Aug 30, 2018",,0.0,,0.0,0.0,,16.0,827,195,632,43.47,16.0,0.0,26.0,1.0,40.0,995.0,38.4425,0:02:19,9158,7.23
25,Vn79HcNTr6Q,"«Как разговаривать с кем угодно, когда угодно,...","Dec 11, 2019",,0.0,1.85,1.0,0.0,0.0,18.0,796,219,577,37.23,18.0,0.0,8.0,0.0,29.0,893.0,48.4875,0:03:15,9287,6.55



Столбец end_screen_element_clicks:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец teaser_clicks_per_card_teaser_shown:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
2,73eE66Hk_1A,«От нуля к единице». Питер Тиль | Саммари ®,"Oct 4, 2022",1.27,33.0,,0.0,0.0,,25.0,6312,4154,2158,30.36,-48.0,12.0,76.0,14.0,291.0,7239.0,362.0123,0:03:00,130379,3.51
6,F-jx_VF74y8,«Магия Утра». Хэл Элрод | Саммари ®,"May 8, 2018",1.79,51.0,,0.0,0.0,,48.0,3733,1483,2250,38.58,48.0,2.0,147.0,2.0,77.0,4455.0,297.4182,0:04:00,86804,3.34
13,pPenBqd2_Gs,«Как Запоминать (ПОЧТИ) Всё и Всегда». Роб Ист...,"May 12, 2018",0.96,11.0,,0.0,0.0,,16.0,1391,246,1145,47.24,16.0,0.0,44.0,3.0,69.0,1533.0,50.6933,0:01:59,12962,8.83



Столбец card_teaser_clicks:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец card_clicks:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец clicks_per_card_shown:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
2,73eE66Hk_1A,«От нуля к единице». Питер Тиль | Саммари ®,"Oct 4, 2022",1.27,33.0,,0.0,0.0,,25.0,6312,4154,2158,30.36,-48.0,12.0,76.0,14.0,291.0,7239.0,362.0123,0:03:00,130379,3.51
6,F-jx_VF74y8,«Магия Утра». Хэл Элрод | Саммари ®,"May 8, 2018",1.79,51.0,,0.0,0.0,,48.0,3733,1483,2250,38.58,48.0,2.0,147.0,2.0,77.0,4455.0,297.4182,0:04:00,86804,3.34
7,m5tZweOIXB4,«Квадрант денежного потока». Роберт Кийосаки |...,"Oct 10, 2018",2.93,57.0,0.0,0.0,0.0,,85.0,2563,946,1617,47.79,84.0,3.0,128.0,2.0,128.0,3251.0,198.5316,0:03:39,41230,3.67



Столбец subscribers_gained:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец unique_viewers:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate



Столбец returning_viewers:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate



Столбец new_viewers:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate



Столбец average_percentage_viewed:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
195,amAL38R6mpA,Дудл видео для бизнеса | 18+,"Jul 8, 2019",,0.0,,0.0,0.0,,0.0,0,0,0,,0.0,0.0,3.0,0.0,0.0,0.0,0.0,,0,
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0



Столбец subscribers:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец comments_added:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец shares:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец dislikes:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец likes:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец views:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец watch_time:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0
198,75TQj-QlTuU,Yurii 08 Dramatic Biography | Music by Roma0Se...,,,,,,,,,0,0,0,,,,,,,,,,17,0.0



Столбец average_view_duration:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
195,amAL38R6mpA,Дудл видео для бизнеса | 18+,"Jul 8, 2019",,0.0,,0.0,0.0,,0.0,0,0,0,,0.0,0.0,3.0,0.0,0.0,0.0,0.0,,0,
196,1lwVlx9jZWo,Yurii 02 Epic violin | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,2,0.0
197,2YAuw4WQqeI,Yurii 11 Inspirational | Music by RomanSergeevCom,,,,,,,,,0,0,0,,,,,,,,,,13,0.0



Столбец impressions:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate



Столбец impressions_click_through_rate:


Unnamed: 0,content,video_title,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
192,1nPqKbeV2K4,«Deadline». Том ДеМарко | Видео Саммари,,,0.0,,0.0,0.0,,0.0,5,0,5,8.43,0.0,0.0,0.0,0.0,0.0,5.0,0.036,0:00:25,0,
194,ycmpeKVSGfI,Модернизация образования | Игорь Рыбаков,,,0.0,,0.0,0.0,,0.0,1,0,1,23.91,0.0,0.0,0.0,0.0,0.0,1.0,0.0087,0:00:31,0,
195,amAL38R6mpA,Дудл видео для бизнеса | 18+,"Jul 8, 2019",,0.0,,0.0,0.0,,0.0,0,0,0,,0.0,0.0,3.0,0.0,0.0,0.0,0.0,,0,


Пропуск в столбце `Video publish time` связан с результирующим значением Total соответствующей строки, ее нужно отбросить. Согласно комментариям заказчика пропуски в столбце `video_publish_time`связаны с видео, которые загружны по закрытым ссылкам, их нет в публичном доступе. Их следует исключить из датасета.
Пропуски в остальных столбцах связаны с нулевыми значенями метрик, заполним их `0`:

In [9]:
content_stat = content_stat.dropna(subset=['video_title', 'video_publish_time'])
content_stat.average_view_duration = content_stat.average_view_duration.fillna("00:00:00")
content_stat = content_stat.fillna(0)

In [10]:
content_stat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 191 entries, 1 to 195
Data columns (total 24 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   content                              191 non-null    object 
 1   video_title                          191 non-null    object 
 2   video_publish_time                   191 non-null    object 
 3   clicks_per_end_screen_element_shown  191 non-null    float64
 4   end_screen_element_clicks            191 non-null    float64
 5   teaser_clicks_per_card_teaser_shown  191 non-null    float64
 6   card_teaser_clicks                   191 non-null    float64
 7   card_clicks                          191 non-null    float64
 8   clicks_per_card_shown                191 non-null    float64
 9   subscribers_gained                   191 non-null    float64
 10  unique_viewers                       191 non-null    int64  
 11  returning_viewers               

Проанализиурем пропуски в столбце `clicks_per_card_shown` датасета `day_stat`:

In [11]:
display(day_stat[day_stat['clicks_per_card_shown'].isna()].head(3))

Unnamed: 0,date,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,clicks_per_card_shown,card_clicks,comments_added,shares,dislikes,likes,subscribers_gained,returning_viewers,new_viewers,unique_viewers,impressions_click_through_rate,impressions,videos_published,videos_added,subscribers,average_percentage_viewed,views,watch_time,average_view_duration
23,2022-10-08,1.62,20,0.0,0,,0,5,95,2,124,60,833,1179,2012,3.9,42394,0,0,-10,29.1,2766,200.8965,0:04:21
25,2022-10-29,1.53,24,0.0,0,,0,2,59,0,79,63,771,1440,2211,4.6,41362,0,0,8,33.15,3062,238.8405,0:04:40
29,2022-10-30,1.29,21,0.0,0,,0,1,68,1,106,73,948,1589,2537,4.8,46261,0,0,-9,30.88,3463,249.3828,0:04:19


Пропуски связаны с нулевым значением метрики, заполним их `0`:

In [12]:
day_stat = day_stat.fillna(0)
day_stat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 24 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   date                                 32 non-null     object 
 1   clicks_per_end_screen_element_shown  32 non-null     float64
 2   end_screen_element_clicks            32 non-null     int64  
 3   teaser_clicks_per_card_teaser_shown  32 non-null     float64
 4   card_teaser_clicks                   32 non-null     int64  
 5   clicks_per_card_shown                32 non-null     float64
 6   card_clicks                          32 non-null     int64  
 7   comments_added                       32 non-null     int64  
 8   shares                               32 non-null     int64  
 9   dislikes                             32 non-null     int64  
 10  likes                                32 non-null     int64  
 11  subscribers_gained                

### Изменение типов данных

Изменим типы данных в датасете `content_stat`:

In [13]:
# фукция to_seconds переводит строковое представление времени в секунды
def to_seconds(df):
    a = tm.strptime(df['average_view_duration'], "%H:%M:%S")
    return dt.timedelta(hours=a.tm_hour, minutes=a.tm_min, seconds=a.tm_sec).seconds

content_stat['video_publish_time'] = pd.to_datetime(content_stat['video_publish_time'], format='%b %d, %Y', errors='ignore')
content_stat['end_screen_element_clicks'] = content_stat['end_screen_element_clicks'].astype('int16')
content_stat['card_teaser_clicks'] = content_stat['card_teaser_clicks'].astype('int8')
content_stat['card_clicks'] = content_stat['card_clicks'].astype('int16')
content_stat['subscribers_gained'] = content_stat['subscribers_gained'].astype('int16')
content_stat['subscribers'] = content_stat['subscribers'].astype('int8')
content_stat['shares'] = content_stat['shares'].astype('int16')
content_stat['dislikes'] = content_stat['dislikes'].astype('int8')
content_stat['likes'] = content_stat['likes'].astype('int16')
content_stat['views'] = content_stat['views'].astype('int32')
#content_stat['average_view_duration'] = pd.to_datetime(content_stat['average_view_duration'], format='%H:%M:%S').dt.time
content_stat['average_view_duration'] = content_stat.apply(to_seconds, axis=1)
content_stat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 191 entries, 1 to 195
Data columns (total 24 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 0   content                              191 non-null    object        
 1   video_title                          191 non-null    object        
 2   video_publish_time                   191 non-null    datetime64[ns]
 3   clicks_per_end_screen_element_shown  191 non-null    float64       
 4   end_screen_element_clicks            191 non-null    int16         
 5   teaser_clicks_per_card_teaser_shown  191 non-null    float64       
 6   card_teaser_clicks                   191 non-null    int8          
 7   card_clicks                          191 non-null    int16         
 8   clicks_per_card_shown                191 non-null    float64       
 9   subscribers_gained                   191 non-null    int16         
 10  unique_viewers

Изменим типы данных в датасете day_stat:

In [14]:
# отбросим из датасета результирующую строку со значением Total
day_stat = day_stat[day_stat.date != 'Total']

day_stat['date'] = pd.to_datetime(day_stat['date'], format='%Y-%m-%d')
#day_stat['average_view_duration'] = pd.to_datetime(day_stat['average_view_duration'], format='%H:%M:%S').dt.time
day_stat['average_view_duration'] = day_stat.apply(to_seconds, axis=1)
day_stat.info()

# отсортируем по столбцу date в хронологическом порядке
day_stat = day_stat.sort_values(by='date')

<class 'pandas.core.frame.DataFrame'>
Int64Index: 31 entries, 1 to 31
Data columns (total 24 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 0   date                                 31 non-null     datetime64[ns]
 1   clicks_per_end_screen_element_shown  31 non-null     float64       
 2   end_screen_element_clicks            31 non-null     int64         
 3   teaser_clicks_per_card_teaser_shown  31 non-null     float64       
 4   card_teaser_clicks                   31 non-null     int64         
 5   clicks_per_card_shown                31 non-null     float64       
 6   card_clicks                          31 non-null     int64         
 7   comments_added                       31 non-null     int64         
 8   shares                               31 non-null     int64         
 9   dislikes                             31 non-null     int64         
 10  likes           

### Ошибки и неявные дубликаты

Проанализиурем данные датасетов на ошибки и галичие неявных дубликатов:

In [15]:
content_stat.describe()

Unnamed: 0,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
count,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0,191.0
mean,1.543037,4.942408,0.290262,0.298429,0.089005,2.089843,7.670157,461.256545,171.926702,289.335079,35.906178,4.39267,0.335079,14.73822,0.434555,17.900524,560.361257,43.00487,263.968586,7451.492147,4.377068
std,1.92237,9.573639,0.882367,1.085768,0.559544,10.451806,18.498964,909.49572,372.983201,607.92756,11.079621,17.894655,1.115807,28.619179,1.323735,33.838868,1097.049109,94.995489,107.773393,18601.492084,2.358015
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-124.0,0.0,0.0,-2.0,-1.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,75.0,31.0,36.0,27.24,0.0,0.0,2.0,0.0,3.0,89.0,7.4811,191.0,1272.5,2.815
50%,1.04,1.0,0.0,0.0,0.0,0.0,2.0,192.0,78.0,102.0,35.61,2.0,0.0,5.0,0.0,8.0,245.0,15.5651,255.0,2739.0,4.23
75%,2.17,5.0,0.0,0.0,0.0,0.0,6.0,432.5,163.5,230.5,43.565,6.0,0.0,15.0,0.0,16.0,561.5,39.24875,323.0,5726.5,5.2
max,11.59,62.0,6.47,13.0,7.0,100.0,151.0,6807.0,4154.0,4983.0,67.98,87.0,12.0,183.0,14.0,291.0,8090.0,842.7552,574.0,167318.0,18.7


In [16]:
day_stat.describe()

Unnamed: 0,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,clicks_per_card_shown,card_clicks,comments_added,shares,dislikes,likes,subscribers_gained,returning_viewers,new_viewers,unique_viewers,impressions_click_through_rate,impressions,videos_published,videos_added,subscribers,average_percentage_viewed,views,watch_time,average_view_duration
count,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0
mean,1.83871,30.451613,0.283548,1.83871,10.998387,0.548387,2.064516,90.83871,2.677419,110.290323,65.83871,948.193548,1545.419355,2493.612903,4.493226,45950.870968,0.032258,0.032258,3.258065,31.883548,3454.032258,265.065561,276.83871
std,0.362351,6.999232,0.226282,1.529636,13.988611,0.675214,2.235106,18.348654,2.521733,37.475497,9.504724,339.370144,219.19364,530.574637,0.359818,9907.676898,0.179605,0.179605,15.56699,1.013119,682.066345,43.396222,13.092229
min,1.23,18.0,0.0,0.0,0.0,0.0,0.0,59.0,-1.0,57.0,49.0,757.0,1172.0,1983.0,3.73,38401.0,0.0,0.0,-49.0,29.1,2766.0,200.8965,239.0
25%,1.61,26.0,0.15,1.0,0.0,0.0,1.0,74.5,1.0,88.0,60.0,814.5,1464.5,2283.5,4.235,41603.0,0.0,0.0,-4.5,31.43,3091.5,244.3997,270.0
50%,1.77,30.0,0.25,2.0,0.0,0.0,2.0,90.0,2.0,109.0,65.0,878.0,1527.0,2471.0,4.61,43525.0,0.0,0.0,3.0,31.83,3389.0,261.7245,279.0
75%,2.145,35.5,0.365,2.0,20.0,1.0,2.5,107.0,5.0,124.5,72.5,936.5,1606.0,2555.0,4.69,46484.5,0.0,0.0,14.5,32.455,3574.0,277.21785,284.0
max,2.55,46.0,0.75,6.0,50.0,2.0,11.0,128.0,8.0,277.0,84.0,2684.0,2445.0,5129.0,5.2,94418.0,1.0,1.0,25.0,33.77,6776.0,451.4227,300.0


In [17]:
print('Дата начала выборки в датасете day_stat:', day_stat.date.min().strftime("%Y-%m-%d"))
print('Дата окончания выборки в датасете day_stat:', day_stat.date.max().strftime("%Y-%m-%d"))

Дата начала выборки в датасете day_stat: 2022-10-01
Дата окончания выборки в датасете day_stat: 2022-10-31


Проанализируем наличие неявных дубликатов в столбцах `video_title` датасетов:

In [18]:
content_stat.sort_values(by='video_title').video_title.unique()

array(['«10 Законов увеличения прибыли». Ирина Нарчемашвили | Саммари ®',
       '«13 вещей, которые не делают сильные духом люди». Эми Морин | Саммари',
       '«30 правил настоящего мечтателя». Ева Кац | Саммари ®',
       '«45 татуировок менеджера». Часть 1. Максим Батырев | Саммари ®',
       '«45 татуировок менеджера». Часть 2. Максим Батырев | Саммари ®',
       '«Alibaba». Джек Ма | Саммари ®',
       '«Deadline». Том ДеМарко | Саммари ®',
       '«SCRUM. Революционный метод управления проектами» Джефф Сазерленд | Саммари ®',
       '«Scrum». Джефф Сазерленд | Саммари',
       '«YouTube для бизнеса». Майкл Миллер | Саммари ®',
       '«YouTube. Как упаковать канал и бесплатно выйти в топ». Роман Сергеев | Саммари ®',
       '«Администратор Instagram». Дмитрий Кудряшов | Саммари ®',
       '«Альберт Эйнштейн». Уолтер Айзексон | Саммари',
       '«Антихрупкость». Нассим Талеб | Саммари ®',
       '«Без компромиссов». Стивен Хилл | Саммари',
       '«Бенджамин Франклин». Уолтер Айз

**Выводы:** ошибки и неявные дубликаты отсутствуют.

## Добавление параметров

In [19]:
# добавим столбец с адресом ссылки на видео
content_stat.insert(2,'link',0) 
content_stat['link'] = 'https://www.youtube.com/watch?v=' + content_stat['content']
display(content_stat.head(5))

Unnamed: 0,content,video_title,link,video_publish_time,clicks_per_end_screen_element_shown,end_screen_element_clicks,teaser_clicks_per_card_teaser_shown,card_teaser_clicks,card_clicks,clicks_per_card_shown,subscribers_gained,unique_viewers,returning_viewers,new_viewers,average_percentage_viewed,subscribers,comments_added,shares,dislikes,likes,views,watch_time,average_view_duration,impressions,impressions_click_through_rate
1,ZFcv558-PDo,«Подсознание может всё!» Джон Кехо | Саммари ®,https://www.youtube.com/watch?v=ZFcv558-PDo,2019-02-06,0.87,25,0.15,2,0,0.0,151,6807,1824,4983,36.1,-111,6.0,183,5,179,8090,483.5454,215,167318,2.98
2,73eE66Hk_1A,«От нуля к единице». Питер Тиль | Саммари ®,https://www.youtube.com/watch?v=73eE66Hk_1A,2022-10-04,1.27,33,0.0,0,0,0.0,25,6312,4154,2158,30.36,-48,12.0,76,14,291,7239,362.0123,180,130379,3.51
3,65Xci_uI_e0,«Магия утра». Хэл Элрод | Саммари,https://www.youtube.com/watch?v=65Xci_uI_e0,2017-01-30,1.36,25,0.1,1,1,20.0,89,4156,702,3454,31.1,87,1.0,174,1,98,5685,842.7552,533,21762,18.7
4,OSjaA86Z4Ho,"«Богатый папа, бедный папа». Роберт Кийосаки ...",https://www.youtube.com/watch?v=OSjaA86Z4Ho,2017-10-23,1.57,39,0.18,3,0,0.0,60,4247,1276,2971,34.46,59,1.0,96,4,138,4817,358.6836,268,103528,2.78
5,H3tIiHXFV3s,«Выйди из зоны комфорта». Брайан Трейси | Саммари,https://www.youtube.com/watch?v=H3tIiHXFV3s,2017-10-11,0.23,4,0.15,1,0,0.0,132,3635,682,2953,33.11,-124,1.0,162,2,186,4618,501.196,390,22263,15.03


In [20]:
# подготовим данные для выгрузки
column_names = list(day_stat.columns)
column_names.remove('new_viewers')
column_names.remove('returning_viewers')

day_stat_rot = day_stat.melt(id_vars=column_names, value_vars=['new_viewers', 'returning_viewers'])
day_stat_rot = day_stat_rot.rename(columns={'variable': 'viewers_type', 'value': 'viewers_count'})

day_stat_rot = day_stat_rot.sort_values(by='date')

### Вывод:

1. Данные готовы для построения дашборда и проведения исследовательского анализа. 
2. Данные соотносятся с метриками, заявленными со стороны заказчика, на возможность их отображения в дашборде.

## Выгрузка данных для дашборда

In [21]:
content_stat.to_csv('content_stat_data.csv', index=False, encoding='utf-8')
day_stat_rot.to_csv('day_stat_rot_data.csv', index=False, encoding='utf-8')