## TV Rating Analysis on Korea


#### First step : Collect Data from Nielsen Korea

Nielsen Korea publish the top 20 TV programs by ratings on Korea every day.  
The data is updated every day. (www.nielsenkorea.co.kr)

We can collect the data from Nielsen Korea website with requests and BeautifulSoup.

In [None]:
# import libraries
from bs4 import BeautifulSoup
import requests


- Nielsen's web page shows data by calling url ("http://www.nielsenkorea.co.kr/tv_terrestrial_day.asp?menu=Tit_1") 
- with sub_menu(tv category), area, and date as parameters.
- We can get the data by using requests and BeautifulSoup.

In [3]:
# set URL and parameters for sample data
url = f"http://www.nielsenkorea.co.kr/tv_terrestrial_day.asp?menu=Tit_1"
url += f"&sub_menu=1_1"  # terrestrial TV
url += f"&area=01" # GSE(Greater Seoul Area)
url += f"&begin_date=20260207" # Date

![Nielsen Korea Webpage](./images/webpage.jpg "Nielsen Korea")

In [8]:
# get page and parse html with BS4
response = requests.get(url)        
parsed_html = BeautifulSoup(response.content.decode(
    'utf-8', 'replace'), 'html.parser')


![Data Table](./images/data_table.jpg "Data Table on the page")

In [9]:
# find table and get rows for table
ranking_table = parsed_html.body.find_all(
    'table', attrs={'class': 'ranking_tb'})

rows = ranking_table[0].find_all('tr', attrs={'class': None})


In [None]:
# check the first row => title row
rows[0]

<tr>
<td class="tb_title" colspan="4"> 가구시청률 TOP 20 </td>
</tr>

In [19]:
# check the second row => show additional information for table (area, unit)
rows[1]

<tr>
<td align="right" class="txt_9pt" colspan="4">
                
                                (분석기준: 수도권, 가구, 단위:%)
                            

                </td></tr>

In [21]:
# from the thirds row, it has the data we need
# Rank / Channel / Program / Rating
rows[2]

<tr>
<td class="tb_txt_center">1	</td>
<td class="tb_txt_center">KBS2	</td>
<td class="tb_txt">KBS2주말드라마(사랑을처방해드립니다)	</td>
<td align="center" class="percent">
                        14.7                   
                        </td>
</tr>

In [None]:
# show all ratings data on table
i = 0
for row in rows:
    i += 1
    if (i <= 2):
        # skip headers
        continue

    # extract row
    cells = row.find_all('td')
    rank = cells[0].text.strip()
    channel = cells[1].text.strip()
    programme = cells[2].text.strip()
    rating = cells[3].text.strip()

    # convert data to dictionary
    item = {
        'rank': int(rank),
        'channel': channel,
        'rating': float(rating),
        'programme': programme,
    }

    print(
        f"{rank} {channel} {programme} {rating}")


1 KBS2 KBS2주말드라마(사랑을처방해드립니다) 14.7
2 MBC MBC금토드라마(판사이한영) 12.1
3 KBS2 KBS2토일미니시리즈(은애하는도적님아) 6.8
4 MBC MBC뉴스데스크 5.8
5 KBS2 불후의명곡 5.3
6 KBS2 살림하는남자들 4.6
7 KBS1 KBS9시뉴스 4.4
7 MBC 놀면뭐하니 4.4
9 KBS1 동네한바퀴 4.3
10 KBS1 시니어토크쇼황금연못 4.1
11 KBS1 KBS뉴스(09:30) 3.9
12 MBC 전지적참견시점 3.8
13 SBS SBS8뉴스 3.7
14 SBS 그것이알고싶다 3.6
15 KBS1 걸어서세계속으로 3.5
15 KBS1 남북의창 3.5
17 KBS1 특파원보고세계는지금 3.2
17 MBC MBC금토드라마(판사이한영<재>) 3.2
19 KBS1 KBS뉴스광장2부 3.1
19 KBS1 KBS뉴스(19:00) 3.1


In [24]:
# make a get ratings functions for multiple use
def get_ratings(date, category, area, test=False):
    """ Get ratings from nielsen Korea """
    """ Returns top rank programs for date, category, area """
    """ date : string (YYYYMMDD) """
    """ category : integer, Channel category(1: terrestrial, 3: cable, 2: general) """
    """ area : integer, Area Code (0: national wide, 1: Great Seoul Area) """

    url_string = f"http://www.nielsenkorea.co.kr/tv_terrestrial_day.asp?menu=Tit_1"
    url_string += f"&sub_menu={category}_1"
    url_string += f"&area={area}&begin_date={date}"

    try:
        response = requests.get(url_string)
        parsed_html = BeautifulSoup(response.content.decode(
            'utf-8', 'replace'), 'html.parser')
        ranking_tb = parsed_html.body.find_all(
            'table', attrs={'class': 'ranking_tb'})

        rows = ranking_tb[0].find_all('tr', attrs={'class': None})
    except:
        print("Eror happend while accessing data on Nielsen. : ", date, category, area)
        return []

    result = []
    i = 0
    for row in rows:
        i += 1
        if (i <= 2):
            # skip headers
            continue

        # extract row
        items = row.find_all('td')
        rank = items[0].text.strip()
        channel = items[1].text.strip()
        programme = items[2].text.strip()
        rating = items[3].text.strip()

        # convert data to dictionary
        item = {
            'rank': int(rank),
            'channel': channel,
            'rating': float(rating),
            'programme': programme,
            'date': date[:4] + '-' + date[4:6] + '-' + date[6:8]
        }

        if test:
            print(
                f"TEST MODE : {date} {rank} {channel} {programme} {rating}")

        result.append(item)

    return result
