## Merge Circle chart Data

매주 둘째주 목요일 오전 10시에 써클차트(https://circlechart.kr/)에 음반판매량이 업데이트 됨

써클차트는 국내 및 글로벌 음악 서비스 플랫폼의 K-pop 데이터를 정식 공급받는 국내 음악차트이며, 써클차트의 월간 음반 판매량은 연예기획사들의 매달 실적을 추정할 수 있는 좋은 지표입니다.

In [1]:
CHART_PATH = "./data/gaon_chart_all.csv"
PRODUCER_PATH = "./data/producer_all.csv"
OUTPUT_PATH = "./data/gaon_chart_all_cleanup.xlsx"

In [2]:
import pandas as pd
import numpy as np

sales = pd.read_csv(
    CHART_PATH, usecols=["selector", "production", "title", "artist", "sales_volume"]
)
sales.rename(columns={"selector": "month"}, inplace=True)
sales = sales.astype({"month": "int32"}, errors="raise")
sales[["monthly_sales", "annual_sales"]] = sales["sales_volume"].str.split(
    "/", 1, expand=True
)
sales = sales.drop(columns=["sales_volume"]).drop_duplicates(
    ["title", "artist", "month"]
)
sales = sales.reindex(
    columns=["month", "title", "artist", "production", "monthly_sales", "annual_sales"]
)
sales = sales.sort_values(by=["month", "monthly_sales"])
sales = sales.sort_values(by=["monthly_sales"])
sales.head()

Unnamed: 0,month,title,artist,production,monthly_sales,annual_sales
4914,1803,Max&Match (Repackage),오드아이써클 (이달의 소녀),윈드밀미디어,1000,1000
4356,1802,종현 소품집 `이야기 Op.2` (SMC),종현 (JONGHYUN),SM Entertainment,1000,1142
3446,1908,The Movie Star,비와이(BewhY),Kakao Entertainment,1000,1000
3299,1802,좋아 - The 1st Album (SMC),종현 (JONGHYUN),SM Entertainment,1000,1247
3224,1802,Married To The Music - The 4th Album R...,샤이니 (SHINee),SM Entertainment,1000,1000


In [3]:
## Get producer data from raw gaon chart data
producer = pd.read_csv(PRODUCER_PATH, usecols=["link", "artist", "producer"])
producer.rename(columns={"link": "month"}, inplace=True)
producer = producer.astype({"month": "int32"}, errors="raise")
producer["artist"] = producer["artist"].str.split("|", 1).str[0]
producer = producer.drop_duplicates(subset=["artist", "producer"], keep="first")
## Caution: Some artists has multiple agencies that has changed
producer = producer.reindex(columns=["artist", "month", "producer"])
producer = producer.sort_values(by=["artist", "month"])
producer.head()

Unnamed: 0,artist,month,producer
24,#안녕,2011,시애틀뮤직
181,(여자)아이들,2004,Stone Music Entertainment
60,(여자)아이들,2101,큐브엔터테인먼트
2741,"(여자)아이들, Madison Beer, Jaira Burns",1811,라이엇 게임즈
103,10cm,1803,매직스트로베리사운드


In [4]:
## Search producer from producer dataframe and insert into new_sales
new_sales = pd.merge(
    left=sales, right=producer, how="left", on="artist"
)  # merge two dataframe
## clean up the data
new_sales["month_y"] = new_sales["month_y"].fillna(1800)
new_sales["producer"] = new_sales["producer"].fillna("미상")
new_sales = new_sales.astype({"month_y": "int32"}, errors="raise")
## find the producer artist belonged to before the release of the album
new_sales[new_sales["month_x"] >= new_sales["month_y"]]
new_sales = new_sales.sort_values("month_y", ascending=True).drop_duplicates(
    ["title", "artist", "month_x"]
)
new_sales.rename(columns={"month_x": "month"}, inplace=True)
new_sales = new_sales.drop(columns=["month_y"])
new_sales = new_sales.reindex(
    columns=[
        "month",
        "title",
        "artist",
        "producer",
        "production",
        "monthly_sales",
        "annual_sales",
    ]
)
new_sales = new_sales.sort_values(by=["month", "monthly_sales"])
new_sales

Unnamed: 0,month,title,artist,producer,production,monthly_sales,annual_sales
78,1801,Hey Mama! - The 1st Mini Album,EXO-CBX (첸백시),SM Entertainment,지니뮤직,1045,1045
112,1801,Spotlight,VAV,미상,지니뮤직,1062,1062
184,1801,The 5th Mini Album Repackage `RAINBOW`...,여자친구 (GFRIEND),쏘스뮤직,Kakao Entertainment,1100,1100
283,1801,The 5th Mini Album Repackage `RAINBOW`,여자친구 (GFRIEND),쏘스뮤직,Kakao Entertainment,1140,1140
364,1801,Red Diary Page.1,볼빨간사춘기,더하기미디어,Kakao Entertainment,1179,1179
...,...,...,...,...,...,...,...
7967,2208,THE ALBUM,BLACKPINK,YG Entertainment,YG PLUS,9697,75921
7976,2208,Sequence : 7272,첫사랑(CSR),미상,Kakao Entertainment,9714,17414
8003,2208,NCT #127 WE ARE SUPERHUMAN - The 4th Mini Albu...,NCT 127,SM Entertainment,Dreamus,9780,10070
8054,2208,SPECIAL ALBUM [Storage of ONF] (META),온앤오프 (ONF),WM엔터테인먼트,Sony Music,9963,9963


In [5]:
sales_table = pd.pivot_table(
    new_sales,
    values="monthly_sales",
    index=["producer", "artist", "title"],
    columns="month",
    aggfunc=np.sum,
).fillna(0)
sales_table

Unnamed: 0_level_0,Unnamed: 1_level_0,month,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,...,2111,2112,2201,2202,2203,2204,2205,2206,2207,2208
producer,artist,title,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
AOMG,GRAY (그레이),grayground.,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
AOMG,로꼬 (LOCO),HELLO,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
AOMG,사이먼 도미닉 (Simon Dominic),DARKROOM,0,0,0,0,0,0,1000,1705,0,0,...,0,0,0,0,0,0,0,0,0,0
AOMG,우원재,BLACK OUT,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
AOMG,우원재,af,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
하이업엔터테인먼트,STAYC(스테이씨),WE NEED LOVE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,224341,31990
하이업엔터테인먼트,STAYC(스테이씨),YOUNG-LUV.COM,0,0,0,0,0,0,0,0,0,0,...,0,0,0,212211,19810,8078,4366,0,0,0
후크엔터테인먼트,이선희,르 데르니에 아무르 (마지막 사랑),0,0,0,0,0,3829,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
후크엔터테인먼트,이선희,안부,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(OUTPUT_PATH, engine="xlsxwriter")
sales_table.to_excel(writer, sheet_name="cleanup")
new_sales.to_excel(writer, sheet_name="sales_with_producer")
sales.to_excel(writer, sheet_name="raw_sales")
producer.to_excel(writer, sheet_name="raw_producer")
writer.save()