# 2022年NCT成员Instagram数据报告

Instagram是海外最大的社交平台之一，其点赞/评论数据经常作为艺人商业价值的重要参考指标。

今天我们就来分析一下NCT成员的Instagram数据。

## 工具准备
抓取Instagram数据的工具网上有很多，我选取了一个名为`instaloader`的包。

In [None]:
# 安装instaloader
!pip3 install instaloader

Collecting instaloader
  Downloading instaloader-4.9.tar.gz (59 kB)
[?25l[K     |█████▌                          | 10 kB 19.4 MB/s eta 0:00:01[K     |███████████                     | 20 kB 11.0 MB/s eta 0:00:01[K     |████████████████▋               | 30 kB 9.2 MB/s eta 0:00:01[K     |██████████████████████▏         | 40 kB 4.5 MB/s eta 0:00:01[K     |███████████████████████████▋    | 51 kB 5.0 MB/s eta 0:00:01[K     |████████████████████████████████| 59 kB 3.3 MB/s 
Building wheels for collected packages: instaloader
  Building wheel for instaloader (setup.py) ... [?25l[?25hdone
  Created wheel for instaloader: filename=instaloader-4.9-py3-none-any.whl size=61065 sha256=ad1fceedf64b744392d95e51dca87dcb1fc61a18d8e5965e0904a73b0e0c2891
  Stored in directory: /root/.cache/pip/wheels/85/fb/29/7d540da52b65c8d4718cbb0e24a057d2c0071174716391bd85
Successfully built instaloader
Installing collected packages: instaloader
Successfully installed instaloader-4.9


除了`instaloader`之外，我们也需要载入一些其他的包，这些包的用处我们在后面会提到。

In [None]:
import instaloader
import pandas as pd
import time

下面这段代码是我在网上找到的，主要功能包括爬取指定账户的关注、粉丝、发文等等。

其中`get_user_info`是我们重点关注的，我们后面会用它来获取各位艺人的信息。

In [43]:
from datetime import datetime
from itertools import dropwhile, takewhile
import csv

class GetInstagramProfile():
    def __init__(self) -> None:
        self.L = instaloader.Instaloader()
        self.L.login(input("你的Instagram ID："), input("你的Instagram密码："))

    def download_users_profile_picture(self,username):
        self.L.download_profile(username, profile_pic_only=True)

    def download_users_posts_with_periods(self,username):
        posts = instaloader.Profile.from_username(self.L.context, username).get_posts()
        SINCE = datetime(2021, 8, 28)
        UNTIL = datetime(2021, 9, 30)

        for post in takewhile(lambda p: p.date > SINCE, dropwhile(lambda p: p.date > UNTIL, posts)):
            self.L.download_post(post, username)

    def download_hastag_posts(self, hashtag):
        for post in instaloader.Hashtag.from_name(self.L.context, hashtag).get_posts():
            self.L.download_post(post, target='#'+hashtag)

    def get_user_info(self, user_name):
        return instaloader.Profile.from_username(self.L.context, user_name)
        
    def get_users_followers(self,user_name):
        profile = instaloader.Profile.from_username(self.L.context, user_name)
        for followee in profile.get_followers():
            username = followee.username
            print(username)

    def get_users_followings(self,user_name):
        profile = instaloader.Profile.from_username(self.L.context, user_name)
        for followee in profile.get_followees():
            username = followee.username
            print(username)

    def get_post_comments(self,username):
        posts = instaloader.Profile.from_username(self.L.context, username).get_posts()
        for post in posts:
            for comment in post.get_comments():
                print("comment.id  : "+str(comment.id))
                print("comment.owner.username  : "+comment.owner.username)
                print("comment.text  : "+comment.text)
                print("comment.created_at_utc  : "+str(comment.created_at_utc))
                print("************************************************")

    def get_post_info_csv(self,username):
        with open(username+'.csv', 'w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            posts = instaloader.Profile.from_username(self.L.context, username).get_posts()
            for post in posts:
                print("post date: "+str(post.date))
                print("post profile: "+post.profile)
                print("post caption: "+post.caption)
                print("post location: "+str(post.location))
                
                posturl = "https://www.instagram.com/p/"+post.shortcode
                print("post url: "+posturl)
                writer.writerow(["post",post.mediaid, post.profile, post.caption, post.date, post.location, posturl,  post.typename, post.mediacount, post.caption_hashtags, post.caption_mentions, post.tagged_users, post.likes, post.comments,  post.title,  post.url ])
            
                for comment in post.get_comments():
                    writer.writerow(["comment",comment.id, comment.owner.username,comment.text,comment.created_at_utc])
                    print("comment username: "+comment.owner.username)
                    print("comment text: "+comment.text)
                    print("comment date : "+str(comment.created_at_utc))
                print("\n\n")

## 创建爬虫并获取信息
现在我们就来创建一个爬虫。点击执行下面这行代码之后，系统会让你输入你的Instagram账号及密码。

In [44]:
loader = GetInstagramProfile()

截止我写这篇小作文的时候（北京时间5月21日），NCT的23名成员中有19位开通了Instagram，他们在Instagram上的id分别是：

In [None]:
nct_member_instagram_ids = {
    'lucas_xx444',
    'wwiinn_7',
    'yangyang_x2',
    'i_m_hendery',
    'djxiao_888',
    'kun11xd',
    'tenlee_1001',
    'sugaringcandy',
    'haechanahceah',
    'mo.on_air',
    'yellow_3to3',
    '_shotaroo_',
    'taeoxo_nct',
    'onyourm__ark',
    'do0_nct',
    'na.jaemin0813',
    'yuu_taa_1026',
    '_jeongjaehyun',
    'johnnyjsuh'
}
len(nct_member_instagram_ids)

19

然后我们来获取他们每个人的信息，并且存储在一个名为`profiles`的`dict`中。

In [None]:
profiles = {id: loader.get_user_info(id) for id in nct_member_instagram_ids}
profiles

{'_jeongjaehyun': <Profile _jeongjaehyun (30819697525)>,
 '_shotaroo_': <Profile _shotaroo_ (47383683179)>,
 'djxiao_888': <Profile djxiao_888 (26884960585)>,
 'do0_nct': <Profile do0_nct (43802283829)>,
 'haechanahceah': <Profile haechanahceah (52136302807)>,
 'i_m_hendery': <Profile i_m_hendery (29833349541)>,
 'johnnyjsuh': <Profile johnnyjsuh (30836608367)>,
 'kun11xd': <Profile kun11xd (26471022476)>,
 'lucas_xx444': <Profile lucas_xx444 (21465052749)>,
 'mo.on_air': <Profile mo.on_air (48418985942)>,
 'na.jaemin0813': <Profile na.jaemin0813 (31025529530)>,
 'onyourm__ark': <Profile onyourm__ark (45321619558)>,
 'sugaringcandy': <Profile sugaringcandy (52651073589)>,
 'taeoxo_nct': <Profile taeoxo_nct (45766699924)>,
 'tenlee_1001': <Profile tenlee_1001 (21680090333)>,
 'wwiinn_7': <Profile wwiinn_7 (38738692372)>,
 'yangyang_x2': <Profile yangyang_x2 (32337725717)>,
 'yellow_3to3': <Profile yellow_3to3 (48210345716)>,
 'yuu_taa_1026': <Profile yuu_taa_1026 (30670957326)>}

在这里我们只想分析划人们2022年之后的发贴数据，所以我写了一个函数，筛选出他们2022年以来的发贴，并且返回每个贴子的发贴时间、点赞数以及评论数。

In [None]:
def get_data_later_than_2022(uid):
  res = []
  posts = profiles[uid].get_posts() ## 获取所有发贴
  for post in posts:
    date = post.date
    if date > datetime(2021, 12, 31): ## 筛选出2021年12月31日之后的贴
      ## 将发贴人、发贴日期、点赞数和评论数存储下来
      res.append((uid, date, post.likes, post.get_comments().count))
    else:
      return res
  return res

下面我们就可以来爬取2022年划人的所有发贴了，这一步耗时较长。

In [None]:
data_dict = {}
for uid in nct_member_instagram_ids:
  print("Processing", uid, "...")
  data_dict[uid] = get_data_later_than_2022(uid)
  time.sleep(5)  ## 爬虫操作过于频繁的话会被系统监测到
  ## 因此，每获取一个划人的发贴信息，我们让程序休息五秒钟

print("Done!")

Processing onyourm__ark ...
Processing sugaringcandy ...
Processing tenlee_1001 ...
Processing wwiinn_7 ...
Processing i_m_hendery ...
Processing johnnyjsuh ...
Processing _jeongjaehyun ...
Processing djxiao_888 ...
Processing yangyang_x2 ...
Processing do0_nct ...
Processing _shotaroo_ ...
Processing mo.on_air ...
Processing yellow_3to3 ...
Processing kun11xd ...
Processing na.jaemin0813 ...
Processing yuu_taa_1026 ...
Processing lucas_xx444 ...
Processing haechanahceah ...
Processing taeoxo_nct ...
Done!


现在我们获得了一个名为`data_dict`的`dict`，其中包含了19位成员2022年以来的发贴信息

## 处理数据

### 单人数据分析
在这里我们创建一个空的`list`，用于保存各位划人的平均点赞和平均评论数。

In [None]:
avg = []

#### Yuta

In [None]:
id = 'yuu_taa_1026'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1341344.8 Comments: 12543.466666666667


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,yuu_taa_1026,2022-05-14 14:23:47,1207893,10650
1,yuu_taa_1026,2022-05-09 03:18:49,1586188,14020
2,yuu_taa_1026,2022-04-21 01:29:02,1465000,41593
3,yuu_taa_1026,2022-04-07 13:16:48,755008,4120
4,yuu_taa_1026,2022-03-28 16:31:25,1268805,7101
5,yuu_taa_1026,2022-03-28 05:34:40,1819436,18210
6,yuu_taa_1026,2022-03-26 14:04:08,1520531,10918
7,yuu_taa_1026,2022-03-24 06:21:30,1515681,14924
8,yuu_taa_1026,2022-02-26 03:08:43,911458,6636
9,yuu_taa_1026,2022-02-26 03:08:01,1016805,5889


#### Renjun

In [None]:
id = 'yellow_3to3'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1951307.5714285714 Comments: 38836.357142857145


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,yellow_3to3,2022-05-20 07:20:46,1774194,19910
1,yellow_3to3,2022-05-14 16:44:26,2159761,47677
2,yellow_3to3,2022-04-02 15:20:08,1377805,15342
3,yellow_3to3,2022-04-02 15:19:29,1200223,9452
4,yellow_3to3,2022-04-02 15:18:52,1705026,19707
5,yellow_3to3,2022-03-31 10:35:59,2042448,30812
6,yellow_3to3,2022-03-28 06:48:25,2293252,46701
7,yellow_3to3,2022-03-28 04:52:11,1930269,21492
8,yellow_3to3,2022-03-27 09:15:17,1972935,26013
9,yellow_3to3,2022-03-26 04:59:29,2188309,47049


#### Yangyang

In [None]:
id = 'yangyang_x2'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1053591.8 Comments: 8616.55


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,yangyang_x2,2022-05-19 06:59:41,752298,4308
1,yangyang_x2,2022-05-16 09:36:08,897234,6029
2,yangyang_x2,2022-05-15 05:04:22,771347,3771
3,yangyang_x2,2022-05-14 01:03:09,1013937,9392
4,yangyang_x2,2022-05-05 07:56:36,1288112,17054
5,yangyang_x2,2022-04-20 02:40:58,1190005,9308
6,yangyang_x2,2022-04-11 13:15:36,1062995,8876
7,yangyang_x2,2022-03-15 06:44:52,1153544,8181
8,yangyang_x2,2022-03-07 14:29:20,1118448,8151
9,yangyang_x2,2022-02-22 08:05:52,1105928,8830


#### Winwin

In [None]:
id = 'wwiinn_7'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1448837.12 Comments: 13114.08


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,wwiinn_7,2022-05-18 17:07:45,1171346,10410
1,wwiinn_7,2022-05-18 17:01:55,1391265,13699
2,wwiinn_7,2022-05-16 11:49:29,1488600,17302
3,wwiinn_7,2022-05-09 04:04:37,1000468,4180
4,wwiinn_7,2022-05-01 18:04:26,1243204,7071
5,wwiinn_7,2022-04-25 09:27:23,1424815,8788
6,wwiinn_7,2022-04-21 08:13:24,1111908,5301
7,wwiinn_7,2022-04-11 10:45:57,1380826,10981
8,wwiinn_7,2022-04-08 06:58:59,1578412,15260
9,wwiinn_7,2022-03-06 13:50:18,1506703,13335


#### Ten

In [None]:
id = 'tenlee_1001'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1332710.4838709678 Comments: 15059.967741935483


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,tenlee_1001,2022-05-18 04:42:58,605349,3565
1,tenlee_1001,2022-05-05 07:48:56,1743406,22275
2,tenlee_1001,2022-05-01 14:50:19,893573,16680
3,tenlee_1001,2022-04-24 09:22:06,773247,10457
4,tenlee_1001,2022-04-23 14:17:03,593958,10458
5,tenlee_1001,2022-04-23 13:37:04,970580,8546
6,tenlee_1001,2022-04-22 08:24:32,1441476,16410
7,tenlee_1001,2022-04-20 09:17:19,655934,5219
8,tenlee_1001,2022-04-16 09:46:11,1632260,20297
9,tenlee_1001,2022-04-14 14:26:51,848562,5333


#### Taeyong

In [None]:
id = 'taeoxo_nct'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1778088.1612903227 Comments: 18326.709677419356


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,taeoxo_nct,2022-05-15 06:47:50,1550191,11650
1,taeoxo_nct,2022-05-12 09:21:15,1351086,13919
2,taeoxo_nct,2022-05-05 14:19:52,1987923,25828
3,taeoxo_nct,2022-05-05 08:56:39,2006071,22118
4,taeoxo_nct,2022-04-30 09:16:07,1685918,11066
5,taeoxo_nct,2022-04-27 12:35:39,1438306,11458
6,taeoxo_nct,2022-04-14 09:31:48,1762496,17042
7,taeoxo_nct,2022-04-12 11:41:55,1435202,11647
8,taeoxo_nct,2022-04-10 10:22:04,1690677,11095
9,taeoxo_nct,2022-04-09 12:38:38,1795288,10819


#### Jungwoo

In [None]:
id = 'sugaringcandy'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1464492.8 Comments: 41361.6


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,sugaringcandy,2022-05-12 15:08:28,1401675,25807
1,sugaringcandy,2022-05-07 08:05:35,1325807,18251
2,sugaringcandy,2022-04-11 07:58:17,1334018,12603
3,sugaringcandy,2022-04-02 08:15:32,1499917,41349
4,sugaringcandy,2022-04-01 05:19:06,1761047,108798


#### Mark

In [None]:
id = 'onyourm__ark'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 2080521.1851851852 Comments: 33040.07407407407


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,onyourm__ark,2022-05-16 08:12:49,1444516,22979
1,onyourm__ark,2022-05-13 10:50:57,1368323,16475
2,onyourm__ark,2022-05-06 07:45:28,2275322,29616
3,onyourm__ark,2022-05-04 13:11:07,2162367,30532
4,onyourm__ark,2022-04-21 13:18:29,1415125,25120
5,onyourm__ark,2022-04-18 07:51:06,2037518,23994
6,onyourm__ark,2022-04-17 10:54:38,1985435,24703
7,onyourm__ark,2022-04-11 08:59:47,2317963,25555
8,onyourm__ark,2022-04-06 01:25:44,2113300,21006
9,onyourm__ark,2022-04-03 11:33:10,2140415,28623


#### Jaemin

In [None]:
id = 'na.jaemin0813'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 2361902.5 Comments: 41282.92857142857


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,na.jaemin0813,2022-05-20 07:18:50,2229806,43508
1,na.jaemin0813,2022-05-13 19:48:11,2438952,53529
2,na.jaemin0813,2022-04-18 05:22:05,2297712,39382
3,na.jaemin0813,2022-04-18 05:21:19,2378944,37784
4,na.jaemin0813,2022-04-09 12:52:47,1947185,22973
5,na.jaemin0813,2022-04-09 12:52:31,2338131,30830
6,na.jaemin0813,2022-04-03 08:15:40,2298001,29716
7,na.jaemin0813,2022-04-03 08:15:15,1692676,9247
8,na.jaemin0813,2022-03-28 15:24:20,2713940,34956
9,na.jaemin0813,2022-03-24 07:50:26,2052205,25356


#### Taeil

In [None]:
id = 'mo.on_air'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1337274.2222222222 Comments: 20140.11111111111


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,mo.on_air,2022-05-14 23:22:34,1121184,15071
1,mo.on_air,2022-04-29 06:30:25,1207469,12262
2,mo.on_air,2022-04-29 06:20:49,1094463,12840
3,mo.on_air,2022-04-10 13:49:21,1209592,8371
4,mo.on_air,2022-03-14 06:44:48,1442628,23110
5,mo.on_air,2022-02-23 23:41:08,1501496,39594
6,mo.on_air,2022-01-29 02:38:21,1536377,20797
7,mo.on_air,2022-01-23 15:54:06,1596773,29686
8,mo.on_air,2022-01-17 15:53:48,1325486,19530


#### Kun

In [None]:
id = 'kun11xd'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1012977.4 Comments: 11072.333333333334


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,kun11xd,2022-05-18 04:31:08,948397,10302
1,kun11xd,2022-04-27 10:12:17,1390779,15903
2,kun11xd,2022-04-11 12:37:38,901497,5192
3,kun11xd,2022-03-19 10:08:08,858300,5286
4,kun11xd,2022-03-14 13:03:11,1038885,7193
5,kun11xd,2022-03-05 10:13:17,1094389,9161
6,kun11xd,2022-02-21 13:14:02,1162219,12490
7,kun11xd,2022-02-19 12:39:50,953606,5585
8,kun11xd,2022-02-15 07:00:49,749690,15917
9,kun11xd,2022-02-14 13:20:53,996009,10651


#### Johnny

In [None]:
id = 'johnnyjsuh'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1751860.1739130435 Comments: 20240.17391304348


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,johnnyjsuh,2022-05-15 03:41:06,1228791,7997
1,johnnyjsuh,2022-05-09 04:06:22,1703541,10358
2,johnnyjsuh,2022-05-05 05:03:10,2169149,20679
3,johnnyjsuh,2022-05-04 15:06:45,1744242,13928
4,johnnyjsuh,2022-05-03 15:45:12,2150711,29152
5,johnnyjsuh,2022-05-02 00:23:46,1784278,11407
6,johnnyjsuh,2022-05-01 21:24:12,1529259,11476
7,johnnyjsuh,2022-05-01 14:02:42,1090448,15685
8,johnnyjsuh,2022-05-01 04:25:55,1600990,13756
9,johnnyjsuh,2022-04-30 23:16:22,1630313,17709


#### Hendery

In [None]:
id = 'i_m_hendery'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1316323.111111111 Comments: 21429.833333333332


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,i_m_hendery,2022-05-20 13:16:29,1224824,32215
1,i_m_hendery,2022-05-18 07:29:18,1288658,33589
2,i_m_hendery,2022-05-13 10:09:43,1102963,15618
3,i_m_hendery,2022-05-05 08:07:38,1646179,30058
4,i_m_hendery,2022-04-28 09:45:52,1158933,14699
5,i_m_hendery,2022-04-21 09:51:57,1192957,13469
6,i_m_hendery,2022-04-13 10:34:46,1426267,59825
7,i_m_hendery,2022-04-12 06:30:07,1704689,19912
8,i_m_hendery,2022-03-08 08:55:05,1622025,29829
9,i_m_hendery,2022-02-18 03:27:50,1225340,15833


#### Haechan

In [None]:
id = 'haechanahceah'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 2060586.625 Comments: 87502.625


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,haechanahceah,2022-05-14 06:18:56,1665507,33056
1,haechanahceah,2022-05-07 14:14:25,2017568,69715
2,haechanahceah,2022-04-17 10:31:57,1797706,53335
3,haechanahceah,2022-04-01 04:08:15,1933550,28028
4,haechanahceah,2022-03-28 06:48:27,2136654,63295
5,haechanahceah,2022-03-23 11:30:28,2200731,94598
6,haechanahceah,2022-03-19 04:11:08,2182229,108752
7,haechanahceah,2022-03-16 11:30:06,2550748,249242


#### Doyoung

In [None]:
id = 'do0_nct'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1865820.0232558139 Comments: 21673.79069767442


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,do0_nct,2022-05-16 13:38:28,1741133,14636
1,do0_nct,2022-05-13 14:47:03,1840985,17017
2,do0_nct,2022-05-11 14:05:28,1891893,23784
3,do0_nct,2022-05-07 09:03:24,1432466,20004
4,do0_nct,2022-05-05 11:46:12,2149977,28718
5,do0_nct,2022-04-28 14:19:58,2094578,18122
6,do0_nct,2022-04-24 13:46:21,1663125,8941
7,do0_nct,2022-04-24 02:16:29,1191909,4902
8,do0_nct,2022-04-24 02:06:44,1419726,7957
9,do0_nct,2022-04-22 13:45:50,2138225,23107


#### Xiaojun

In [None]:
id = 'djxiao_888'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1386002.0833333333 Comments: 17317.25


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,djxiao_888,2022-05-13 12:57:16,1339701,11473
1,djxiao_888,2022-04-28 09:43:37,1500891,20511
2,djxiao_888,2022-04-24 09:02:06,693524,4896
3,djxiao_888,2022-04-10 08:35:44,900067,5610
4,djxiao_888,2022-04-06 06:29:06,1443175,8439
5,djxiao_888,2022-04-03 08:53:25,1502697,15771
6,djxiao_888,2022-03-17 11:51:34,2637558,75917
7,djxiao_888,2022-02-28 05:49:56,1338064,12798
8,djxiao_888,2022-02-14 09:11:06,1187113,22962
9,djxiao_888,2022-02-13 07:53:43,1151226,9503


#### Jaehyun

In [None]:
id = '_jeongjaehyun'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 2877069.8666666667 Comments: 50356.4


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,_jeongjaehyun,2022-05-20 13:17:01,1082063,23669
1,_jeongjaehyun,2022-05-16 11:36:33,3165312,66024
2,_jeongjaehyun,2022-05-05 06:09:21,2923793,56540
3,_jeongjaehyun,2022-04-26 04:39:30,2372903,37919
4,_jeongjaehyun,2022-04-04 09:08:34,3119168,53725
5,_jeongjaehyun,2022-03-24 08:18:21,2741318,46441
6,_jeongjaehyun,2022-03-11 09:13:37,3333830,48419
7,_jeongjaehyun,2022-03-10 07:52:40,3426659,74800
8,_jeongjaehyun,2022-02-28 14:17:56,2605838,31316
9,_jeongjaehyun,2022-02-24 09:28:53,3205081,53105


#### Shotaro

In [None]:
id = '_shotaroo_'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 1202098.4210526317 Comments: 7369.105263157895


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,_shotaroo_,2022-05-18 10:02:03,1285014,7723
1,_shotaroo_,2022-05-09 04:48:58,917249,3722
2,_shotaroo_,2022-05-02 07:57:55,1220446,5592
3,_shotaroo_,2022-04-29 04:57:30,1260262,7195
4,_shotaroo_,2022-04-21 11:10:50,1080408,7373
5,_shotaroo_,2022-04-17 09:52:00,1466667,6700
6,_shotaroo_,2022-04-09 11:52:16,1031916,3302
7,_shotaroo_,2022-03-28 06:54:51,1135073,3864
8,_shotaroo_,2022-03-28 06:53:08,1156821,3397
9,_shotaroo_,2022-03-25 12:36:36,1586720,9195


#### Lucas

In [None]:
id = 'lucas_xx444'
df = pd.DataFrame(data_dict[id])
df.columns = ['User ID', 'Publish Date', 'Likes', 'Comments']
print('Likes:', df['Likes'].mean(), 'Comments:', df['Comments'].mean())
avg.append((id, df['Likes'].mean(),  df['Comments'].mean()))
df

Likes: 3997843.0 Comments: 1028979.0


Unnamed: 0,User ID,Publish Date,Likes,Comments
0,lucas_xx444,2022-02-09 13:46:16,3997843,1028979


### 单人数据比较

现在我们就获得了每位划人2022年的平均点赞和平均评论数

In [None]:
avg

[('yuu_taa_1026', 1341344.8, 12543.466666666667),
 ('yellow_3to3', 1951307.5714285714, 38836.357142857145),
 ('yangyang_x2', 1053591.8, 8616.55),
 ('wwiinn_7', 1448837.12, 13114.08),
 ('tenlee_1001', 1332710.4838709678, 15059.967741935483),
 ('taeoxo_nct', 1778088.1612903227, 18326.709677419356),
 ('sugaringcandy', 1464492.8, 41361.6),
 ('onyourm__ark', 2080521.1851851852, 33040.07407407407),
 ('na.jaemin0813', 2361902.5, 41282.92857142857),
 ('mo.on_air', 1337274.2222222222, 20140.11111111111),
 ('kun11xd', 1012977.4, 11072.333333333334),
 ('johnnyjsuh', 1751860.1739130435, 20240.17391304348),
 ('i_m_hendery', 1316323.111111111, 21429.833333333332),
 ('haechanahceah', 2060586.625, 87502.625),
 ('do0_nct', 1865820.0232558139, 21673.79069767442),
 ('djxiao_888', 1386002.0833333333, 17317.25),
 ('_jeongjaehyun', 2877069.8666666667, 50356.4),
 ('_shotaroo_', 1202098.4210526317, 7369.105263157895),
 ('lucas_xx444', 3997843.0, 1028979.0)]

同样，我们可以把各位按照平均点赞数排序。

In [None]:
df_avg = pd.DataFrame(avg)
df_avg.columns = ['User ID', 'Average Likes', 'Average Comments']
pd.set_option('display.float_format', lambda x: '%.5f' % x)
df_avg.sort_values(by = ['Average Likes'], ascending=False)

Unnamed: 0,User ID,Average Likes,Average Comments
18,lucas_xx444,3997843.0,1028979.0
16,_jeongjaehyun,2877069.86667,50356.4
8,na.jaemin0813,2361902.5,41282.92857
7,onyourm__ark,2080521.18519,33040.07407
13,haechanahceah,2060586.625,87502.625
1,yellow_3to3,1951307.57143,38836.35714
14,do0_nct,1865820.02326,21673.7907
5,taeoxo_nct,1778088.16129,18326.70968
11,johnnyjsuh,1751860.17391,20240.17391
6,sugaringcandy,1464492.8,41361.6


也可以按照平均评论数排序：

In [None]:
df_avg.sort_values(by = ['Average Comments'], ascending=False)

Unnamed: 0,User ID,Average Likes,Average Comments
18,lucas_xx444,3997843.0,1028979.0
13,haechanahceah,2060586.625,87502.625
16,_jeongjaehyun,2877069.86667,50356.4
6,sugaringcandy,1464492.8,41361.6
8,na.jaemin0813,2361902.5,41282.92857
1,yellow_3to3,1951307.57143,38836.35714
7,onyourm__ark,2080521.18519,33040.07407
14,do0_nct,1865820.02326,21673.7907
12,i_m_hendery,1316323.11111,21429.83333
11,johnnyjsuh,1751860.17391,20240.17391


### 汇总分析

我们可以将大家的所有发贴合并汇总成一个表格，然后按照评论数排序。

可以看到排名第一的是`lucas_xx444`2022年2月9日的发贴，点赞接近四百万，评论超过一百万条。排名第二的发贴来自`haechanahceah`，接近25万留言。

In [None]:
data_all_in_one = []
for key, val in data_dict.items():
  for entry in val:
    data_all_in_one.append(entry)
df = pd.DataFrame(data_all_in_one)
df.columns = ['User ID', 'Publish Time', 'Likes', 'Comments']
pd.set_option('display.max_rows', None)
df.sort_values(by = ['Comments'], ascending=False)

Unnamed: 0,User ID,Publish Time,Likes,Comments
305,lucas_xx444,2022-02-09 13:46:16,3997843,1028979
313,haechanahceah,2022-03-16 11:30:06,2550748,249242
31,sugaringcandy,2022-04-01 05:19:06,1761047,108798
312,haechanahceah,2022-03-19 04:11:08,2182229,108752
259,yellow_3to3,2022-01-15 13:03:45,2279576,104785
288,na.jaemin0813,2022-03-14 15:42:42,3017839,102301
311,haechanahceah,2022-03-23 11:30:28,2200731,94598
140,_jeongjaehyun,2022-02-14 13:41:43,3427500,93706
25,onyourm__ark,2022-01-19 08:41:30,2777572,87970
150,djxiao_888,2022-03-17 11:51:34,2637558,75917


也可以按照点赞数排序：

In [None]:
df.sort_values(by = ['Likes'], ascending=False)

Unnamed: 0,User ID,Publish Time,Likes,Comments
305,lucas_xx444,2022-02-09 13:46:16,3997843,1028979
140,_jeongjaehyun,2022-02-14 13:41:43,3427500,93706
136,_jeongjaehyun,2022-03-10 07:52:40,3426659,74800
135,_jeongjaehyun,2022-03-11 09:13:37,3333830,48419
138,_jeongjaehyun,2022-02-24 09:28:53,3205081,53105
130,_jeongjaehyun,2022-05-16 11:36:33,3165312,66024
133,_jeongjaehyun,2022-04-04 09:08:34,3119168,53725
143,_jeongjaehyun,2022-01-07 10:43:29,3022431,44370
288,na.jaemin0813,2022-03-14 15:42:42,3017839,102301
139,_jeongjaehyun,2022-02-15 02:32:50,3006388,40201
