## 简介:
<h3>数据集包含从 1980 年到 2023 年的视频游戏列表，并提供发布日期、用户评论评级和评论家评论评级等信
息。分析不同年份流行的游戏类型的变化趋势，找出各个年代最受欢迎的类型</h3>

变量含义:  
  
-`Title`:游戏标题  
-`Release` Date:游戏首个版本发布日期  
-`Team`:游戏开发团队  
-`Rating`:平均评分  
-`Times Listed`:列出此游戏的用户数量  
-`Number of Reviews`:用户提供的评论数量  
-`Genres`:游戏所属的所有类型/流派  
-`Summary`:团队提供的摘要/概述  
-`Reviews`:用户的评价/评论  

## 读取数据

导入数据分析所需要的库，并通过Pandas的`read_csv`函数，将原始数据文件"e_commerce.csv"里的数据内容，解析为DataFrame，并赋值给变量`original_data`。

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib
from scipy.stats import ttest_ind
import ast

In [2]:
pd.set_option('display.max_columns', 1000)

In [3]:
pd.set_option('display.max_rows',1000)

In [4]:
pd.set_option("display.max_colwidth", 100)

In [5]:
original_data = pd.read_csv(r"../数据集/原始数据/games.csv")

In [6]:
original_data.head()

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist
0,0,Elden Ring,"Feb 25, 2022","['Bandai Namco Entertainment', 'FromSoftware']",4.5,3.9K,3.9K,"['Adventure', 'RPG']","Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K
1,1,Hades,"Dec 10, 2019",['Supergiant Games'],4.3,2.9K,2.9K,"['Adventure', 'Brawler', 'Indie', 'RPG']","A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K
2,2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","['Nintendo', 'Nintendo EPD Production Group No. 3']",4.4,4.3K,4.3K,"['Adventure', 'RPG']",The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K
3,3,Undertale,"Sep 15, 2015","['tobyfox', '8-4']",4.2,3.5K,3.5K,"['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']","A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K
4,4,Hollow Knight,"Feb 24, 2017",['Team Cherry'],4.4,3K,3K,"['Adventure', 'Indie', 'Platform']",A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K


## 评估数据

在这个部分，我将对以上original_data的DataFrame所包含的数据进行评估  

评估数据主要在两个维度进行，分别为结构和内容，即整齐度和干净度，数据的结构性问题指的是数据不符合"每一列是一个变量，每一行是一个观察值，每一个单元格是一个值"这三个标准，内容性问题指的是数据存在数据缺失、重复、无效数据等问题

## 整齐度评估

### 随机抽取检查整齐度

In [7]:
original_data.sample(10)

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist
694,694,Fire Emblem: Radiant Dawn,"Feb 22, 2007","['Nintendo', 'Nintendo SPD Group No. 2']",4.0,459,459,"['RPG', 'Strategy', 'Tactical']","Radiant Dawn is a turn-based tactics RPG and a direct sequel, 3 years have passed, to the game ""...","[""It's both better and worse than PoR, but I came out of it really loving it, even if I think it...",1.8K,69,1.1K,497
558,558,Star Wars: Knights of the Old Republic,"Jul 15, 2003","['BioWare', 'LucasArts']",4.2,839,839,"['Adventure', 'RPG']","Choose Your Path! Set 4,000 years before the Galactic Empire, you are the last hope for the Jedi...",['I get why a lot of people love this game but fundamentally I find that no element of the game ...,5.3K,180,2.3K,705
1444,1444,Phantom Brigade,"Feb 28, 2023",['Brace Yourself Games'],2.8,16,16,"['Indie', 'RPG', 'Strategy', 'Tactical', 'Turn Based Strategy']","Phantom Brigade is a hybrid real-time and turn-based tactical RPG, focusing on in-depth customiz...","[""Played it for a short session, and considering all of the reviews I think I had enough.\n ...",16,10,30,51
299,299,Super Smash Bros. for Nintendo 3DS,"Sep 13, 2014","['Nintendo', 'Sora']",3.3,545,545,['Fighting'],"Super Smash Bros. for Nintendo 3DS is the first portable entry in the renowned series, in which ...",['The 3ds stick is already flimsy and playing smash on it for more than a week is guaranteed to ...,10K,23,273,96
1077,1077,Sonic CD,"Sep 23, 1993","['Sonic Team', 'Sega']",3.4,629,629,['Platform'],Sonic travels to the distant shores of Never Lake for the once-a-year appearance of Little Plane...,"['Recomendadisimo (cualquier version).', 'this games ost is so good it carried the whole experie...",4.9K,43,656,223
1260,1260,Crysis,"Nov 13, 2007","['Crytek Frankfurt', 'Electronic Arts']",3.2,224,224,"['Adventure', 'Shooter']","From the makers of Far Cry, Crysis offers FPS fans the best-looking, most highly-evolving gamepl...","[""admittedly, I'm a sucker for the core idea of Crysis but the suit feels so half-baked and the ...",3K,26,561,241
1105,1105,Saints Row IV,"Aug 19, 2013","['Volition', 'Deep Silver']",3.1,386,386,"['Adventure', 'Shooter']","Unlike the first three games in the franchise, Saints Row 4 does not center around the main char...","['infamous at home:', ""Finally got back to finish this, it's stupid, but mindless fun!"", 'putari...",7.2K,64,775,176
211,211,Returnal,"Apr 30, 2021","['Housemarque', 'Sony Interactive Entertainment']",4.0,725,725,['Shooter'],"After crash-landing on a shape-shifting alien planet, Selene finds herself fighting tooth and na...","['holy shit', 'The biggest budget rogue like ever. Big money for big loops', ""As they become inc...",2.4K,430,1.8K,1.6K
227,227,Immortality,"Aug 30, 2022",['Half Mermaid'],3.9,461,461,"['Indie', 'Point-and-Click', 'Simulator']",Marissa Marcel was a film star. She made three movies. But none of the movies was ever released....,"['Boa produção mas é um sufoco jogar este jogo', 'The lastest game from San Barlow, the creator ...",1.2K,95,587,564
1130,1130,Star Wars Jedi: Survivor,"Apr 28, 2023","['Respawn Entertainment', 'Electronic Arts']",,250,250,['Adventure'],"The story of Cal Kestis continues in Star Wars Jedi: Survivor, a third-person, galaxy-spanning, ...",[],13,2,367,1.4K


- 从抽取出的10行数据来看，10行数据均符合'每一列是一个变量，每一行是一个观察值',但`Team`与`Genres`里单元格的值是并不符合'每个单元格是一个值'的标准，因此存在结构性问题，需要对单元格进行拆分
- `Unnamed: 0`为冗余列，应当删除

### 数据整齐度处理

### `Team` 拆分

查看`Team`是否存在缺失值

In [8]:
not_Tstring = original_data['Team'][original_data['Team'].apply(lambda x: not isinstance(x, str))]
print(not_Tstring)

1245    NaN
Name: Team, dtype: object


In [9]:
original_data[original_data['Team'].isnull()]

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist
1245,1245,NEET Girl Date Night,"Oct 21, 2022",,2.7,21,21,['Visual Novel'],Your friend sets you up on a date with his NEET Cousin Kara.,"['this sucked. ""Omg she is literally me"" is not a reason to like something.', 'brain hurt when s...",106,1,44,42


从输出结果可得索引为`1245`的`Team`数据缺失，为了方便后续整理数据且数据没有影响整体的情况下，删除此数据

In [10]:
not_Gstring = original_data['Genres'][original_data['Genres'].apply(lambda x: not isinstance(x, str))]
print(not_Gstring)

Series([], Name: Genres, dtype: object)


`Genres`不存在数据缺失

删除`Team`缺失值

In [11]:
original_data = original_data.dropna(subset=["Team"])

确认删除

In [12]:
original_data[original_data['Team'].isnull()]

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist


In [13]:
# 将Team里的数据转换为列表
original_data['Team'] = original_data['Team'].apply(ast.literal_eval)

In [14]:
# 使用字符串的join()方法用`，`进行分割从而去除两边的[]与''
original_data['Team'] = original_data['Team'].apply(lambda x: ','.join(x))

In [15]:
original_data.head()

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist
0,0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,"['Adventure', 'RPG']","Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K
1,1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2.9K,2.9K,"['Adventure', 'Brawler', 'Indie', 'RPG']","A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K
2,2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4.3K,4.3K,"['Adventure', 'RPG']",The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K
3,3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3.5K,3.5K,"['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']","A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K
4,4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3K,3K,"['Adventure', 'Indie', 'Platform']",A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K


In [16]:
# 筛选出含有`,`的Series
original_data['Team'][original_data['Team'].str.contains(',')]

0                                Bandai Namco Entertainment,FromSoftware
2                           Nintendo,Nintendo EPD Production Group No. 3
3                                                            tobyfox,8-4
6                                                         OMOCAT,PLAYISM
7                                                  Nintendo,MercurySteam
                                      ...                               
1502                                                     Sonic Team,Sega
1505                       Capcom,Virgin Interactive Entertainment, Inc.
1506    Virgin Interactive Entertainment (Europe) Ltd.,Delphine Software
1508                                                   Sumo Digital,Sega
1511                                                  WB Games,TT Fusion
Name: Team, Length: 1204, dtype: object

In [17]:
# split(",", expand=True) 分割
Team = original_data['Team'][original_data['Team'].str.contains(',')].str.split(",", expand=True)
Team

Unnamed: 0,0,1,2
0,Bandai Namco Entertainment,FromSoftware,
2,Nintendo,Nintendo EPD Production Group No. 3,
3,tobyfox,8-4,
6,OMOCAT,PLAYISM,
7,Nintendo,MercurySteam,
...,...,...,...
1502,Sonic Team,Sega,
1505,Capcom,Virgin Interactive Entertainment,Inc.
1506,Virgin Interactive Entertainment (Europe) Ltd.,Delphine Software,
1508,Sumo Digital,Sega,


In [18]:
Team_1 = original_data['Team'][~original_data['Team'].str.contains(',')]
Team_1.head()

1     Supergiant Games
4          Team Cherry
5       Mojang Studios
8           InnerSloth
19            Nintendo
Name: Team, dtype: object

In [19]:
Main_team = Team[0]
Secondary_teams = Team[1]
Main_team

0                           Bandai Namco Entertainment
2                                             Nintendo
3                                              tobyfox
6                                               OMOCAT
7                                             Nintendo
                             ...                      
1502                                        Sonic Team
1505                                            Capcom
1506    Virgin Interactive Entertainment (Europe) Ltd.
1508                                      Sumo Digital
1511                                          WB Games
Name: 0, Length: 1204, dtype: object

In [20]:
merged_Team = pd.concat([Main_team, Team_1])

In [21]:
merged_Team.sort_index()

0       Bandai Namco Entertainment
1                 Supergiant Games
2                         Nintendo
3                          tobyfox
4                      Team Cherry
                   ...            
1507                Telltale Games
1508                  Sumo Digital
1509                        Capcom
1510                Larian Studios
1511                      WB Games
Length: 1511, dtype: object

In [22]:
original_data['Main_team'] = merged_Team

In [23]:
original_data['Secondary_teams'] = Secondary_teams

In [24]:
original_data.head()

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,"['Adventure', 'RPG']","Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2.9K,2.9K,"['Adventure', 'Brawler', 'Indie', 'RPG']","A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4.3K,4.3K,"['Adventure', 'RPG']",The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3.5K,3.5K,"['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']","A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3K,3K,"['Adventure', 'Indie', 'Platform']",A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,


### `Genres`拆分

In [25]:
not_Tstring = original_data['Genres'][original_data['Genres'].apply(lambda x: not isinstance(x, str))]
print(not_Tstring)

Series([], Name: Genres, dtype: object)


In [26]:
# 将Team里的数据转换为列表
original_data['Genres'] = original_data['Genres'].apply(ast.literal_eval)

In [107]:
# 查看列表的最大长度
original_data['Genres'].apply(len).max()

18

从输出结果可得，`Genres`最长有7个值，为了方便拆分，我们取取第一个值作为主要的`Genres`

In [68]:
original_data['Genres'].value_counts()


Genres
Adventure             1014
RPG                     95
Shooter                 81
Indie                   53
Arcade                  51
Platform                47
Fighting                46
Brawler                 39
Racing                  17
Simulator               16
Card & Board Game       13
Puzzle                  11
Music                    7
Sport                    7
MOBA                     3
[]                       3
Point-and-Click          3
Real Time Strategy       3
Strategy                 1
Visual Novel             1
Name: count, dtype: int64

In [51]:
def frist_Genres(x):
    if len(x) > 0:
        return x[0]
    else:
        return x

In [53]:
original_data['Genres'] = original_data['Genres'].apply(frist_Genres)

In [54]:
original_data

Unnamed: 0.1,Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2.9K,2.9K,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4.3K,4.3K,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3.5K,3.5K,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3K,3K,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1507,1507,Back to the Future: The Game,"Dec 22, 2010",Telltale Games,3.2,94,94,Adventure,Back to the Future: The Game is one of Telltale Games' popular episodic games. It follows the st...,['Very enjoyable game. The story adds onto the movies without ruining anything from them. The ga...,763,5,223,67,Telltale Games,
1508,1508,Team Sonic Racing,"May 21, 2019","Sumo Digital,Sega",2.9,264,264,Arcade,Team Sonic Racing combines the best elements of arcade and fast-paced competitive style racing a...,"['jogo morto mas bom', 'not my cup of tea', ""Compared to the previous two sonic kart racers from...",1.5K,49,413,107,Sumo Digital,Sega
1509,1509,Dragon's Dogma,"May 22, 2012",Capcom,3.7,210,210,Brawler,"Set in a huge open world, Dragon’s Dogma: Dark Arisen presents a rewarding action combat experie...","['Underrated.', 'A grandes rasgos, es como un MMO pero para un jugador. Me explico:', 'peak kino...",1.1K,45,487,206,Capcom,
1510,1510,Baldur's Gate 3,"Oct 06, 2020",Larian Studios,4.1,165,165,Adventure,"An ancient evil has returned to Baldur's Gate, intent on devouring it from the inside out. The f...","['Bu türe bu oyunla girmeye çalışmak hataydı sanırım.', ""Even if this turns out to be a perfect ...",269,79,388,602,Larian Studios,


删除`Unnamed: 0`列

In [113]:
original_data = original_data.drop('Unnamed: 0',axis=1)

In [114]:
original_data.head()

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2.9K,2.9K,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4.3K,4.3K,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3.5K,3.5K,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3K,3K,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,


## 数据干净度评估

In [116]:
original_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1511 entries, 0 to 1511
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Title              1511 non-null   object 
 1   Release Date       1511 non-null   object 
 2   Team               1511 non-null   object 
 3   Rating             1498 non-null   float64
 4   Times Listed       1511 non-null   object 
 5   Number of Reviews  1511 non-null   object 
 6   Genres             1511 non-null   object 
 7   Summary            1510 non-null   object 
 8   Reviews            1511 non-null   object 
 9   Plays              1511 non-null   object 
 10  Playing            1511 non-null   object 
 11  Backlogs           1511 non-null   object 
 12  Wishlist           1511 non-null   object 
 13  Main_team          1511 non-null   object 
 14  Secondary_teams    1204 non-null   object 
dtypes: float64(1), object(14)
memory usage: 221.2+ KB


从输出结果可得：  
1.数据共1511条，`Rating`数据存在缺失  
2.`Times Listed`、`Number of Reviews`数据类型应为数字，`Release Date`数据类型应为日期，应当进行数据格式转换。

### 缺失值评估

`Rating`存在缺失值，根据条件提取出观察值

In [117]:
original_data[original_data['Rating'].isnull()]

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
587,Final Fantasy XVI,"Jun 22, 2023","Square Enix,Square Enix Creative Business Unit III",,422,422,RPG,Final Fantasy XVI is an upcoming action role-playing game developed and published by Square Enix...,[],37,10,732,2.4K,Square Enix,Square Enix Creative Business Unit III
649,Death Stranding 2,releases on TBD,Kojima Productions,,105,105,Adventure,,[],3,0,209,644,Kojima Productions,
713,Final Fantasy VII Rebirth,"Dec 31, 2023",Square Enix,,192,192,[],This next standalone chapter in the FINAL FANTASY VII remake trilogy,[],20,3,354,1.1K,Square Enix,
719,Lies of P,"Aug 01, 2023","NEOWIZ,Round8 Studio",,175,175,RPG,"Inspired by the familiar story of Pinocchio, Lies of P is an action souls-like game set in a dar...",[],5,0,260,939,NEOWIZ,Round8 Studio
726,Judas,"Mar 31, 2025",Ghost Story Games,,90,90,Adventure,A disintegrating starship. A desperate escape plan. You are the mysterious and troubled Judas. Y...,[],1,0,92,437,Ghost Story Games,
746,Like a Dragon Gaiden: The Man Who Erased His Name,"Dec 31, 2023","Ryū Ga Gotoku Studios,Sega",,118,118,Adventure,This game covers Kiryu's story between Yakuza 7 and Like a Dragon 8.,[],2,1,145,588,Ryū Ga Gotoku Studios,Sega
972,The Legend of Zelda: Tears of the Kingdom,"May 12, 2023","Nintendo,Nintendo EPD Production Group No. 3",,581,581,Adventure,The Legend of Zelda: Tears of the Kingdom is the sequel to The Legend of Zelda: Breath of the Wi...,[],72,6,1.6K,5.4K,Nintendo,Nintendo EPD Production Group No. 3
1130,Star Wars Jedi: Survivor,"Apr 28, 2023","Respawn Entertainment,Electronic Arts",,250,250,Adventure,"The story of Cal Kestis continues in Star Wars Jedi: Survivor, a third-person, galaxy-spanning, ...",[],13,2,367,1.4K,Respawn Entertainment,Electronic Arts
1160,We Love Katamari Reroll + Royal Reverie,"Jun 02, 2023","Bandai Namco Entertainment,MONKEYCRAFT Co. Ltd",,51,51,Adventure,We Love Katamari Reroll + Royal Reverie is a remake of the PS2 game We Love Katamari.,[],3,0,74,291,Bandai Namco Entertainment,MONKEYCRAFT Co. Ltd
1202,Earthblade,"Dec 31, 2024",Extremely OK Games,,83,83,Adventure,"You are Névoa, an enigmatic child of Fate returning at long last to Earth, in this explor-action...",[],0,1,103,529,Extremely OK Games,


由于此次评估为找出各个年代最受欢迎的类型，其中`Rating`是一个重要指标，删除可能影响整体数据，所以我们采取`Rating`总体的平均数对缺失值进行填补

### 重复数据评估

根据分析，数据中存在三个变量不可以出现重复数据  
  
1.`Title`：每个游戏标题应该是唯一标识，不可以重复  
2.`Team`:尽管一个团队可以开发多个游戏，但是在分析单个游戏时，每个游戏对应的团队应该是一致的，不应该有歧义  
3.`Release`：在同一个游戏的不同记录中，发布日期应是一致的。因此，每个游戏的初次发布日期是唯一的  

检查是否存在重复数据

In [118]:
# 查找重复的行，基于 'Title', 'Team' 和 'Release Date' 列
duplicates = original_data[original_data.duplicated(subset=['Title', 'Team', 'Release Date'], keep=False)]

In [119]:
duplicates

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2.9K,2.9K,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4.3K,4.3K,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3.5K,3.5K,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3K,3K,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,
5,Minecraft,"Nov 18, 2011",Mojang Studios,4.3,2.3K,2.3K,Adventure,"Minecraft focuses on allowing the player to explore, interact with, and modify a dynamically-gen...","['Minecraft is what you make of it. Unfortunately theres no reason to do anything.', 'muito bom,...",33K,1.8K,1.1K,230,Mojang Studios,
6,Omori,"Dec 25, 2020","OMOCAT,PLAYISM",4.2,1.6K,1.6K,Adventure,"A turn-based surreal horror RPG in which a child traverses various mundane, quirky, humourous, m...","[""The best game I've played in my life"", ""omori is a game held up by it's heartbreaking, emotion...",7.2K,1.1K,4.5K,3.8K,OMOCAT,PLAYISM
7,Metroid Dread,"Oct 07, 2021","Nintendo,MercurySteam",4.3,2.1K,2.1K,Adventure,Join intergalactic bounty hunter Samus Aran in her first new 2D Metroid story in 19 years. Samus...,"['Have only been a Metroid fan for couple of years but I think this was worth the 19 year wait',...",9.2K,759,3.4K,3.3K,Nintendo,MercurySteam
8,Among Us,"Jun 15, 2018",InnerSloth,3.0,867,867,Indie,Join your crew-mates in a multiplayer game of teamwork and betrayal! Play with 4-15 players onli...,"[""it's a solid party game. i'm bad at lying though and it makes me feel bad; not a complaint wit...",25K,470,776,126,InnerSloth,
9,NieR: Automata,"Feb 23, 2017","PlatinumGames,Square Enix",4.3,2.9K,2.9K,Brawler,"NieR: Automata tells the story of androids 2B, 9S and A2 and their battle to reclaim the machine...","['Holy shit', 'im carrying the weight of the wooooooooooooooooooorld', 'no me llamaba la atenció...",18K,1.1K,6.2K,3.6K,PlatinumGames,Square Enix


验证是否重复

In [120]:
original_data.query("Title == 'Elden Ring'")

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
326,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
776,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""By far one of the most disappointing game I've ever played. I went into it with such high hope...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware


根据验证，数据存在重复

### 评估不一致数据

不一致数据可能存在于`Team`和`Genres`中，我们要查看是否存在多个不同值指代一个值的情况。

In [121]:
team_counts = original_data['Main_team'].value_counts()
team_counts

Main_team
Nintendo                                          142
Capcom                                             64
Square Enix                                        54
Sega                                               30
Bandai Namco Entertainment                         24
Konami                                             21
Atlus                                              21
Electronic Arts                                    20
Activision                                         19
Square                                             19
Nintendo EAD                                       18
FromSoftware                                       18
Sony Computer Entertainment                        17
Naughty Dog                                        16
Sony Interactive Entertainment                     15
Ubisoft Entertainment                              15
Intelligent Systems Co.                            14
Ubisoft Montreal                                   13
WB Games          

In [122]:
genres_counts = original_data['Genres'].value_counts()
genres_counts

Genres
Adventure             1014
RPG                     95
Shooter                 81
Indie                   53
Arcade                  51
Platform                47
Fighting                46
Brawler                 39
Racing                  17
Simulator               16
Card & Board Game       13
Puzzle                  11
Music                    7
Sport                    7
MOBA                     3
[]                       3
Point-and-Click          3
Real Time Strategy       3
Strategy                 1
Visual Novel             1
Name: count, dtype: int64

经过ai分析，不存在多个值指代同一个值的情况，故无需处理

### 评估无效数据

In [123]:
genres_counts = original_data['Genres'].value_counts()
genres_counts

Genres
Adventure             1014
RPG                     95
Shooter                 81
Indie                   53
Arcade                  51
Platform                47
Fighting                46
Brawler                 39
Racing                  17
Simulator               16
Card & Board Game       13
Puzzle                  11
Music                    7
Sport                    7
MOBA                     3
[]                       3
Point-and-Click          3
Real Time Strategy       3
Strategy                 1
Visual Novel             1
Name: count, dtype: int64

从输出结果来看,存在`[]`无效值，会对此次评估造成影响，需要删除

In [124]:
original_data.describe()

Unnamed: 0,Rating
count,1498.0
mean,3.720027
std,0.532133
min,0.7
25%,3.4
50%,3.8
75%,4.1
max,4.8


根据输出结果得知，数据最大值与最小值均属于正常范围，故无需处理

## 数据清理

根据前面评估部分得到的结论，我们需要进行的数据清理包括：  

- 把`Times Listed`,`Number of Reviews`变量的数据类型转换为数字型
- 把`Release Date`变量的数据类型转换为日期型
- 对`Rating`的空缺值采用`Rating`的平均值进行填补
- 删除`Title`、`Team`以及`Release Date`同时重复的数据
- 删除`Genner`数据中的`[]`值


为了区分开经过清理的数据和原始的数据，我们创建新的变量`cleaned_data`，让它为`original_data`复制出的副本。我们之后的清理步骤都将被运用在`cleaned_data`上。

In [125]:
cleaned_data = original_data.copy()
cleaned_data.head()

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3.9K,3.9K,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2.9K,2.9K,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4.3K,4.3K,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3.5K,3.5K,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3K,3K,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,


In [126]:
def convert_to_number(s):
    if 'K' in s:
        return float(s.replace('K', '')) * 1000
    return float(s) * 1000

In [127]:
cleaned_data['Number of Reviews'] = cleaned_data['Number of Reviews'].apply(convert_to_number)

In [128]:
cleaned_data['Times Listed'] = cleaned_data['Times Listed'].apply(convert_to_number)

In [129]:
cleaned_data

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,"Feb 25, 2022","Bandai Namco Entertainment,FromSoftware",4.5,3900.0,3900.0,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,"Dec 10, 2019",Supergiant Games,4.3,2900.0,2900.0,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","Nintendo,Nintendo EPD Production Group No. 3",4.4,4300.0,4300.0,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,"Sep 15, 2015","tobyfox,8-4",4.2,3500.0,3500.0,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,"Feb 24, 2017",Team Cherry,4.4,3000.0,3000.0,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1507,Back to the Future: The Game,"Dec 22, 2010",Telltale Games,3.2,94000.0,94000.0,Adventure,Back to the Future: The Game is one of Telltale Games' popular episodic games. It follows the st...,['Very enjoyable game. The story adds onto the movies without ruining anything from them. The ga...,763,5,223,67,Telltale Games,
1508,Team Sonic Racing,"May 21, 2019","Sumo Digital,Sega",2.9,264000.0,264000.0,Arcade,Team Sonic Racing combines the best elements of arcade and fast-paced competitive style racing a...,"['jogo morto mas bom', 'not my cup of tea', ""Compared to the previous two sonic kart racers from...",1.5K,49,413,107,Sumo Digital,Sega
1509,Dragon's Dogma,"May 22, 2012",Capcom,3.7,210000.0,210000.0,Brawler,"Set in a huge open world, Dragon’s Dogma: Dark Arisen presents a rewarding action combat experie...","['Underrated.', 'A grandes rasgos, es como un MMO pero para un jugador. Me explico:', 'peak kino...",1.1K,45,487,206,Capcom,
1510,Baldur's Gate 3,"Oct 06, 2020",Larian Studios,4.1,165000.0,165000.0,Adventure,"An ancient evil has returned to Baldur's Gate, intent on devouring it from the inside out. The f...","['Bu türe bu oyunla girmeye çalışmak hataydı sanırım.', ""Even if this turns out to be a perfect ...",269,79,388,602,Larian Studios,


`Release Date`变量的数据类型转换为日期型

In [130]:
cleaned_data['Release Date'] = pd.to_datetime(cleaned_data['Release Date']).dt.strftime('%Y/%m/%d')

ValueError: time data "releases on TBD" doesn't match format "%b %d, %Y", at position 644. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

代码执行发生报错，我怀疑数据中存在格式不同于`Oct 14, 2008`的数据

In [131]:
cleaned_data['Release Date'].value_counts()

Release Date
Nov 12, 2020       8
Nov 19, 2006       7
Jun 16, 2022       6
Oct 28, 2022       5
Nov 01, 2011       5
Nov 13, 2007       5
Oct 31, 2019       5
Sep 10, 2021       5
Jun 11, 2021       5
Nov 15, 2019       4
Sep 20, 2019       4
Jan 25, 2019       4
Sep 18, 2010       4
Sep 24, 2001       4
Mar 25, 2022       4
Mar 19, 2020       4
Jan 26, 2010       4
Nov 17, 2009       4
Oct 18, 2011       4
Nov 18, 2022       4
Oct 07, 2021       4
Jun 15, 2018       4
Oct 09, 2007       4
Jul 19, 2022       4
Jan 21, 2016       3
Jun 23, 2015       3
Oct 15, 2019       3
Sep 15, 2005       3
Apr 02, 2015       3
Jun 23, 1991       3
Feb 10, 2023       3
Apr 10, 2008       3
Apr 03, 2020       3
Nov 02, 2021       3
Sep 22, 2017       3
May 18, 2010       3
Jul 06, 2015       3
Jul 22, 2022       3
Jul 26, 2014       3
Sep 14, 2021       3
Dec 10, 1993       3
Mar 28, 2002       3
Mar 13, 2012       3
Dec 01, 2017       3
Sep 09, 2022       3
Sep 01, 2014       3
Apr 18, 2019       3


经过验证结果观察`Release Date`数据中存在`releases on TBD`的值，此为无效数据，应当删除

In [132]:
cleaned_data[cleaned_data['Release Date'] == 'releases on TBD']

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
644,Deltarune,releases on TBD,tobyfox,4.3,313000.0,313000.0,Adventure,"UNDERTALE's parallel story, DELTARUNE. Meet new and old characters in a tale that steps closer t...","['Spamton is so hot, I want to kiss him in the mouth', ""The scary thing about a video game that ...",1.3K,83,468,617,tobyfox,
649,Death Stranding 2,releases on TBD,Kojima Productions,,105000.0,105000.0,Adventure,,[],3,0,209,644,Kojima Productions,
1252,Elden Ring: Shadow of the Erdtree,releases on TBD,"FromSoftware,Bandai Namco Entertainment",4.8,18000.0,18000.0,Adventure,An expansion to Elden Ring setting players on a new adventure in the Lands Between.,['I really loved that they integrated Family Guy shorts that plays constantly at the corner of y...,1,0,39,146,FromSoftware,Bandai Namco Entertainment


In [133]:
cleaned_data = cleaned_data.drop([644,649,1252])

In [134]:
cleaned_data['Release Date'] = pd.to_datetime(cleaned_data['Release Date']).dt.strftime('%Y/%m/%d')

In [135]:
cleaned_data.head()

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,2022/02/25,"Bandai Namco Entertainment,FromSoftware",4.5,3900.0,3900.0,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,2019/12/10,Supergiant Games,4.3,2900.0,2900.0,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,2017/03/03,"Nintendo,Nintendo EPD Production Group No. 3",4.4,4300.0,4300.0,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,2015/09/15,"tobyfox,8-4",4.2,3500.0,3500.0,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,2017/02/24,Team Cherry,4.4,3000.0,3000.0,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,


In [136]:
cleaned_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1508 entries, 0 to 1511
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Title              1508 non-null   object 
 1   Release Date       1508 non-null   object 
 2   Team               1508 non-null   object 
 3   Rating             1496 non-null   float64
 4   Times Listed       1508 non-null   float64
 5   Number of Reviews  1508 non-null   float64
 6   Genres             1508 non-null   object 
 7   Summary            1508 non-null   object 
 8   Reviews            1508 non-null   object 
 9   Plays              1508 non-null   object 
 10  Playing            1508 non-null   object 
 11  Backlogs           1508 non-null   object 
 12  Wishlist           1508 non-null   object 
 13  Main_team          1508 non-null   object 
 14  Secondary_teams    1203 non-null   object 
dtypes: float64(3), object(12)
memory usage: 188.5+ KB


### 缺失值

因为我们分析目的为分析不同年份流行的游戏类型的变化趋势，找出各个年代最受欢迎的类型，所以我们得出各个参数的比重结论  
  
必需参数：`Release Date`、`Genres`、`Times Listed`、`Rating`  
辅助参数：`Title`、`Number of Reviews`、`Team`  
不必要参数：`Summary`、`Reviews`

`Rating`为不必要参数，`Team`为辅助参数，对本次分析没有影响，故无需处理

### 重复值

删除`Title`, `Team`, `Release Date`同时重复的值

In [137]:
cleaned_data.drop_duplicates(subset=['Title', 'Main_team', 'Release Date'], inplace= True)

查看是否还存在重复值

In [138]:
duplicates = cleaned_data[cleaned_data.duplicated(subset=['Title', 'Team', 'Release Date'], keep=False)]
duplicates

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams


### 无效数据

对`Rating`的空缺值采用`Rating`的平均值进行填补

In [144]:
cleaned_data['Rating'] = cleaned_data['Rating'].fillna(round(cleaned_data['Rating'].mean(),1))

In [145]:
cleaned_data['Rating'][cleaned_data['Rating'].isnull()]

Series([], Name: Rating, dtype: float64)

### 删除Genres数据中的[]值

In [141]:
cleaned_data['Genres'][cleaned_data['Genres'].apply(lambda x: len(x) == 0)]

713     []
1309    []
1475    []
Name: Genres, dtype: object

In [142]:
cleaned_data.drop([713,1309,1475],inplace=True)

In [143]:
cleaned_data['Genres'][cleaned_data['Genres'].apply(lambda x: len(x) == 0)]

Series([], Name: Genres, dtype: object)

## 保存清理后的数据

In [146]:
cleaned_data.head()

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,2022/02/25,"Bandai Namco Entertainment,FromSoftware",4.5,3900.0,3900.0,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,2019/12/10,Supergiant Games,4.3,2900.0,2900.0,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,2017/03/03,"Nintendo,Nintendo EPD Production Group No. 3",4.4,4300.0,4300.0,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,2015/09/15,"tobyfox,8-4",4.2,3500.0,3500.0,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,2017/02/24,Team Cherry,4.4,3000.0,3000.0,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,


In [150]:
cleaned_data.to_csv("../数据集/整洁数据/game_cleaned.csv", index=False)

In [151]:
cleaned_data.to_excel("../数据集/整洁数据/game_cleaned.xlsx", index=False)

In [152]:
pd.read_csv("../数据集/整洁数据/game_cleaned.csv").head()

Unnamed: 0,Title,Release Date,Team,Rating,Times Listed,Number of Reviews,Genres,Summary,Reviews,Plays,Playing,Backlogs,Wishlist,Main_team,Secondary_teams
0,Elden Ring,2022/02/25,"Bandai Namco Entertainment,FromSoftware",4.5,3900.0,3900.0,Adventure,"Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and...","[""The first playthrough of elden ring is one of the best eperiences gaming can offer you but aft...",17K,3.8K,4.6K,4.8K,Bandai Namco Entertainment,FromSoftware
1,Hades,2019/12/10,Supergiant Games,4.3,2900.0,2900.0,Adventure,"A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the ...",['convinced this is a roguelike for people who do not like the genre. The art is technically goo...,21K,3.2K,6.3K,3.6K,Supergiant Games,
2,The Legend of Zelda: Breath of the Wild,2017/03/03,"Nintendo,Nintendo EPD Production Group No. 3",4.4,4300.0,4300.0,Adventure,The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Lin...,['This game is the game (that is not CS:GO) that I have played the most ever. I have played this...,30K,2.5K,5K,2.6K,Nintendo,Nintendo EPD Production Group No. 3
3,Undertale,2015/09/15,"tobyfox,8-4",4.2,3500.0,3500.0,Adventure,"A small child falls into the Underground, where monsters have long been banished by humans and a...",['soundtrack is tied for #1 with nier automata. a super charming story and characters which hav...,28K,679,4.9K,1.8K,tobyfox,8-4
4,Hollow Knight,2017/02/24,Team Cherry,4.4,3000.0,3000.0,Adventure,A 2D metroidvania with an emphasis on close combat and exploration in which the player enters th...,"[""this games worldbuilding is incredible, with its amazing soundtrack and gorgeous art direction...",21K,2.4K,8.3K,2.3K,Team Cherry,
