# 项目：评估与清理游戏售卖数据

## 分析目标

该数据集分析市场趋势、玩家行为和销售业绩，使其对游戏行业的开发人员、发行商和分析师产生价值。查找每个国家最畅销的游戏类型

## 简介

视频游戏销售和下载数据集包含有关跨平台视频游戏销售、下载和用户参与度的数据。它包括诸如游戏标题、类型、平台、发布日期、总销售量（实体和数字）、区域分销、收入、评级和游戏内购买等详细信息。

数据每列的含义如下：
- `Rank`: 在销售排名中的位置。
- `Name`: 游戏名称。
- `Platform`: 主机/PC 系统。
- `Year`: 发行年份。
- `Genre`: 游戏类型（动作、角色扮演等）。
- `Publisher`: 发行游戏的公司。
- `NA_Sales`: 北美销量（百万）。
- `EU_Sales`: 欧洲销量（百万）。
- `JP_Sales`: 日本销量（百万）。
- `Other_Sales`: 其他地区的销量（百万）。
- `Global_Sales`: 全球总销量（百万）。

## 读取数据

#### **导入pandas，并读取数据**

In [1]:
import pandas as pd

In [2]:
Orgin_Data = pd.read_csv("./video games sales.csv")

In [3]:
Orgin_Data

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


#### **将Rank作为索引**

In [4]:
Orgin_Data.set_index("Rank",inplace=True)
Orgin_Data

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...
16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


## 评估数据

#### **开始检查**

In [5]:
Orgin_Data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 16598 entries, 1 to 16600
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Name          16598 non-null  object 
 1   Platform      16598 non-null  object 
 2   Year          16327 non-null  float64
 3   Genre         16598 non-null  object 
 4   Publisher     16540 non-null  object 
 5   NA_Sales      16598 non-null  float64
 6   EU_Sales      16598 non-null  float64
 7   JP_Sales      16598 non-null  float64
 8   Other_Sales   16598 non-null  float64
 9   Global_Sales  16598 non-null  float64
dtypes: float64(6), object(4)
memory usage: 1.4+ MB


**缺失值在`Publisher`与`Year`出现，并且索引缺失2行，并且需要将年份改为时间类型**

随机检查

In [6]:
Orgin_Data.sample(10)

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
3808,F1 2009,PSP,2009.0,Racing,Codemasters,0.08,0.29,0.0,0.16,0.53
14227,James Pond: Codename Robocod,GBA,2005.0,Platform,Play It,0.02,0.01,0.0,0.0,0.03
15285,Jake Hunter: Detective Chronicles,DS,2008.0,Adventure,Aksys Games,0.02,0.0,0.0,0.0,0.02
2625,FIFA Soccer 2004,XB,2003.0,Sports,Electronic Arts,0.24,0.49,0.0,0.05,0.79
8765,American Chopper,XB,2004.0,Racing,Zoo Digital Publishing,0.11,0.03,0.0,0.01,0.15
9598,The Sky Crawlers: Innocent Aces,Wii,2008.0,Simulation,Namco Bandai Games,0.09,0.02,0.01,0.01,0.13
15224,Tennis no Oji-Sama: Driving Smash! Side King,DS,2008.0,Sports,Konami Digital Entertainment,0.0,0.0,0.02,0.0,0.02
1225,Call of Duty: Modern Warfare: Reflex Edition,Wii,2009.0,Shooter,Activision,0.95,0.43,0.0,0.14,1.53
16579,Rugby Challenge 3,XOne,2016.0,Sports,Alternative Software,0.0,0.01,0.0,0.0,0.01
6641,Spy Kids Challenger,GBA,2002.0,Platform,Disney Interactive Studios,0.18,0.07,0.0,0.0,0.25


检查空白值

In [7]:
Orgin_Data[Orgin_Data["Publisher"].isnull()]

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
471,wwe Smackdown vs. Raw 2006,PS2,,Fighting,,1.57,1.02,0.0,0.41,3.0
1305,Triple Play 99,PS,,Sports,,0.81,0.55,0.0,0.1,1.46
1664,Shrek / Shrek 2 2-in-1 Gameboy Advance Video,GBA,2007.0,Misc,,0.87,0.32,0.0,0.02,1.21
2224,Bentley's Hackpack,GBA,2005.0,Misc,,0.67,0.25,0.0,0.02,0.93
3161,Nicktoons Collection: Game Boy Advance Video V...,GBA,2004.0,Misc,,0.46,0.17,0.0,0.01,0.64
3168,SpongeBob SquarePants: Game Boy Advance Video ...,GBA,2004.0,Misc,,0.46,0.17,0.0,0.01,0.64
3768,SpongeBob SquarePants: Game Boy Advance Video ...,GBA,2004.0,Misc,,0.38,0.14,0.0,0.01,0.53
4147,Sonic the Hedgehog,PS3,,Platform,,0.0,0.48,0.0,0.0,0.48
4528,The Fairly Odd Parents: Game Boy Advance Video...,GBA,2004.0,Misc,,0.31,0.11,0.0,0.01,0.43
4637,The Fairly Odd Parents: Game Boy Advance Video...,GBA,2004.0,Misc,,0.3,0.11,0.0,0.01,0.42


58个缺失发行公司

In [8]:
Orgin_Data[Orgin_Data["Year"].isnull()]

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
180,Madden NFL 2004,PS2,,Sports,Electronic Arts,4.26,0.26,0.01,0.71,5.23
378,FIFA Soccer 2004,PS2,,Sports,Electronic Arts,0.59,2.36,0.04,0.51,3.49
432,LEGO Batman: The Videogame,Wii,,Action,Warner Bros. Interactive Entertainment,1.86,1.02,0.00,0.29,3.17
471,wwe Smackdown vs. Raw 2006,PS2,,Fighting,,1.57,1.02,0.00,0.41,3.00
608,Space Invaders,2600,,Shooter,Atari,2.36,0.14,0.00,0.03,2.53
...,...,...,...,...,...,...,...,...,...,...
16310,Freaky Flyers,GC,,Racing,Unknown,0.01,0.00,0.00,0.00,0.01
16330,Inversion,PC,,Shooter,Namco Bandai Games,0.01,0.00,0.00,0.00,0.01
16369,Hakuouki: Shinsengumi Kitan,PS3,,Adventure,Unknown,0.01,0.00,0.00,0.00,0.01
16430,Virtua Quest,GC,,Role-Playing,Unknown,0.01,0.00,0.00,0.00,0.01


271个缺失年份

不影响

In [9]:
Orgin_Data.duplicated()

Rank
1        False
2        False
3        False
4        False
5        False
         ...  
16596    False
16597    False
16598    False
16599    False
16600    False
Length: 16598, dtype: bool

In [10]:
Orgin_Data.duplicated()

Rank
1        False
2        False
3        False
4        False
5        False
         ...  
16596    False
16597    False
16598    False
16599    False
16600    False
Length: 16598, dtype: bool

In [11]:
Orgin_Data[Orgin_Data.duplicated()]

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
15002,Wii de Asobu: Metroid Prime,Wii,,Shooter,Nintendo,0.0,0.0,0.02,0.0,0.02


In [12]:
Orgin_Data.loc[15002]

Name            Wii de Asobu: Metroid Prime
Platform                                Wii
Year                                    NaN
Genre                               Shooter
Publisher                          Nintendo
NA_Sales                                0.0
EU_Sales                                0.0
JP_Sales                               0.02
Other_Sales                             0.0
Global_Sales                           0.02
Name: 15002, dtype: object

In [13]:
Orgin_Data[14990:15003]

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
14993,Bomberman Fantasy Race,PS,1998.0,Racing,Virgin Interactive,0.01,0.01,0.0,0.0,0.02
14994,Colin McRae Rally 2.0,GBA,2002.0,Racing,Codemasters,0.02,0.01,0.0,0.0,0.02
14995,Carnage Heart EXA,PSP,2010.0,Strategy,ArtDink,0.0,0.0,0.02,0.0,0.02
14996,Tropico 4: Modern Times,PC,2012.0,Strategy,Kalypso Media,0.0,0.02,0.0,0.0,0.02
14997,Europa Universalis III Complete,PC,2008.0,Strategy,Paradox Interactive,0.0,0.02,0.0,0.0,0.02
14998,Bugriders: The Race of Kings,PS,1997.0,Racing,GT Interactive,0.01,0.01,0.0,0.0,0.02
14999,Bust-A-Move,3DO,1994.0,Puzzle,Micro Cabin,0.0,0.0,0.02,0.0,0.02
15000,Wii de Asobu: Metroid Prime,Wii,,Shooter,Nintendo,0.0,0.0,0.02,0.0,0.02
15001,Payout Poker & Casino,PSP,,Misc,Unknown,0.02,0.0,0.0,0.0,0.02
15002,Wii de Asobu: Metroid Prime,Wii,,Shooter,Nintendo,0.0,0.0,0.02,0.0,0.02


In [14]:
Orgin_Data[Orgin_Data.duplicated()]

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
15002,Wii de Asobu: Metroid Prime,Wii,,Shooter,Nintendo,0.0,0.0,0.02,0.0,0.02


In [15]:
Orgin_Data.dropna()

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...
16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [19]:
New_Data = Orgin_Data.dropna()

清理完毕

# 保存数据 #

In [20]:
New_Data.head()

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [22]:
New_Data.to_csv("Finalwork.csv")