Let me introduce the analytical thinking of this article. In the analysis, this article divides the video games into three levels. The first level is all the observations after deleting the missing values, through which we can know the general composition and development trend of the game market. The second level is the top 100 games in the world, and this article regards them as excellent in the game market. The third level is the top 10 games in the world. This article regards them as classics in the game market. When investigating game-related variables, we will analyze these three levels separately, and draw more diverse conclusions through the differences shown by the data at different levels. 

# 1.Prepossessing

Firstly, import the packages and data

In [None]:
import numpy as np
import pandas as pd 
import re
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly
from scipy.stats import chi2_contingency
from wordcloud import wordcloud
import matplotlib.pyplot as plt

In [None]:
data = pd.read_csv("../input/videogamesales/vgsales.csv")

Preview the first ten rows of data.

In [None]:
data.head(10)

View the data types and missing values of each column.

In [None]:
data.info()

It can be seen that there are missing values in the `Year` and `Publisher` columns, but the proportions are low, which will not affect the relevant research on the variables, so the samples with missing values are directly deleted. 

In [None]:
data = data.dropna()

Since some samples were deleted, the index needs to be reset. 

In [None]:
data = data.reset_index(drop=True)

Taking into account that after deleting the sample, the value of the rank column is discontinuous, so it is necessary to update the `rank` column according to the sample again.

In [None]:
data.loc[:,'Rank'] = np.arange(data.shape[0])+1

Change the data type of the `Year` column from floating to integer for subsequent display

In [None]:
data['Year'] = data['Year'].astype(int)

# 2.Data Analysis 

## 2.1 Overview of overall sales data 

Overview the sales distribution of the overall data

In [None]:
fig = px.box(data, y="Global_Sales", points="all", height = 400, color_discrete_sequence=['#A4CCD9'])
fig.show()

Since the data distribution has a large value range, and most of them are concentrated around 0-10, it is not appropriate to use box plots or histograms to display directly. Therefore, the data is grouped by `Global_Sales` and displayed with histograms. 

In [None]:
temp1 = data[data['Global_Sales'] < 10 ]
fig = px.histogram(temp1, x="Global_Sales", height = 400, color_discrete_sequence=['#A4CCD9'])
fig.show()

In [None]:
temp2 = data[data['Global_Sales'] > 10]
fig = px.histogram(temp2, x="Global_Sales", height = 400, color_discrete_sequence=['#A4CCD9'])
fig.show()

By observing the first graph, we can see that the number of games with a total sales value of 0-1 is extremely high. By observing the second graph, we can see that the top-selling game is 40 units higher than the second. Combining the two figures, it can be found that the number of games with a total sales value of more than 10 is relatively small, and the sales value of most games is in the range of 0-1, which to a certain extent reflects the two-eight law of sales in the game industry. 

Plot the changes in circulation and sales over time. 

In [None]:
data_yearcount = data.groupby(data['Year'])[['Rank']].count().rename(columns={'Rank':'counts'})
data_yearsales = data.groupby(data['Year'])[['Global_Sales']].sum()

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
    go.Bar(x=data_yearcount.index, y=data_yearcount['counts'], marker=dict(color='rgba(17, 145, 171, 0.6)'), name = 'counts'),
    secondary_y=False,
)
fig.add_trace(
    go.Scatter(x=data_yearsales.index, y=data_yearsales['Global_Sales'], name='Global_Sales'),
    secondary_y=True,
)
fig.update_xaxes(title_text="Year")
fig.update_yaxes(title_text="<b>counts</b>", secondary_y=False)
fig.update_yaxes(title_text="<b>Global_Sales</b>", secondary_y=True)
fig.show()

It can be seen from the figure that in 2008 and 2009, the distribution of electronic games was the largest, and the sales of published games were also the highest, which can be regarded as the golden age of electronic games. The total sales volume of games in a given year is generally proportional to the number of games released. However, in 2004, although the number of games released was lower than that of the previous two years, the total sales volume of games exceeded the previous two years. High or the emergence of a hot game, by checking the game sales data in 2004 in detail, it is finally judged that the reason is the former. In addition, before 1995, although the number of games issued each year was extremely low, the total sales volume was relatively not low. It may be that the games gradually gained collection value as the years grew,and then purchased by the player. 

Next, analyze the characteristics of the game itself. 

## 2.2 Platform

Analyze the occupancy and total sales of different platforms in three levels of games: 

In [None]:
data_platformcount = data.groupby(data['Platform'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)

fig = px.bar(data_platformcount, x=data_platformcount.index, y='counts', color='counts',color_continuous_scale=['rgba(17, 171, 122, 0.6)', 'rgba(17, 145, 171, 0.6)'],
              height=400)
fig.show()

From the perspective of the total game market, there are about 31 electronic game distribution platforms. The DS and PS2 platforms have the highest total game distribution. The remaining platforms have successively decreased and more continuous distribution without any gaps, indicating that each platform is no monopoly on game share. 

In [None]:
data_platformcount100 = data[0:100].groupby(data['Platform'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)

fig = px.bar(data_platformcount100, x=data_platformcount100.index, y='counts', color='counts',color_continuous_scale=['rgba(17, 171, 122, 0.6)', 'rgba(17, 145, 171, 0.6)'],
              height=400)
fig.show()

A total of 17 platforms have issued the games of the second level. X360 and Wii platforms have emerged at this level. Looking back at the first picture, these two platforms belong to the second echelon in terms of total circulation. The comparison shows that they are not forgetting the pursuit of game quality while focusing on market size. 

In [None]:
data_platformcount10 = data[0:10].groupby(data['Platform'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)

fig = px.bar(data_platformcount10, x=data_platformcount10.index, y='counts', color='counts',color_continuous_scale=['rgba(17, 171, 122, 0.6)', 'rgba(17, 145, 171, 0.6)'],
              height=400)
fig.show()

Focusing on the third level, you will be surprised to find that half of the top ten games sold are from the Wii platform, which shows the dominance of the Wii platform among excellent games. It is also worth noting that both the GB platform and the NES platform have two games shortlisted in the top ten sales, and their circulation in the first level is only 98 and 97 respectively. At first glance, it seems that the two platforms have high requirements for the quality of their own published games. But after a closer look at the games on these two game platforms, I found that their release time was concentrated in 1983-2001. There are many masterpieces with a total sales value of 10 or more. The platform's game consoles were all discontinued around 2000, and the reason for the discontinuation was that they were replaced by platforms with higher comprehensive performance. Once a smash hit, it will inevitably become a thing of past, which is unavoidable. At the same time, it can be seen that the game industry is deeply affected by technological development, and the most popular game platform will gradually be eliminated with technological innovation. 

Map the total sales of each platform.

In [None]:
data_platformsales = data.groupby(data['Platform'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)

fig = px.bar(data_platformsales, x=data_platformsales.index, y='Global_Sales', color='Global_Sales',color_continuous_scale=['rgba(17, 171, 122, 0.6)', 'rgba(17, 145, 171, 0.6)'],
              height=400)
fig.show()

By comparing the sales of different platforms and combining the above three distribution charts, we can find that the operating strategies of different game platforms are diverse. Taking the top five platforms with higher sales as an example, PS2 and DS mainly focus on volume to win, and High game circulation maintains the continuous operation of the platform. In contrast, X360, Wii and PS3 pay attention to game quality while maintaining the basic circulation, which can also bring fame and fortune. 

## 2.3 Genre

Draw the circulation and sales of different game genres 

In [None]:
data_genrecount = data.groupby(data['Genre'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)
data_genresales = data.groupby(data['Genre'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)
fig = px.treemap(
    names = data_genrecount.index, parents = ['total']*12, 
    values = data_genrecount['counts'],color=data_genresales['Global_Sales'],
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(data_genresales['Global_Sales'])
)
fig.show()

In [None]:
data_genrecount100 = data[0:100].groupby(data['Genre'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)
data_genresales100 = data[0:100].groupby(data['Genre'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)
fig = px.treemap(
    names = data_genrecount100.index, parents = ['total']*11,
    values = data_genrecount100['counts'],color=data_genresales100['Global_Sales'],
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(data_genresales100['Global_Sales'])
)
fig.show()

In [None]:
data_genrecount10 = data[0:10].groupby(data['Genre'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)
data_genresales10 = data[0:10].groupby(data['Genre'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)
fig = px.treemap(
    names = data_genrecount10. index, parents = ['total']*7,
    values = data_genrecount10['counts'],color=data_genresales10['Global_Sales'],
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(data_genresales10['Global_Sales'])
)
fig.show()

The above three tree diagrams all use area to indicate circulation, and color to indicate sales. Looking at the figures, we can see that game themes are divided into 12 categories, and the market has a high degree of tolerance for each theme. In terms of total distribution and total sales, action games have a higher market share. In terms of excellent works, shooting and platform games are even better, but in general, the distribution and sales of games with different themes The difference is not extreme, and the top ten games in sales involve seven different themes, which shows that the success of a game will not be limited by the theme itself. 

Next, three game themes, sports, shooting, and action, were selected, and their sales changes in different years were drawn to explore the popularity of different themes in the time dimension. 

In [None]:
genresam = list(data.groupby(data['Genre'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)[0:3].index)
platsam = list(data.groupby(data['Platform'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)[0:5].index)
temp3 = data[(data['Genre'].isin(genresam)) & (data['Platform'].isin(platsam))]

data_genreyear = temp3.groupby(['Year', 'Genre'])[['Global_Sales']].sum().sort_values('Year', ascending = False).reset_index(drop = False)

fig = px.line(data_genreyear, x="Year", y="Global_Sales",
             color='Genre',color_discrete_sequence=['rgba(17, 171, 122, 0.6)',  'rgba(17, 104, 171, 0.6)', 'rgba(93, 81, 240, 0.6)'],
             height=400)
fig.show()

It can be seen from the above line chart that action games have been popular since 1985, while sports and shooting games only began to appear in the market in 2000. In addition, the cycles of popularity of different themes are different. For example, the sales of sports games reached the peak in 2006 and 2009, while the sales of shooting games reached the peak in 2011, but they will all be affected by the macro game market. For example, 2008 and 2009 were at the golden age of the game market, and their sales were also at their respective high levels. 

Finally, the five game platforms with the highest total sales and the three themes of action, shooting, and sports are selected to explore whether there is a preference for the theme of the release of games on different platforms, and whether there is a difference in the popularity of games with different themes issued by the platform. 

In [None]:
data_genreplatcount = temp3.groupby(['Platform', 'Genre'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('Platform', ascending = False).reset_index(drop = False)

fig = px.bar_polar(data_genreplatcount, r="counts", theta="Platform", 
                   color='Genre', color_discrete_sequence= ['rgba(17, 171, 122, 0.6)',  'rgba(17, 104, 171, 0.6)', 'rgba(17, 145, 171, 0.6)'])
fig.show()

It can be seen from the wind rose chart that different platforms have selectivity in the subject matter of the game. For example, the Wii and PS2 platforms mainly publish sports-themed games, while the PS3 and DS platforms are mainly action games. The distribution of games of various themes in the X360 platform is relatively even. 

In [None]:
data_genreplat = temp3.groupby(['Platform', 'Genre'])[['Global_Sales']].sum().sort_values('Platform', ascending = False).reset_index(drop = False)

fig = px.bar_polar(data_genreplat, r='Global_Sales', theta="Platform", 
                   color='Genre', color_discrete_sequence= ['rgba(17, 171, 122, 0.6)',  'rgba(17, 104, 171, 0.6)', 'rgba(17, 145, 171, 0.6)'])
fig.show()

The wind rose chart of sales more reflects the differences in game themes displayed by different platforms. PS3 has similar sales in the three themes. Although X360 has similar distributions for the three themes, the shooting category brings this platform With greater sales gains, Wii and PS2 both gained higher sales through their main sports games, while the DS platform did not perform well in the sales of the three themes. Perhaps the platform has other good topics. 

## 2.4 Publisher

Since there are many publishers in the market, let’s first check the distribution of the top ten publishers in the first tier 

In [None]:
data_pubcount = data.groupby(data['Publisher'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)[:10]

fig = px.bar(data_pubcount, x=data_pubcount.index, y='counts', color='counts',color_continuous_scale=['rgba(17, 171, 122, 0.6)', 'rgba(17, 145, 171, 0.6)'],
              height=400)
fig.show()

It can be seen that in terms of the number of releases, the number of games from major manufacturers is extremely close. Then observe the number of games issued by the second-tier publishers 

In [None]:
data_pubcount100 = data[:100].groupby(data['Publisher'])[['Rank']].count().rename(columns = {'Rank':'counts'}).sort_values('counts', ascending = False)

fig = px.pie(data_pubcount100 , names=data_pubcount100.index, values='counts', template='seaborn')
fig.update_traces(pull=[0.06,0.06,0.06,0.06,0.06], textinfo="percent+label")
fig.show()

It can be clearly seen from the pie chart that Nintendo accounts for more than half of the total number of second-tier games.And if you look at the third level, you will find that the top ten games sold are all from Nintendo, which shows that Nintendo's position in the game industry is booming. 

Take a closer look at the sales of the top ten publishers 

In [None]:
data_pubsales = data.groupby(data['Publisher'])[['Global_Sales']].sum().sort_values('Global_Sales', ascending = False)[0:10]

fig = px.bar(data_pubsales, x=data_pubsales.index, y='Global_Sales', color='Global_Sales',color_continuous_scale=['rgba(17, 171, 122, 0.6)', 'rgba(17, 145, 171, 0.6)'],
              height=400)
fig.show()

Although Nintendo’s total number of distributions is not outstanding, its sales ranks first, and it is close to the sum of the second and third place sales, which shows that Nintendo has achieved success in overall game quality and sales. 

## 2.5 Market

View the proportion of total sales in each market through the pie chart: 

In [None]:
region_sec = data[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']].apply(lambda x: x. sum (), axis = 0)
region_sum = pd.DataFrame.from_dict(region_sec.to_dict(), orient = 'index', columns = ['sum']).sort_values('sum', ascending = False)

fig = px.pie(region_sum , names=region_sum.index, values='sum', template='seaborn')
fig.update_traces(pull=[0,0.01,0.01,0.01],textinfo="percent+label")

fig.show()

Since other markets have a small share of sales, this section focuses on the comparative analysis of the North American market, the European market and the Japanese market. First look at the sales changes in the three markets over time: 

In [None]:
region_sales = data.groupby(data['Year'])[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']].sum()

fig = px.bar(region_sales, x=region_sales.index, y=['NA_Sales', 'EU_Sales', 'JP_Sales'], labels={'value':'Sales'},
             color_discrete_sequence=['rgba(17, 171, 122, 0.6)',  'rgba(17, 104, 171, 0.6)', 'rgba(17, 145, 171, 0.6)'])
fig.show()

It can be seen from the figure that, except for game sales before 1990, which were almost contributed by North American players, the proportion of the three game sales remained basically unchanged for the rest of the time. 

Next, we will explore whether there are differences in the popularity of different game publishing platforms, genres and publishers in the three markets. (The bubble charts in this section all use the horizontal axis to represent sales in the North American market, the vertical axis to represent sales in the European market, the size of the bubbles to represent sales in Japan, and the colors to represent different types of qualitative variables) 

In [None]:
region_platsales = data.groupby(data['Platform'])[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']].sum().sort_values('Global_Sales', ascending = False)[:10]

fig = px.scatter(region_platsales, x="NA_Sales", y="EU_Sales",
         size="JP_Sales", color=region_platsales.index,
                 hover_name=region_platsales.index, size_max=60)
fig.show()

Analysis of the above figure shows that there are significant differences in the sales of some platforms in different regions. For example, the GBA platform has high sales in the North American and European markets, but the sales in Japan is not good. The PS3 platform has high sales in North America and Japan, and the sales in the North American market. The sales volume is relatively low, and the PS2 platform has achieved better results in all three markets. 

In [None]:
region_genresales = data.groupby(data['Genre'])[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']].sum().sort_values('Global_Sales', ascending = False)[:10]

fig = px.scatter(region_genresales, x="NA_Sales", y="EU_Sales",
         size="JP_Sales", color=region_genresales.index,
                 hover_name=region_genresales.index, size_max=60)
fig.show()

Regarding the sales of different themes in the three markets, it can be seen from the figure that all the bubbles are basically in a straight line. From this, it can be inferred that the North American and European markets have similar preferences for different themes. However, as the bubble rises, the shape of the bubble varies from time to time. It can be seen that the North American and European markets and the Japanese market have different preferences for themes. A more obvious example is the role-playing theme, which is the highest-selling game theme in Japan. However, sales in North America and Europe are lower. 

In [None]:
region_pubsales = data.groupby(data['Publisher'])[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']].sum().sort_values('Global_Sales', ascending = False)[:10]

fig = px.scatter(region_pubsales, x="NA_Sales", y="EU_Sales",
         size="JP_Sales", color=region_pubsales.index,
                 hover_name=region_pubsales.index, size_max=60)
fig.show()

Regarding the sales of different publishers in the three markets, it can be found that except for Nintendo’s highest sales in the three markets, the three markets have different preferences for other publishers. For example, Electronic Arts is more popular in Europe and North America. Welcome. In Japan, sales are extremely low. Ubisoft is more popular in Japan, but its business is poor in North America and Europe. 

## 2.6 Game Name

This section excavates the game name based on text analysis technology and explores the characteristics of the game name. 

First, extract the video game name text, segment it, remove the stop words, calculate the word frequency, and generate a word cloud based on the weight of the word frequency to obtain a three-level game name text word cloud. 

In [None]:
with open('../input/stopword/stopword.txt', "r",encoding='utf-8') as f:
    stopWords = f.read()
    stopWordList = stopWords.splitlines()
    stopWordDict = dict(zip(stopWordList, list(range(len(stopWordList)))))
    
texts = data['Name']
def get_words(len):
    texts = data[:len]['Name']
    words = []
    documents = []
    for document in texts:
        documents.append(re.split(' |/|:|!',document))
        for i in re.split(' |/|:|!',document):
            if i not in stopWordList:
                words.append(i)
    return words
all_words = get_words(data.shape[0])

count = {}
for item in all_words:
    count[item] = count.get(item, 0) + 1

word_count = pd.DataFrame.from_dict(count, orient = 'index', columns = ['counts']).sort_values('counts', ascending = False)[0:10]


WC = wordcloud.WordCloud(background_color='white',
    width=2000,
    height=1200,
    max_font_size=400, 
    min_font_size=40,
    max_words=500)

con = WC.generate_from_frequencies(count)
plt.imshow(con)
plt.axis("off")
plt.show()

In [None]:
words100 = get_words(100)

count100 = {}
for item in words100:
    count100[item] = count100.get(item, 0) + 1
    
word_count100 = pd.DataFrame.from_dict(count100, orient = 'index', columns = ['counts']).sort_values('counts', ascending = False)[:10]

WC = wordcloud.WordCloud(background_color='white',
    width=2000,
    height=1200,
    max_font_size=400, 
    min_font_size=40,
    max_words=500)

con = WC.generate_from_frequencies(count100)
plt.imshow(con)
plt.axis("off")
plt.show()

In [None]:
words10 = get_words(10)

count10 = {}
for item in words10:
    count10[item] = count10.get(item, 0) + 1
    
word_count10 = pd.DataFrame.from_dict(count10, orient = 'index', columns = ['counts']).sort_values('counts', ascending = False)

WC = wordcloud.WordCloud(background_color='white',
    width=2000,
    height=1200,
    max_font_size=400, 
    min_font_size=40,
    max_words=500)

con = WC.generate_from_frequencies(count10)
plt.imshow(con)
plt.axis("off")
plt.show()

Through the first-level word cloud, we can see that the high-frequency words are "star", "super", "dragon","Game", "heroes", "pro", etc., just judge the first five words with animation and game color, so it is widely used in game titles, and "pro" means professional, and is often placed at the end of the game's original name, indicating a professional version of the game, so it appears more frequently. Looking at the word cloud of the second-level game name, the high-frequency words are "Pokemon", "Mario", "Wii", "call", "duty". The first two words are translated into Pokémon and Mario respectively, which are electronic Characters that often appear in games, and "Wii" as the name of the game platform, appears in the game name to indicate the Wii version of the game, and its high frequency is also understandable. But the remaining two words and the "duck" and "blue" in the third-level word cloud not only fail to understand the reason for their high frequencies from the meaning of the word alone, but they are also not proper nouns. I guess whether they are related to "Pokemon". "The same is a special game IP, but after consulting, it is not found that these single words have the same name of the game IP. The author further guessed whether they may be IP formed by the combination of words, and the word segmentation operation separates the IP and cannot be found. 

In order to verify this conjecture, the author selected all the words with a word frequency of more than 20, a total of 541 words, traversed the text database of the game name, and constructed a co-occurrence matrix in which two words appear in a game name at the same time. The elements of the co-occurrence matrix are $𝑋_{ij}$ Indicates the number of game titles that appear in the i-th term and the j-th term at the same time. Although the co-occurrence matrix can only count the frequency of two words appearing at the same time due to dimensional limitations, if the game IP name exceeds two words, the IP will also appear in the co-occurrence matrix through any two words in the complete IP name. Therefore, most IPs can be identified only by counting the collinear frequency of two words. After constructing the co-occurrence matrix, the co-occurrence frequency of "call" and "duty" is found to be the highest. If you search for these two words at the same time, you can find the famous IP "call of duty", which is successfully explained the reason for the higher frequency of "call" and "duty". 

However, the co-occurrence matrix can not only find the most closely related word through a word, but also actively discover the hidden IP names in the game by searching for the largest co-occurrence frequency in the entire matrix. By simply obtaining the ten phrases with the highest co-occurrence frequency, the following IPs can be found by combining the Internet search: 

![](https://storage.googleapis.com/kagglesdsdata/datasets/1119874/1880581/result.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20210126%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210126T100118Z&X-Goog-Expires=172799&X-Goog-SignedHeaders=host&X-Goog-Signature=11cbb500f2392bd17caafd2b26913eed45802774707c8ebb693b43fb7272a0641083a52468b7154128ef555b9b28ed2ac8f9e39dcbb99ede6b102ccbf18131988f072b36f1ba5011d049fdf6d0817873439c10bcfe44323ab024991ba942bac264029ac332800788875813e456296ab5da06201bd5230e60cad0f6d4ae086fb0d1449decf3370bfb25d1a2f818d2d5e8506ecab8d936d23c98daca922b883a3f77d3afbe05a86265c63992276b5bd403bfba94a68b1d32890d4758947535c484dbfa99c25ac9fa74d6be0c244fee9a44065bb696ea8bd3847705a4f51d75244ec3bb12e6e7dae3637e8fe8d5da3f953ea99bdf1fe2b00a719d6d75566b153df4)

# 3 Summary

Based on the sales data of electronic games, this article starts with the distribution and sales indicators of electronic game publishing platforms, game themes, and publishers. Based on the three levels of the game, it makes a multi-dimensional comparison of the different values of the above game attributes. After that, the market is segmented and analyzed whether different regional markets have different preferences for the attributes of video games. Finally, based on the idea of text analysis, text mining is performed on the game name. By studying the frequency of a single word and the co-occurrence matrix of two words, it is found that the game name usually involves the game IP. Brief analysis of the relationship between. 