<center>

# Our Python is in another castle

### Videogame data analysis

    
#### Alan Verdugo
#### (Marketing Systems, CIO)

</center>

### A (brief) introduction to Python

Python is a popular and friendly programming language

It has *MANY* use cases, including:
- Data science (AI, ML, statistics, etc.)
- Other sciences (physics, biology, chemistry, electro-magnetism, etc.)
- System administration
- Automation
- Web development
- Backend
- Desktop and mobile apps!
- Games!!
- Much more!!!

<center>

![use_cases](use_cases.png)

</center>

### The importance of data analysis

<center>

![ghost_map1](ghost_map1.jpg)

</center>

<center>

![ghost_map2](ghost_map2.jpg)

</center>

### Let's analyze video games sales data

### Import modules.

In [20]:
import pandas as pd
import json

In [17]:
dataframe = pd.read_csv("/home/alan/git/game_data_analysis/vgsales.csv")
dataframe = dataframe.drop(dataframe[dataframe['Year'] > 2015].index)

In [18]:
import numpy as np
import chart_studio.plotly as py
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go
import plotly.figure_factory as ff
import matplotlib.pyplot as plt

### We can display the dataframe as a table

In [24]:
top = dataframe.head(10)
table = ff.create_table(top, colorscale="ice")
# Other built-in colorscales:
# mint        orrd        oranges     oryel       peach       pinkyl      plasma      plotly3
# rdpu        redor       reds        sunset      sunsetdark  teal        tealgrn     viridis
# gray        haline      ice         matter      solar       speed       tempo       thermal
# turbid      armyrose    brbg        earth       fall        geyser      prgn        piyg
for i in range(len(table.layout.annotations)):
    table.layout.annotations[i].font.size = 8
iplot(table)

### Bar charts!

In [26]:
df = dataframe.head(100)
trace = go.Histogram(x=df.Platform,
                     marker=dict(color="crimson", line=dict(color='black', width=2)),opacity=0.75)
layout = go.Layout(
    title='Numbers of top 100 videogames publishers',
    xaxis=dict(title='Publishers'),
    yaxis=dict(title="Count"),
    bargap=0.2,
    bargroupgap=0.1,
    paper_bgcolor="rgb(243, 243, 243)",
    plot_bgcolor="rgb(243, 243, 243)")
iplot(go.Figure(data=[trace], layout=layout))

### Rank the top 100 games in a scatter plot

In [27]:
trace1 = go.Scatter(x = df.Rank,
                    y = df.NA_Sales,
                    mode = "markers",
                    name = "North America",
                    marker = dict(color = 'rgba(28, 149, 249, 0.8)',size=8),
                    text = df.Name)
trace2 = go.Scatter(x = df.Rank,
                    y = df.EU_Sales,
                    mode = "markers",
                    name = "Europe",
                    marker = dict(color = 'rgba(249, 94, 28, 0.8)',size=8),
                    text = df.Name)
trace3 = go.Scatter(x = df.Rank,
                    y = df.JP_Sales,
                    mode = "markers",
                    name = "Japan",
                    marker = dict(color = 'rgba(150, 26, 80, 0.8)',size=8),
                    text = df.Name)
trace4 = go.Scatter(x = df.Rank,
                    y = df.Other_Sales,
                    mode = "markers",
                    name = "Other",
                    marker = dict(color = 'lime',size=8),
                    text = df.Name)
data = [trace1, trace2, trace3, trace4]
layout = dict(title = 'North America, Europe, Japan and Other Sales of top 100 videogames',
              xaxis = dict(title='Rank', ticklen=5, zeroline=False, zerolinewidth=1, gridcolor="white"),
              yaxis = dict(title='Sales (In millions)', ticklen=5, zeroline=False, zerolinewidth=1, gridcolor="white"),
              paper_bgcolor = 'rgb(243, 243, 243)',
              plot_bgcolor = 'rgb(243, 243, 243)')
iplot(dict(data = data, layout = layout))

### Another bar chart

#### (Now with the proportion of genres in each region)

In [28]:
genre=pd.DataFrame(dataframe.groupby("Genre")[["NA_Sales","EU_Sales","JP_Sales","Other_Sales","Global_Sales"]].sum())
genre.reset_index(level=0, inplace=True)
genrecount=pd.DataFrame(dataframe["Genre"].value_counts())
genrecount.reset_index(level=0, inplace=True)
genrecount.rename(columns={"Genre": "Counts","index":"Genre"}, inplace=True)
genre=pd.merge(genre,genrecount,on="Genre")

table_data=genre[["Genre","NA_Sales","EU_Sales","JP_Sales","Other_Sales","Global_Sales"]]
table_data = table_data.rename(columns = {"NA_Sales": "North America", 
                                  "EU_Sales":"Europe", 
                                  "JP_Sales": "Japan","Other_Sales":"Other","Global_Sales":"Total"})

x=genre.Genre
NA_Perce=list(genre["NA_Sales"]/genre["Global_Sales"]*100)
EU_Perce=list(genre["EU_Sales"]/genre["Global_Sales"]*100)
JP_Perce=list(genre["JP_Sales"]/genre["Global_Sales"]*100)
Other_Perce=list(genre["Other_Sales"]/genre["Global_Sales"]*100)

trace1 = go.Bar(
    x=x,
    y=NA_Perce,
    name="North America" ,
    xaxis='x2', yaxis='y2',
    marker=dict(
        color='rgb(158,202,225)',
        line=dict(
            color='rgb(8,48,107)',
            width=3),
        ),
    opacity=0.75)
trace2 = go.Bar(
    x=x,
    y=EU_Perce,
    xaxis='x2', yaxis='y2',
    marker=dict(
        color='red',
        line=dict(
            color='rgb(8,48,107)',
            width=3),
        ),
    opacity=0.75,
    name = "Europe",
    )
trace3 = go.Bar(
    x=x,
    y=JP_Perce,
    xaxis='x2', yaxis='y2',
  
    marker=dict(
        color='orange',
        line=dict(
            color='rgb(8,48,107)',
            width=3),
        ),
    opacity=0.75,
    name = "Japan",
    )
trace4 = go.Bar(
    x=x,
    y=Other_Perce,
    xaxis='x2', yaxis='y2',
    
    marker=dict(
        color='purple',
        line=dict(
            color='rgb(8,48,107)',
            width=3),
        ),
    opacity=0.75,
    name = "Other",)
trace5=go.Table(
  header = dict(
    values = table_data.columns,
    line = dict(color = 'rgb(8,48,107)',width=3),
    fill = dict(color = ["darkslateblue","blue","red", "orange","purple","green"]),
    align = ['left','center'],
    font = dict(color = 'white', size = 12),
     height=30,
  ),
  cells = dict(
    values = [table_data.Genre,round(table_data["North America"]),round(table_data["Europe"]), round(table_data["Japan"]), round(table_data["Other"]),round(table_data["Total"])],
    height=30,
    line = dict(color = 'rgb(8,48,107)',width=3),
    fill = dict(color = ["silver","rgb(158,202,225)","darksalmon", "gold","mediumorchid","yellowgreen"]),
    align = ['left', 'center'],
    font = dict(color = '#506784', size = 12)),
    domain=dict(x=[0.60,1],y=[0,0.95]),
)

data = [trace1, trace2,trace3,trace4,trace5]
layout = go.Layout(barmode='stack',autosize=False,width=1200,height=650,
                legend=dict(x=.58, y=0,orientation="h",font=dict(family='Courier New, monospace',size=11,color='#000'),
                           bgcolor='beige', bordercolor='beige', borderwidth=1),
                title='North America, Europe, Japan and Other Sales Percentage and Amounts According to Genre',
                titlefont=dict(family='Courier New, monospace',size=17,color='black'),
                xaxis2=dict(domain=[0, 0.50],anchor="y2", title='Genre',titlefont=dict(family='Courier New, monospace'),tickfont=dict(family='Courier New, monospace')), yaxis2=dict( domain=[0, 1],anchor='x2',title="Total Percentage",titlefont=dict(family='Courier New, monospace'),tickfont=dict(family='Courier New, monospace')),
                paper_bgcolor='beige',plot_bgcolor='beige',
                annotations=[ dict( text='Sales Percentage According to Region',x=0.08,y=1.02,xref="paper",yref="paper",showarrow=False,font=dict(size=15,family="Courier New, monospace"),bgcolor="lightyellow",borderwidth=5),dict( text='Total Sales(In Millions)',x=0.9,y=1.02,xref="paper",yref="paper",showarrow=False,font=dict(size=15,family='Courier New, monospace'),bgcolor="lightyellow",borderwidth=5)],
                  )
iplot(go.Figure(data=data, layout=layout))

### Now with extra dimensions!

#### We can create a 3D visualization data by Rank, Publisher and Release year.

In [30]:
df1000=dataframe.iloc[:100,:]

df1000["normsales"] = (df1000["Global_Sales"] - np.min(df1000["Global_Sales"]))/(np.max(df1000["Global_Sales"])-np.min(df1000["Global_Sales"]))

df1000.Rank=df1000.Rank.astype("str")
df1000.Global_Sales=df1000.Global_Sales.astype("str")
trace1 = go.Scatter3d(
    y=df1000["Publisher"],
    x=df1000["Year"],
    z=df1000["normsales"],
    text="Name:"+ df1000.Name +","+" Rank:" + df1000.Rank + " Global Sales: " + df1000["Global_Sales"] +" millions",
    mode='markers',
    marker=dict(
        size=df1000['NA_Sales'],
        color = df1000['normsales'],
        colorscale = "Rainbow",
        colorbar = dict(title = 'Global Sales'),
        line=dict(color='rgb(140, 140, 170)'),
       
    )
)

data=[trace1]

layout=go.Layout(height=800, width=800, title='Top 1000 Video Games, Release Years, Publishers and Sales',
            titlefont=dict(color='rgb(20, 24, 54)'),
            scene = dict(xaxis=dict(title='Year',
                                    titlefont=dict(color='rgb(20, 24, 54)')),
                            yaxis=dict(title='Publisher',
                                       titlefont=dict(color='rgb(20, 24, 54)')),
                            zaxis=dict(title='Global Sales',
                                       titlefont=dict(color='rgb(20, 24, 54)')),
                            bgcolor = 'whitesmoke'
                           ))
 
fig=go.Figure(data=data, layout=layout)
iplot(fig)

### Analyze Starcraft II replays with Jupyter Notebooks

#### https://developer.ibm.com/technologies/analytics/patterns/analyze-starcraft-ii-replays-with-jupyter-notebooks/

#### https://github.com/IBM/starcraft2-replay-analysis

### The best Mario Kart character according to data science

#### https://medium.com/civis-analytics/the-best-mario-kart-character-according-to-data-science-7dfb65d4c18e

### Join the Python community!

#### Slack:
#### #pythonmeetupsgdl (Workspace: Campus Tecnológico GDL)

### Thank you

#### alan.verdugo.munoz1@ibm.com

#### Github: https://github.com/alanverdugo/

#### This notebook: https://github.com/alanverdugo/game_data_analysis