### __Python_DataWranglingPandasPivotTables__

Pandas also offers pivot tables as an alternative method for grouping and analyzing data.

Pivot tables are a great tool for synthesizing data sets and exploring their different dimensions. They're very popular in spreadsheet applications like Excel, but creating them programmatically with pandas is even more impressive.

Let's take another look at the video game sales data below:

In [2]:
import pandas as pd

df = pd.read_csv('DataSets/vg_sales.csv')
print(df.head())

                       name platform  year_of_release         genre publisher  \
0                Wii Sports      Wii           2006.0        Sports  Nintendo   
1         Super Mario Bros.      NES           1985.0      Platform  Nintendo   
2            Mario Kart Wii      Wii           2008.0        Racing  Nintendo   
3         Wii Sports Resort      Wii           2009.0        Sports  Nintendo   
4  Pokemon Red/Pokemon Blue       GB           1996.0  Role-Playing  Nintendo   

  developer  na_sales  eu_sales  jp_sales  critic_score user_score  
0  Nintendo     41.36     28.96      3.77          76.0          8  
1       NaN     29.08      3.58      6.81           NaN        NaN  
2  Nintendo     15.68     12.76      3.79          82.0        8.3  
3  Nintendo     15.61     10.93      3.28          80.0          8  
4       NaN     11.27      8.89     10.22           NaN        NaN  


Let's say we want to determine the total sales in Europe for each genre on each platform. __Pivot tables__ offer a quick and convenient way to do this. First, let's examine the code and then comment it out. To simplify things, we'll delete rows with missing values.

In [2]:
import pandas as pd

df = pd.read_csv('DataSets/vg_sales.csv')
df.dropna(inplace=True)

pivot_data = df.pivot_table(index='genre',
                            columns='platform',
                            values='eu_sales',
                            aggfunc='sum'
                           )
print(pivot_data)
print()
print(type(pivot_data))

platform       3DS    DC     DS    GBA    GC     PC     PS    PS2    PS3  \
genre                                                                      
Action        8.50   NaN  13.98   9.13  6.48  15.32  21.61  64.09  94.27   
Adventure     0.50  0.24   3.11   1.36  1.04   1.60   0.35   3.38   6.34   
Fighting      0.84  0.00   0.28   0.90  2.79   0.11   5.83  16.07  13.69   
Misc          1.21   NaN  25.46   2.86  2.15   1.19   1.73  14.57   8.24   
Platform      8.41  0.00  14.17  12.57  5.62   0.33   6.09  17.13   7.26   
Puzzle        1.05   NaN  19.86   1.98  0.76   0.13   0.10   1.57   0.04   
Racing        4.45  0.00   8.49   4.03  2.32   2.59  14.46  38.73  28.12   
Role-Playing  3.10  0.00   6.04   3.99  2.37  24.32   8.70  16.36  16.81   
Shooter       0.27  0.00   0.44   0.69  2.79  18.28   2.34  31.91  65.43   
Simulation    4.40  0.00  13.17   0.58  1.74  22.35   0.79  10.44   2.82   
Sports        0.61  0.05   2.27   2.73  4.40   6.47   7.32  51.50  30.88   
Strategy    

We created a pivot table using the method named pivot_table() method. The parameters we used were:

- index=: the column whose values ​​become indexes in the pivot table;

- columns=: the column whose values ​​become columns in the pivot table;

- values=: the column whose values ​​we want to aggregate in the pivot table;

- aggfunc=: the aggregation function we want to apply to the values ​​in each group of rows and columns.

Each cell in the pivot table above contains the total sales in Europe for each particular genre/platform combination. We also printed the data type of the pivot table to show that it's a Pandas DataFrame, which we're already very familiar with.

Using a pivot table here is convenient because it allows us to easily exclude all the columns from df that we're not interested in for our analysis. It can also be easier to read than the equivalent output from groupby(), as you can see below.

In [3]:
groupby_data = df.groupby(['genre', 'platform'])['eu_sales'].mean()
print(groupby_data)
print()
print(type(groupby_data))

genre     platform
Action    3DS         0.146552
          DS          0.103556
          GBA         0.090396
          GC          0.076235
          PC          0.113481
                        ...   
Strategy  Wii         0.037500
          WiiU        0.320000
          X360        0.107368
          XB          0.027647
          XOne        0.030000
Name: eu_sales, Length: 197, dtype: float64

<class 'pandas.core.series.Series'>


A very different format, right? Also note that the result of __groupby() returns a Series object__, while __pivot_table() returns a DataFrame__. Whether you choose to use groupby() or pivot_table() depends on your personal preferences, and over time, you'll develop an intuition about which tool is best for the task at hand.

##### __Excercise 01__

We've filtered the video game dataset to only contain games released in 2000 or later. Create a pivot table from the filtered dataset containing the average value for sales in Japan for each combination of genre and release year.

- The genres will serve as indexes.

- The pivot table columns will be the release years.

- Use the corresponding column as the values ​​to be aggregated.

- Use the appropriate aggregation function.

- Assign the result to a variable named df_pivot and then display it.

In [6]:
import pandas as pd

df = pd.read_csv('DataSets/vg_sales.csv')
df = df[df['year_of_release'] >= 2000]

print(df.columns)
print()

df_pivot = df.pivot_table(index='genre',
                            columns='year_of_release',
                            values='jp_sales',
                            aggfunc='mean'
                           )

print(df_pivot)

Index(['name', 'platform', 'year_of_release', 'genre', 'publisher',
       'developer', 'na_sales', 'eu_sales', 'jp_sales', 'critic_score',
       'user_score'],
      dtype='object')

year_of_release    2000.0    2001.0    2002.0    2003.0    2004.0    2005.0  \
genre                                                                         
Action           0.085000  0.089403  0.040800  0.029097  0.038560  0.032917   
Adventure        0.069375  0.050952  0.076905  0.035833  0.032368  0.017619   
Fighting         0.105172  0.151667  0.058148  0.067045  0.034359  0.071395   
Misc             0.138500  0.048462  0.064444  0.067547  0.028588  0.069217   
Platform         0.106667  0.082326  0.058701  0.025517  0.092576  0.013494   
Puzzle           0.081667  0.046667  0.008500  0.142500  0.102000  0.188182   
Racing           0.026744  0.056338  0.001942  0.025963  0.028028  0.057500   
Role-Playing     0.544828  0.232927  0.248000  0.200625  0.224833  0.129859   
Shooter          0.010000