In [1]:
import pandas as pd
import numpy as np

# Groupby

As I pointed out in the first part of this lesson, tidy data is only useful if we have tools that work with it in a consistent and reproducable manner. One such tools is a `groupby` method of `DataFrame`, which provides a powerful interface to apply any operation based on groupping variables, and we will talk about it in detail in the current section.

It turns out that very frequently we need to do some operation based on a groupping variable. A common example is calculating mean of each group (e.g. performance of each subject, or performance on each type of stimuli, etc). This can be thought of as making 3 separate actions:
- Splitting the data based on a groupping variable(s)
- Applying a function to each group separately
- Combining the resulting values back together

Based on these 3 actions, this approach is called *Split-Apply-Combine* (SAC) [1].

[1] Wickham, Hadley. "The split-apply-combine strategy for data analysis." Journal of Statistical Software 40.1 (2011): 1-29.

<img src="http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/figures/03.08-split-apply-combine.png"></img>
From ["Aggregation and groupping" chapter](http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.08-Aggregation-and-Grouping.ipynb) of ["Python Data Science Handbook"](http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb) by Jake VanderPlas

A lot of operations on data can be thought of as SAC operations. These include calculating sums, means, standard deviations and other parameters of the groups' distributions; transfromations of data, such as normalization or detrending; plotting based on group, e.g. boxplots; and many other. (Some operations cannot be thought of as purely SAC, most prominently those in which data from the same group is used several times, e.g. rolling window means.)

A traditional way of doing these operations in include loops, where on each iteration a subset of data is selected and processed. Loops, however, are slow and usually require a lot of code, which makes them difficult to read, and are not easily extendible from 1 to several groupping variables.

`Groupby` is a method of `DataFrames` which makes any SAC operation easy to perform and read.

>**Note**: Tidy data is an the most convenient form for making SAC operations, because you always have access to any combination of your groupping variables due to them being always separated in columns.

Let's see a toy example of using a `groupby` operation instead of a loop.

In [64]:
df = pd.DataFrame({'group': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data': range(6)})
df

Unnamed: 0,data,group
0,0,A
1,1,B
2,2,C
3,3,A
4,4,B
5,5,C


Let's say I want to calculate a sum of `data` column, based on `group` variable and save it in a `Series`. I can do it with a loop:

In [65]:
result = pd.Series()

groups = df['group'].unique()
for g in groups:
    data = df.loc[df['group']==g, 'data']
    result[g] = np.sum(data)

result

A    3
B    5
C    7
dtype: int64

This code does the job, but it is quite long. If I try to shorten it, it will become very difficult to read:

In [66]:
result = pd.Series()
for g in df['group'].unique():
    result[g] = np.sum(df.loc[df['group']==g, 'data'])

result

A    3
B    5
C    7
dtype: int64

Now let's try to do the same thing with `groupby`:

In [67]:
df.groupby('group')['data'].sum()

group
A    3
B    5
C    7
Name: data, dtype: int32

See that it is really short and concise and readable. Moreover, let's say I have a more complicated example with several groupping variables:

In [71]:
df = pd.DataFrame({'group1': ['A', 'B', 'C']*3,
                   'group2': ['A']*4 + ['B']*1 + ['C']*4,
                   'data': range(9)})
df

Unnamed: 0,data,group1,group2
0,0,A,A
1,1,B,A
2,2,C,A
3,3,A,A
4,4,B,B
5,5,C,C
6,6,A,C
7,7,B,C
8,8,C,C


Trying to calculate a sum based on these several groups requires significantly more code with loops. With `groupby` it is as easy as adding another groupping variable in the `groupby` attributes:

In [73]:
result = df.groupby(['group1','group2'])['data'].sum()
result

group1  group2
A       A          3
        C          6
B       A          1
        B          4
        C          7
C       A          2
        C         13
Name: data, dtype: int32

>**Pro-tip**: You may notice that in the resulting `Series` index has 2 levels: `group1` and `group2`. This is referred to as *Hierarchical index* or `MultiIndex`, and is a way to stack several dimensions of data. We won't go much into the details of `MultiIndex` (if you wish to learn more, you may refer to [this section](http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.05-Hierarchical-Indexing.ipynb) of [Python Data Science Handbook](http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb) and to [MultiIndex](http://pandas.pydata.org/pandas-docs/stable/advanced.html) section of `pandas` documentation. For our purposes we just need to know 2 things: how to index a `MultiIndex` and how to *unstack* dimensions to turn it into a table:

In [76]:
# get an element with group1 = A and group2 = C
result[('A','C')]

6

In [79]:
# unstack levels of multiindex (turn one of them into a column)
result.unstack()

group2,A,B,C
group1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,3.0,,6.0
B,1.0,4.0,7.0
C,2.0,,13.0


Overall, `groupby` is an extremely useful tool for making group-based operations quickly and more readible. Let's see some concrete examples of how you can use it. We will work on the data in the food preferences task provided by Paolo Garlasco. Let's load it first and do some cleanup:

In [234]:
df = pd.read_csv('data/Paolo.csv')
# drop old index column
df.drop('Unnamed: 0', axis='columns', inplace=True)
df['cond'].replace({1: 'high vs high', 2: 'low vs low', 
                    3: 'high vs low', 4: 'low vs high'}, inplace=True)
df['congr'].replace({0: 'same', 1: 'different'}, inplace=True)
df['session'].replace({0: 'fed', 1: 'hungry'}, inplace=True)
print(df.shape)
df.head()

(12460, 10)


Unnamed: 0,item,subj_num,session,pref_b,freq,cal,cond,congr,response,rt
0,ciliegie,12,hungry,8,7,38,low vs low,different,1,559.000015
1,anguria-02,12,hungry,4,3,16,low vs high,same,1,496.999979
2,caramelle,12,hungry,9,2,394,high vs high,different,0,496.999979
3,melone-01,12,hungry,7,5,33,low vs low,different,1,575.000048
4,ananas,12,hungry,4,3,40,low vs low,different,0,512.000084


The data contains 4 subjects:

In [83]:
df['subj_num'].unique()

array([12,  3,  6,  8], dtype=int64)

Let's calculate mean reaction time for each subject:

In [86]:
df.groupby('subj_num')['rt'].mean()

subj_num
3     759.796249
6     782.063034
8     908.831453
12    562.563121
Name: rt, dtype: float64

Subjects also seem to have more that 1 session, so we might want to compute mean for each session separately:

In [88]:
rt_subject_session = df.groupby(['subj_num','session'])['rt'].mean()
rt_subject_session

subj_num  session
3         0           709.919997
          1           809.672502
6         0           750.098065
          1           811.027536
8         0          1022.428570
          1           808.440000
12        0           622.272497
          1           502.853746
Name: rt, dtype: float64

# <font color='DarkSeaGreen '>Exercise</font>
In the cell below calculate mean response for each food item.



As we saw above, `pandas` provides shortcuts to applying some frequent functions, such as `mean()`, `std()`, `count()`, `min()`, `max()`. However, we can apply any function to the groups. to do that, there are 3 methods: `aggregate()`, `transform()` and `apply()`. Each of these methods require a function (the one you want to apply to the data) as an argument.

## Aggregate
`aggregate()` can apply any function, which returns a single value for each group (in other words, it *aggregates* a group to a single value). This is what mean, std, count, min, max, and others are. Instead of writing `df.groupby('subj_num')['rt'].mean()` we could've passed a `np.mean` function to calculate means:

In [106]:
df.groupby('subj_num')['rt'].aggregate(np.mean)

subj_num
3     759.796249
6     782.063034
8     908.831453
12    562.563121
Name: rt, dtype: float64

You can also specify several functions in a list, and `aggregate()` will return results of all of them in a neat table:

In [108]:
df.groupby('subj_num')['rt'].aggregate([np.mean, np.std, np.median])

Unnamed: 0_level_0,mean,std,median
subj_num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
3,759.796249,247.464557,701.999903
6,782.063034,311.811562,671.000004
8,908.831453,402.683958,779.999971
12,562.563121,178.90862,515.000105


More importantly, you can create any function and pass it to `aggregate()` and the function will be applied to each group. The only limitation is that code will assume that the function returns a single value, e.g. calculate half of mean:

In [104]:
def hafl_mean(x):
    """Calculate half of the mean"""
    mean = np.mean(x)
    return mean/2

df.groupby('subj_num')['rt'].aggregate(hafl_mean)

subj_num
3     379.898125
6     391.031517
8     454.415726
12    281.281561
Name: rt, dtype: float64

## Transform
`transform()` works exactly like `aggregate()`, but it expects a function to return an a `Series` or and `array` of the same size as input. It will handle the cases when you want to tranform the data. For example, we could subtract the mean reaction time for each subject:

In [102]:
def subtract_mean(x):
    return x - np.mean(x)

df['rt_minus_mean'] = df.groupby('subj_num')['rt'].transform(subtract_mean)
df.head()

Unnamed: 0,item,subj_num,session,pref_b,freq,cal,cond,congr,response,rt,rt_z,rt_minus_mean
0,ciliegie,12,1,8,7,38,2,1,1,559.000015,-0.019919,-3.563106
1,anguria-02,12,1,4,3,16,4,0,1,496.999979,-0.366519,-65.563142
2,caramelle,12,1,9,2,394,1,1,0,496.999979,-0.366519,-65.563142
3,melone-01,12,1,7,5,33,2,1,1,575.000048,0.069526,12.436926
4,ananas,12,1,4,3,40,2,1,0,512.000084,-0.282664,-50.563038


# <font color='DarkSeaGreen '>Exercise</font>
In the cell below calculate standard score (*z-score*) on reaction time for each subject using `groupby` and `transform`. Save z scores to a new column. 

See which 10 items require highest reaction times on average in all subjects.

All the other case, which don't fall within `aggregate` and `transform` can be handled by `apply` method. In reality, `apply` can act as both `aggregate` and `transform` in most circumstances, but it is slower (because it cannot assume output shape) and can cannot do certain things, for example, aggregate several functions at once like `aggregate` method can.

## Looping with groupby
`groupby`-`apply` combination let's us in general avoid loops, but sometimes you might still need to use them. For example, this can happen when you want to do plotting by group. `groupby` can also simplify that, because it supports iteration through itself. When you do it, on each iteration it will give 2 values: one for the name of the group (basically, groupping variable value) and the values of the group.

In [133]:
# assign a grouby object to a variable
groupped = df.groupby('subj_num')['rt']
# iterate through groupby object
for name, data in groupped:
    # groupping variable value
    print(name)
    # shape of the data: in this case the 'rt' values for each group
    print(data.shape)

3
(3200,)
6
(3046,)
8
(3014,)
12
(3200,)


# `DataFrame` and `Series` transformations
Now that you know the power of `groupby` and having data in a tidy format, let's talk about how to get there. In general, you should become comfortable with transforming your data to any shape you want, because the tools you might want to use, won't necessarily work with tidy data. `pandas` provides a lot of ways to tranform `Series` and `DataFrame` objects.

## `Set`, `reset` index
Index is very useful for retrieving values, but also for other things. For example, as we will see in the visualization lesson, when plotting a `Series`, index will be automatically assumed to be the X axis, and the values will become the Y axis. This is useful for quick exploratory visualization.

Main methods to interact with the index are `set_index()` and `reset_index()`. First takes a column and makes it into a new index:

In [147]:
df_items = df.set_index('item')
df_items.head()

Unnamed: 0_level_0,subj_num,session,pref_b,freq,cal,cond,congr,response,rt,rt_z,rt_minus_mean
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
ciliegie,12,1,8,7,38,2,1,1,559.000015,-0.019919,-3.563106
anguria-02,12,1,4,3,16,4,0,1,496.999979,-0.366519,-65.563142
caramelle,12,1,9,2,394,1,1,0,496.999979,-0.366519,-65.563142
melone-01,12,1,7,5,33,2,1,1,575.000048,0.069526,12.436926
ananas,12,1,4,3,40,2,1,0,512.000084,-0.282664,-50.563038


In [146]:
# if you say append=True, you can keep the old index too, which will result in a MultiIndex
df.set_index('item', append=True).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,subj_num,session,pref_b,freq,cal,cond,congr,response,rt,rt_z,rt_minus_mean
Unnamed: 0_level_1,item,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0,ciliegie,12,1,8,7,38,2,1,1,559.000015,-0.019919,-3.563106
1,anguria-02,12,1,4,3,16,4,0,1,496.999979,-0.366519,-65.563142
2,caramelle,12,1,9,2,394,1,1,0,496.999979,-0.366519,-65.563142
3,melone-01,12,1,7,5,33,2,1,1,575.000048,0.069526,12.436926
4,ananas,12,1,4,3,40,2,1,0,512.000084,-0.282664,-50.563038


`reset_index()` will make the old index into a columns and instead create a new index, which has values from `0` to the number of rows minus 1:

In [148]:
# our DataFrame indexed by items
df_items.head()

Unnamed: 0_level_0,subj_num,session,pref_b,freq,cal,cond,congr,response,rt,rt_z,rt_minus_mean
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
ciliegie,12,1,8,7,38,2,1,1,559.000015,-0.019919,-3.563106
anguria-02,12,1,4,3,16,4,0,1,496.999979,-0.366519,-65.563142
caramelle,12,1,9,2,394,1,1,0,496.999979,-0.366519,-65.563142
melone-01,12,1,7,5,33,2,1,1,575.000048,0.069526,12.436926
ananas,12,1,4,3,40,2,1,0,512.000084,-0.282664,-50.563038


In [149]:
# let's reset index
df_items.reset_index().head()

Unnamed: 0,item,subj_num,session,pref_b,freq,cal,cond,congr,response,rt,rt_z,rt_minus_mean
0,ciliegie,12,1,8,7,38,2,1,1,559.000015,-0.019919,-3.563106
1,anguria-02,12,1,4,3,16,4,0,1,496.999979,-0.366519,-65.563142
2,caramelle,12,1,9,2,394,1,1,0,496.999979,-0.366519,-65.563142
3,melone-01,12,1,7,5,33,2,1,1,575.000048,0.069526,12.436926
4,ananas,12,1,4,3,40,2,1,0,512.000084,-0.282664,-50.563038


These two methods make working with index very dynamic -- you can set it and reset it to become a normal column again whenever you need. You can also set several columns (pass them as a list to `set_index`) and create a `MultiIndex`.

## Melt
The concept of melting is related to tidying the data. `melt` function takes all columns of the `DataFrame` and creates 2 columns from them: one with groupping variable (former name of the column) and another with the value variable. If applied correctly, the resulting *molten* `DataFrame` will be tidy.

Let's see a toy example:

In [174]:
untidy = pd.DataFrame({'treatment_a':[np.nan, 16, 3],'treatment_b':[2,11,1]})
untidy

Unnamed: 0,treatment_a,treatment_b
0,,2
1,16.0,11
2,3.0,1


In [176]:
# let's melt
pd.melt(untidy)

Unnamed: 0,variable,value
0,treatment_a,
1,treatment_a,16.0
2,treatment_a,3.0
3,treatment_b,2.0
4,treatment_b,11.0
5,treatment_b,1.0


Note how the data is reshaped. What were the names of the columns in the untidy `DataFrame` (`treatment_a` and `treatment_b`) are now the groupping variable. The values inside the table are now all in the single "value" column.

In [177]:
# you can also specify the names of the resulting columns
pd.melt(untidy, var_name='treatment', value_name='measurement')

Unnamed: 0,treatment,measurement
0,treatment_a,
1,treatment_a,16.0
2,treatment_a,3.0
3,treatment_b,2.0
4,treatment_b,11.0
5,treatment_b,1.0


Frequently you want to melt only certain columns, because some are already groupping variable. Specify them as `id_vars` in the `melt` function and they will not be changed:

In [163]:
# in this example "person" is already a separated variable
untidy = pd.DataFrame({'treatment_a':[np.nan, 16, 3],'treatment_b':[2,11,1], 
                      'person':['John Smith', 'Jane Doe','Mary Johnson']})
untidy

Unnamed: 0,person,treatment_a,treatment_b
0,John Smith,,2
1,Jane Doe,16.0,11
2,Mary Johnson,3.0,1


In [164]:
pd.melt(untidy, id_vars='person', var_name='treatment', value_name='measurement')

Unnamed: 0,person,treatment,measurement
0,John Smith,treatment_a,
1,Jane Doe,treatment_a,16.0
2,Mary Johnson,treatment_a,3.0
3,John Smith,treatment_b,2.0
4,Jane Doe,treatment_b,11.0
5,Mary Johnson,treatment_b,1.0


Let's see another example, taken directly from the [lesson on tidy data](http://nbviewer.jupyter.org/github/antopolskiy/sciprog/blob/master/002_data_organization_00_slides.ipynb):

In [171]:
income_untidy = pd.read_csv('data\pew.csv')
print(income_untidy.shape)
income_untidy.head()

(18, 11)


Unnamed: 0,religion,<$10k,$10-20k,$20-30k,$30-40k,$40-50k,$50-75k,$75-100k,$100-150k,>150k,Don't know/refused
0,Agnostic,27,34,60,81,76,137,122,109,84,96
1,Atheist,12,27,37,52,35,70,73,59,74,76
2,Buddhist,27,21,30,34,33,58,62,39,53,54
3,Catholic,418,617,732,670,638,1116,949,792,633,1489
4,Don’t know/refused,15,14,15,11,10,35,21,17,18,116


In this case all columns except for `religion` have the same variable (count of people who belongs to this group), so we keep `religion` and melp all other columns:

In [173]:
income_tidy = pd.melt(income_untidy,id_vars='religion',var_name='income',value_name='count')
print(income_tidy.shape)
income_tidy.head()

(180, 3)


Unnamed: 0,religion,income,count
0,Agnostic,<$10k,27
1,Atheist,<$10k,12
2,Buddhist,<$10k,27
3,Catholic,<$10k,418
4,Don’t know/refused,<$10k,15


# Pivot table
Pivoting is another way of transforming the `DataFrames`, which is usually used to tranform a tidy `DataFrame` in some other form. For example, it can be used as if to undo melting. Using method `pivot_table` is easy: simply think about which column you want to have as and index and which as columns.

In [186]:
# molten dataframe
income_tidy.head()

Unnamed: 0,religion,income,count
0,Agnostic,<$10k,27
1,Atheist,<$10k,12
2,Buddhist,<$10k,27
3,Catholic,<$10k,418
4,Don’t know/refused,<$10k,15


In [198]:
# pivoting to undo melting
income_tidy.pivot_table(columns='income', index='religion')

Unnamed: 0_level_0,count,count,count,count,count,count,count,count,count,count
income,$10-20k,$100-150k,$20-30k,$30-40k,$40-50k,$50-75k,$75-100k,<$10k,>150k,Don't know/refused
religion,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Agnostic,34,109,60,81,76,137,122,27,84,96
Atheist,27,59,37,52,35,70,73,12,74,76
Buddhist,21,39,30,34,33,58,62,27,53,54
Catholic,617,792,732,670,638,1116,949,418,633,1489
Don’t know/refused,14,17,15,11,10,35,21,15,18,116
Evangelical Prot,869,723,1064,982,881,1486,949,575,414,1529
Hindu,9,48,7,9,11,34,47,1,54,37
Historically Black Prot,244,81,236,238,197,223,131,228,78,339
Jehovah's Witness,27,11,24,24,21,30,15,20,6,37
Jewish,19,87,25,25,30,95,69,19,151,162


But pivoting can achieve much more than that. Let's look at another example. This dataset contains number of births for each day from 1969 to 2008:

In [202]:
births = pd.read_csv('data/births.csv')
births.head()

Unnamed: 0,year,month,day,gender,births
0,1969,1,1,F,4046
1,1969,1,1,M,4440
2,1969,1,2,F,4454
3,1969,1,2,M,4548
4,1969,1,3,F,4548


Let's say we want to calculate the total number of births for each year for boys and girls to see how the gender proportions change over the years. We could achieve it with `groupby`:

In [206]:
births.groupby(['year','gender'])['births'].sum().head(10)

year  gender
1969  F         1753634
      M         1846572
1970  F         1819164
      M         1918636
1971  F         1736774
      M         1826774
1972  F         1592347
      M         1673888
1973  F         1533102
      M         1613023
Name: births, dtype: int64

We could then use `unstack` on the resulting `Series` to create a nice table:

In [207]:
births.groupby(['year','gender'])['births'].sum().unstack()

gender,F,M
year,Unnamed: 1_level_1,Unnamed: 2_level_1
1969,1753634,1846572
1970,1819164,1918636
1971,1736774,1826774
1972,1592347,1673888
1973,1533102,1613023
1974,1543005,1627626
1975,1535546,1618010
1976,1547613,1628863
1977,1623363,1708796
1978,1626324,1711976


Pivot table can do the same and in some cases can be more readable, because when we pivot we don't need to think about groupping, but instead we think about what kind of table we want to get in the end. In this case I think to myself: "I want *year* to be the index, *genders* will be the columns. I will take the *births* columns and *sum* them up for each resulting group". The syntax of `pivot_table` repeats this thinking almost exactly:

In [244]:
births_year_gender = births.pivot_table(index='year', columns='gender', values='births', aggfunc=np.sum)
births_year_gender

gender,F,M
year,Unnamed: 1_level_1,Unnamed: 2_level_1
1969,1753634,1846572
1970,1819164,1918636
1971,1736774,1826774
1972,1592347,1673888
1973,1533102,1613023
1974,1543005,1627626
1975,1535546,1618010
1976,1547613,1628863
1977,1623363,1708796
1978,1626324,1711976


Let's see another example on Paolo's food preference data.

In [252]:
df.head()

Unnamed: 0,item,subj_num,session,pref_b,freq,cal,cond,congr,response,rt
0,ciliegie,12,hungry,8,7,38,low vs low,different,1,559.000015
1,anguria-02,12,hungry,4,3,16,low vs high,same,1,496.999979
2,caramelle,12,hungry,9,2,394,high vs high,different,0,496.999979
3,melone-01,12,hungry,7,5,33,low vs low,different,1,575.000048
4,ananas,12,hungry,4,3,40,low vs low,different,0,512.000084


I want to create a table with mean reaction times with rows being session type and columns being the condition.

In [254]:
df.pivot_table(values='rt', index='session', columns='cond', aggfunc=np.mean)

cond,high vs high,high vs low,low vs high,low vs low
session,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
fed,798.722372,786.981337,753.064153,737.271185
hungry,764.59684,728.000004,715.874999,725.789469


# <font color='DarkSeaGreen '>Exercise</font>
Using `births` dataset, create a table in which there would be total number of births for each month for each year. Do it using `groupby-aggregate-unstack` and using `pivot_table`.

# <font color='DarkSeaGreen '>Exercise</font>
Using food preference dataset, create a table in which the index would be items, columns would be session type and the values would be mean response.