# Code Example: Computing Marginal Distribution in Pandas

If you already have your data in a pandas data frame, computing a marginal distribution is similar to computing the multivariate distributions before.
As with the pure Python example above, the details vary based on whether you are starting from raw data rows or a probability distribution.

For your reference, here is the previous code for computing the multivariate distribution.


In [None]:
(abalone.groupby(list(abalone.columns)).size() / len(abalone)).rename("probability").reset_index()

Unnamed: 0,Sex,Length,Diameter,Height,Whole_weight,Shucked_weight,Viscera_weight,Shell_weight,Rings,probability
0,F,0.275,0.195,0.070,0.0800,0.0310,0.0215,0.0250,5,0.000239
1,F,0.290,0.210,0.075,0.2750,0.1130,0.0675,0.0350,6,0.000239
2,F,0.290,0.225,0.075,0.1400,0.0515,0.0235,0.0400,5,0.000239
3,F,0.305,0.225,0.070,0.1485,0.0585,0.0335,0.0450,7,0.000239
4,F,0.305,0.230,0.080,0.1560,0.0675,0.0345,0.0480,7,0.000239
...,...,...,...,...,...,...,...,...,...,...
4172,M,0.770,0.605,0.175,2.0505,0.8005,0.5260,0.3550,11,0.000239
4173,M,0.770,0.620,0.195,2.5155,1.1155,0.6415,0.6420,12,0.000239
4174,M,0.775,0.570,0.220,2.0320,0.7350,0.4755,0.6585,17,0.000239
4175,M,0.775,0.630,0.250,2.7795,1.3485,0.7600,0.5780,12,0.000239


**Code Notes:**
* All that needs to change here is which columns are grouped on.

In [None]:
(abalone.groupby(["Length", "Diameter"]).size() / len(abalone)).rename("probability").reset_index()


Unnamed: 0,Length,Diameter,probability
0,0.075,0.055,0.000239
1,0.110,0.090,0.000239
2,0.130,0.095,0.000239
3,0.130,0.100,0.000239
4,0.135,0.130,0.000239
...,...,...,...
1175,0.775,0.630,0.000239
1176,0.780,0.600,0.000239
1177,0.780,0.630,0.000239
1178,0.800,0.630,0.000239


If you already have a pandas data frame of probabilities, the calculation is even easier.

In [None]:
abalone_length_diameter_probabilities = (abalone.groupby(["Length", "Diameter"]).size() / len(abalone)).rename("probability").reset_index()


In [None]:
abalone_length_diameter_probabilities.groupby("Length")["probability"].sum().reset_index()

Unnamed: 0,Length,probability
0,0.075,0.000239
1,0.110,0.000239
2,0.130,0.000479
3,0.135,0.000239
4,0.140,0.000479
...,...,...
129,0.770,0.000718
130,0.775,0.000479
131,0.780,0.000479
132,0.800,0.000239


**Code Notes:**
* This calculation is just grouping on the new columns and summing the probabilities.
* The probability column is selected by indexing after the group by. This drops any extra columns that are not being marginalized, such as the diameter in this example.
* As before, the `reset_index` method is used to keep length as a column instead of the index.
  * Don't worry about the index for now.
  * You will learn when and how to use it in module 2.
