# Statistical Computation with pandas
**pandas** is, after all, a package of data analysis tools. There are various actions you can do to get the most out of your data without having to actually calculate something by yourself.

And the Titanic data is used here again.

In [2]:
import pandas as pd
titanic = pd.read_csv("titanic.csv")
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Some aggregating data can be acquired by calling some convenient methods.

In [3]:
titanic["Age"].mean()
titanic[["Age", "Fare"]].median()
# This one gives out a lot of useful data!
titanic[["Age", "Fare"]].describe()

Unnamed: 0,Age,Fare
count,714.0,891.0
mean,29.699118,32.204208
std,14.526497,49.693429
min,0.42,0.0
25%,20.125,7.9104
50%,28.0,14.4542
75%,38.0,31.0
max,80.0,512.3292


You can be *precise* about that (and your computing power) as well.

In [4]:
titanic.agg(
    {
        "Age": ["min", "max", "median", "skew"],
        "Fare": ["min", "max", "median", "mean"],
    }
)

Unnamed: 0,Age,Fare
max,80.0,512.3292
mean,,32.204208
median,28.0,14.4542
min,0.42,0.0
skew,0.389108,


You can also do some grouping on your data before aggregating.
In short, you can do **split-apply-combine** pattern:

* **Split** the data into groups
* **Apply** a function independently on each group
* **Combine** the result back together into a data structure

In [5]:
titanic[["Sex", "Age"]].groupby("Sex").mean()

Unnamed: 0_level_0,Age
Sex,Unnamed: 1_level_1
female,27.915709
male,30.726645


However, getting the average value of *some columns* may not make much sense. So you can just select some columns to compute instead.

In [6]:
titanic.groupby("Sex")["Age"].mean()
titanic.groupby(["Sex", "Pclass"])["Fare"].mean()

Sex     Pclass
female  1         106.125798
        2          21.970121
        3          16.118810
male    1          67.226127
        2          19.741782
        3          12.661633
Name: Fare, dtype: float64

You can do something besides manipulating the data, such as counting them.

In [7]:
titanic["Pclass"].value_counts()
# Same as...
titanic.groupby("Pclass")["Pclass"].count()

Pclass
1    216
2    184
3    491
Name: Pclass, dtype: int64