# Advanced `pandas`

The following notebook is dedicated to more advanved opeartions in Pandas:

- `split-apply-combine` pipeline,
- operations on string columns (string operations, replacement),
- joins on Pandas dataframes.

In [18]:
%pylab inline
plt.style.use("bmh")

Populating the interactive namespace from numpy and matplotlib


In [19]:
import numpy as np
import pandas as pd

In [20]:
url_test='https://raw.githubusercontent.com/dsindy/kaggle-titanic/master/data/test.csv'
url_train='https://raw.githubusercontent.com/dsindy/kaggle-titanic/master/data/train.csv'
titanic_train = pd.read_csv(url_train, index_col="PassengerId")
titanic_test = pd.read_csv(url_test, index_col="PassengerId")


# titanic_train = pd.read_csv("train.csv", index_col="PassengerId")
# titanic_test = pd.read_csv("test.csv", index_col="PassengerId")
titanic = pd.concat([titanic_train, titanic_test], sort=False)

In [21]:
titanic.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0.0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1.0,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1.0,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1.0,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0.0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


# Joining Pandas dataframes (`JOIN` in Pandas)

We start with a synthetic example:

In [22]:
a = pd.DataFrame(np.arange(8).reshape((4,2)),
                 columns=["a", "b"],
                 index=["a", "b", "a", "b"])
b = pd.DataFrame(10 + np.arange(4).reshape((4,-1)),
                 columns=["d"],
                 index=["d", "b", "c", "b"])

In [23]:
a

Unnamed: 0,a,b
a,0,1
b,2,3
a,4,5
b,6,7


In [24]:
b

Unnamed: 0,d
d,10
b,11
c,12
b,13


In [25]:
a.join(b) # default is left join

Unnamed: 0,a,b,d
a,0,1,
a,4,5,
b,2,3,11.0
b,2,3,13.0
b,6,7,11.0
b,6,7,13.0


In [26]:
a.join(b, how="inner")

Unnamed: 0,a,b,d
b,2,3,11
b,2,3,13
b,6,7,11
b,6,7,13


In [27]:
a

Unnamed: 0,a,b
a,0,1
b,2,3
a,4,5
b,6,7


In [28]:
b

Unnamed: 0,d
d,10
b,11
c,12
b,13


In [29]:
b.join(a, how="right")

Unnamed: 0,d,a,b
a,,0,1
a,,4,5
b,11.0,2,3
b,13.0,2,3
b,11.0,6,7
b,13.0,6,7


In [30]:
a.join(b, how="outer")

Unnamed: 0,a,b,d
a,0.0,1.0,
a,4.0,5.0,
b,2.0,3.0,11.0
b,2.0,3.0,13.0
b,6.0,7.0,11.0
b,6.0,7.0,13.0
c,,,12.0
d,,,10.0


We can also perform join operation on multi-indexed dataframes:

In [None]:
c = pd.DataFrame(np.arange(8).reshape((4,2)),
                 columns=["a", "b"],
                 index=pd.MultiIndex.from_tuples([("a", "A"), ("b", "E"), ("a", "Y"), ("b", "R")],
                                                 names=("lower", "upper")))

In [None]:
c

In [None]:
a

In [None]:
c.join(a, on="lower")  # This one will fail

In [None]:
c.join(a, on="lower", rsuffix="_right")

# Joining dataframes for EDA

## Problem: get (almost) all couples on board 

In [34]:
titanic[["Name", "Sex"]].head()

Unnamed: 0_level_0,Name,Sex
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Braund, Mr. Owen Harris",male
2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female
3,"Heikkinen, Miss. Laina",female
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female
5,"Allen, Mr. William Henry",male


We start by noting the pattern: married females are listed as `<FAMILY_NAME>, Mrs. <HUSBANDS_FIRST_NAME> (<WIFES_FULL_NAME>)`. Let's play with it a bit:

In [69]:
family_names = (titanic
                .replace(re.compile(r'\s+\(.*\)'), '')
                .replace(re.compile("Mrs."), "Mr."))[["Name", "Sex"]]

In [70]:
family_names

Unnamed: 0_level_0,Name,Sex
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Braund, Mr. Owen Harris",male
2,"Cumings, Mr. John Bradley",female
3,"Heikkinen, Miss. Laina",female
4,"Futrelle, Mr. Jacques Heath",female
5,"Allen, Mr. William Henry",male
...,...,...
1305,"Spector, Mr. Woolf",male
1306,"Oliva y Ocana, Dona. Fermina",female
1307,"Saether, Mr. Simon Sivertsen",male
1308,"Ware, Mr. Frederick",male


Replacing wife's name altogether:

In [92]:
titanic.replace(re.compile(r'\s+\(.*\)'), '')

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0.0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
2,1.0,1,"Cumings, Mrs. John Bradley",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1.0,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
4,1.0,1,"Futrelle, Mrs. Jacques Heath",female,35.0,1,0,113803,53.1000,C123,S
5,0.0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...
1305,,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
1306,,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
1307,,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
1308,,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


In [None]:
family_names

We can now get passenger IDs and husbands names (not all of them are on board!) of all married women:

In [74]:
family_names = family_names[(family_names.Sex=="female") & family_names.Name.str.contains("Mr.")]

In [75]:
family_names.head()

Unnamed: 0_level_0,Name,Sex
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
2,"Cumings, Mr. John Bradley",female
4,"Futrelle, Mr. Jacques Heath",female
9,"Johnson, Mr. Oscar W",female
10,"Nasser, Mr. Nicholas",female
16,"Hewlett, Mr.",female


In [73]:
family_names.shape[0]

197

We now want to join this back to original dataframe (a very common pattern if you need some **pairs**):

In [76]:
family_names.reset_index().set_index("Name")["PassengerId"]

Name
Cumings, Mr. John Bradley              2
Futrelle, Mr. Jacques Heath            4
Johnson, Mr. Oscar W                   9
Nasser, Mr. Nicholas                  10
Hewlett, Mr.                          16
                                    ... 
McNamee, Mr. Neal                   1275
Lines, Mr. Ernest H                 1283
Smith, Mr. Lucien Philip            1287
Frolicher-Stehli, Mr. Maxmillian    1289
Minahan, Mr. William Edward         1303
Name: PassengerId, Length: 197, dtype: int64

In [77]:
titanic.join(family_names.reset_index().set_index("Name")["PassengerId"],
             on="Name", how="inner", rsuffix="_Spouse")

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,PassengerId
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
14,0.0,3,"Andersson, Mr. Anders Johan",male,39.0,1,5,347082,31.2750,,S,611
35,0.0,1,"Meyer, Mr. Edgar Joseph",male,28.0,1,0,PC 17604,82.1708,,C,376
36,0.0,1,"Holverson, Mr. Alexander Oskar",male,42.0,1,0,113789,52.0000,,S,384
63,0.0,1,"Harris, Mr. Henry Birkhardt",male,45.0,1,0,36973,83.4750,C83,S,231
93,0.0,1,"Chaffee, Mr. Herbert Fuller",male,46.0,1,0,W.E.P. 5734,61.1750,E31,S,906
...,...,...,...,...,...,...,...,...,...,...,...,...
1208,,1,"Spencer, Mr. William Augustus",male,57.0,1,0,PC 17569,146.5208,B78,C,32
1245,,2,"Herman, Mr. Samuel",male,49.0,1,2,220845,65.0000,,S,755
1258,,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C,579
1286,,3,"Kink-Heilmann, Mr. Anton",male,29.0,3,1,315153,22.0250,,S,1057


Note, that `PassengerId` **column** was not renamed!

In [89]:
couples = (titanic.join(family_names
                        .reset_index()
                        .set_index("Name")["PassengerId"],
                        on="Name", how="inner", rsuffix="_Spouse"))

In [79]:
couples

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,PassengerId
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
14,0.0,3,"Andersson, Mr. Anders Johan",male,39.0,1,5,347082,31.2750,,S,611
35,0.0,1,"Meyer, Mr. Edgar Joseph",male,28.0,1,0,PC 17604,82.1708,,C,376
36,0.0,1,"Holverson, Mr. Alexander Oskar",male,42.0,1,0,113789,52.0000,,S,384
63,0.0,1,"Harris, Mr. Henry Birkhardt",male,45.0,1,0,36973,83.4750,C83,S,231
93,0.0,1,"Chaffee, Mr. Herbert Fuller",male,46.0,1,0,W.E.P. 5734,61.1750,E31,S,906
...,...,...,...,...,...,...,...,...,...,...,...,...
1208,,1,"Spencer, Mr. William Augustus",male,57.0,1,0,PC 17569,146.5208,B78,C,32
1245,,2,"Herman, Mr. Samuel",male,49.0,1,2,220845,65.0000,,S,755
1258,,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C,579
1286,,3,"Kink-Heilmann, Mr. Anton",male,29.0,3,1,315153,22.0250,,S,1057


In [90]:
couples.rename({"PassengerId":"PassengerId_Spouse"},
               axis=1, inplace=True)

In [91]:
couples.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,PassengerId_Spouse
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
14,0.0,3,"Andersson, Mr. Anders Johan",male,39.0,1,5,347082,31.275,,S,611
35,0.0,1,"Meyer, Mr. Edgar Joseph",male,28.0,1,0,PC 17604,82.1708,,C,376
36,0.0,1,"Holverson, Mr. Alexander Oskar",male,42.0,1,0,113789,52.0,,S,384
63,0.0,1,"Harris, Mr. Henry Birkhardt",male,45.0,1,0,36973,83.475,C83,S,231
93,0.0,1,"Chaffee, Mr. Herbert Fuller",male,46.0,1,0,W.E.P. 5734,61.175,E31,S,906


In [82]:
couples = couples.join(titanic[["Name", "Age"]],
                       on="PassengerId_Spouse", rsuffix="_Spouse")

In [83]:
couples

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,PassengerId_Spouse,Name_Spouse,Age_Spouse
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
14,0.0,3,"Andersson, Mr. Anders Johan",male,39.0,1,5,347082,31.2750,,S,611,"Andersson, Mrs. Anders Johan (Alfrida Konstant...",39.0
35,0.0,1,"Meyer, Mr. Edgar Joseph",male,28.0,1,0,PC 17604,82.1708,,C,376,"Meyer, Mrs. Edgar Joseph (Leila Saks)",
36,0.0,1,"Holverson, Mr. Alexander Oskar",male,42.0,1,0,113789,52.0000,,S,384,"Holverson, Mrs. Alexander Oskar (Mary Aline To...",35.0
63,0.0,1,"Harris, Mr. Henry Birkhardt",male,45.0,1,0,36973,83.4750,C83,S,231,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",35.0
93,0.0,1,"Chaffee, Mr. Herbert Fuller",male,46.0,1,0,W.E.P. 5734,61.1750,E31,S,906,"Chaffee, Mrs. Herbert Fuller (Carrie Constance...",47.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1208,,1,"Spencer, Mr. William Augustus",male,57.0,1,0,PC 17569,146.5208,B78,C,32,"Spencer, Mrs. William Augustus (Marie Eugenie)",
1245,,2,"Herman, Mr. Samuel",male,49.0,1,2,220845,65.0000,,S,755,"Herman, Mrs. Samuel (Jane Laver)",48.0
1258,,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C,579,"Caram, Mrs. Joseph (Maria Elias)",
1286,,3,"Kink-Heilmann, Mr. Anton",male,29.0,3,1,315153,22.0250,,S,1057,"Kink-Heilmann, Mrs. Anton (Luise Heilmann)",26.0


In [85]:
titanic.Pclass.value_counts()

3    709
1    323
2    277
Name: Pclass, dtype: int64

In [84]:
couples.Pclass.value_counts()

1    41
2    23
3    22
Name: Pclass, dtype: int64

In [86]:
couples.Sex.value_counts()

male    86
Name: Sex, dtype: int64

In [87]:
(couples.Age - couples.Age_Spouse).groupby(couples.Pclass).agg(["min", "max", "mean", "median", "std", "count", "size"])

Unnamed: 0_level_0,min,max,mean,median,std,count,size
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,-40.0,14.0,2.805556,3.0,9.211234,36,41
2,-2.0,20.0,6.195652,4.0,6.793612,23,23
3,-4.0,12.0,3.472222,3.5,3.798112,18,22


In [88]:
couples[(couples.Age - couples.Age_Spouse)<0][["PassengerId_Spouse", "Name", "Age", "Name_Spouse", "Age_Spouse"]]

Unnamed: 0_level_0,PassengerId_Spouse,Name,Age,Name_Spouse,Age_Spouse
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
93,906,"Chaffee, Mr. Herbert Fuller",46.0,"Chaffee, Mrs. Herbert Fuller (Carrie Constance...",47.0
207,86,"Backstrom, Mr. Karl Alfred",32.0,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...",33.0
249,872,"Beckwith, Mr. Richard Leonard",37.0,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",47.0
315,441,"Hart, Mr. Benjamin",43.0,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",45.0
622,936,"Kimball, Mr. Edwin Nelson Jr",42.0,"Kimball, Mrs. Edwin Nelson Jr (Gertrude Parsons)",45.0
646,53,"Harper, Mr. Henry Sleeper",48.0,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",49.0
725,810,"Chambers, Mr. Norman Campbell",27.0,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",33.0
742,988,"Cavendish, Mr. Tyrell William",36.0,"Cavendish, Mrs. Tyrell William (Julia Florence...",76.0
861,1201,"Hansen, Mr. Claus Peter",41.0,"Hansen, Mrs. Claus Peter (Jennie L Howard)",45.0
993,134,"Weisz, Mr. Leopold",27.0,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",29.0


In [None]:
titanic.loc[742]

In [None]:
titanic.loc[988]

Although it's only heuristics, and we may need to dig deeper (e.g., to find some uncommon naming patterns), this is already something. Think on which features you may add to quantify a passenger (say, `is wife/husband on board?`, which may complement `SibSp`).

Think on how you may find entire **families**, and which features you may extract by knowing those. EDA is about your data driven creativity, so - play with it.

P. S. **not a single loop** above.

### Intermezzo: on self-joins

In [35]:
cabin_counts = titanic.Cabin.value_counts()
cabin_counts[cabin_counts>1]

C23 C25 C27        6
G6                 5
B57 B59 B63 B66    5
C22 C26            4
F33                4
                  ..
D15                2
E67                2
D35                2
C93                2
C125               2
Name: Cabin, Length: 79, dtype: int64

In [36]:
cabin_counts = cabin_counts[cabin_counts>1]

In [37]:
titanic.loc[titanic.Cabin.isin(cabin_counts.index), ["Name", "Cabin"]]#.merge(titanic, on="Cabin", how="inner")

Unnamed: 0_level_0,Name,Cabin
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",C85
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",C123
7,"McCarthy, Mr. Timothy J",E46
11,"Sandstrom, Miss. Marguerite Rut",G6
28,"Fortune, Mr. Charles Alexander",C23 C25 C27
...,...,...
1287,"Smith, Mrs. Lucien Philip (Mary Eloise Hughes)",C31
1289,"Frolicher-Stehli, Mrs. Maxmillian (Margaretha ...",B41
1292,"Bonnell, Miss. Caroline",C7
1299,"Widener, Mr. George Dunton",C80


In [43]:
cabins = (titanic
          .loc[titanic.Cabin.isin(cabin_counts.index),
               ["Name", "Cabin"]]
          .reset_index())

In [40]:
cabins.merge(cabins, on="Cabin", suffixes=("_first", "_second"))

Unnamed: 0,PassengerId_first,Name_first,Cabin,PassengerId_second,Name_second
0,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",C85,2,"Cumings, Mrs. John Bradley (Florence Briggs Th..."
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",C85,1126,"Cumings, Mr. John Bradley"
2,1126,"Cumings, Mr. John Bradley",C85,2,"Cumings, Mrs. John Bradley (Florence Briggs Th..."
3,1126,"Cumings, Mr. John Bradley",C85,1126,"Cumings, Mr. John Bradley"
4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",C123,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)"
...,...,...,...,...,...
499,1299,"Widener, Mr. George Dunton",C80,1299,"Widener, Mr. George Dunton"
500,1144,"Clark, Mr. Walter Miller",C89,1144,"Clark, Mr. Walter Miller"
501,1144,"Clark, Mr. Walter Miller",C89,1164,"Clark, Mrs. Walter Miller (Virginia McDowell)"
502,1164,"Clark, Mrs. Walter Miller (Virginia McDowell)",C89,1144,"Clark, Mr. Walter Miller"


In [41]:
companions = cabins.merge(cabins, on="Cabin", suffixes=("_first", "_second"))
companions = companions[companions.PassengerId_first != companions.PassengerId_second]

In [44]:
companions

Unnamed: 0,PassengerId_first,Name_first,Cabin,PassengerId_second,Name_second
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",C85,1126,"Cumings, Mr. John Bradley"
2,1126,"Cumings, Mr. John Bradley",C85,2,"Cumings, Mrs. John Bradley (Florence Briggs Th..."
5,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",C123,138,"Futrelle, Mr. Jacques Heath"
6,138,"Futrelle, Mr. Jacques Heath",C123,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)"
9,7,"McCarthy, Mr. Timothy J",E46,1038,"Hilliard, Mr. Herbert Henry"
...,...,...,...,...,...
494,1162,"McCaffry, Mr. Thomas Francis",C6,1010,"Beattie, Mr. Thomson"
497,1110,"Widener, Mrs. George Dunton (Eleanor Elkins)",C80,1299,"Widener, Mr. George Dunton"
498,1299,"Widener, Mr. George Dunton",C80,1110,"Widener, Mrs. George Dunton (Eleanor Elkins)"
501,1144,"Clark, Mr. Walter Miller",C89,1164,"Clark, Mrs. Walter Miller (Virginia McDowell)"


We can now clean this up and get another interesting source of information (`travelling with a family member in the same cabin?`, etc.).

In [60]:
companions.groupby('Cabin').first()

Unnamed: 0_level_0,PassengerId_first,Name_first,PassengerId_second,Name_second
Cabin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A34,446,"Dodge, Master. Washington",1185,"Dodge, Dr. Washington"
B18,330,"Hippach, Miss. Jean Gertrude",524,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)"
B20,691,"Dick, Mr. Albert Adrian",782,"Dick, Mrs. Albert Adrian (Vera Gillespie)"
B22,541,"Crosby, Miss. Harriet R",746,"Crosby, Capt. Edward Gifford"
B28,62,"Icard, Miss. Amelie",830,"Stone, Mrs. George Nelson (Martha Evelyn)"
...,...,...,...,...
F G73,76,"Moen, Mr. Sigurd Hansen",716,"Soholt, Mr. Peter Andreas Lauritz Andersen"
F2,149,"Navratil, Mr. Michel (""Louis M Hoffman"")",194,"Navratil, Master. Michel M"
F33,67,"Nye, Mrs. (Elizabeth Ramell)",346,"Brown, Miss. Amelia ""Mildred"""
F4,184,"Becker, Master. Richard F",619,"Becker, Miss. Marion Louise"


In [54]:
companions.groupby('Cabin').Cabin.count()

Cabin
A34       6
B18       2
B20       2
B22       2
B28       2
         ..
F G73     2
F2       12
F33      12
F4       12
G6       20
Name: Cabin, Length: 79, dtype: int64

In [68]:
companions.loc[companions.Cabin == 'A34']
companions.Cabin.unique()

array(['C85', 'C123', 'E46', 'G6', 'C23 C25 C27', 'B78', 'D33', 'C52',
       'B28', 'C83', 'F33', 'F G73', 'E31', 'D10 D12', 'D26', 'B58 B60',
       'E101', 'F2', 'C2', 'E33', 'F4', 'D36', 'D15', 'C93', 'C78', 'D35',
       'B77', 'E67', 'C125', 'B49', 'D', 'C22 C26', 'C106', 'C65', 'C54',
       'B57 B59 B63 B66', 'C7', 'E34', 'C32', 'B18', 'C124', 'D37', 'B35',
       'E50', 'B96 B98', 'E44', 'A34', 'C92', 'D21', 'D20', 'E25', 'B22',
       'C86', 'C101', 'C68', 'B41', 'D19', 'C126', 'B71', 'B51 B53 B55',
       'B5', 'B20', 'F G63', 'C62 C64', 'E24', 'E8', 'C46', 'D30', 'E121',
       'D17', 'B69', 'D28', 'B45', 'C31', 'C55 C57', 'C116', 'C6', 'C80',
       'C89'], dtype=object)