# Manipulation of Non-Numeric Data, Pt. II
In the last section we discussed manipulation of datetime columns, which are, sadly, still piles of numbers that contradicts the intuition of human minds. And now comes the real part! Text! Sweet plain text which the most sophisticated NLP model can't 100% comprehend!

Here is our ingredients: libraries and spreadsheets.

In [2]:
import pandas as pd
titanic = pd.read_csv("titanic.csv")
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Plain text is *plain* text. And we can't ask pandas to perform magic on it. But we can, still, do some simple operations.

In [7]:
titanic["Name"].str.lower().head()

0                              braund, mr. owen harris
1    cumings, mrs. john bradley (florence briggs th...
2                               heikkinen, miss. laina
3         futrelle, mrs. jacques heath (lily may peel)
4                             allen, mr. william henry
Name: Name, dtype: object

We can also split and organize and strings by assigning the results to new columns.

*HINT* - Don't be confused by the **get(0)** function. It performs action on each element instead of extracting the first element of the table!

In [6]:
titanic["Surname"] = titanic["Name"].str.split(",").str.get(0)
titanic["Surname"].head()

0       Braund
1      Cumings
2    Heikkinen
3     Futrelle
4        Allen
Name: Surname, dtype: object

Of course, text can be used for logical operations as well.
Let's find the heroine of Titanic disaster, <a href="https://www.bing.com/search?q=No%C3%ABl+Leslie%2C+Countess+of+Rothess">NoÃ«l Leslie, Countess of Rothes</a>!

In [5]:
titanic[titanic["Name"].str.contains("Countess")]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Surname
759,760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dye...",female,33.0,0,0,110152,86.5,B77,S,Rothes


Okay, enough serious talking now. Let's find the passenger with **the longest name** on the Titanic.

In [8]:
titanic.loc[titanic["Name"].str.len().idxmax(), "Name"]

'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)'

And we can derive new text columns by replacing patterns.

In [10]:
titanic["Sex_short"] = titanic["Sex"]\
    .replace({"male": "M", "female": "F"})
titanic["Sex_short"].head()

0    M
1    F
2    F
3    F
4    M
Name: Sex_short, dtype: object

Congrats on finishing learning this handy **pandas** tool!
Did you feel like being a *real data scientist*?
No? Then please read from the ~~Induction~~ *Introduction* again ;)