# Table of Contents
 <p><div class="lev1 toc-item"><a href="#to-a-string" data-toc-modified-id="to-a-string-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>to a string</a></div><div class="lev1 toc-item"><a href="#to-a-numeric" data-toc-modified-id="to-a-numeric-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>to a numeric</a></div>

Data types will govern what you can and cannot do with variables,
Here we will cover coverting types from one to another,
e.g., strings to numerics

In [1]:
import pandas as pd

In [1]:
# we will be using a a dataset from tips
import seaborn as sns

In [3]:
# load the tipes dataset
tips = sns.load_dataset('tips')

In [4]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [5]:
# look at the types
tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
total_bill    244 non-null float64
tip           244 non-null float64
sex           244 non-null category
smoker        244 non-null category
day           244 non-null category
time          244 non-null category
size          244 non-null int64
dtypes: category(4), float64(2), int64(1)
memory usage: 7.2 KB


# to a string

In [6]:
tips['sex_str'] = tips['sex'].astype(str)

In [7]:
tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 8 columns):
total_bill    244 non-null float64
tip           244 non-null float64
sex           244 non-null category
smoker        244 non-null category
day           244 non-null category
time          244 non-null category
size          244 non-null int64
sex_str       244 non-null object
dtypes: category(4), float64(2), int64(1), object(1)
memory usage: 9.1+ KB


# to a numeric

In [8]:
# subset our data for the example
tips_sub_miss = tips.head(10)

In [9]:
# turn a few values in to a string "missing"
tips_sub_miss.loc[[1, 3, 5, 7], 'total_bill'] = 'missing'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [10]:
tips_sub_miss

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,sex_str
0,16.99,1.01,Female,No,Sun,Dinner,2,Female
1,missing,1.66,Male,No,Sun,Dinner,3,Male
2,21.01,3.5,Male,No,Sun,Dinner,3,Male
3,missing,3.31,Male,No,Sun,Dinner,2,Male
4,24.59,3.61,Female,No,Sun,Dinner,4,Female
5,missing,4.71,Male,No,Sun,Dinner,4,Male
6,8.77,2.0,Male,No,Sun,Dinner,2,Male
7,missing,3.12,Male,No,Sun,Dinner,4,Male
8,15.04,1.96,Male,No,Sun,Dinner,2,Male
9,14.78,3.23,Male,No,Sun,Dinner,2,Male


In [15]:
# See how the column is now "object" for string type
tips_sub_miss.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 8 columns):
total_bill    10 non-null object
tip           10 non-null float64
sex           10 non-null category
smoker        10 non-null category
day           10 non-null category
time          10 non-null category
size          10 non-null int64
sex_str       10 non-null object
dtypes: category(4), float64(1), int64(1), object(2)
memory usage: 920.0+ bytes


In [11]:
# try to convert to a float
# this will error
tips_sub_miss['total_bill'].astype(float)

ValueError: could not convert string to float: 'missing'

In [17]:
# use the to numeric function to convert to numeric values
# this will error
pd.to_numeric(tips_sub_miss['total_bill'])

ValueError: Unable to parse string "missing" at position 1

In [20]:
# need to coerce things to NaN
pd.to_numeric(tips_sub_miss['total_bill'],errors='coerce')

0    16.99
1      NaN
2    21.01
3      NaN
4    24.59
5      NaN
6     8.77
7      NaN
8    15.04
9    14.78
Name: total_bill, dtype: float64