## Data Types

In [2]:
import pandas as pd
import seaborn as sns

In [3]:
tips = sns.load_dataset('tips')

In [5]:
tips.dtypes

total_bill     float64
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
dtype: object

In [6]:
# To convert values to a different data type, use the .astype() method.
tips['sex_str'] = tips['sex'].astype(str)
tips.dtypes

total_bill     float64
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
sex_str         object
dtype: object

In [9]:
# While astype(float) works, the pandas to_numeric function handles non-numeric values better.
import warnings
warnings.filterwarnings('ignore')

tips_sub_miss = tips.head(10)
tips_sub_miss.loc[[1, 3, 5, 7], 'total_bill'] = 'missing'
tips_sub_miss

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,sex_str
0,16.99,1.01,Female,No,Sun,Dinner,2,Female
1,missing,1.66,Male,No,Sun,Dinner,3,Male
2,21.01,3.5,Male,No,Sun,Dinner,3,Male
3,missing,3.31,Male,No,Sun,Dinner,2,Male
4,24.59,3.61,Female,No,Sun,Dinner,4,Female
5,missing,4.71,Male,No,Sun,Dinner,4,Male
6,8.77,2.0,Male,No,Sun,Dinner,2,Male
7,missing,3.12,Male,No,Sun,Dinner,4,Male
8,15.04,1.96,Male,No,Sun,Dinner,2,Male
9,14.78,3.23,Male,No,Sun,Dinner,2,Male


In [10]:
# Notice that total_bill is not an object because we inserted strings into that series.
tips_sub_miss.dtypes

total_bill      object
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
sex_str         object
dtype: object

In [11]:
# ... therefore, this will also thrown an error...
tips_sub_miss['total_bill'].astype(float)

ValueError: could not convert string to float: 'missing'

`to_numeric` will throw an error as well unless we make use of the `errors` parameter:  

* `raise` (default) will raise an error if it cannot convert to a numeric value.
* `coerce` will return **NaN** for values it cannot convert to numeric values.
* `ignore` will return a vector without converting the column into a numeric value (i.e., will do nothing)

In [13]:
tips_sub_miss['total_bill'] = pd.to_numeric(tips_sub_miss['total_bill'], errors='coerce')
tips_sub_miss.dtypes

total_bill     float64
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
sex_str         object
dtype: object

In [14]:
tips_sub_miss['total_bill'].sample(10)

8    15.04
9    14.78
2    21.01
5      NaN
3      NaN
1      NaN
6     8.77
4    24.59
0    16.99
7      NaN
Name: total_bill, dtype: float64

The `to_numeric` function has another parameter, `downcast` which allows you to change the numeric `dtype` to the smallest possible numeric `dtype`.  
The default is **None**; other possible values are 'integer', 'signed', 'unsigned' and 'float'.

### Categorical Data
Categorical data, where applicable, may result in memory and speed efficiency.  
Refer to these urls for more information:

https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html    
https://pandas.pydata.org/pandas-docs/stable/api.html#api-categorical  


In [16]:
tips.dtypes

total_bill     float64
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
sex_str         object
dtype: object