# Pandas tip #14: What's your number?
When importing data I regularly come accross strings that are actually numbers. It is not too hard to convert them into their true data type. A few years ago I would simply use a lambda function that contains a try/catch statement to fix errors in the dataset. As with many things, Pandas has this already implemented using the .to_numeric() function. 

Just like .to_datetime(), the .to_numeric() function is a called straight from Pandas library, i.e. it is not a method of the DataFrame or Series object. By default, it will raise an exception when a value cannot be transformed into a numeric value. This behaviour can be altered using the `errors` parameter. Next to `errors='raise'`, we can set it to `errors='ignore'` to keep the original value, or `errors='coerce'` to set errors to NA.

Another nice feature is to 'downgrade' the numeric values into a specific data type. This is helpful when you expect a certain data type, i.e. integers, signed, or unsigned values. It does not round values, but simply type cast them. This could lead to errors that are not catched by the errors parameter. Another thing to keep in mind is that the standard integer data type does not contain a NA value, therefore, if you data contains any NA, it will by default be casted to float.

The .to_numeric() function is my first go to for string to numeric conversion!

Lets generate some random data:

In [None]:
import numpy as np
import pandas as pd
from itertools import product

rng = np.random.default_rng(42)
n_rows = 100
str_numbers = [str(x) for x in rng.integers(0, 1000, size=n_rows)]
str_numbers += ['-', 'Nope', 'Poop']  # add some random noise

df = pd.DataFrame({
    'id': np.arange(n_rows),
    'number': rng.choice(str_numbers, size=n_rows),
})

We have these unique values:

In [None]:
df.number.unique()

Now we can easily convert them into numbers using Pandas, including error handling:

In [None]:
pd.to_numeric(df.number, errors='coerce')

We could create integers using the downcast:

In [None]:
pd.to_numeric(df.number, errors='coerce', downcast='integer')

However, the standard integer datatype does cannot handle NA, therefore, it is cast into float.

If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis).