
# FORMATTING - numeric values



Let me ***scrape*** this [table](https://www.cia.gov/the-world-factbook/field/real-gdp-purchasing-power-parity/country-comparison/) from the CIA World Fact Book.

In [None]:
# read data from website
linkCiaPurchPower='https://www.cia.gov/the-world-factbook/field/real-gdp-purchasing-power-parity/country-comparison/'

library(rvest)

# get tables from website
ciapupow = read_html(linkCiaPurchPower)%>%html_table()%>%.[[1]] #get first table, there is only one

# see table scraped
head(ciapupow)


The data needs cleaning before formatting. Review previous contents and do the cleaning.

 ### I.1 Cleaning before formatting

You see currency symbols and commas above, then you know R has not identified the 3rd column as numerical.

The first step is to know if only '**$**' and '**,**' are present beside numbers:

In [None]:
# replace any digit by 'nothing' ('')
# and show unique values

# function
byeDigits=function(x) gsub('\\d','',x)
# check
unique(apply(ciapupow[,c(3)],1,FUN = byeDigits))


We have confirmed that the column will need a simple cleaning by replacing values. Let me first change that column name:

In [None]:
names(ciapupow)[3]='RealGDP'

Let's create a function to clean by replacing and getting rid of leading and trailing spaces:

In [None]:
keepDigits=function(x) gsub('\\D','',trimws(x))

ciapupow[,c(3)]=  apply(ciapupow[,c(3)],1,FUN = keepDigits)

# Notice 'RealGDP' is still character type
str(ciapupow)

In [None]:
ciapupow

While *RealGDP* is clean, you do not have the right format:

In [None]:
summary(ciapupow$RealGDP)


### I.2 Formatting number using **as_numeric**:

This is the function that will solve our problem.

You can only apply this function when you are sure your numeric column is clean.

Let's see:


In [None]:
ciapupow$RealGDP=as.numeric(ciapupow$RealGDP)

#Notice 'RealGDP' is numeric now
str(ciapupow)

In [None]:
summary(ciapupow$RealGDP)

If the cell isnâ€™t clean, R will coerce the value to NA. Explicitly handling missing values during cleaning is preferable to relying on automatic coercion