# DATAPREP.AI

Since data exploration and preparation is needed for each data engineering project and the same techniques always have to be used, a number of high-level, low-code tools and libraries have emerged in recent years. One of those tools, that can be used for both data exploration and data cleaning, is dataprep.ai. You can install it in your Python environment through `pip install dataprep`.

## 1. Data exploration

With very few code you can generate a complete report to explore your data:

In [None]:
import pandas as pd
from dataprep.eda import create_report

In [None]:
df = pd.read_csv('datasets/penguins.csv')
report = create_report(df)

In [None]:
report

In [None]:
report.save('Penguin Report') # save report to local disk
report.show_browser() # show report in the browser

In [None]:
df.info()

In [None]:
df.head(10)

## 2. Data cleaning

`dataprep.ai` also offers some functionality for data validation and cleaning. See https://docs.dataprep.ai/user_guide/user_guide.html for the documentation. It can e.g. check for the validity of Belgian VAT or IBAN numbers. 

In [None]:
from dataprep.clean import clean_be_vat
btw = pd.DataFrame({"vat": ['BE403019261','BE431150351','BE0255647755']})
clean_be_vat(btw,'vat')

The first and third vat number correspond to the format of Belgian VAT numbers; the second doesn't.  However, it does not check against the existance of a VAT number.  For that, you have to use country-specific tools, if avaible. For Belgian VAT numbers there is an public web service that can be used to check of a number really exists (of course, this has nothing to do with `dataprep.ai`). 

In [5]:
import requests
x = requests.get("https://controleerbtwnummer.eu/api/validate/BE403019261.json")
print(x.text)

{"valid":false,"countryCode":"BE","vatNumber":"403019261","name":null,"address":{"street":null,"number":null,"zip_code":null,"city":null,"country":"BelgiÃ«","countryCode":"BE"},"strAddress":null}


In [None]:
x = requests.get("https://controleerbtwnummer.eu/api/validate/BE0255647755.json")
print(x.text)

If the VAT number exists, like in the second example, a JSON object containing the company details is returned.