# Uniformity Problems

You might have a data that have values in different units and formats as demonstrated below. Therefore, you should unify your data!

-	**Temperature:** `32°C` is also `89.6°F`

-	**Weight:** `70 Kg` is also `11 st.`

-	**Date:** `26-11-2019` is also `26, November, 2019`

-	**Money:** `100$` is also `4,886.05 EGP`

<br>

### Treating Currencies Data: (Unit Uniformity)

```python
""" Dataset Description:

The banking dataset contains data on the amount of money stored in accounts (acct_amount), their currency (acct_cur), amount invested (inv_amount), account opening date (account_opened), and last transaction date (last_transaction) that were consolidated from American and European branches.
"""

# Find values of acct_cur that are equal to 'euro'
acct_eu = banking['acct_cur'] == 'euro'

# Convert acct_amount where it is in euro to dollars
banking.loc[acct_eu, 'acct_amount'] *= 1.1

# Unify acct_cur column by changing 'euro' values to 'dollar'
banking.loc[acct_eu, 'acct_cur'] = 'dollar'

# Assert that only dollar currency remains
assert banking['acct_cur'].unique() == 'dollar'
```

<br>

### Treating Date Data: (Format Uniformity)

```python
# Convert account_opened to datetime
banking['account_opened'] = pd.to_datetime(banking['account_opened'],
										# Infer datetime format
										infer_datetime_format = True,
										# Return missing value 'NA' instead of throwing errors in case of an invalid matching 
										errors = 'coerce') 
										
# `.dt.strftime(TimeFormat)` converts the format of a datetime column, which accepts a datetime format of your choice 
# Examples of allowed datetime formats: "%d-%m-%Y" → "25-12-2019", "%c" → "December 25th 2019", "%m-%d-%Y" → "12-25-2019", ...
# Get year of account opened
banking['acct_year'] = banking['account_opened'].dt.strftime("%Y")
```

<br>

### Treating ambiguous date data

Is `"2019-03-08"` in August or March?

**Possible Solutions:**

-	Convert to NA and treat accordingly

-	Infer format by understanding data source

-	Infer format by understanding previous and subsequent data in DataFrame