Skip to content

Country name cleaning failed example #939

Open
@yibenhuang

Description

@yibenhuang

Describe the bug
Hi, just found the country name "Virgin Islands (British)" would be failed to clean to the correct name.

To Reproduce

import pandas as pd
from dataprep.clean import clean_country

df = pd.DataFrame({"country": ["Virgin Islands (British)", "Virgin Islands (U.S.)"]})
clean_country(df, column="country", output_format="name")

Output:

country country_clean
0 Virgin Islands (British) NaN
1 Virgin Islands (U.S.) United States Virgin Islands

Expected behavior
The based on project country_converter can work like below.

import country_converter as coco

names = ["Virgin Islands (British)", "Virgin Islands (U.S.)"]
cc = coco.CountryConverter()

cc.convert(names=names, to="name_short")
# Output: ['British Virgin Islands', 'United States Virgin Islands']

Desktop (please complete the following information):

  • OS: macOS
  • Browser: Chrome
  • Platform: Jupyter Notebook
  • Platform Version 6.4.12
  • Python Version: 3.10.5
  • Dataprep Version: 0.4.5

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions