Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change FullName to PersonFullName #740

Merged
merged 6 commits into from Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -50,7 +50,7 @@ df = pd.read_csv("https://api.featurelabs.com/datasets/online-retail-logs-2018-0
df.ww.init(name='retail', make_index=True, index='order_product_id')
df.ww.set_types(logical_types={
'quantity': 'Integer',
'customer_name': 'FullName',
'customer_name': 'PersonFullName',
'country': 'Categorical',
'order_id': 'Categorical'
})
Expand All @@ -59,15 +59,15 @@ df.ww

```
Physical Type Logical Type Semantic Tag(s)
Column
Column
order_product_id Int64 Integer ['index']
order_id category Categorical ['category']
product_id category Categorical ['category']
description string NaturalLanguage []
quantity Int64 Integer ['numeric']
order_date datetime64[ns] Datetime []
unit_price float64 Double ['numeric']
customer_name string FullName []
customer_name string PersonFullName []
country category Categorical ['category']
total float64 Double ['numeric']
cancelled boolean Boolean []
Expand Down
2 changes: 1 addition & 1 deletion docs/source/api_reference.rst
Expand Up @@ -118,12 +118,12 @@ Logical Types
Double
EmailAddress
Filepath
FullName
Integer
IPAddress
LatLong
NaturalLanguage
Ordinal
PersonFullName
PhoneNumber
PostalCode
SubRegionCode
Expand Down
16 changes: 8 additions & 8 deletions docs/source/guides/understanding_types_and_tags.ipynb
Expand Up @@ -171,7 +171,7 @@
"logical_types = {\n",
" 'integers': 'Integer',\n",
" 'bools': 'Boolean',\n",
" 'names': 'FullName'\n",
" 'names': 'PersonFullName'\n",
"}\n",
"\n",
"df.ww.init(logical_types=logical_types)\n",
Expand All @@ -192,7 +192,7 @@
"outputs": [],
"source": [
"logical_types = {\n",
" 'names': 'FullName'\n",
" 'names': 'PersonFullName'\n",
"}\n",
"df.ww.init(logical_types=logical_types)\n",
"df.ww"
Expand All @@ -202,7 +202,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"With that input, you get the same results. Woodwork used the `FullName` logical type you assigned to the `names` column and then correctly inferred the logical types for the `integers` and `bools` columns.\n",
"With that input, you get the same results. Woodwork used the `PersonFullName` logical type you assigned to the `names` column and then correctly inferred the logical types for the `integers` and `bools` columns.\n",
"\n",
"Next, look at what happens if we do not specify any logical types."
]
Expand All @@ -221,9 +221,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, Woodwork correctly inferred type for the `integers` and `bools` columns, but failed to recognize the `names` column should have a logical type of `FullName`. In situations like this, Woodwork provides users the ability to change the logical type.\n",
"In this case, Woodwork correctly inferred type for the `integers` and `bools` columns, but failed to recognize the `names` column should have a logical type of `PersonFullName`. In situations like this, Woodwork provides users the ability to change the logical type.\n",
"\n",
"Update the logical type of the `names` column to be `FullName`."
"Update the logical type of the `names` column to be `PersonFullName`."
]
},
{
Expand All @@ -232,17 +232,17 @@
"metadata": {},
"outputs": [],
"source": [
"df.ww.set_types(logical_types={'names': 'FullName'})\n",
"df.ww.set_types(logical_types={'names': 'PersonFullName'})\n",
"df.ww"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you look carefully at the output, you can see that several things happened to the `names` column. First, the correct `FullName` logical type has been applied. Second, the physical type of the column has changed from `category` to `string` to match the standard physical type for the `FullName` logical type. Finally, the standard tag of `category` that was previously set for the `names` column has been removed because it no longer applies.\n",
"If you look carefully at the output, you can see that several things happened to the `names` column. First, the correct `PersonFullName` logical type has been applied. Second, the physical type of the column has changed from `category` to `string` to match the standard physical type for the `PersonFullName` logical type. Finally, the standard tag of `category` that was previously set for the `names` column has been removed because it no longer applies.\n",
"\n",
"When setting the LogicalType for a column, the type can be specified by passing a string representing the camel-case name of the LogicalType class as you have done in previous examples. Alternatively, you can pass the class directly instead of a string or the snake-case name of the string. All of these would be valid values to use for setting the FullName Logical type: `FullName`, `\"FullName\"` or `\"full_name\"`. \n",
"When setting the LogicalType for a column, the type can be specified by passing a string representing the camel-case name of the LogicalType class as you have done in previous examples. Alternatively, you can pass the class directly instead of a string or the snake-case name of the string. All of these would be valid values to use for setting the PersonFullName Logical type: `PersonFullName`, `\"PersonFullName\"` or `\"person_full_name\"`. \n",
"\n",
"Note—in order to use the class name, first you have to import the class."
]
Expand Down
4 changes: 2 additions & 2 deletions docs/source/guides/using_woodwork_with_dask_and_koalas.ipynb
Expand Up @@ -86,7 +86,7 @@
" 'quantity': 'Integer',\n",
" 'order_date': 'Datetime',\n",
" 'unit_price': 'Double',\n",
" 'customer_name': 'FullName',\n",
" 'customer_name': 'PersonFullName',\n",
" 'country': 'Categorical',\n",
" 'total': 'Double',\n",
" 'cancelled': 'Boolean',\n",
Expand Down Expand Up @@ -210,7 +210,7 @@
" 'quantity': 'Integer',\n",
" 'order_date': 'Datetime',\n",
" 'unit_price': 'Double',\n",
" 'customer_name': 'FullName',\n",
" 'customer_name': 'PersonFullName',\n",
" 'country': 'Categorical',\n",
" 'total': 'Double',\n",
" 'cancelled': 'Boolean',\n",
Expand Down
4 changes: 3 additions & 1 deletion docs/source/release_notes.rst
Expand Up @@ -6,15 +6,17 @@ Release Notes
* Enhancements
* Fixes
* Changes
* Rename ``FullName`` logical type to ``PersonFullName`` (:pr:`740`)
* Rename ``ZIPCode`` logical type to ``PostalCode`` (:pr:`741`)
* Documentation Changes
* Testing Changes

Thanks to the following people for contributing to this release:
:user:`thehomebrewnerd`
:user:`jeff-hernandez`, :user:`thehomebrewnerd`

**Breaking Changes**
* The ``ZIPCode`` logical type has been renamed to ``PostalCode``
* The ``FullName`` logical type has been renamed to ``PersonFullName``

**v0.1.0 March 22, 2021**
* Enhancements
Expand Down
6 changes: 3 additions & 3 deletions docs/source/start.ipynb
Expand Up @@ -106,7 +106,7 @@
"metadata": {},
"source": [
"## Updating Logical Types\n",
"If the initial inference was not to our liking, the logical type can be changed to a more appropriate value. Let's change some of the columns to a different logical type to illustrate this process. In this case, set the logical type for the `order_product_id` and `country` columns to be `Categorical` and set `customer_name` to have a logical type of `FullName`."
"If the initial inference was not to our liking, the logical type can be changed to a more appropriate value. Let's change some of the columns to a different logical type to illustrate this process. In this case, set the logical type for the `order_product_id` and `country` columns to be `Categorical` and set `customer_name` to have a logical type of `PersonFullName`."
]
},
{
Expand All @@ -116,7 +116,7 @@
"outputs": [],
"source": [
"df.ww.set_types(logical_types={\n",
" 'customer_name': 'FullName',\n",
" 'customer_name': 'PersonFullName',\n",
" 'country': 'Categorical',\n",
" 'order_id': 'Categorical'\n",
"})\n",
Expand Down Expand Up @@ -492,7 +492,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.8.8"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion woodwork/logical_types.py
Expand Up @@ -156,7 +156,7 @@ class Filepath(LogicalType):
primary_dtype = 'string'


class FullName(LogicalType):
class PersonFullName(LogicalType):
jeff-hernandez marked this conversation as resolved.
Show resolved Hide resolved
"""Represents Logical Types that may contain first, middle and last names,
including honorifics and suffixes.

Expand Down
4 changes: 2 additions & 2 deletions woodwork/tests/accessor/test_statistics.py
Expand Up @@ -10,12 +10,12 @@
Double,
EmailAddress,
Filepath,
FullName,
Integer,
IPAddress,
LatLong,
NaturalLanguage,
Ordinal,
PersonFullName,
PhoneNumber,
PostalCode,
SubRegionCode,
Expand Down Expand Up @@ -229,7 +229,7 @@ def test_describe_accessor_method(describe_df):
formatted_datetime_ltypes = [Datetime(datetime_format='%Y~%m~%d')]
timedelta_ltypes = [Timedelta]
numeric_ltypes = [Double, Integer]
natural_language_ltypes = [EmailAddress, Filepath, FullName, IPAddress,
natural_language_ltypes = [EmailAddress, Filepath, PersonFullName, IPAddress,
PhoneNumber, URL]
latlong_ltypes = [LatLong]

Expand Down
46 changes: 23 additions & 23 deletions woodwork/tests/accessor/test_table_accessor.py
Expand Up @@ -23,12 +23,12 @@
Double,
EmailAddress,
Filepath,
FullName,
Integer,
IPAddress,
LatLong,
NaturalLanguage,
Ordinal,
PersonFullName,
PhoneNumber,
PostalCode,
SubRegionCode
Expand Down Expand Up @@ -667,7 +667,7 @@ def test_sets_string_dtype_on_init():

logical_types = [
Filepath,
FullName,
PersonFullName,
IPAddress,
NaturalLanguage,
PhoneNumber,
Expand Down Expand Up @@ -1085,7 +1085,7 @@ def test_accessor_with_falsy_column_names(falsy_names_df):

def test_get_invalid_schema_message(sample_df):
schema_df = sample_df.copy()
schema_df.ww.init(name='test_schema', index='id', logical_types={'id': 'Double', 'full_name': 'FullName'})
schema_df.ww.init(name='test_schema', index='id', logical_types={'id': 'Double', 'full_name': 'PersonFullName'})
schema = schema_df.ww.schema

assert _get_invalid_schema_message(schema_df, schema) is None
Expand Down Expand Up @@ -1131,7 +1131,7 @@ def test_get_invalid_schema_message_index_checks(sample_df):
pytest.xfail('Index validation not performed for Dask or Koalas DataFrames')

schema_df = sample_df.copy()
schema_df.ww.init(name='test_schema', index='id', logical_types={'id': 'Double', 'full_name': 'FullName'})
schema_df.ww.init(name='test_schema', index='id', logical_types={'id': 'Double', 'full_name': 'PersonFullName'})
schema = schema_df.ww.schema

different_underlying_index_df = schema_df.copy()
Expand Down Expand Up @@ -1274,7 +1274,7 @@ def test_get_subset_df_with_schema(sample_df):
schema_df.ww.init(time_index='signup_date',
index='id',
name='df_name',
logical_types={'full_name': FullName,
logical_types={'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'age': Double,
Expand Down Expand Up @@ -1323,7 +1323,7 @@ def test_get_subset_df_use_dataframe_order(sample_df):

def test_select_ltypes_no_match_and_all(sample_df):
schema_df = sample_df.copy()
schema_df.ww.init(logical_types={'full_name': FullName,
schema_df.ww.init(logical_types={'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'age': Double,
Expand All @@ -1342,50 +1342,50 @@ def test_select_ltypes_no_match_and_all(sample_df):

def test_select_ltypes_strings(sample_df):
schema_df = sample_df.copy()
schema_df.ww.init(logical_types={'full_name': FullName,
schema_df.ww.init(logical_types={'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'age': Double,
'signup_date': Datetime,
})

df_multiple_ltypes = schema_df.ww.select(['FullName', 'email_address', 'double', 'Boolean', 'datetime'])
df_multiple_ltypes = schema_df.ww.select(['PersonFullName', 'email_address', 'double', 'Boolean', 'datetime'])
assert len(df_multiple_ltypes.columns) == 5
assert 'phone_number' not in df_multiple_ltypes.columns
assert 'id' not in df_multiple_ltypes.columns

df_single_ltype = schema_df.ww.select('full_name')
df_single_ltype = schema_df.ww.select('person_full_name')
assert set(df_single_ltype.columns) == {'full_name'}


def test_select_ltypes_objects(sample_df):
schema_df = sample_df.copy()
schema_df.ww.init(logical_types={'full_name': FullName,
schema_df.ww.init(logical_types={'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'age': Double,
'signup_date': Datetime,
})

df_multiple_ltypes = schema_df.ww.select([FullName, EmailAddress, Double, Boolean, Datetime])
df_multiple_ltypes = schema_df.ww.select([PersonFullName, EmailAddress, Double, Boolean, Datetime])
assert len(df_multiple_ltypes.columns) == 5
assert 'phone_number' not in df_multiple_ltypes.columns
assert 'id' not in df_multiple_ltypes.columns

df_single_ltype = schema_df.ww.select(FullName)
df_single_ltype = schema_df.ww.select(PersonFullName)
assert len(df_single_ltype.columns) == 1


def test_select_ltypes_mixed(sample_df):
schema_df = sample_df.copy()
schema_df.ww.init(logical_types={'full_name': FullName,
schema_df.ww.init(logical_types={'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'age': Double,
'signup_date': Datetime,
})

df_mixed_ltypes = schema_df.ww.select(['FullName', 'email_address', Double])
df_mixed_ltypes = schema_df.ww.select(['PersonFullName', 'email_address', Double])
assert len(df_mixed_ltypes.columns) == 3
assert 'phone_number' not in df_mixed_ltypes.columns

Expand All @@ -1395,7 +1395,7 @@ def test_select_ltypes_table(sample_df):
schema_df.ww.init(name='testing',
index='id',
time_index='signup_date',
logical_types={'full_name': FullName,
logical_types={'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'age': Double,
Expand All @@ -1413,7 +1413,7 @@ def test_select_ltypes_table(sample_df):
assert df_with_indices.ww.index == 'id'
assert df_with_indices.ww.time_index == 'signup_date'

df_values = schema_df.ww.select(['FullName'])
df_values = schema_df.ww.select(['PersonFullName'])
assert df_values.ww.name == schema_df.ww.name
assert df_values.ww.columns['full_name'] == schema_df.ww.columns['full_name']

Expand Down Expand Up @@ -1465,7 +1465,7 @@ def test_select_single_inputs(sample_df):
index='id',
name='df_name',
logical_types={
'full_name': FullName,
'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'signup_date': Datetime(datetime_format='%Y-%m-%d')
Expand All @@ -1476,7 +1476,7 @@ def test_select_single_inputs(sample_df):
'signup_date': 'date_of_birth'
})

df_ltype_string = schema_df.ww.select('full_name')
df_ltype_string = schema_df.ww.select('person_full_name')
assert len(df_ltype_string.columns) == 1
assert 'full_name' in df_ltype_string.columns

Expand All @@ -1500,7 +1500,7 @@ def test_select_list_inputs(sample_df):
index='id',
name='df_name',
logical_types={
'full_name': FullName,
'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'signup_date': Datetime(datetime_format='%Y-%m-%d'),
Expand All @@ -1513,14 +1513,14 @@ def test_select_list_inputs(sample_df):
'is_registered': 'category'
})

df_just_strings = schema_df.ww.select(['FullName', 'index', 'tag2', 'boolean'])
df_just_strings = schema_df.ww.select(['PersonFullName', 'index', 'tag2', 'boolean'])
assert len(df_just_strings.columns) == 4
assert 'id' in df_just_strings.columns
assert 'full_name' in df_just_strings.columns
assert 'email' in df_just_strings.columns
assert 'is_registered' in df_just_strings.columns

df_mixed_selectors = schema_df.ww.select([FullName, 'index', 'time_index', Integer])
df_mixed_selectors = schema_df.ww.select([PersonFullName, 'index', 'time_index', Integer])
assert len(df_mixed_selectors.columns) == 4
assert 'id' in df_mixed_selectors.columns
assert 'full_name' in df_mixed_selectors.columns
Expand All @@ -1540,7 +1540,7 @@ def test_select_semantic_tags_no_match(sample_df):
index='id',
name='df_name',
logical_types={
'full_name': FullName,
'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'signup_date': Datetime(datetime_format='%Y-%m-%d'),
Expand All @@ -1566,7 +1566,7 @@ def test_select_repetitive(sample_df):
index='id',
name='df_name',
logical_types={
'full_name': FullName,
'full_name': PersonFullName,
'email': EmailAddress,
'phone_number': PhoneNumber,
'signup_date': Datetime(datetime_format='%Y-%m-%d'),
Expand Down