LatLong type #57

Seth-Rothschild · 2018-01-03T15:50:28Z

The issue in testing comes from mock_ds.py where the mock retail entityset is made with es.entity_from_csv(entity, (line 292). This makes the latlong type in that entityset a string rather than a tuple. The options as I understand them are:

Modify Latitude and Longitude to check if the latlong is a string
Modify entity_from_csv to convert certain strings to tuples
Change the test to do the pandas _from_csv, modify the dataframe and then make entity_from_dataframe.
Leave Latitude and Longitude with no real tests for now.

My gut is to go with 3. Do you have a preference @kmax12?

codecov-io · 2018-01-03T15:53:13Z

Codecov Report

Merging #57 into master will decrease coverage by 0.56%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master     #57      +/-   ##
=========================================
- Coverage   87.37%   86.8%   -0.57%     
=========================================
  Files          74      74              
  Lines        6993    7041      +48     
=========================================
+ Hits         6110    6112       +2     
- Misses        883     929      +46

Impacted Files	Coverage Δ
tests/testing_utils/mock_ds.py	`85.95% <0%> (-8.65%)`	⬇️
entityset/entityset.py	`83.6% <0%> (-6.42%)`	⬇️
tests/entityset_tests/test_es.py	`92.85% <0%> (-3.51%)`	⬇️
utils/gen_utils.py	`64.44% <0%> (-2.23%)`	⬇️
tests/entityset_tests/test_pandas_es.py	`99.73% <0%> (-0.01%)`	⬇️
synthesis/deep_feature_synthesis.py	`92.11% <0%> (ø)`	⬆️
variable_types/variable.py	`78.82% <0%> (+0.25%)`	⬆️
primitives/transform_primitive.py	`97.75% <0%> (+0.28%)`	⬆️
.../feature_function_tests/test_transform_features.py	`86.12% <0%> (+0.67%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cbc82d5...737dae8. Read the comment docs.

kmax12 · 2018-01-03T16:10:24Z

@Seth-Rothschild I agree. Let's go with 3

Seth-Rothschild · 2018-01-03T16:29:11Z

Changed entity_from_csv to entity_from_dataframe in mock_ds.py which lets us put an honest tuple into the make_ecommerce_entityset function. This makes the test pass for Latitude and Longitude but induces 4 new entityset type errors. Three seem to deal with compression of the entityset and the fourth is unicode. They are:

test_glob_entityset in test_es.py
test_gzip_entityset in test_es.py
test_gzip_glob_entityset in test_es.py and
test_combine_variables in test_pandas_es.py

@rwedge any idea what's causing these?

rwedge · 2018-01-03T19:35:12Z

featuretools/tests/testing_utils/mock_ds.py

-                           time_index=ti_name,
-                           secondary_time_index=secondary)
+
+        df = pd.read_csv(filenames[entity])


adding encoding='utf-8' to this read_csv call should fix problem 4.

rwedge · 2018-01-03T22:27:23Z

Errors 1-3 stem from tests for how entity_from_csv handles csv paths with glob patterns, csvs compressed with gzip, and a combination of the two situations. Changing the testing entityset to be loaded in using pandas.read_csv instead of entity_from_csv has caused these tests to error.

Taking a step back, if we don't want to support entity_from_csv working with all variable types we could remove/deprecate it and have entity_from_dataframe be the standard way to create entities. These test cases would no longer be necessary.

Seth-Rothschild · 2018-01-12T16:30:48Z

There are three more explicit uses of entityset.from_csv in the codebase.

Two of the examples in the docs (Creating entities and the retail data demo)
The API reference
The function definition itself

As a way forward I'd propose not testing from_csv, replacing the uses in the docs and putting a warning that it will be depreciated in the function call.

The two other things we might want to do with this are to add to the docs that a latlong type has to be explicitly constructed in the dataframe and add a primitive like Haversine to take advantage of the new type. Does that sound like a good plan?

kmax12 · 2018-01-12T17:52:36Z

sounds like a good plan to me!

Seth-Rothschild · 2018-01-12T18:43:07Z

Ok - this should be good to go after review. Overall changes to master:

Add two columns to mock_ds for testing: latlong and latlong2
Add Latitude variable type and description
Add Latitude, Longitude and Haversine primitives and tests
Add warning to entity_from_csv, remove tests gzip, glob, glob_gzip and test_time_index_components
Remove entity_from_csv from docs notebook and example

kmax12 · 2018-01-17T20:22:23Z

@Seth-Rothschild what do you think about adding these as default primitives in DFS? And if we do, we should probably add NumCharacters and NumWords as well

Seth-Rothschild · 2018-01-17T20:35:27Z

@kmax12 I think NumCharacters and NumWords are probably worth adding as defaults but Latitude, Longitude and Haversine might not be.

A reason to add them would be that latlongs don't show up in a feature matrix, because they're tuples. That being said, by the time a user has made a LatLong variable (i.e. combined the two columns and marked the variable type as LatLong) they're probably at a stage in processing where they'd want to actively choose the trans_primitives and they'd already have a choice of primitives in mind.

kmax12 · 2018-01-17T20:38:46Z

got it. what if we just add the 2 text primitives and Haversine? I think if someone provides two latlong's it useful to automatically return Haversine

Seth-Rothschild · 2018-01-17T21:12:21Z

That sounds like a reasonable plan. Should probably implement this after merging so that I can do both at once, since NumWords and NumCharacters came after this branch.

Seth-Rothschild · 2018-01-18T00:04:54Z

With the merges there's the new error which stems from latlong1 no longer being an array of tuples in python3. Not sure which change has this breaking, but here's the output in python3 versus python2.

python3

(Pdb) latlong1
array([<map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>, <map object at 0x10fcf6860>,
       <map object at 0x10fcf6860>], dtype=object)

python2

(Pdb) latlong1
array([(0.0, 0.0), (5.0, 2.0), (10.0, 4.0), (15.0, 6.0), (20.0, 8.0),
       (0.0, 0.0), (1.0, 1.0), (2.0, 2.0), (3.0, 3.0), (0.0, 0.0),
       (0.0, 0.0), (5.0, 2.0), (0.0, 0.0), (7.0, 3.0), (14.0, 6.0)], dtype=object)

kmax12 · 2018-01-18T00:19:02Z

in python 3 map is an iterator. look for usages of map(...) and do list(map(...)).

kmax12 · 2018-01-18T15:07:21Z

featuretools/tests/testing_utils/mock_ds.py

-            df['latlong'] = map(lambda x: latlong_unstringify(x), df['latlong'])
-            df['latlong2'] = map(lambda x: latlong_unstringify(x), df['latlong2'])
+            df['latlong'] = list(map(
+                lambda x: latlong_unstringify(x), df['latlong']))


i think this line can actually be

df['latlong'] = df['latlong'].apply(latlong_unstringify)

kmax12 · 2018-01-18T18:41:04Z

can we add the new primitives and variable types to the api reference as well as remove entity_from_csv?

kmax12 · 2018-01-18T19:37:53Z

This looks good to merge (once Circle CI passes). @Seth-Rothschild can you put up a PR for removing entity_from_csv when you get a chance?

Seth-Rothschild added 8 commits December 13, 2017 11:49

Add LatLong variable type, Lat and Long trans prims

cef72b7

Modify get_function

3cbfa31

Add latlong to test es

645a7eb

latlong_string workaround

42e3379

linting

3ff65c2

Write primitives with latlong as tuple

8a7bbcd

es.entity_from_csv -> pd.read_csv

f9a2819

Comment out breaking tests. 3 glob/gzip 1 unicode to fix

90238bf

rwedge reviewed Jan 3, 2018

View reviewed changes

Add encoding='utf-8'

9810bd7

Seth-Rothschild added 3 commits January 12, 2018 13:03

Add warnings about from_csv, add desc to latlong

5067022

Add haversine primitive and test

bee50ed

Remove entity_from_csv from notebook

a290740

Seth-Rothschild changed the title ~~(WIP) LatLong type~~ LatLong type Jan 12, 2018

Seth-Rothschild added 4 commits January 17, 2018 17:46

Merge branch 'master' into latlong-type

3b7707a

Add Haversine, NumWords and NumCharacters to default primitives

c108211

lint

3091ba7

isort

4654c70

list(map() in mock_ds

856d8e8

kmax12 reviewed Jan 18, 2018

View reviewed changes

Seth-Rothschild added 2 commits January 18, 2018 10:31

map -> apply

b6a5069

Merge branch 'master' into latlong-type

051d27a

Seth-Rothschild added 2 commits January 18, 2018 14:14

Update API reference

1dd7501

Add LatLong, Drop from_csv to API ref

737dae8

kmax12 merged commit 75d05ec into master Jan 18, 2018

Seth-Rothschild mentioned this pull request Jan 18, 2018

Remove entity_from_csv, tests, and lint #71

Merged

rwedge mentioned this pull request Jan 18, 2018

Release v0.1.17 #72

Merged

Seth-Rothschild deleted the latlong-type branch January 18, 2018 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LatLong type #57

LatLong type #57

Seth-Rothschild commented Jan 3, 2018

codecov-io commented Jan 3, 2018 •

edited

kmax12 commented Jan 3, 2018

Seth-Rothschild commented Jan 3, 2018

rwedge Jan 3, 2018

rwedge commented Jan 3, 2018

Seth-Rothschild commented Jan 12, 2018

kmax12 commented Jan 12, 2018

Seth-Rothschild commented Jan 12, 2018

kmax12 commented Jan 17, 2018

Seth-Rothschild commented Jan 17, 2018

kmax12 commented Jan 17, 2018 •

edited

Seth-Rothschild commented Jan 17, 2018

Seth-Rothschild commented Jan 18, 2018

kmax12 commented Jan 18, 2018 •

edited

kmax12 Jan 18, 2018

kmax12 commented Jan 18, 2018 •

edited

kmax12 commented Jan 18, 2018

LatLong type #57

LatLong type #57

Conversation

Seth-Rothschild commented Jan 3, 2018

codecov-io commented Jan 3, 2018 • edited

Codecov Report

kmax12 commented Jan 3, 2018

Seth-Rothschild commented Jan 3, 2018

rwedge Jan 3, 2018

Choose a reason for hiding this comment

rwedge commented Jan 3, 2018

Seth-Rothschild commented Jan 12, 2018

kmax12 commented Jan 12, 2018

Seth-Rothschild commented Jan 12, 2018

kmax12 commented Jan 17, 2018

Seth-Rothschild commented Jan 17, 2018

kmax12 commented Jan 17, 2018 • edited

Seth-Rothschild commented Jan 17, 2018

Seth-Rothschild commented Jan 18, 2018

kmax12 commented Jan 18, 2018 • edited

kmax12 Jan 18, 2018

Choose a reason for hiding this comment

kmax12 commented Jan 18, 2018 • edited

kmax12 commented Jan 18, 2018

codecov-io commented Jan 3, 2018 •

edited

kmax12 commented Jan 17, 2018 •

edited

kmax12 commented Jan 18, 2018 •

edited

kmax12 commented Jan 18, 2018 •

edited