-
Notifications
You must be signed in to change notification settings - Fork 897
LatLong type #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LatLong type #57
Conversation
Codecov Report
@@ Coverage Diff @@
## master #57 +/- ##
=========================================
- Coverage 87.37% 86.8% -0.57%
=========================================
Files 74 74
Lines 6993 7041 +48
=========================================
+ Hits 6110 6112 +2
- Misses 883 929 +46
Continue to review full report at Codecov.
|
@Seth-Rothschild I agree. Let's go with 3 |
Changed
@rwedge any idea what's causing these? |
time_index=ti_name, | ||
secondary_time_index=secondary) | ||
|
||
df = pd.read_csv(filenames[entity]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding encoding='utf-8'
to this read_csv
call should fix problem 4.
Errors 1-3 stem from tests for how Taking a step back, if we don't want to support |
There are three more explicit uses of
As a way forward I'd propose not testing The two other things we might want to do with this are to add to the docs that a latlong type has to be explicitly constructed in the dataframe and add a primitive like Haversine to take advantage of the new type. Does that sound like a good plan? |
sounds like a good plan to me! |
Ok - this should be good to go after review. Overall changes to master:
|
@Seth-Rothschild what do you think about adding these as default primitives in DFS? And if we do, we should probably add |
@kmax12 I think A reason to add them would be that latlongs don't show up in a feature matrix, because they're tuples. That being said, by the time a user has made a |
got it. what if we just add the 2 text primitives and |
That sounds like a reasonable plan. Should probably implement this after merging so that I can do both at once, since |
With the merges there's the new error which stems from latlong1 no longer being an array of tuples in python3. Not sure which change has this breaking, but here's the output in python3 versus python2. python3
python2
|
in python 3 map is an iterator. look for usages of |
df['latlong'] = map(lambda x: latlong_unstringify(x), df['latlong']) | ||
df['latlong2'] = map(lambda x: latlong_unstringify(x), df['latlong2']) | ||
df['latlong'] = list(map( | ||
lambda x: latlong_unstringify(x), df['latlong'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this line can actually be
df['latlong'] = df['latlong'].apply(latlong_unstringify)
can we add the new primitives and variable types to the api reference as well as remove |
This looks good to merge (once Circle CI passes). @Seth-Rothschild can you put up a PR for removing entity_from_csv when you get a chance? |
The issue in testing comes from
mock_ds.py
where the mock retail entityset is made withes.entity_from_csv(entity,
(line 292). This makes the latlong type in that entityset a string rather than a tuple. The options as I understand them are:Latitude
andLongitude
to check if the latlong is a stringentity_from_csv
to convert certain strings to tuples_from_csv
, modify the dataframe and then makeentity_from_dataframe
.Latitude
andLongitude
with no real tests for now.My gut is to go with 3. Do you have a preference @kmax12?