Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 'Find village "49"' exercise to use only interview date, because GPS isn't distinct #43

Open
brownsarahm opened this issue Jan 11, 2019 · 3 comments
Labels
help wanted Looking for Contributors

Comments

@brownsarahm
Copy link
Contributor

brownsarahm commented Jan 11, 2019

This lesson uses GPS and interview date to try to correct a mislabeled village. However, the GPS locations in the 3 villages are not distinct. With a scatter plot, we can see that the GPS locations are in 3 clusters, but each of those clusters has responses from multiple villages.

Is this a bad copy of the data? or is the actual GPS data bad?

If the GPS data is actually bad, maybe we should change the last exercise of episode 3 to only rely on interview date?

Edit: update the link to the exercise.


Conclusion by @bencomp from discussion below: let's update the exercise to only rely on interview date.

@ha0ye
Copy link

ha0ye commented Feb 14, 2021

The only explanation I can come up with is that the surveys are collected at sites, and the GPS coordinates are for the device used to collect the data, but that the survey asks questions on household and village that may not correspond to the location of data collection.

The figshare entry is unclear about this... it states that the province, district, ward, and village are all related to where the survey was conducted.

The closest reference I could find about the survey methodology was Bont et al. 2019, but this didn't have useful details either.

I agree with using interview date to resolve the mislabeled village.

@brownsarahm
Copy link
Contributor Author

@ha0ye thanks for your effort in researching this! If some location data is related to survey and some related to the farm that could explain.

I'm a maintainer for this less on now (submitted the issue as someone who had taught it). Could you submit a pull request?

@bencomp
Copy link
Contributor

bencomp commented Jun 27, 2022

I also ask to find the correct name for village 49 only using interview dates. That makes the exercise in episode 3 smaller and quicker, although it removes the need for sorting on multiple columns. Maybe we can add an exercise with sorting on multiple values for finding errors in the ward and district columns?

@bencomp bencomp added the help wanted Looking for Contributors label Jun 27, 2022
@bencomp bencomp changed the title GPS isn't distinct Update 'Find village "49"' exercise to use only interview date, because GPS isn't distinct Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors
Projects
None yet
Development

No branches or pull requests

3 participants