Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Data Inferring #30

Open
Varunram opened this issue Sep 19, 2019 · 0 comments
Open

Automatic Data Inferring #30

Varunram opened this issue Sep 19, 2019 · 0 comments

Comments

@Varunram
Copy link
Member

Data inferences

A problem that arose while consuming data from public facing sites was that data was formatted into different names and it was non trivial to identify which names were associated with standard measurable values. This problem would compound when there are multiple providers uploading data and when the platform is not able to figure out where said data belongs. One way to approach this would be to have a standard list and ask uploaders to transfer data from the format they have into the new format that we define. But, as past efforts have shown, this is unsustainable and companies and countries are not incentivised to do this and as a result will not do this.

Assume there are three inputs - Input1, Input2, and Input3 with three fields to report

  • Input1 defines them to be Field1, Field2, Field3
  • Input2 defines them to be F1, F2, F3
  • Input3 defines them to be f1, f2, f3

Assume that the platform expects these fields to be defined as field1, field2, field3. The platform must have a way to infer that the respective fields are mapped to their correct domains by parsing their names. This model could be powered by a simple text parser, a ML based learning algorithm, etc. The idea is that this parsing layer must be a blackbox and everything put into it must come out cleanly formatted.

This blackbox could also potentially be used in other places where we might need inferential analysis (API endpoints, Names, etc). This would be a nice side project that can be easily plugged into the platform and does not depend on the platform to make any changes (one could write a parser that works on 100 examples and then run it on the platform)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant