Skip to content

Commit

Permalink
Added a sentence about demonstrating consumer demand to show increase…
Browse files Browse the repository at this point in the history
…d value of enrichments.
  • Loading branch information
agreiner committed Jun 16, 2016
1 parent ce1b1a8 commit 540ed3b
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions bp.html
Expand Up @@ -3887,11 +3887,10 @@ <h4 class="subhead">Intended Outcome</h4>
<section class="how">
<h4 class="subhead">Possible Approaches to Implementation</h4>
<p>Techniques for data enrichment are complex and go well beyond the scope of this document, which can only highlight the possibilities.</p>
<p>Machine learning can readily be applied to the enrichment of data. Methods include those focused on data categorization, disambiguation, entity recognition, sentiment analysis and topification, among others. New data values may be derived as simply as performing a mathematical calculation across existing columns. Other examples include visual inspection to identify features in spatial data and cross-reference to external databases for demographic information.</p>
<p>Machine learning can readily be applied to the enrichment of data. Methods include those focused on data categorization, disambiguation, entity recognition, sentiment analysis and topification, among others. New data values may be derived as simply as performing a mathematical calculation across existing columns. Other examples include visual inspection to identify features in spatial data and cross-reference to external databases for demographic information. Lastly, generation of new data may be demand-driven, where missing values are calculated or otherwise determined by direct means.</p>
<p>Values generated by inference-based techniques should be labeled as such, and it should be possible to retrieve any original values replaced by enrichment.</p>
<p>Whenever licensing permits, the code used to enrich the data should be made available along with the dataset. Sharing such code is particularly important for scientific data.</p>
<p>Prioritization of enrichment activities should be based on value to the data consumer as well as the effort required.</p>

<p>Prioritization of enrichment activities should be based on value to the data consumer as well as the effort required. Value to the consumer can be gauged by measurement of demand (e.g., through surveys or tracking direct requests). Documenting how you measure demand can make the increased value demonstrable.</p>
<aside class="example"><ol>
<li>The MyCity transport agency has street addresses for each of its transit stops. It wants to make it easier for consumers of its data to combine the data with maps, so it adds latitude and longitude information for each stop by utilizing a geographic database.</li>
<li>The transit agency has a large collection of email correspondence from transit riders. Some of the correspondence is complimentary, some emails are complaints, and some are requests for information. The agency conducts a combination of sentiment analysis and categorization to extract metadata for each of the messages, such as transit mode, route number, and rider positivity, to create a semi-structured dataset.</li></ol>
Expand Down

0 comments on commit 540ed3b

Please sign in to comment.