Skip to content

Commit

Permalink
Added text to enrichment BP to address enrichments other than machine…
Browse files Browse the repository at this point in the history
… learning. Simplified the tests for the enrichment BP.
  • Loading branch information
agreiner committed Jun 16, 2016
1 parent 46f549a commit ce1b1a8
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions bp.html
Expand Up @@ -3875,21 +3875,22 @@ <h3>Data Enrichment</h3>
<!-- begin of BP Enrich data -->
<div class="practice">
<p><span id="EnrichData" class="practicelab">Enrich data by generating new data</span></p>
<p class="practicedesc">Enrich your data by generating new data from the raw data when doing so will enhance its value.</p>
<p class="practicedesc">Enrich your data by generating new data when doing so will enhance its value.</p>
<section class="axioms">
<h4 class="subhead">Why</h4>
<p>Enrichment can greatly enhance processability, particularly for unstructured data. Under some circumstances, missing values can be filled in, and new attributes and measures can be added. Publishing more complete datasets can enhance trust, if done properly and ethically. Deriving additional values that are of general utility saves users time and encourages more kinds of reuse. There are many intelligent techniques that can be used to enrich data, making the dataset an even more valuable asset.</p>
<p>Enrichment can greatly enhance processability, particularly for unstructured data. Under some circumstances, missing values can be filled in, and new attributes and measures can be added from the existing raw data. Datasets can also be enriched by gathering additional results in the same fashion as the original data, or by combining the original data with other datasets. Publishing more complete datasets can enhance trust, if done properly and ethically. Deriving additional values that are of general utility saves users time and encourages more kinds of reuse. There are many intelligent techniques that can be used to enrich data, making the dataset an even more valuable asset.</p>
</section>
<section class="outcome">
<h4 class="subhead">Intended Outcome</h4>
<p>Datasets with missing values will be enhanced by filling those values. Structure will be conferred and utility enhanced if relevant measures or attributes are added, but only if the addition does not distort analytical results, significance, or statistical power.</p>
<p>Datasets with missing values will be enhanced by filling in those values. Structure will be conferred and utility enhanced if relevant measures or attributes are added, but only if the addition does not distort analytical results, significance, or statistical power.</p>
</section>
<section class="how">
<h4 class="subhead">Possible Approaches to Implementation</h4>
<p>Techniques for data enrichment are complex and go well beyond the scope of this document, which can only highlight the possibilities.</p>
<p>Machine learning can readily be applied to the enrichment of data. Methods include those focused on data categorization, disambiguation, entity recognition, sentiment analysis and topification, among others. New data values may be derived as simply as performing a mathematical calculation across existing columns. Other examples include visual inspection to identify features in spatial data and cross-reference to external databases for demographic information.</p>
<p>Values generated by inference-based techniques should be labeled as such, and it should be possible to retrieve any original values replaced by enrichment.</p>
<p>Whenever licensing permits, the code used to enrich the data should be made available along with the dataset. Sharing such code is particularly important for scientific data.</p>
<p>Prioritization of enrichment activities should be based on value to the data consumer as well as the effort required.</p>

<aside class="example"><ol>
<li>The MyCity transport agency has street addresses for each of its transit stops. It wants to make it easier for consumers of its data to combine the data with maps, so it adds latitude and longitude information for each stop by utilizing a geographic database.</li>
Expand All @@ -3898,7 +3899,7 @@ <h4 class="subhead">Possible Approaches to Implementation</h4>
</section>
<section class="test">
<h4 class="subhead">How to Test</h4>
<p>Look for missing values in the dataset or additional fields likely to be needed by others. Check that any data added by inferential enrichment techniques is identified as such and that any replaced data is still available. Check that code used to enrich the data is available. Check whether the metadata being extracted is in accordance with human knowledge and readable by humans.</p>
<p>Verify that there are no missing values in the dataset, or additional fields likely to be needed by others, that could readily be provided. Check that any data added by inferential enrichment techniques is identified as such and that any replaced data is still available.</p>
</section>
<section class="ucr">
<h4 class="subhead">Evidence</h4>
Expand Down

0 comments on commit ce1b1a8

Please sign in to comment.