Signals/Noise Issue #74

sschneiderman · 2013-04-24T15:42:42Z

Andrew, We previously discussed methods for promoting or demoting source documents based on analyst judgment. This was an interest of both Aveshka and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

astrite · 2013-04-24T16:24:00Z

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst flags
a document relevant to a case, it can be moved into the supporting evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman notifications@github.comwrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: astrite@ikanow.com
Mobile: 301.514.1384

sschneiderman · 2013-04-24T16:30:34Z

Can you provide training on Thursday on how Tag Weighting would be applied to reduce false positives on similar names (John Smith the target versus John Smith the innocent bystander)? I understand the principle but not the implementation.
Thanks.

From: Andrew [mailto:notifications@github.com]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst flags
a document relevant to a case, it can be moved into the supporting evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <notifications@github.com mailto:notifications@github.com>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: astrite@ikanow.commailto:astrite@ikanow.com
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945286.

astrite · 2013-04-24T16:48:36Z

That's a slightly different issue. Tag weighting is appropriate for
inflating the score of a particular kind of document (eg all those from CNN
or Databot) which will ensure that certain kinds of documents show up
before others.

"False positives" like the one you describe are better solved using
alternative query strategies and query qualifiers, and to a lesser extent
aliasing. Selecting documents that match the correct John Smith and
finding associated entities will give you additional query parameters.
These terms, if included in the query for John Smith, should push the
relevant documents up to the top.

eg John Smith AND ( Company A OR Company B OR Associate A OR Associate B)

Alternately, if you have a scenario where you have John Smith (incorrect
person) and John B. Smith (correct person), you can either discard one of
the entities so it not longer displays or run queries like:

eg (John B. Smith OR "John Smith") NOT John Smith.

A certain amount experimentation is probably required to develop an
effective query.

As an aside, John Smith (the accountant) vs. John Smith (the priest) isn't
a true false positive. In both cases, a query for John Smith should bring
back matches with "John Smith" (of whatever entity type you define) back.
A false positive would be if documents were getting labeled with John
Smith when they are not actually about that entity. This is more the
situation where an advertisement might flag a document to be about a
company, but it is not actually in the text.

On Wed, Apr 24, 2013 at 12:30 PM, sschneiderman notifications@github.comwrote:

Can you provide training on Thursday on how Tag Weighting would be applied
to reduce false positives on similar names (John Smith the target versus
John Smith the innocent bystander)? I understand the principle but not the
implementation.
Thanks.

From: Andrew [mailto:notifications@github.com]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor
you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd
put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst
flags
a document relevant to a case, it can be moved into the supporting
evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains
granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <notifications@github.com
mailto:notifications@github.com>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both
Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74>
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: astrite@ikanow.commailto:astrite@ikanow.com
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74#issuecomment-16945286>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945722
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: astrite@ikanow.com
Mobile: 301.514.1384

sschneiderman · 2013-04-24T16:52:49Z

Understood. Lets discuss again Thursday.

From: Andrew [mailto:notifications@github.com]
Sent: Wednesday, April 24, 2013 12:49 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's a slightly different issue. Tag weighting is appropriate for
inflating the score of a particular kind of document (eg all those from CNN
or Databot) which will ensure that certain kinds of documents show up
before others.

"False positives" like the one you describe are better solved using
alternative query strategies and query qualifiers, and to a lesser extent
aliasing. Selecting documents that match the correct John Smith and
finding associated entities will give you additional query parameters.
These terms, if included in the query for John Smith, should push the
relevant documents up to the top.

eg John Smith AND ( Company A OR Company B OR Associate A OR Associate B)

Alternately, if you have a scenario where you have John Smith (incorrect
person) and John B. Smith (correct person), you can either discard one of
the entities so it not longer displays or run queries like:

eg (John B. Smith OR "John Smith") NOT John Smith.

A certain amount experimentation is probably required to develop an
effective query.

As an aside, John Smith (the accountant) vs. John Smith (the priest) isn't
a true false positive. In both cases, a query for John Smith should bring
back matches with "John Smith" (of whatever entity type you define) back.
A false positive would be if documents were getting labeled with John
Smith when they are not actually about that entity. This is more the
situation where an advertisement might flag a document to be about a
company, but it is not actually in the text.

On Wed, Apr 24, 2013 at 12:30 PM, sschneiderman <notifications@github.com mailto:notifications@github.com>wrote:

Can you provide training on Thursday on how Tag Weighting would be applied
to reduce false positives on similar names (John Smith the target versus
John Smith the innocent bystander)? I understand the principle but not the
implementation.
Thanks.

From: Andrew [mailto:notifications@github.com]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor
you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd
put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst
flags
a document relevant to a case, it can be moved into the supporting
evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains
granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <notifications@github.com
mailto:notifications@github.com%20%0b> mailto:notifications@github.com>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both
Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74>
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: astrite@ikanow.commailto:astrite@ikanow.com mailto:astrite@ikanow.com%3cmailto:astrite@ikanow.com
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74#issuecomment-16945286>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945722
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: astrite@ikanow.commailto:astrite@ikanow.com
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16946849.

ghost assigned astrite Apr 24, 2013

sschneiderman unassigned astrite Mar 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signals/Noise Issue #74

Signals/Noise Issue #74

sschneiderman commented Apr 24, 2013

astrite commented Apr 24, 2013

sschneiderman commented Apr 24, 2013

astrite commented Apr 24, 2013

sschneiderman commented Apr 24, 2013

Signals/Noise Issue #74

Signals/Noise Issue #74

Comments

sschneiderman commented Apr 24, 2013

astrite commented Apr 24, 2013

sschneiderman commented Apr 24, 2013

astrite commented Apr 24, 2013

sschneiderman commented Apr 24, 2013