Initial spice implementation for Expatistan #5

Merged
merged 3 commits into from Oct 28, 2011

Projects

None yet

2 participants

@pardocz

I just needed to create a very straightforward function in spice.js, since the API I created before already has the abstract in a usable form.

In the last message you wrote "We will also want to include (I can update the format README) a Perl file that reads in a txt file (with the tokens) and then a regex that triggers the call to the api." I'm a little lost. Could you provide some clarification and (ideally) an example of the txt file with one or two lines?

@yegg
DuckDuckGo member

Yes, that makes it easier :).

What I mean how this gets triggered. It potentially doesn't need a text file, but it is something we need to think about and you are the best person to determine it of course.

It probably belong in a file called spice.pl or spice.js depending on how it gets triggered. In this case, I think it is spice.pl (for Perl) and will depend on some regex, much like we do for WolframAlpha: http://weinbergalpha.com/

For example if could be,

if ($q_check =~ /cost of living in (.*)/) {

}

That would capture the last block and then we'd send that to the API.

The text file comes in (which we could call spice.txt) if we want to further qualify what pops out of the regex, i.e. make sure it is in a city you have good data for. Now, the API could do that on your side so we may not need to do that. I'm just not sure.

You would read in the file like this:

my %spice = ();
open(IN," while (my $line = ) {
chomp($line);
$spice{lc $line[0]};
}
close(IN);

That would be one city per line and then you could check it via

exists $spice{$1}

If you wanted to store more info in the hash you could make a tab delimited file and split it -- I could show you how.

@pardocz

I don't think that the text file would be necessary, since the API returns information about the quality of the data in the city being queried, and spice.js only populates a result if the quality is OK.

Regarding spice.pl, can it have multiple regex or does it need to be only one? We are dealing with two different kind of queries for this results: "cost of living in London" and "cost of living London vs New York" (among many others). It would simplify things if we could have multiple regex.

Then in my first attempts at making these regex I run with some problems, mainly due to the fact that city names may be single or multiple words (new york, los angeles). In a query like "cost of living kansas city vs new york" or even "cost of living kansas city new york", how do you set the boundaries for each city? These regex, for example, will have problems matching some of those queries properly:

/cost of living (?:comparison|)(?:in\s|between\s|)(.+?) (?:vs\s|and\s|to\s|)(.+)/i
/(.+?) (?:vs\s|and\s|to\s|)(.+) cost of living/i

Any suggestions on how to better approach this problem?

@yegg
DuckDuckGo member

I usually write out examples of what needs to be matched and then build it up until it all works. There can be multiple regex, but I try to consolidate when possible.

However, let me offer another possible alternative that may be better. What we do on this side is just send you any query that has 'cost of living' in it. Then on your side, you run a regex of all your 700 cities (very quick if optimized). If you see one or two cities then you process accordingly.

@pardocz
@yegg
DuckDuckGo member

Awesome, sounds great. The volume will be very low. While we get about 10M queries a month, when you cut it up that small it is still very small in any one bucket. So really prob more like a few a day at this point. But we can add it to the goodies list and what not and then it should grow a bit over time and/or expand the regex over time that we use to trigger it.

@pardocz

I've just pushed into production a new version of the API for spice. Now it will parse a full query and it will try to extract city names from the query. The method assumes that the query is relevant for expatistan (i.e. it does not check again for "cost of living" or anything similar, only for city names). It will work both for the single case (cost of living in London) and the comparison case (cost of living New York vs Moscow).

@yegg
DuckDuckGo member

Awesome. I just integrated and updated the goodies repos, and I hope to do the same for spice this week!

@yegg yegg merged commit d3f57f1 into duckduckgo:master Oct 28, 2011
@yegg
DuckDuckGo member

Hmm, maybe that was pre-mature :).

What is missing is in spice.html to create the callback function that calls nra appropriately. I just wrote up more extensive documentation and there are now some good live examples (in the other directories) to look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment