-
Notifications
You must be signed in to change notification settings - Fork 366
Conversation
Hey @zachthompson thanks a lot for submitted this. We'll give it a look and provide some feedback shortly |
@zachthompson Thanks for this. Everything works! |
What do you think of the design of it? |
I could replace parens with double quotes since it seems like that's how they're using them. Also, now that I see the output, should I remove the redundant title in the abstract so that it starts with "MPG:..."? |
Yeah try double quotes and removing that repeated title. |
Those two changes are made. Let me know if we need any others. |
@zachthompson looks great thanks! I added a few trigger words to it: mpg, fuel economy, gas mileage. Are there any others you can think should trigger it? |
@jdorweiler Looks good. All I can think of is maybe expanding mpg, e.g. miles per gallon, miles/gallon, etc., or "fuel mileage" which seems to be propagated, for example, here - http://www.fuelmileage.com/ BTW, any thoughts on the disambiguation? |
Consider also "(fuel|gas|petrol)efficiency"
For my part, I would love to see them in a Spice-style tile/detail view, but I don't know if Fatheads can do that today. |
This looks super cool! |
@mwmiller I can change the shebang to whatever best works in the ddg environment. I like the tile idea as well. I'm working on another fathead and was wondering the same thing for certain searches it could provide. |
No fathead templates yet but that's something I'll like to have too. For now you can just use |
Well, there are a couple of options.
|
Let's just leave it as-is for now. I kinda like seeing the multiple vehicles as long as it has some logical limit to the number that will show. I'm wondering if we can fix the triggering on this. The titles are so specific that I have to go into the output.txt file to see what to search for. I though What do you think about adding additional redirect entries that have some of the common words stripped off? i.e. awd, 2wd, 4wd, pickup ... I'm sure there are others. That way This page has some info about redirects if you didn't see it already https://duck.co/duckduckhack/fathead_overview#data-file-format. Let me know if you need help. |
If you look at the bottom half of output.txt you should see about 20k redirects. I played around with several ways to remove these types of words to make them easier to trigger. In the specific case you mention, it will work, since Subaru doesn't make 2WD versions. In most, however, removing the AWD/2WD will create an ambiguous redirect (e.g. go to the search http://www.fueleconomy.gov/feg/findacar.shtml, bring up 2009 Ford, and scan the models.) We can do it and just see how many additional, unique redirects it generates. I could also just run through all of the combinations of the words in the model. |
BTW, to be clear, I was only talking about changing the display for any redirects that referenced multiple vehicles. It wouldn't change the display of multiple configurations, which is what we've been looking at. Multiple vehicles would just be vertically stacked, rather than horizontally as was suggested. |
131k additional redirects with the update. Any variation that's not completely ambiguous should work. |
ah thanks. I didn't notice the redirects at the bottom. I'll check out the new ones and see how it works now. |
The only item not part of the variations is the year. It has to be first. I could change it to allow the year to be anywhere as well. |
Actually, that last statement is incorrect. Both the year and make are in fixed positions. If we allow for these two terms to be in any position, require that a year be present, and allow for any number of terms, as long as it's unique, the redirects balloon to over 5.2M. Not allowing the year or make to appear between two terms of the model reduces the output to a much more manageable 1.25M redirects or so. |
…terator less restrictive.
* Significant memory (~1.4G on FreeBSD amd64 Perl 5.16.3) and run time (~3 minutes) reqs * Clarified some variable usage * Some memory tweaks
…to reduce duplication
@jdorweiler Updated. Only 75 vehicles where the volumes come into play. |
@zachthompson great! I updated with your new changes. |
|
||
use DDG::Fathead; | ||
|
||
primary_example_queries '2014 Honday Fit fuel economy', '2014 Prius mpg'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honday -> honda
2014 prius mpg -> 2014 prius v mpg (I guess that's the model name now?)
Looks great! I just want to see if @chrismorast has any design ideas |
* Make transmission in the configuration optional * Fix some typos
@jdorweiler A couple of additional tweaks for electric vehicles. |
fyi, records for the upcoming year start showing up in Spring and are continuing to be added even now. According to my contact 15 new records arrived just this afternoon. Might need to run this somewhat regularly. |
@zachthompson oh nice! I didn't even think of electric cars. The names for the tesla cars are kinda long Thanks for looking into when they do updates. I'll make a note to update this a few times a year. New changes are up if you want to try and trigger some of the electric cars. |
@jdorweiler yeah, unfortunately some models have extensive qualifiers in parens like that. Looks good. I tried the 2012 nissan leaf and a few without transmissions specified, e.g. 2001 th!nk and 2001 hyper-mini. For models with a single configurations like these, we might collapse the summary and configuration into one line. Since there's no range it's sort of redundant. However, I could also see leaving it just to be consistent. |
@zachthompson Could you limit the number of variations? When I search for |
@jdorweiler With that particular example I'm not sure I see a good way to reduce the redirects any further. The redirects are generated as follows: Year: 2013 The terms of model are 1) permuted and 2) reduced in number, requiring at least one term. Each model is then inserted into each permutation of year/make/model. It only requires two of the latter but forces the year to be one of them (eliminating rare cases where make/model, or even just model, would work.) So it allows for reasonable variety while also requiring minimal terms, e.g. "2013 60" will work. Ways I can think of to reduce redirects:
Let me know if you have other ideas. None of the above seem like great tradeoffs with respect to flexibility or effort to me. |
@zachthompson Makes sense. I agree though there doesn't seem like an easy way to fix that. I think this is good to go though. I'm going to post it up internally for testing. |
@jdorweiler A variation of #3 above would be to prevent "and", "of", "the", etc., from appearing first or last in the model permutation. However, I generated all of the unique first and last terms and it would only save us ~15k redirects, mostly from "and", "inc", and "incl.". |
Looks good! Although, I tried some other vehicles but couldn't get it to trigger. 2007 Nissan Frontier |
@zachthompson After getting some feedback on this I think we're going to have to cut back on the number of redirects. All of the current fatheads have less than 1M entries combined. I'm not sure the best way to fix it but a few idea that could work:
I think a better option would be to try this as a longtail and see how that works. A longtail does relevancy searching on the title so it's not as strict as key:value searching for a fathead. I think either option could work so don't let me discourage you from trying the others. Let me know if you want to try the longtail and I can give you a quick summary. It's been a while since anyone made one so our docs need an update (https://duck.co/duckduckhack/longtail_overview). |
@chrismorast Both of those have multiple models (2007 Nissan Frontier 2WD, 2007 Nissan Frontier V6 4WD, 2007 Nissan Frontier V6 2WD, etc.) I suppose one way to reduce redirects would be to try and determine the most generic model name and group the specific models under it like this. From some of the model names though it might be harder to figure out than it appears. @jdorweiler ok. So it sounds like fatheads should only be used for keys with a couple of terms. Most of them seem to be essentially definitions that don't really require extensive redirects. I'll check out the longtail when I have a chance. I'm guessing some magic is performed on the title field instead of the redirects. Shall I close this? |
@zachthompson Let's leave this open. What about reducing it down to [year][make][model] by stripping off the common words? |
@jdorweiler It already is year/make/model. In the case of the Frontier above, that's what the EPA considers a model. All of the descriptors like "2WD", "AWD", "V6", etc., are what distinguish them with respect to fuel economy. Though we visually comprehend that "Nissan Frontier X" all refer to the same basic model, the data aren't that way. So if we were to remove the common words, do you mean in the article or just for redirects? If we attempt that in the article, there are a few challenges. First, how to do it in a general way? Even if the model were split apart and rebuilt term by term, how do we know when the base model has been found? For example, if there were a "Town and Country *" and a "Town Car *", are these both model "Town" or completely separate? There are a lot of other models with numbers and letters where it just isn't obvious where the base model stops. Just removing common terms from articles leads to duplicates which have to be somehow resolved. In the case of the Frontier, for example, we would assume that they all refer to the same model. The removed terms would likely have to be relocated to the configurations below to distinguish them. We can't do this only for redirects since they would become ambiguous. If you or anyone else has experience with this type of thing or additional thoughts, I'm game for giving it a go. The longtail does sound like a better solution and addresses my earlier concern about generating redirects in a uniform way. However, I'm not sure how extensive the relevancy search is on the title(s). I'm assuming it does terms in any order, minimum number of terms to identify a single item, etc.? Or can longtails display multiple items if a single item isn't found? |
Just to chime in maybe we should reconsider the Spice route? If we can build a hash that maps the car names to ID's or we can can cleverly parse a query to get the make model year we can form an API request, I think? |
@moollaza I was looking at this a bit. The year, make, model API is pretty inflexible. You have to have the exact model for it to generate a hit as far as I can tell, e.g. Town and Country/Voyager/Grand Voy. 2WD. You might be able to utilize the year/make API, if they can be identified, and try to match the rest against one of the models returned. That seems pretty involved to do on the fly. The hash idea with names to id mappings sounds interesting. Were you thinking something like all of the articles + redirects (~1.25M) in the current output mapped to specific IDs? That probably wouldn't be too intense. However, it should be noted that each configuration within each article has a separate ID. For example, the 2005 Jetta above maps to 11 IDs, not one. In order to derive a single ID for the API we would need to generate the unique redirects for each configuration! |
@zachthompson Good point. That really makes me think that this is better for a longtail. With the longtail things like
You can even search for If you want to try that out the output for a single entry would look like this:
You can just repeat the |
@moollaza Yeah, that would be awesome if it tiled on searches like that. I'll start converting it to a longtail and we can see how it compares. Should make the parsing much easier. |
Great, thanks for trying this out. 👍 |
@jdorweiler , as far as the multiple models go, is it possible to add a dropdown (similar to the nutrition one) for disambiguation? |
@chrismorast no but @zachthompson resubmitted this as a longtail which will show multiple models in a tile view. duckduckgo/zeroclickinfo-longtail#9 |
@zachthompson @jdorweiler do we still need this PR? Or are we indefinitely going with the Longtail? Just want to make sure we don't have any lingering PR's that need our attention :) |
@moollaza I think the fathead route has limitations that can't be overcome for the data. I'm ok with closing it unless @jdorweiler has other reasons not to. |
@zachthompson @moollaza Thanks. The longtail solves all the troubles we had here so let's go with that one. |
What does your Instant Answer do?
Fathead for EPA Fuel Economy data. Downloads source, parses it, and creates the standard output.txt file for fatheads.
What problem does your Instant Answer solve (Why is it better than organic links)?
-Displays city/hwy fuel economy directly.
-Gives ranges for city/hwy for models with multiple vehicle configurations
-Lists individual model configuration fuel economies.
What is the data source for your Instant Answer? (Provide a link if possible)
http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip
Why did you choose this data source?
Suggested in idea request
Are there any other alternative (better) data sources?
Not that I found with comprehensive fuel economy data.
What are some example queries that trigger this Instant Answer?
2011 Honda Fit fuel economy
2011 Honda Fit mpg (this could optional)
Which communities will this Instant Answer be especially useful for? (gamers, book lovers, etc)
Anyone researching mileage for a vehicle
Is this Instant Answer connected to a DuckDuckHack Instant Answer idea?
Yes - https://duck.co/ideas/idea/4514/vehicle-fuel-efficiency
Which existing Instant Answers will this one supercede/overlap with?
None that I know of.
Are you having any problems? Do you need our help with anything?
How to best handle disambiguation pages and/or display is TBD. For example, a search for "1993 Colt fuel economy" could reference 1993 Dodge Colt or 1993 Plymouth Colt. Should the answer list links to those models or just list both models so no link is necessary? For now, ambiguous redirects like this are simply deleted.
Where did you hear about DuckDuckHack? (For first time contributors)
jobs-subscribe@perl.org I believe.
What does the Instant Answer look like? (Provide a screenshot for new or updated Instant Answers)
http://withoutopus.org/fueleconomy.htm
Checklist
Please place an 'X' where appropriate.