Skip to content

Commit

Permalink
Ignore any fragment identifier in URIs when setting info['fragment'] …
Browse files Browse the repository at this point in the history
…and info['content_location'] (closes #67)
  • Loading branch information
nevali committed Nov 23, 2016
1 parent 38cd34e commit f5a5006
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions libcrawl/fetch.c
Original file line number Diff line number Diff line change
Expand Up @@ -450,8 +450,21 @@ crawl_generate_info_(struct crawl_fetch_data_struct *data, json_t *dict)
curl_easy_getinfo(data->ch, CURLINFO_EFFECTIVE_URL, &ptr);
if(ptr)
{
json_object_set_new(dict, "location", json_string(ptr));
json_object_set_new(dict, "content_location", json_string(ptr));
t = strchr(ptr, '#');
if(t)
{
/* If there's a fragment, store only the characters prior to it
* (because fragments are a facet of user-agent behaviour, they
* don't make any sense in Location or Content-Location headers)
*/
json_object_set_new(dict, "location", json_string(ptr), t - ptr);
json_object_set_new(dict, "content_location", json_string(ptr), t - ptr);
}
else
{
json_object_set_new(dict, "location", json_string(ptr));
json_object_set_new(dict, "content_location", json_string(ptr));
}
}
ptr = NULL;
curl_easy_getinfo(data->ch, CURLINFO_CONTENT_TYPE, &ptr);
Expand Down

0 comments on commit f5a5006

Please sign in to comment.