# Count the search queries

Generate a count of how many times each search query (set of query words) has been submitted to the site.

## For Example
Users have submitted three search queries to the site - "Indiana", "Indiana" and "Indiana River".  This code will return the following results:
- the query "Indiana" was submitted 2 times
- the query "Indiana River" was submitted 1 times

## Limitations
- because various users have submitted the same search query phrases but used different casing (e.g. "Indiana" vs "indiana"), the code renders all queries to lowercase for ease of comparison - this has the unfortunate side effect of making proper names harder to spot in the output

In [3]:
from csv import reader
from operator import itemgetter

searches_filename = 'searchresults.csv'
delimiter_character = ';'
query_string_position_in_searches_file = 1 # search query strings appear in 2nd column

search_queries = []
unique_search_queries_set = set()
unique_search_queries = []
results = {}
sorted_results = {}

with open(searches_filename, 'r') as s:
    for row in reader(s, delimiter = delimiter_character, skipinitialspace = True):
        search_queries.append(row[query_string_position_in_searches_file])

unique_search_queries_set = {i.lower() for i in search_queries}

unique_search_queries = list(unique_search_queries_set)

for unique_query in unique_search_queries[:99]: # slicing so that testing doesn't take forever, until the double-for-loop gets fixed
    occurrences = 0
    for query in search_queries:
        if bool(unique_query == query.lower()): # lower() to match the unique_queries .lower()'ed earlier
            occurrences = occurrences + 1
    results[unique_query] = occurrences # add to the dictionary
#print(results)

sorted_results = sorted(results.items(), key=itemgetter(1), reverse=True)
#print(sorted_results)

for result in sorted_results:
    print("The query \"" + result[0] + "\" was submitted", result[1], "times")

# TODO: start attacking the "for in for" performance issue - try building a dictionary of {unique, count} as early as possible
# TODO: -- to make this easier, start carving off trivial functions 
# TODO: -- ultimately boil this down to a single for (or while) loop, which will radically improve performance
# TODO: determine why sorted_results prints as a list, not a dictionary
# TODO: remove leading space(s) from search queries e.g. " usgs"
# TODO: remove inner punctuation from search queries such as comma
# TODO: strip out those search queries that simply submitted one or more URLs (prefixed with http://)

The query "middle fork" was submitted 26 times
The query "black river" was submitted 20 times
The query "waiver" was submitted 11 times
The query "blue river colorado" was submitted 6 times
The query "facts" was submitted 6 times
The query "frying pan" was submitted 6 times
The query "nisqually" was submitted 5 times
The query "rafts for sale" was submitted 4 times
The query "water release" was submitted 4 times
The query "ogden" was submitted 4 times
The query "georgia rafting" was submitted 4 times
The query "marshall nc" was submitted 3 times
The query "cfm" was submitted 3 times
The query "northwestmountiansports" was submitted 3 times
The query "sandy beach" was submitted 3 times
The query "floating" was submitted 2 times
The query "st. francis" was submitted 2 times
The query "pacific northwest" was submitted 2 times
The query "rafting in georgia" was submitted 2 times
The query "tarriffville" was submitted 2 times
The query "new stanton " was submitted 2 times
The query "red riv

## References
- http://stackoverflow.com/questions/12897374/get-unique-values-from-a-list-in-python
- http://stackoverflow.com/questions/613183/sort-a-python-dictionary-by-value