## Usage Example - Localizer()

In this notebook we will look at how we can use the localizer class based on wikipedia pages with an example sentence. During the process we will review the different functions of the class. 

### Needed packages

"pip install wikipedia, pandas, numpy, sklearn"

### Imports

Imports are being taken care of in the class directly, thus we only need to import localizer.py to get started. 

In [1]:
import localizer

### Localizations

To work with this class you need a list of localizations that will be the places you want to search for using the localizer class. In our case we will work with the sample_localizations givent by the class as default. 

### Localizer class 

We will now instanciate an empty Localizer class to which we will add the default localizations.

In [2]:
L = localizer.Localizer()
L.add_listLocation()
# L.add_listLocation(list) could be used if you want to add your own list of localizations

### get_WikiText()

By using this function we will go and fetch the content of each localization using the [wikipedia API](https://pypi.python.org/pypi/wikipedia). We will remove all punctuation from the the page. (this process will also be applied to the input sentence, more later)

In [3]:
L.get_WikiText()

### vectorizer()

This function will apply a [tf-idf](https://en.wikipedia.org/wiki/Tf–idf) using the sklearn function CountVectorizer and using english stop words as default (we could image a multilingual support, the function already takes the language as variable).
It will return a vector of tf-idf values and a corresponding vector of features (words). These variables will be also stored in the functions directly.

In [4]:
L.vectorizer();

### make_map(top)

This function will get the top n tf-idf scores and return a list of these words in the form of a pandas dataframe.

In [5]:
L.make_map(25)

### search_for(sentence, top=10)

This function will take a sentence and output the top n localizations that have the highest score in relation with that sentence. It first removes all punctuation from the sentence, lowecases everything and then checks if the words are in the top tf-idf words corresponding to the state. 

In [6]:
print(L.search_for("Alabama, my home, my state"))

[('Alabama', 4), ('Iowa state', 3), ('New York state', 3), ('Rhode Island state', 3), ('South Dakota state', 3), ('Wisconsin state', 3), ('Alaska state', 2), ('Arkansas state', 2), ('California state', 2), ('Connecticut state', 2)]


### score(sentence, correct_value, top=10)

This function will return true of false if the predicted localizations (in the top n) contain the correct localizations.

In [7]:
print(L.score('Alabama, my home, my state', 'Alabama'))

True


## Possible applications

The class has been built to have easy to use inputs (strings), one should be carefull with the text encoding in case they want to convert the text from a file into the input. Concerning the analytic usage of the project, the score function can be used top check if the classifier was correct or not. 

An interesting point of this workflow is that it could be used in any language and/or multilanguage applications since the information is retrieved on wikipedia and the tf-idf could be set to remove the stop words of another language. 

### Further improvements

* Multilanguage support
* Non-linear prediction model
* Use the tf-idf value to get the score of each state in the search_for function