# Index
1. Import Section
2. EDA
3. Baseline Model
4. Model Enhancements
    <br>4.1 dealing with abbreviation issue:
    <br>4.2 dealing with contractions issue:
    <br>4.3 dealing with URL's/HTML tags issue:
    <br>4.4 dealing with Emojis issue:
    <br>4.5 dealing with numbers issue:
    <br>4.6 dealing with punctuation issue:
    <br>4.7 removing stopwords:
    <br>4.8 lemmatizing/stemming:
5. Result
6. Other tasks
7. Future Enhancements
8. References

## 1. Import section

In [1140]:
#!pip install inflect
#!pip install num2word
#!pip install num2words
#!pip install contractions
#!pip install emoji

In [1141]:
import pandas as pd
import numpy as np
from Mo import moFunctions
from sklearn.feature_extraction.text import CountVectorizer,HashingVectorizer,TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, f1_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import RidgeClassifier
import re,csv
import string
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import word_tokenize
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from spellchecker import SpellChecker
from sklearn.naive_bayes import MultinomialNB
import inflect, num2word,num2words,contractions, emoji 
from nltk.tokenize import word_tokenize


## 2. EDA

In [1142]:
trainDF = pd.read_csv('train.csv')
testDF = pd.read_csv('test.csv')
submitDF = pd.read_csv('sample_submission.csv')

In [1143]:
trainDF.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [1144]:
trainDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7613 entries, 0 to 7612
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        7613 non-null   int64 
 1   keyword   7552 non-null   object
 2   location  5080 non-null   object
 3   text      7613 non-null   object
 4   target    7613 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 297.5+ KB


In [1145]:
testDF.head()

Unnamed: 0,id,keyword,location,text
0,0,,,Just happened a terrible car crash
1,2,,,"Heard about #earthquake is different cities, s..."
2,3,,,"there is a forest fire at spot pond, geese are..."
3,9,,,Apocalypse lighting. #Spokane #wildfires
4,11,,,Typhoon Soudelor kills 28 in China and Taiwan


In [1146]:
testDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3263 entries, 0 to 3262
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        3263 non-null   int64 
 1   keyword   3237 non-null   object
 2   location  2158 non-null   object
 3   text      3263 non-null   object
dtypes: int64(1), object(3)
memory usage: 102.1+ KB


In [1147]:
submitDF.head()

Unnamed: 0,id,target
0,0,0
1,2,0
2,3,0
3,9,0
4,11,0


In [1148]:
trainDF.target.value_counts()

0    4342
1    3271
Name: target, dtype: int64

#### Checking null values

In [1149]:
moFunctions.checkNulls(trainDF)

keyword       61
location    2533
dtype: int64

In [1150]:
moFunctions.checkNulls(testDF)

keyword       26
location    1105
dtype: int64

In [1151]:
moFunctions.checkNulls(submitDF)

Series([], dtype: int64)

In [1152]:
trainDF.keyword.value_counts()

fatalities               45
armageddon               42
deluge                   42
body%20bags              41
harm                     41
                         ..
forest%20fire            19
epicentre                12
threat                   11
inundation               10
radiation%20emergency     9
Name: keyword, Length: 221, dtype: int64

In [1153]:
trainDF.location.value_counts()

USA                     104
New York                 71
United States            50
London                   45
Canada                   29
                       ... 
Tama, Iowa                1
Lake Highlands            1
Minneapolis/St. Paul      1
Metro Manila              1
Halfrica                  1
Name: location, Length: 3341, dtype: int64

In [1154]:
trainDF["text"].values[2]

"All residents asked to 'shelter in place' are being notified by officers. No other evacuation or shelter in place orders are expected"

In [1155]:
trainDF[trainDF['id'] == 48]

Unnamed: 0,id,keyword,location,text,target
31,48,ablaze,Birmingham,@bbcmtd Wholesale Markets ablaze http://t.co/l...,1


In [1156]:
testDF['keyword'] = testDF['keyword'].replace(np.nan, '')
trainDF['keyword'] = trainDF['keyword'].replace(np.nan, '')
testDF['location'] = testDF['location'].replace(np.nan, '')
trainDF['location'] = trainDF['location'].replace(np.nan, '')

## 3. Baseline Model
Since we are going to predict that this tweet is disaster or not then we need first to create the sparse matrix that is needed for the classification model

we'll use a CountVectorizer as a start

In [1114]:
countVec = CountVectorizer()
trainSparseMat = countVec.fit_transform(trainDF["text"])

In [1115]:
trainSparseMat

<7613x21637 sparse matrix of type '<class 'numpy.int64'>'
	with 111497 stored elements in Compressed Sparse Row format>

In [1116]:
print(trainSparseMat[0:1].todense())

[[0 0 0 ... 0 0 0]]


Similarily we create the testSparseMat

In [1117]:
testSparseMat = countVec.transform(testDF["text"])

In [1118]:
print(trainSparseMat.todense().shape)
print(trainSparseMat.todense())

(7613, 21637)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


In [1119]:
print(testSparseMat.todense().shape)
print(testSparseMat.todense())

(3263, 21637)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


then using logistic regression i'm going to use it as my base model that i'll try to enhance afterwards

lets first form our train/test X&y


In [1157]:
countVec = CountVectorizer()
trainSparseMat = countVec.fit_transform(trainDF["text"])
testSparseMat = countVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
ytest = submitDF['target']
logreg = LogisticRegression()
logreg.fit(Xtrain, ytrain)
y_pred = logreg.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    logreg.score(Xtest, ytest)))
##----
predictions = logreg.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

Accuracy of logistic regression classifier on test set: 0.63
              precision    recall  f1-score   support

           0       1.00      0.63      0.78      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.63      3263
   macro avg       0.50      0.32      0.39      3263
weighted avg       1.00      0.63      0.78      3263

Average f1_score: 0.78


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))


then our base accuracy is **63%** and f1 score is **0.78** and let's see how we can improve this

## 4. Model Enhancements 
**(reload dataframes again ie. run step 1 then 3)**

#### 4.1 Appreviations issue

In [1088]:
def translator(user_string):
    user_string = user_string.split(" ")
    j = 0
    for _str in user_string:
        # File path which consists of Abbreviations.
        fileName = "slang.txt"

        # File Access mode [Read Mode]
        with open(fileName, "r") as myCSVfile:
            # Reading file as CSV with delimiter as "=", so that abbreviation are stored in row[0] and phrases in row[1]
            dataFromFile = csv.reader(myCSVfile, delimiter="=")
            # Removing Special Characters.
            _str = re.sub('[^a-zA-Z0-9]+', '', _str)
            for row in dataFromFile:
                # Check if selected word matches short forms[LHS] in text file.
                if _str.upper() == row[0]:
                    # If match found replace it with its appropriate phrase in text file.
                    user_string[j] = row[1]
            myCSVfile.close()
        j = j + 1
    return ' '.join(user_string)


In [1089]:
trainDF['text'] = trainDF['text'].apply(lambda x:  translator(x)  ) 
testDF['text'] = testDF['text'].apply(lambda x:  translator(x)  ) 

## Cleaning Text

In [1090]:
def cleanText(text):
    #emojies to text
    #text = emoji.demojize(text)
    #text = ' '.join(word_tokenize(text))
    # fixing contraction issue ie. i'm, don't,..
    text = contractions.fix(text)

    # removingURL's
    text = re.sub(r'http\S+', '', text)
    # removing mentioned
    #text = re.sub("@[A-Za-z0-9]+","",text)
    # removingHtmlTags
    html=re.compile(r'<.*?>')
    # removingweird characters
    text = html.sub(r'',text)
    text = ''.join([x for x in text if x in string.printable])
    # removingEmojies
#     emojies = re.compile("["
#                            u"\U0001F600-\U0001F64F"
#                            u"\U0001F300-\U0001F5FF"
#                            u"\U0001F680-\U0001F6FF"
#                            u"\U0001F1E0-\U0001F1FF"
#                            u"\U00002702-\U000027B0"
#                            u"\U000024C2-\U0001F251"
#                            "]+", flags=re.UNICODE)
#     text = emojies.sub(r'', text)
    # remove numbers
    # text = "".join([i for i in text if not i.isdigit()])
    # trying to replace numbers
    new_words = []
    for word in text:
        if word.isdigit():
            new_word = num2words.num2words(word)
            new_words.append(new_word)
        else:
            new_words.append(word)
    text = new_words
     # remove punctuation
    text = "".join([i for i in text if i not in string.punctuation])
    # remove stopwords
    #text = "".join([i for i in text if i not in stopwords.words('english')])
    #text_tokens = word_tokenize(text)
    #text = (" ").join([word for word in text_tokens if not word in stopwords.words()])
    # Lemmatize
    #lemmatizer = WordNetLemmatizer()
    #text = "".join([lemmatizer.lemmatize(i) for i in text])
    porter = PorterStemmer()
    text = "".join([porter.stem(i) for i in text])
    return text

In [1091]:
trainDF['text'] = trainDF['text'].map( lambda text : cleanText(text))
testDF['text'] = testDF['text'].map( lambda text : cleanText(text))
#trainDF['text'] = trainDF['text'].map(lambda x : ' '.join(word_tokenize(x)))
#testDF['text'] = testDF['text'].map(lambda x : ' '.join(word_tokenize(x)))

In [1092]:
# trying to see lowercasing and \n replacement
trainDF['text']=trainDF['text'].str.lower()
testDF['text']=testDF['text'].str.lower()
trainDF['text']=trainDF['text'].str.replace("\n"," ")
testDF['text']=testDF['text'].str.replace("\n"," ")

In [1093]:
# trainDF[trainDF['id'] == 48]

In [1094]:
# for index, row in testDF.iterrows():
#     print(row['text'])

In [1095]:
# best 0.80879

countVec = CountVectorizer(ngram_range=(1,2), binary=True, stop_words='english')
trainSparseMat = countVec.fit_transform(trainDF["text"])
testSparseMat = countVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
ytest = submitDF['target']
logreg = LogisticRegression()
logreg.fit(Xtrain, ytrain)
y_pred = logreg.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    logreg.score(Xtest, ytest)))
##----
predictions = logreg.predict(Xtest)
print(classification_report(ytest, predictions))

print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))


Accuracy of logistic regression classifier on test set: 0.67
              precision    recall  f1-score   support

           0       1.00      0.67      0.80      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.67      3263
   macro avg       0.50      0.33      0.40      3263
weighted avg       1.00      0.67      0.80      3263

Average f1_score: 0.80


  _warn_prf(average, modifier, msg_start, len(result))


In [1096]:
submitDF['target'] = predictions
submitDF.to_csv('my_submission24.csv',index=False)

Best f1-score  so far (**0.80879**).
<img src="bestscore3.png">

# 5. Other tasks

In [1097]:
# #Trying other ML/Vectorizer


# words = countVec.get_feature_names()
# words

In [933]:
# def convert_emojis(text):
#     for emot in UNICODE_EMO:
#         text = re.sub(r'('+emot+')', "_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()), text)
#     return text

In [934]:
# text = convert_emojis(testDF['text'][80] ) 
# text

In [894]:
# now check if we can unify the countries as much as we can

unify_country = { '  Glasgow ' : 'GBR'
, '  Melbourne, Australia' : 'AUS'
, ' 616 ¬â√õ¬¢ Kentwood , MI ' : 'USA'
, ' Alberta' : 'CAN'
, ' Eugene, Oregon' : 'USA'
, ' Indiana' : 'USA'
, ' Little Rock, AR' : 'USA'
, ' Miami Beach' : 'USA'
, ' Nevada Carson City,Freeman St' : 'USA'
, ' Neverland ' : 'USA'
, ' New Delhi ' : 'IND'
, ' New England' : 'UK'
, ' Nxgerxa' : 'USA'
, ' Quantico Marine Base, VA.' : 'USA'
, ' Queensland, Australia' : 'AUS'
, ' Tropical SE FLorida' : 'USA'
, '? miranda ? 521 mi' : 'USA'
, '???????, Texas' : 'USA'
, '?????????????, Thailand ' : 'THA'
, '\'Merica' : 'USA'
, '\'SAN ANTONIOOOOO\'' : 'USA'
, '(Spain)' : 'ESP'
, '#EngleWood CHICAGO ' : 'USA'
, '#FLIGHTCITY UK  ' : 'UK'
, '#goingdownthetoilet Illinois' : 'USA'
, '#iminchina' : 'CHN'
, '#NewcastleuponTyne #UK' : 'UK'
, '#SOUTHAMPTON ENGLAND' : 'UK'
, '#WashingtonState #Seattle' : 'USA'
, '|| c h i c a g o ||' : 'USA'
, '√•_√•_Los Mina City¬â√£¬¢' : 'USA'
, '√•¬°√•¬°Midwest ¬â√õ¬¢¬â√õ¬¢' : 'USA'
, '√å√∏√•√Ä√•_T: 40.736324,-73.990062' : 'USA'
, '√å√èT: -26.695807,27.837865' : 'ZAF'
, '√å√èT: 1.50225,103.742992' : 'MYS'
, '√å√èT: 10.614817868480726,12.195582811791382' : 'NGA'
, '√å√èT: 19.123127,72.825133' : 'IND'
, '√å√èT: 27.9136024,-81.6078532' : 'USA'
, '√å√èT: 30.307558,-81.403118' : 'USA'
, '√å√èT: 33.209923,-87.545328' : 'USA'
, '√å√èT: 35.223347,-80.827834' : 'USA'
, '√å√èT: 36.142163,-95.979189' : 'USA'
, '√å√èT: 39.982988,-75.261624' : 'USA'
, '√å√èT: 40.562796,-75.488849' : 'USA'
, '√å√èT: 40.707762,-74.014213' : 'USA'
, '√å√èT: 41.252426,-96.072013' : 'USA'
, '√å√èT: 42.910975,-78.865828' : 'USA'
, '√å√èT: 43.631838,-79.55807' : 'CAN'
, '√å√èT: 6.4682,3.18287' : 'NGA'
, '√å√èT: 6.488400524109015,3.352798039832285' : 'NGA'
, '11th dimension, los angeles' : 'USA'
, '1313 W.Patrick St, Frederick' : 'USA'
, '140920-21 & 150718-19 BEIJING' : 'CHN'
, '1648 Queen St. West, Toronto.' : 'CAN'
, '19.600858, -99.047821' : 'MEX'
, '21.462446,-158.022017' : 'USA'
, '261 5th Avenue New York, NY ' : 'USA'
, '2B Hindhede Rd, Singapore' : 'SGP'
, '412 NW 5th Ave. Portland OR' : 'USA'
, '48.870833,2.399227' : 'FRA'
, '50% Queanbeyan - 50% Sydney' : 'AUS'
, '518 √•√° NY' : 'USA'
, '52.479722, 62.184971' : 'KAZ'
, '570 Vanderbilt; Brooklyn, NY' : 'USA'
, '828/704(Soufside)/while looking goofy in NJ' : 'USA'
, 'Ab, Canada' : 'CAN'
, 'Aberdeenshire' : 'UK'
, 'Absecon, NJ' : 'USA'
, 'Abuja' : 'NGA'
, 'Abuja, Nigeria' : 'NGA'
, 'Abuja,Nigeria' : 'NGA'
, 'ACCRA GHANA' : 'GHA'
, 'Accra,Ghana' : 'GHA'
, 'Adelaide, Australia' : 'AUS'
, 'Adelaide, South Australia' : 'AUS'
, 'Afghanistan' : 'AFG'
, 'Afghanistan, USA' : 'USA'
, 'Aix-en-Provence, France' : 'FRA'
, 'AKRON OHIO USA' : 'USA'
, 'Alabama' : 'USA'
, 'Alabama, USA' : 'USA'
, 'Alameda and Pleasanton, CA' : 'USA'
, 'Alameda, CA' : 'USA'
, 'Alaska' : 'USA'
, 'Alaska, USA' : 'USA'
, 'Albany/NY' : 'USA'
, 'Alberta ' : 'CAN'
, 'Alberta | Sask. | Montana' : 'CAN'
, 'Alberta Pack' : 'CAN'
, 'alberta, canada' : 'CAN'
, 'Alberta, VA' : 'CAN'
, 'Albuquerque New Mexico' : 'MEX'
, 'Alexandria, Egypt.' : 'EGY'
, 'Alexandria, VA' : 'USA'
, 'Alexandria, VA, USA' : 'USA'
, 'Alger-New York-San Francisco' : 'USA'
, 'Alicante, Spain' : 'ESP'
, 'Alicante, Valencia' : 'ESP'
, 'Alliston Ontario' : 'CAN'
, 'Alphen aan den Rijn, Holland' : 'NLD'
, 'Alvin, TX' : 'USA'
, 'Amazon Seller , Propagandist' : 'USA'
, 'America' : 'USA'
, 'America | New Zealand ' : 'USA'
, 'America of Founding Fathers' : 'USA'
, 'Americas Newsroom' : 'USA'
, 'Ames, IA' : 'USA'
, 'Ames, Iowa' : 'USA'
, 'Amman, Jordan' : 'JOR'
, 'Amman,Jordan' : 'JOR'
, 'Amsterdam' : 'NLD'
, 'amsterdayum 120615 062415' : 'NLD'
, 'Anaheim' : 'USA'
, 'Anchorage, AK' : 'USA'
, 'Anderson, SC' : 'USA'
, 'Ankara - Malatya - ad Orontem' : 'TUR'
, 'Anna Maria, FL' : 'USA'
, 'Annapolis, MD' : 'USA'
, 'Antigua ?? NYC ' : 'USA'
, 'Antioch, CA ' : 'USA'
, 'antioch, california' : 'USA'
, 'anzio,italy' : 'ITA'
, 'ARGENTINA' : 'ARG'
, 'ARIZONA' : 'USA'
, 'Arizona ' : 'USA'
, 'Arkansas' : 'USA'
, 'Arkansas, Jonesboro' : 'USA'
, 'Arlington, TX' : 'USA'
, 'Arlington, VA' : 'USA'
, 'Arlington, VA and DC' : 'USA'
, 'Arnhem, the Netherlands' : 'NLD'
, 'Arthas US' : 'USA'
, 'Arundel ' : 'USA'
, 'Arvada, CO' : 'USA'
, 'Asgard' : 'USA'
, 'Ashburn, VA' : 'USA'
, 'Asheboro, NC' : 'USA'
, 'Asheville, NC' : 'USA'
, 'Ashford, Kent, United Kingdom' : 'UK'
, 'Ashland, Oregon' : 'USA'
, 'Asunci√å_n-PY / T√å_bingen-GER' : 'GER'
, 'Athens - Nicosia' : 'GRC'
, 'Athens, Greece' : 'GRC'
, 'Athens,Greece' : 'GRC'
, 'ATL ? SEA ' : 'USA'
, 'ATL ??' : 'USA'
, 'ATL, GA' : 'USA'
, 'ATL??AL??' : 'USA'
, 'Atlanta' : 'USA'
, 'Atlanta - FAU class of \'18' : 'USA'
, 'ATLANTA , GEORGIA ' : 'USA'
, 'Atlanta g.a.' : 'USA'
, 'Atlanta Georgia' : 'USA'
, 'Atlanta Georgia ' : 'USA'
, 'Atlanta, GA' : 'USA'
, 'Atlanta, Georgia' : 'USA'
, 'Atlanta, Georgia USA' : 'USA'
, 'Atlanta,Ga' : 'USA'
, 'Atlanta(ish), GA' : 'USA'
, 'Atlantic Highlands, NJ' : 'USA'
, 'Atlantic, IA' : 'USA'
, 'Auburn ' : 'USA'
, 'Auburn, AL' : 'USA'
, 'Auckland' : 'USA'
, 'Auckland, New Zealand' : 'USA'
, 'Augusta, GA' : 'USA'
, 'Augusta, Maine, 04330' : 'USA'
, 'Aurora, IL' : 'USA'
, 'Aurora, Ontario ' : 'CAN'
, 'AUS' : 'AUS'
, 'Austin' : 'USA'
, 'Austin | San Diego' : 'USA'
, 'Austin TX' : 'USA'
, 'Austin, Texas' : 'USA'
, 'Austin, TX' : 'USA'
, 'Austin/Los Angeles' : 'USA'
, 'Australia' : 'AUS'
, 'Australia ' : 'AUS'
, 'AUSTRALIA-SOUTHAFRICA-CAMBODIA' : 'AUS'
, 'Australian Capital Territory' : 'AUS'
, 'Aveiro, Portugal' : 'PRT'
, 'Avon' : 'USA'
, 'Avon, OH' : 'USA'
, 'AZ' : 'USA'
, 'Back East in PA' : 'USA'
, 'Bahrain' : 'BHR'
, 'Baker City Oregon' : 'USA'
, 'Bakersfield, CA' : 'USA'
, 'Bakersfield, California' : 'USA'
, 'Balikesir - Eskisehir' : 'USA'
, 'Baltimore' : 'USA'
, 'Baltimore, MD' : 'USA'
, 'Bandar Lampung, Indonesia' : 'IDN'
, 'Bandung' : 'IDN'
, 'bangalore' : 'IND'
, 'Bangalore City, India' : 'IND'
, 'Bangalore, India' : 'IND'
, 'Bangalore. India' : 'IND'
, 'Bangkok' : 'THA'
, 'Bangkok Thailand' : 'THA'
, 'Bangor, Co.Down' : 'UK'
, 'Barcelona, Spain' : 'ESP'
, 'Bartholomew County, Indiana' : 'USA'
, 'Based in CA - Serve Nationwide' : 'USA'
, 'Based out of Portland, Oregon' : 'USA'
, 'Basketball City, USA ' : 'USA'
, 'Basking Ridge, NJ' : 'USA'
, 'Bathtub de Bett ' : 'USA'
, 'Baton Rouge' : 'USA'
, 'Baton Rouge, LA' : 'USA'
, 'Bay Area' : 'USA'
, 'Bay Area, CA' : 'USA'
, 'Baydestrian' : 'USA'
, 'Bayonne, NJ' : 'USA'
, 'BC' : 'USA'
, 'Beacon Hills' : 'USA'
, 'beacon hills ' : 'USA'
, 'Beaumont, TX' : 'USA'
, 'Beautiful British Columbia' : 'CAN'
, 'Bedford IN ' : 'USA'
, 'Bedford, England' : 'UK'
, 'beijing .China' : 'CHN'
, 'Beirut, Lebanon' : 'LBN'
, 'Beirut/Toronto' : 'CAN'
, 'Belbroughton, England' : 'UK'
, 'Belfast' : 'UK'
, 'Belgium' : 'BEL'
, 'Belgrade' : 'SRB'
, 'belleville' : 'USA'
, 'Belleville, Illinois' : 'USA'
, 'Bellevue NE' : 'USA'
, 'Bellville, Ohio' : 'USA'
, 'Bend, Oregon' : 'USA'
, 'Benicia, CA ' : 'USA'
, 'Benton City, Washington' : 'USA'
, 'Berlin - Germany' : 'GER'
, 'Berlin, Germany' : 'GER'
, 'BIG D  HOUSTON/BOSTON/DENVER' : 'USA'
, 'Biloxi, Mississippi' : 'USA'
, 'Birdland, New Meridian, FD' : 'USA'
, 'Birmingham' : 'UK'
, 'Birmingham & Bristol' : 'UK'
, 'Birmingham and the Marches' : 'UK'
, 'Birmingham UK' : 'UK'
, 'Birmingham, England' : 'UK'
, 'Birmingham, UK' : 'UK'
, 'Birmingham, United Kingdom' : 'UK'
, 'Bishops Lydeard, England' : 'UK'
, 'Bishops Stortford, England' : 'UK'
, 'Black Canyon New River, AZ' : 'USA'
, 'Blackpool' : 'UK'
, 'Blackpool, England, UK.' : 'UK'
, 'Bleak House' : 'UK'
, 'Bloomington, IN' : 'USA'
, 'Bloomington, Indiana' : 'USA'
, 'Boise, Idaho' : 'USA'
, 'Bolivar, MO' : 'USA'
, 'Bolton & Tewkesbury, UK' : 'UK'
, 'Bombardment Bay' : 'USA'
, 'Bon Temps Louisiana' : 'USA'
, 'Books Published, USA' : 'USA'
, 'Born in Baltimore Living in PA' : 'USA'
, 'Boston' : 'USA'
, 'Boston ¬â√õ¬¢ Cape Cod ?' : 'USA'
, 'Boston MA' : 'USA'
, 'Boston, MA' : 'USA'
, 'Boston, Massachusetts' : 'USA'
, 'Boston/Montreal ' : 'USA'
, 'Boulder' : 'USA'
, 'Boulder, CO' : 'USA'
, 'Bournemouth' : 'USA'
, 'Bournemouth, Dorset, UK' : 'UK'
, 'Bow, NH' : 'USA'
, 'Bozeman, Montana' : 'USA'
, 'Brackley Beach, PE, Canada' : 'CAN'
, 'Brasil' : 'BRA'
, 'Brasil, Fortaleza ce' : 'BRA'
, 'Brasil,SP' : 'BRA'
, 'Brazil' : 'BRA'
, 'Brazil ' : 'BRA'
, 'Brazos Valley, Texas' : 'USA'
, 'Brecksville, OH' : 'USA'
, 'Bremerton, WA' : 'USA'
, 'Brentwood, NY' : 'USA'
, 'Brentwood,TN' : 'USA'
, 'Bridport, England' : 'UK'
, 'Brighton and Hove' : 'UK'
, 'Brisbane' : 'AUS'
, 'Brisbane Australia' : 'AUS'
, 'brisbane, australia' : 'AUS'
, 'Brisbane, Queensland' : 'AUS'
, 'Brisbane.' : 'AUS'
, 'Bristol' : 'UK'
, 'Bristol, England' : 'UK'
, 'Bristol, UK' : 'UK'
, 'British Columbia, Canada' : 'CAN'
, 'British girl in Texas' : 'USA'
, 'Bronx NY' : 'USA'
, 'Bronx NYC / M-City NY' : 'USA'
, 'Bronx, New York' : 'USA'
, 'Bronx, NY' : 'USA'
, 'Brooklyn' : 'USA'
, 'Brooklyn, New York' : 'USA'
, 'Brooklyn, NY' : 'USA'
, 'BROOKLYN, NYC' : 'USA'
, 'Broomfield, CO' : 'USA'
, 'BrowardCounty // Florida ' : 'USA'
, 'Bucks County, Pa' : 'USA'
, 'Budapest, Hungary' : 'HUN'
, 'Buenos Aires' : 'ARG'
, 'buenos aires argentina' : 'ARG'
, 'Buenos Aires, Argentina' : 'ARG'
, 'Buffalo NY' : 'USA'
, 'Buffalo, NY' : 'USA'
, 'Burbank,CA' : 'USA'
, 'Bushkill pa' : 'USA'
, 'CA' : 'USA'
, 'CA physically- Boston Strong?' : 'USA'
, 'Cairo, Egypt' : 'EGY'
, 'Cairo, Egypt.' : 'EGY'
, 'Calgary' : 'CAN'
, 'Calgary, AB' : 'CAN'
, 'Calgary, AB, Canada' : 'CAN'
, 'Calgary, Alberta' : 'CAN'
, 'Calgary, Alberta, Canada' : 'CAN'
, 'Calgary, Canada' : 'CAN'
, 'calgary,ab' : 'CAN'
, 'Calgary,AB, Canada' : 'CAN'
, 'Calgary/Airdrie/RedDeer/AB' : 'CAN'
, 'California' : 'USA'
, 'California ' : 'USA'
, 'california | oregon | peru |' : 'USA'
, 'california mermaid ? ' : 'USA'
, 'California or Colorado' : 'USA'
, 'California, United States' : 'USA'
, 'California, USA' : 'USA'
, 'CAMARILLO, CA' : 'USA'
, 'Cambridge, Massachusetts' : 'USA'
, 'Cambridge, Massachusetts, U.S.' : 'USA'
, 'Canada' : 'CAN'
, 'Canada ' : 'CAN'
, 'Canada BC' : 'CAN'
, 'Canada Eh! ' : 'CAN'
, 'Canberra, Australian Capital Territory' : 'AUS'
, 'Cape Cod, Massachusetts USA' : 'USA'
, 'Cape Town' : 'ZAF'
, 'Cape Town, Khayelitsha' : 'ZAF'
, 'Cardiff, UK' : 'UK'
, 'Carol Stream, Illinois' : 'USA'
, 'Caserta-Roma, Italy ' : 'ITA'
, 'Cassadaga Florida' : 'USA'
, 'Castaic, CA' : 'USA'
, 'Catalonia, Spain' : 'ESP'
, 'Central Coast, California' : 'USA'
, 'Central Florida' : 'USA'
, 'Central Illinois' : 'USA'
, 'Chandler, AZ' : 'USA'
, 'Chapel Hill, NC' : 'USA'
, 'Chappaqua NY and Redlands CA' : 'USA'
, 'Charleston, IL' : 'USA'
, 'Charlotte' : 'USA'
, 'Charlotte ' : 'USA'
, 'Charlotte County Florida' : 'USA'
, 'Charlotte NC' : 'USA'
, 'Charlotte, N.C.' : 'USA'
, 'Charlotte, NC' : 'USA'
, 'Charlotte, NC | K√å¬¶ln, NRW' : 'USA'
, 'Charlotte, North Carolina' : 'USA'
, 'Charlottetown' : 'USA'
, 'Chatham, IL' : 'USA'
, 'Cherry Creek Denver CO' : 'USA'
, 'Cheshire. London. #allover' : 'UK'
, 'Chester, IL' : 'USA'
, 'Chicago' : 'USA'
, 'Chicago - Lake Buena Vista' : 'USA'
, 'CHICAGO (312)' : 'USA'
, 'Chicago Area' : 'USA'
, 'Chicago Heights, IL' : 'USA'
, 'Chicago IL' : 'USA'
, 'Chicago, but Philly is home' : 'USA'
, 'Chicago, IL' : 'USA'
, 'Chicago, IL ' : 'USA'
, 'Chicago, IL 60607' : 'USA'
, 'Chicago, Illinois' : 'USA'
, 'Chicago,Illinois' : 'USA'
, 'Chicagoland' : 'USA'
, 'ChicagoRObotz' : 'USA'
, 'China' : 'CHN'
, 'Chippenham/Bath, UK' : 'UK'
, 'Chiswick, London' : 'UK'
, 'Chorley, Lancashire, UK' : 'UK'
, 'City of Angels, CA' : 'USA'
, 'City of London, London' : 'UK'
, 'Ciudad Aut√å_noma de Buenos Aires, Argentina' : 'ARG'
, 'Clayton, NC' : 'USA'
, 'Clearwater, FL' : 'USA'
, 'Cleveland, OH' : 'USA'
, 'Cleveland, OH - San Diego, CA' : 'USA'
, 'Coasts of Maine & California' : 'USA'
, 'Cochrane, Alberta, Canada' : 'CAN'
, 'Coconut Creek, Florida' : 'USA'
, 'Colchester Essex ' : 'UK'
, 'College Station, TX' : 'USA'
, 'Colombia' : 'USA'
, 'Colombo,Sri Lanka.' : 'USA'
, 'Colonial Heights, VA' : 'USA'
, 'ColoRADo' : 'USA'
, 'Colorado Springs' : 'USA'
, 'Colorado, USA' : 'USA'
, 'Columbia Heights, MN' : 'USA'
, 'Columbia, SC' : 'USA'
, 'Columbus' : 'USA'
, 'Columbus ?? North Carolina' : 'USA'
, 'columbus ohio' : 'USA'
, 'Columbus, Georgia' : 'USA'
, 'Columbus, OH' : 'USA'
, 'Concord, CA' : 'USA'
, 'Concord, N.C.' : 'USA'
, 'Concord, NH ' : 'USA'
, 'Connecticut' : 'USA'
, 'Conroe, TX' : 'USA'
, 'Coolidge, AZ' : 'USA'
, 'CORNFIELDS' : 'USA'
, 'Cornwall' : 'USA'
, 'Corpus - Las Vegas - Houston' : 'USA'
, 'Corpus Christi' : 'USA'
, 'Corpus Christi, Texas' : 'USA'
, 'Cosmic Oneness' : 'USA'
, 'Costa Rica' : 'USA'
, 'Cottonwood Arizona' : 'USA'
, 'County Durham, United Kingdom' : 'UK'
, 'Coventry' : 'USA'
, 'Coventry, Rhode Island' : 'USA'
, 'Coventry, UK' : 'UK'
, 'CPT & JHB, South Africa' : 'ZAF'
, 'Crato - CE ' : 'USA'
, 'Crayford, London' : 'UK'
, 'Crouch End, London' : 'UK'
, 'Croydon' : 'USA'
, 'CT & NY' : 'USA'
, 'CT, USA' : 'USA'
, 'D√å_sseldorf, Germany' : 'GER'
, 'DaKounty, Pa' : 'USA'
, 'dallas' : 'USA'
, 'Dallas Fort-Worth' : 'USA'
, 'Dallas, Tejas' : 'USA'
, 'Dallas, Texas. ' : 'USA'
, 'Dallas, TX' : 'USA'
, 'Dallas, TX ' : 'USA'
, 'Dammam- KSA' : 'SAU'
, 'Danbury, CT' : 'USA'
, 'Danville, VA' : 'USA'
, 'Dappar (Mohali) Punjab' : 'IND'
, 'Davao City' : 'USA'
, 'Davidson, NC' : 'USA'
, 'Davis, California' : 'USA'
, 'Dayton, OH' : 'USA'
, 'Dayton, Ohio' : 'USA'
, 'DC' : 'USA'
, 'DC Metro area' : 'USA'
, 'DC, frequently NYC/San Diego' : 'USA'
, 'Deadend, UK' : 'UK'
, 'death star' : 'USA'
, 'Decatur, GA' : 'USA'
, 'Delhi ' : 'IND'
, 'denmark' : 'DEN'
, 'Denton, Texas' : 'USA'
, 'denver colorado' : 'USA'
, 'Denver Colorado. Fun Times' : 'USA'
, 'Denver, CO' : 'USA'
, 'Denver, Colorado' : 'USA'
, 'Derbyshire, United Kingdom' : 'UK'
, 'Des Moines, IA' : 'USA'
, 'Des Moines, Iowa ' : 'USA'
, 'Desde Republica Argentina' : 'ARG'
, 'Desert Storm?? |BCHS|' : 'USA'
, 'Detroit' : 'USA'
, 'Detroit Tigers Dugout' : 'USA'
, 'Detroit, MI' : 'USA'
, 'Detroit, MI, United States' : 'USA'
, 'Detroit, Michigan' : 'USA'
, 'Detroit/Windsor' : 'USA'
, 'Devon/London ' : 'UK'
, 'DFW, Texas' : 'USA'
, 'Dhaka' : 'BGD'
, 'Dhaka, Bangladsh' : 'BGD'
, 'Displaced Son of TEXAS!' : 'USA'
, 'Dorset, UK' : 'UK'
, 'Dorset, United Kingdom' : 'UK'
, 'Dover, DE' : 'USA'
, 'Downtown Churubusco, Indiana' : 'USA'
, 'Downtown Oklahoma City' : 'USA'
, 'Dreieich, Germany' : 'GER'
, 'Dubai' : 'UAE'
, 'dubai ' : 'UAE'
, 'Dubai, UAE' : 'UAE'
, 'Dubai, United Arab Emirates' : 'UAE'
, 'Dublin' : 'IRL'
, 'dublin ' : 'IRL'
, 'Dublin City, Ireland' : 'IRL'
, 'Dublin, Ireland' : 'IRL'
, 'ducked off . . . ' : 'IRL'
, 'Dudetown' : 'IRL'
, 'Duncan' : 'IRL'
, 'dundalk ireland' : 'IRL'
, 'Dundas, Ontario' : 'CAN'
, 'Dundee' : 'CAN'
, 'Dundee, UK' : 'UK'
, 'Dunwoody, GA' : 'USA'
, 'Durand, MI' : 'USA'
, 'Durban, South Africa' : 'ZAF'
, 'Durham N.C ' : 'USA'
, 'Durham, NC' : 'USA'
, 'Dutch/English/German' : 'GER'
, 'Duval, WV 25573, USA ?' : 'USA'
, 'Eagle Mountain, Texas ' : 'USA'
, 'Eagle Pass, Texas' : 'USA'
, 'Eagle River Alaska' : 'USA'
, 'Ealing, London' : 'UK'
, 'East Atlanta, Georgia' : 'USA'
, 'East Aurora, NY' : 'USA'
, 'East Islip, NY' : 'USA'
, 'East Lansing, MI' : 'USA'
, 'East London' : 'UK'
, 'East London. ' : 'UK'
, 'EastAtlanta ??#WestGeorgia\'18' : 'USA'
, 'Eastbourne England' : 'UK'
, 'EastCarolina' : 'USA'
, 'Eastern Iowa' : 'USA'
, 'Eastlake, OH' : 'USA'
, 'Eau Claire, Wisconsin' : 'USA'
, 'Eaubonne, 95, France' : 'FRA'
, 'eBooks, North America' : 'USA'
, 'Edinburgh' : 'UK'
, 'Edinburgh, Scotland' : 'UK'
, 'Edmonton, Alberta' : 'CAN'
, 'Edmonton, Alberta - Treaty 6' : 'CAN'
, 'EGYPT' : 'EGY'
, 'EIU  Chucktown/LaSalle IL' : 'USA'
, 'El Dorado, Arkansas' : 'USA'
, 'El Dorado, KS' : 'USA'
, 'El Paso, Texas' : 'USA'
, 'El Paso, TX' : 'USA'
, 'Elizabeth, NJ' : 'USA'
, 'Elk Grove, CA, USA' : 'USA'
, 'Elkhart, IN' : 'USA'
, 'Ellensburg to Spokane' : 'USA'
, 'Elmwood Park, NJ' : 'USA'
, 'Elsewhere, NZ' : 'USA'
, 'Emirates' : 'UAE'
, 'Enfield, UK' : 'UK'
, 'England' : 'UK'
, 'England ' : 'UK'
, 'England & Wales Border, UK' : 'UK'
, 'England, Great Britain.' : 'UK'
, 'England, United Kingdom' : 'UK'
, 'England,UK,Europe,Sol 3.' : 'UK'
, 'England.' : 'UK'
, 'English Midlands' : 'UK'
, 'Enterprise, Alabama' : 'USA'
, 'Enterprise, NV' : 'USA'
, 'Erbil' : 'IRQ'
, 'Erie, PA' : 'USA'
, 'Escondido, CA' : 'USA'
, 'Espa√å¬±a - Spain - Espagne' : 'ESP'
, 'Espa√å¬±a, Spain' : 'ESP'
, 'Espoo, Finland' : 'FIN'
, 'Essex' : 'UK'
, 'Essex, England' : 'UK'
, 'Essex/Brighton' : 'UK'
, 'Est. September 2012 - Bristol' : 'UK'
, 'Estados Unidos' : 'USA'
, 'Eugene, Oregon' : 'USA'
, 'Eureka, California, USA' : 'USA'
, 'Evanston, IL' : 'USA'
, 'Evansville, IN' : 'USA'
, 'Everett, WA' : 'USA'
, 'Evergreen Colorado' : 'USA'
, 'Ewa Beach, HI' : 'USA'
, 'Fairfax, VA' : 'USA'
, 'Fairfield, California' : 'USA'
, 'Fife, WA' : 'USA'
, 'Finland' : 'FIN'
, 'fl' : 'USA'
, 'Fleet/Oxford, UK' : 'UK'
, 'Florida' : 'USA'
, 'Florida but I wanna be n Texas' : 'USA'
, 'Florida Forever' : 'USA'
, 'Florida USA' : 'USA'
, 'Florida, USA' : 'USA'
, 'Fort Calhoun, NE' : 'USA'
, 'Fort Collins, CO' : 'USA'
, 'Fort Fizz, Ohio' : 'USA'
, 'Fort Knox, KY 40121' : 'USA'
, 'Fort Lauderdale, FL' : 'USA'
, 'Fort Myers, Florida' : 'USA'
, 'Fort Smith, AR' : 'USA'
, 'Fort Valley,GA/Fayetteville,AR' : 'USA'
, 'Fort Walton Beach, Fl' : 'USA'
, 'Fort Wayne, IN' : 'USA'
, 'Fort Worth,  Texas ' : 'USA'
, 'Fort Worth, Texas' : 'USA'
, 'Fountain City, IN ' : 'USA'
, 'Fountain Valley, CA' : 'USA'
, 'France' : 'FRA'
, 'Frankfort, KY' : 'USA'
, 'Franklin, TN near Nashville' : 'USA'
, 'Frascati' : 'USA'
, 'Fredonia,NY' : 'USA'
, 'Free State, South Africa' : 'ZAF'
, 'Freeport IL. USA' : 'USA'
, 'Freeport Ny' : 'USA'
, 'Fresno, CA' : 'USA'
, 'Fresno, California' : 'USA'
, 'Friendswood, TX' : 'USA'
, 'Frisco, TX' : 'USA'
, 'From a torn up town MANCHESTER' : 'UK'
, 'From NY. In Scranton, PA' : 'USA'
, 'Frome, Somerset, England' : 'UK'
, 'Fukuoka, Japan' : 'JPN'
, 'Fukushima city Fukushima.pref' : 'JPN'
, 'Gages Lake, IL' : 'USA'
, 'Gainesville, FL' : 'USA'
, 'Gainesville/Tampa, FL' : 'USA'
, 'Galveston, Texas' : 'USA'
, 'Garden City, NY' : 'USA'
, 'Georgia, USA' : 'USA'
, 'germany' : 'GER'
, 'Gettysburg, PA' : 'USA'
, 'Glendale, CA' : 'USA'
, 'Gloucestershire , UK' : 'UK'
, 'Goa, India' : 'IND'
, 'Gold Coast, Australia' : 'AUS'
, 'Gold Coast, Qld, Australia' : 'AUS'
, 'Gotham City,USA' : 'USA'
, 'Grand Rapids MI' : 'USA'
, 'Greater Manchester, UK' : 'UK'
, 'Greenfield, Massachusetts' : 'USA'
, 'Greensboro, NC' : 'USA'
, 'Greensburg, PA' : 'USA'
, 'Greenville' : 'USA'
, 'Greenville, S.C.' : 'USA'
, 'Greenville,SC' : 'USA'
, 'Guelph Ontario Canada' : 'CAN'
, 'Guildford, UK' : 'UK'
, 'Hackney, London' : 'UK'
, 'Haddonfield, NJ' : 'USA'
, 'Hagerstown, MD' : 'USA'
, 'Haiku, Maui, Hawaii' : 'USA'
, 'Hailing from Dayton ' : 'USA'
, 'Halfrica' : 'USA'
, 'Halifax' : 'USA'
, 'Halifax, Nouvelle-√å√§cosse' : 'USA'
, 'Halifax, Nova Scotia' : 'USA'
, 'Halifax, NS, Canada' : 'CAN'
, 'Halton Region' : 'USA'
, 'Halton, Ontario' : 'CAN'
, 'Hamburg, DE' : 'USA'
, 'Hame' : 'USA'
, 'Hamilton County, IN' : 'USA'
, 'Hamilton, ON' : 'USA'
, 'Hamilton, Ontario CA' : 'USA'
, 'Hamilton, Ontario Canada' : 'CAN'
, 'Hammersmith, London' : 'UK'
, 'Hampshire UK' : 'UK'
, 'Hampshire, UK' : 'UK'
, 'Hampstead, London.' : 'UK'
, 'Hampton Roads, VA' : 'USA'
, 'Hannover, Germany' : 'GER'
, 'Happily Married with 2 kids ' : 'USA'
, 'Harbour Heights, FL' : 'USA'
, 'Harlem, New York' : 'USA'
, 'Harlem, NY or Chocolate City' : 'USA'
, 'Harlingen, TX' : 'USA'
, 'Harper Woods, MI' : 'USA'
, 'Harpurhey, Manchester, UK' : 'UK'
, 'Harris County, Texas' : 'USA'
, 'Hartford,  connecticut' : 'USA'
, 'Hartford, Connecticut' : 'USA'
, 'hatena bookmark' : 'USA'
, 'Hatteras, North Carolina' : 'USA'
, 'Hattiesburg, MS' : 'USA'
, 'Hawaii' : 'USA'
, 'Hawaii USA' : 'USA'
, 'Hawaii, USA' : 'USA'
, 'Hawthorne, NE' : 'USA'
, 'Haysville, KS' : 'USA'
, 'Head Office: United Kingdom' : 'UK'
, 'Heathrow' : 'UK'
, 'Helsinki' : 'FIN'
, 'Helsinki, Finland' : 'FIN'
, 'Henderson, Nevada' : 'USA'
, 'Henderson, NV' : 'USA'
, 'Hendersonville, NC' : 'USA'
, 'Hermitage, PA' : 'USA'
, 'Hermosa Beach, CA' : 'USA'
, 'Hickville, USA' : 'USA'
, 'Highland Park, CA' : 'USA'
, 'Hillsville/Lynchburg, VA' : 'USA'
, 'Holland MI via Houston, CLE' : 'USA'
, 'Holly Springs, NC ' : 'USA'
, 'Holly, MI' : 'USA'
, 'Hollywood' : 'USA'
, 'Hollywood, CA' : 'USA'
, 'hollywoodland ' : 'USA'
, 'Honolulu, Hawaii' : 'USA'
, 'Honolulu,Hawaii ' : 'USA'
, 'Horsemind, MI' : 'USA'
, 'houstn' : 'USA'
, 'houston' : 'USA'
, 'Houston ' : 'USA'
, 'Houston |??| Corsicana' : 'USA'
, 'Houston TX' : 'USA'
, 'Houston,  TX' : 'USA'
, 'Houston, Texas' : 'USA'
, 'Houston, Texas ! ' : 'USA'
, 'Houston, TX' : 'USA'
, 'Houston, TX  ' : 'USA'
, 'Hoxton, London' : 'UK'
, 'Huber Heights, OH' : 'USA'
, 'Hudson Valley, NY' : 'USA'
, 'Hughes, AR' : 'USA'
, 'Huntington, WV' : 'USA'
, 'Huntley, IL' : 'USA'
, 'Huntsville AL' : 'USA'
, 'Huntsville, AL' : 'USA'
, 'Huntsville, Alabama' : 'USA'
, 'Hustletown' : 'USA'
, 'hyderabad' : 'PAK'
, 'Hyderabad Telangana INDIA' : 'IND'
, 'I-75 in Florida' : 'USA'
, 'IG : Sincerely_TSUNAMI' : 'USA'
, 'Iliff,Colorado  ' : 'USA'
, 'ill yorker' : 'USA'
, 'Illinois' : 'USA'
, 'Illinois, USA' : 'USA'
, 'illinois. united state ' : 'USA'
, 'Im Around ... Jersey' : 'USA'
, 'India' : 'IND'
, 'indiana' : 'USA'
, 'Indiana, USA' : 'USA'
, 'Indianapolis, IN' : 'USA'
, 'IndiLand ' : 'USA'
, 'Indonesia' : 'IDN'
, 'Indonesia' : 'IDN'
, 'Inglewood, CA' : 'USA'
, 'Iowa, USA' : 'USA'
, 'Ireland' : 'IRL'
, 'Irving , Texas' : 'USA'
, 'Islamabad' : 'PAK'
, 'Island Lake, IL' : 'USA'
, 'Isle of Man' : 'UK'
, 'Istanbul' : 'TUR'
, 'italy' : 'ITA'
, 'Jacksonville Beach, FL' : 'USA'
, 'Jacksonville, FL' : 'USA'
, 'Jaipur, India' : 'IND'
, 'Jaipur, Rajasthan, India' : 'IND'
, 'Jakarta' : 'IDN'
, 'Jakarta, Indonesia' : 'IDN'
, 'Jakarta/Kuala Lumpur/S\'pore' : 'IDN'
, 'Japan' : 'JPN'
, 'japon' : 'JPN'
, 'Jeddah_Saudi Arabia.' : 'SAU'
, 'Jersey' : 'USA'
, 'jersey ' : 'USA'
, 'Jersey - C.I' : 'USA'
, 'Jersey City, New Jersey' : 'USA'
, 'Jersey City, NJ' : 'USA'
, 'Jersey Shore' : 'USA'
, 'Jerseyville, IL' : 'USA'
, 'Johannesburg, South Africa' : 'ZAF'
, 'Johannesburg, South Africa ' : 'ZAF'
, 'Jonesboro, AR MO, IOWA USA' : 'USA'
, 'Jonesboro, Arkansas USA' : 'USA'
, 'Joshua Tree, CA' : 'USA'
, 'Jubail IC, Saudi Arabia' : 'SAU'
, 'Jubail IC, Saudi Arabia.' : 'SAU'
, 'Kalamazoo, Michigan' : 'USA'
, 'Kalimantan Timur, Indonesia' : 'IDN'
, 'Kama | 18 | France ' : 'FRA'
, 'Kamloops, BC' : 'USA'
, 'kansas' : 'USA'
, 'Kansas City' : 'USA'
, 'Kansas City, MO' : 'USA'
, 'Kansas City, Mo.' : 'USA'
, 'Kansas, The Free State! ~ KC' : 'USA'
, 'Karachi' : 'PAK'
, 'Karachi ' : 'PAK'
, 'Karachi Pakistan' : 'PAK'
, 'Karachi, Pakistan' : 'PAK'
, 'Kashmir!' : 'PAK'
, 'Kauai, Hawaii' : 'USA'
, 'Kawartha Lakes, Ontario, Canad' : 'CAN'
, 'Keighley, England' : 'UK'
, 'Kelowna, BC' : 'USA'
, 'Kenosha, WI 53143' : 'USA'
, 'Kensington, MD' : 'USA'
, 'Kent' : 'USA'
, 'Kenton, Ohio' : 'USA'
, 'Kentucky, USA' : 'USA'
, 'Kenya' : 'KEN'
, 'Kettering, OH' : 'USA'
, 'Kingston, Pennsylvania' : 'USA'
, 'Knoxville, TN' : 'USA'
, 'Kodiak, AK' : 'USA'
, 'Kokomo, In' : 'USA'
, 'Kolkata, India' : 'IND'
, 'Kutztown, PA' : 'USA'
, 'L. A.' : 'USA'
, 'La Grange Park, IL' : 'USA'
, 'La Puente, CA' : 'USA'
, 'LA/OC/Vegas' : 'USA'
, 'Lagos' : 'NGA'
, 'Lake Monticello, VA' : 'USA'
, 'Lancashire, United Kingdom' : 'UK'
, 'Lancaster California' : 'USA'
, 'Lancaster, CA' : 'USA'
, 'Lancaster, Pennsylvania, USA' : 'USA'
, 'Laredo, TX' : 'USA'
, 'Las Vegas' : 'USA'
, 'Las Vegas aka Hell' : 'USA'
, 'Las Vegas, Nevada' : 'USA'
, 'Las Vegas, NV' : 'USA'
, 'Las Vegas, NV ' : 'USA'
, 'Las Vegas, NV USA' : 'USA'
, 'Lawrence, KS via Emporia, KS' : 'USA'
, 'LEALMAN, FLORIDA' : 'USA'
, 'Leduc, Alberta, Canada' : 'CAN'
, 'lee london' : 'UK'
, 'Leeds' : 'USA'
, 'Leeds, England' : 'USA'
, 'Leeds, U.K.' : 'USA'
, 'Leeds, UK' : 'UK'
, 'Leeds, United Kingdom' : 'UK'
, 'Leesburg, FL' : 'USA'
, 'Lehigh Valley, PA' : 'USA'
, 'Leicester' : 'UK'
, 'Leicester, England' : 'UK'
, 'Lethbridge, AB, Canada' : 'CAN'
, 'Lethbridge, Alberta, Canada' : 'CAN'
, 'Lima, OH' : 'USA'
, 'Lima, Ohio' : 'USA'
, 'Lincoln' : 'USA'
, 'Lincoln City Oregon' : 'USA'
, 'Lincoln, IL' : 'USA'
, 'Lincoln, NE' : 'USA'
, 'Lindenhurst' : 'USA'
, 'Linton Hall, VA' : 'USA'
, 'Lisbon, Portugal' : 'PRT'
, 'Littleton, CO' : 'USA'
, 'Littleton, CO, USA' : 'USA'
, 'LITTLETON, CO, USA, TERRAN' : 'USA'
, 'LiVE M√å¬ÅS' : 'USA'
, 'Live m√å√Ås' : 'USA'
, 'Live Oak, TX' : 'USA'
, 'Live On Webcam' : 'USA'
, 'LIVERPOOL' : 'UK'
, 'liverpool ' : 'UK'
, 'Lives in London' : 'UK'
, 'Livingston, IL  U.S.A.' : 'USA'
, 'Livingston, MT' : 'USA'
, 'Livonia, MI' : 'USA'
, 'Lizzy\'s Knee' : 'USA'
, 'London' : 'UK'
, 'London ' : 'UK'
, 'London / Birmingham' : 'UK'
, 'london / st catharines ?' : 'UK'
, 'london essex england uk' : 'UK'
, 'london town..' : 'UK'
, 'London UK' : 'UK'
, 'London, England' : 'UK'
, 'London, Greater London, UK' : 'UK'
, 'London, Kent & SE England.' : 'UK'
, 'London, UK' : 'UK'
, 'London, United Kingdom' : 'UK'
, 'London.' : 'UK'
, 'London/Bristol/Guildford' : 'UK'
, 'London/Lagos/FL √å√èT: 6.6200132,' : 'UK'
, 'London/New York' : 'USA'
, 'London/Outlaw Country ' : 'UK'
, 'London/Surrey ' : 'UK'
, 'Londonstan' : 'UK'
, 'Long Beach, CA' : 'USA'
, 'Long Eaton √•√° Derbyshire √•√° UK' : 'UK'
, 'Long Island' : 'USA'
, 'Long Island NY & San Francisco' : 'USA'
, 'LONG ISLAND, NY' : 'USA'
, 'Los Angeles' : 'USA'
, 'Los Angeles ' : 'USA'
, 'Los Angeles for now' : 'USA'
, 'Los Angeles New York' : 'USA'
, 'Los Angeles, CA' : 'USA'
, 'Los Angeles, Calif.' : 'USA'
, 'Los Angeles, California' : 'USA'
, 'Los Angeles,CA, USA' : 'USA'
, 'Los Angeles... CA... USA' : 'USA'
, 'Los Angles, CA' : 'USA'
, 'Louavul, KY' : 'USA'
, 'Loughborough, England' : 'UK'
, 'Loughborough.' : 'UK'
, 'Loughton, Essex, UK' : 'UK'
, 'Louisiana' : 'USA'
, 'Louisiana, USA' : 'USA'
, 'louisville, kentucky' : 'USA'
, 'Louisville, KY' : 'USA'
, 'Louisville, KY ' : 'USA'
, 'Loveland Colorado' : 'USA'
, 'Lowell, MA' : 'USA'
, 'LP, MN USA' : 'USA'
, 'Lubbock, Texas' : 'USA'
, 'Lubbock, TX' : 'USA'
, 'Lucknow, India' : 'IND'
, 'Lynchburg, VA' : 'USA'
, 'Lynwood, CA' : 'USA'
, 'MA via PA' : 'USA'
, 'Macclesfield' : 'USA'
, 'Mackay, QLD, Australia' : 'AUS'
, 'Mackem in Bolton' : 'USA'
, 'Macon, GA' : 'USA'
, 'Macon, Georgia' : 'USA'
, 'MAD as Hell' : 'USA'
, 'Made Here In Detroit ' : 'USA'
, 'Made in America' : 'USA'
, 'Madison, GA' : 'USA'
, 'Madison, WI' : 'USA'
, 'Madison, WI & St. Louis MO' : 'USA'
, 'Madison, Wisconsin, USA' : 'USA'
, 'Madisonville TN' : 'USA'
, 'Madrid' : 'ESP'
, 'Madrid, Comunidad de Madrid' : 'ESP'
, 'mainly California' : 'USA'
, 'Manchester' : 'UK'
, 'Manchester UK' : 'UK'
, 'Manchester, England' : 'UK'
, 'Manchester, NH' : 'USA'
, 'Manchester, UK' : 'UK'
, 'manchester, uk.' : 'UK'
, 'Manhattan' : 'USA'
, 'Manhattan, NY' : 'USA'
, 'Marbella. Spain' : 'ESP'
, 'Maricopa, AZ' : 'USA'
, 'Maryland' : 'USA'
, 'Maryland ' : 'USA'
, 'Maryland, USA' : 'USA'
, 'Maryland,Baltimore' : 'USA'
, 'marysville ca ' : 'USA'
, 'Marysville, MI' : 'USA'
, 'Mass' : 'USA'
, 'Massachusetts' : 'USA'
, 'Massachusetts ' : 'USA'
, 'Massachusetts, USA' : 'USA'
, 'McLean, VA' : 'USA'
, 'Medford, NJ' : 'USA'
, 'Medford, Oregon' : 'USA'
, 'Meereen ' : 'USA'
, 'melbourne' : 'AUS'
, 'Melbourne Australia' : 'AUS'
, 'Melbourne-ish' : 'AUS'
, 'Melbourne, Australia' : 'AUS'
, 'Melbourne, Australia.' : 'AUS'
, 'Melbourne, FL' : 'USA'
, 'Melbourne, Florida' : 'USA'
, 'Melbourne, Victoria' : 'AUS'
, 'Melrose' : 'AUS'
, 'Melton, GA' : 'USA'
, 'Memphis' : 'USA'
, 'Memphis, in the Tennessees' : 'USA'
, 'Memphis, TN' : 'USA'
, 'Menasha, WI' : 'USA'
, 'Mentor OH' : 'USA'
, 'Merica!' : 'USA'
, 'Mesa, AZ' : 'USA'
, 'Methville, CA' : 'USA'
, 'Metro Manila' : 'USA'
, 'mexico' : 'USA'
, 'Mexico City' : 'USA'
, 'Mexico! ^_^' : 'USA'
, 'MI' : 'USA'
, 'MI - CA' : 'USA'
, 'MI,USA' : 'USA'
, 'Miami' : 'USA'
, 'Miami ??' : 'USA'
, 'Miami Beach, Fl' : 'USA'
, 'Miami via Lima' : 'USA'
, 'miami x dallas ' : 'USA'
, 'Miami, FL' : 'USA'
, 'Miami, Florida' : 'USA'
, 'Miami,FL' : 'USA'
, 'Miami,Fla' : 'USA'
, 'Miami?Gainesville' : 'USA'
, 'Michel Delving.' : 'USA'
, 'Michigan' : 'USA'
, 'Michigan ' : 'USA'
, 'Michigan, USA' : 'USA'
, 'Mid north coast of NSW' : 'USA'
, 'Mid West' : 'USA'
, 'Middle Earth / Asgard / Berk' : 'USA'
, 'middle eastern palace' : 'USA'
, 'midwest' : 'USA'
, 'Midwest City, OK' : 'USA'
, 'Midwestern USA' : 'USA'
, 'milky way' : 'USA'
, 'Milky Way galaxy ' : 'USA'
, 'Milton keynes' : 'USA'
, 'Milton Keynes ' : 'USA'
, 'Milton Keynes, England' : 'UK'
, 'Milton/Tallahassee' : 'USA'
, 'Milwaukee County' : 'USA'
, 'Milwaukee WI' : 'USA'
, 'Milwaukee, WI' : 'USA'
, 'Minneapolis - St. Paul' : 'USA'
, 'Minneapolis, MN' : 'USA'
, 'Minneapolis,MN,US' : 'USA'
, 'Minneapolis/St. Paul' : 'USA'
, 'Minority Privilege, USA' : 'USA'
, 'Mississauga, Ontario' : 'CAN'
, 'missouri USA' : 'USA'
, 'Missouri, USA' : 'USA'
, 'Mogadishu, New Jersey' : 'USA'
, 'Montana, USA' : 'USA'
, 'Mooresville, NC' : 'USA'
, 'Morganville, Texas.' : 'USA'
, 'Morioh, Japan' : 'JPN'
, 'Morris, IL' : 'USA'
, 'Mount Vernon, NY' : 'USA'
, 'Mumbai' : 'IND'
, 'Mumbai , India' : 'IND'
, 'Mumbai india' : 'IND'
, 'Mumbai, India' : 'IND'
, 'Murray Hill, New Jersey' : 'USA'
, 'N. California USA' : 'USA'
, 'Nairobi' : 'KEN'
, 'NAIROBI  KENYA ' : 'KEN'
, 'Nairobi , Kenya' : 'KEN'
, 'Nairobi-KENYA' : 'KEN'
, 'Nairobi, Kenya' : 'KEN'
, 'Nairobi, Kenya ' : 'KEN'
, 'Nakhon Si Thammarat' : 'KEN'
, 'Nanaimo, BC, Canada' : 'CAN'
, 'Nantes, France' : 'FRA'
, 'Napa, CA' : 'USA'
, 'Nashua NH' : 'USA'
, 'Nashville' : 'USA'
, 'Nashville, Tennessee' : 'USA'
, 'Nashville, TN' : 'USA'
, 'nashville, tn ' : 'USA'
, 'nbc washington' : 'USA'
, 'NC' : 'USA'
, 'Near Richmond, VA' : 'USA'
, 'Nelspruit, South Africa' : 'ZAF'
, 'Netherlands' : 'NZL'
, 'Netherlands,Amsterdam-Virtual ' : 'NZL'
, 'Nevada (wishing for Colorado)' : 'USA'
, 'Nevada, USA' : 'USA'
, 'New Britain, CT' : 'USA'
, 'New Brunswick, NJ' : 'USA'
, 'New Chicago' : 'USA'
, 'New Delhi' : 'IND'
, 'New Delhi, Delhi' : 'IND'
, 'New Delhi, India' : 'IND'
, 'New Delhi,India' : 'IND'
, 'New England' : 'UK'
, 'New Hampshire' : 'UK'
, 'New Hampshire, USA' : 'USA'
, 'New Hanover County, NC' : 'USA'
, 'New Haven, Connecticut' : 'USA'
, 'New Jersey' : 'USA'
, 'New Jersey ' : 'USA'
, 'New Jersey, USA' : 'USA'
, 'New Jersey, usually' : 'USA'
, 'New Jersey/ D.R.' : 'USA'
, 'New Jersey/New York' : 'USA'
, 'New Mexico, USA' : 'USA'
, 'New Orleans ,Louisiana' : 'USA'
, 'New Orleans, LA' : 'USA'
, 'New Orleans, Louisiana' : 'USA'
, 'New South Wales, Australia' : 'AUS'
, 'New Sweden' : 'SWE'
, 'New York' : 'USA'
, 'New York ' : 'USA'
, 'New York - Connecticut' : 'USA'
, 'New York ? ATL' : 'USA'
, 'New York / Worldwide' : 'USA'
, 'New York 2099' : 'USA'
, 'New York Brooklyn' : 'USA'
, 'New York City' : 'USA'
, 'New York City ,NY' : 'USA'
, 'New York City, NY' : 'USA'
, 'New York NYC' : 'USA'
, 'New York, New York' : 'USA'
, 'New York, NY' : 'USA'
, 'New York, NY ' : 'USA'
, 'New York, United States' : 'USA'
, 'New York, USA' : 'USA'
, 'New York. NY' : 'USA'
, 'New Your' : 'USA'
, 'New Zealand' : 'NZL'
, 'Newark, NJ' : 'USA'
, 'Newcastle' : 'UK'
, 'Newcastle upon Tyne' : 'USA'
, 'Newcastle Upon Tyne, England' : 'UK'
, 'Newcastle, England' : 'UK'
, 'Newcastle, England ' : 'UK'
, 'Newcastle, OK' : 'USA'
, 'Newport, Wales, UK' : 'UK'
, 'Newton Centre, Massachusetts' : 'USA'
, 'Newton, NJ 07860' : 'USA'
, 'NH via Boston, MA' : 'USA'
, 'Nicoma Park, OK' : 'USA'
, 'Nigeria' : 'NGA'
, 'nj/ny' : 'USA'
, 'Noida, NCR, India' : 'IND'
, 'NOLA ?? TX' : 'USA'
, 'Nomad, USA' : 'USA'
, 'Norman, Oklahoma' : 'USA'
, 'North America' : 'USA'
, 'North Carolina, USA' : 'USA'
, 'North Dartmouth, Massachusetts' : 'USA'
, 'North East / Middlesbrough ' : 'UK'
, 'North East Unsigned Radio' : 'UK'
, 'North East USA' : 'USA'
, 'North East, England' : 'UK'
, 'North Ferriby, East Yorkshire' : 'UK'
, 'North Highlands, CA' : 'USA'
, 'North Jersey' : 'USA'
, 'North London' : 'UK'
, 'North Port, FL' : 'USA'
, 'North Vancouver, BC' : 'CAN'
, 'North West England UK' : 'UK'
, 'North West London' : 'UK'
, 'Northampton, MA' : 'USA'
, 'Northern California U.S.A.' : 'USA'
, 'Northern Colorado' : 'USA'
, 'Northern Ireland' : 'USA'
, 'Northern Kentucky, USA' : 'USA'
, 'Norwalk, CT' : 'USA'
, 'Not Los Angeles, Not New York.' : 'USA'
, 'Nottingham, United Kingdom' : 'UK'
, 'Nova Scotia, Canada' : 'CAN'
, 'Novi, MI' : 'USA'
, 'Numa casa de old yellow bricks' : 'USA'
, 'NY' : 'USA'
, 'NY || live easy? ' : 'USA'
, 'NY Capital District' : 'USA'
, 'NY, CT & Greece' : 'USA'
, 'NY, NY' : 'USA'
, 'nyc' : 'USA'
, 'NYC :) Ex- #Islamophobe' : 'USA'
, 'NYC / International' : 'USA'
, 'NYC area' : 'USA'
, 'NYC metro' : 'USA'
, 'NYC-LA-MIAMI' : 'USA'
, 'NYC, New York' : 'USA'
, 'NYC,US - Cali, Colombia' : 'USA'
, 'NYC&NJ' : 'USA'
, 'NYHC' : 'USA'
, 'Oakland, CA' : 'USA'
, 'Ocean City, NJ' : 'USA'
, 'Odawara, Japan' : 'JPN'
, 'OES 4th Point. sisSTAR & TI' : 'USA'
, 'Ohio' : 'USA'
, 'Ohio, USA' : 'USA'
, 'OK' : 'USA'
, 'Okanagan Valley, BC' : 'USA'
, 'Oklahoma' : 'USA'
, 'Oklahoma City' : 'USA'
, 'Oklahoma City, OK' : 'USA'
, 'Oklahoma, USA' : 'USA'
, 'Olathe, KS' : 'USA'
, 'Oldenburg // London' : 'UK'
, 'Olympia, WA' : 'USA'
, 'Oneonta, NY/ Staten Island, NY' : 'USA'
, 'Ontario Canada' : 'CAN'
, 'Ontario, Canada' : 'CAN'
, 'Ontario, Canada. ' : 'CAN'
, 'Orange County, CA' : 'USA'
, 'Orange County, California' : 'USA'
, 'Orange County, NY' : 'USA'
, 'Orbost, Victoria, Australia' : 'AUS'
, 'Oregon and Washington' : 'USA'
, 'Oregon, USA' : 'USA'
, 'Orlando' : 'USA'
, 'Orlando ' : 'USA'
, 'Orlando, FL' : 'USA'
, 'Orlando,FL  USA' : 'USA'
, 'Orlando/Cocoa Beach, FL' : 'USA'
, 'Ormond By The Sea, FL' : 'USA'
, 'Oshawa, Canada' : 'CAN'
, 'Oshawa/Toronto' : 'CAN'
, 'Otsego, MI' : 'USA'
, 'Ottawa, Canada' : 'CAN'
, 'Ottawa, Ontario' : 'CAN'
, 'Ottawa,Ontario Canada' : 'CAN'
, 'Overland Park, KS' : 'USA'
, 'oxford' : 'USA'
, 'Oxford / bristol' : 'UK'
, 'Oxford, MS' : 'USA'
, 'Oxford, OH' : 'USA'
, 'PA' : 'USA'
, 'PA, USA' : 'USA'
, 'PA.USA' : 'USA'
, 'Pacific Northwest' : 'USA'
, 'Paducah, KY' : 'USA'
, 'Palestine Texas' : 'USA'
, 'Palm Beach County, FL' : 'USA'
, 'Palm Desert, CA' : 'USA'
, 'Palo Alto, CA' : 'USA'
, 'Palo Alto, California' : 'USA'
, 'paradise' : 'USA'
, 'Paradise City' : 'USA'
, 'Paradise, NV' : 'USA'
, 'Paranaque City' : 'USA'
, 'Paris' : 'FRA'
, 'Paris ' : 'FRA'
, 'Paris (France)' : 'FRA'
, 'Paris, France' : 'FRA'
, 'Paris.' : 'FRA'
, 'Park Ridge, Illinois' : 'USA'
, 'Paterson, New Jersey ' : 'USA'
, 'peekskill. new york, 10566 ' : 'USA'
, 'Penn Hills, PA' : 'USA'
, 'Pennsylvania, PA' : 'USA'
, 'Pennsylvania, USA' : 'USA'
, 'Pensacola, FL' : 'USA'
, 'Perth, Australia' : 'AUS'
, 'perth, australia ' : 'AUS'
, 'Perth, Western Australia' : 'AUS'
, 'Petaluma, CA' : 'USA'
, 'Peterborough, Ontario, Canada' : 'CAN'
, 'pettyville, usa' : 'USA'
, 'ph' : 'USA'
, 'Phila.' : 'USA'
, 'Philadelphia' : 'USA'
, 'Philadelphia, PA' : 'USA'
, 'Philadelphia, PA ' : 'USA'
, 'Philadelphia, PA USA' : 'USA'
, 'Philadelphia, Pennsylvania' : 'USA'
, 'Philadelphia, Pennsylvania USA' : 'USA'
, 'philly' : 'USA'
, 'philly ' : 'USA'
, 'Phoenix' : 'USA'
, 'Phoenix Az' : 'USA'
, 'Phoenix, Arizona, USA' : 'USA'
, 'Phoenix, AZ' : 'USA'
, 'Piedmont Triad, NC' : 'USA'
, 'Pig Symbol, Alabama' : 'USA'
, 'Pittsburgh PA' : 'USA'
, 'Plain O\' Texas' : 'USA'
, 'Plano, IL' : 'USA'
, 'Plano, Texas' : 'USA'
, 'Plano,TX' : 'USA'
, 'Pleasanton, CA' : 'USA'
, 'Pompano Beach, FL' : 'USA'
, 'Pontefract UK' : 'UK'
, 'Poplar, London' : 'UK'
, 'Port Charlotte, FL' : 'USA'
, 'Port Jervis, NY' : 'USA'
, 'port matilda pa' : 'USA'
, 'Port Orange, FL' : 'USA'
, 'Portland, OR' : 'USA'
, 'Portsmouth, UK' : 'UK'
, 'Portsmouth, VA' : 'USA'
, 'proudly South African' : 'ZAF'
, 'Purgatory, USA' : 'USA'
, 'QLD Australia' : 'AUS'
, 'Quantico, VA' : 'USA'
, 'Queen Creek AZ' : 'USA'
, 'Queens New York' : 'USA'
, 'Queens, NY' : 'USA'
, 'Queensland, Australia' : 'AUS'
, 'Raleigh (Garner/Cleveland) NC' : 'USA'
, 'Raleigh Durham, NC' : 'USA'
, 'Raleigh, NC' : 'USA'
, 'Rapid City, Black Hills, SD' : 'USA'
, 'Rapid City, South Dakota' : 'USA'
, 'Reading UK' : 'UK'
, 'Redding, California, USA' : 'USA'
, 'Redondo Beach, CA' : 'USA'
, 'Renfrew, Scotland' : 'UK'
, 'Republic of Texas' : 'USA'
, 'Reston, VA, USA' : 'USA'
, 'Rheinbach / Germany' : 'GER'
, 'Rhode Island' : 'USA'
, 'RhodeIsland' : 'USA'
, 'Richardson TX' : 'USA'
, 'Richmond, VA' : 'USA'
, 'rio de janeiro | brazil' : 'BRA'
, 'Riverside, CA' : 'USA'
, 'Riverside, California.' : 'USA'
, 'Riverview, FL ' : 'USA'
, 'Roanoke VA' : 'USA'
, 'Roanoke, VA' : 'USA'
, 'Rochester Hills, MI' : 'USA'
, 'Rochester, NY' : 'USA'
, 'Rockford, IL' : 'USA'
, 'Rockland County, NY' : 'USA'
, 'Rome, Italy' : 'ITA'
, 'rowyso dallas ' : 'USA'
, 'Rutherfordton, NC' : 'USA'
, 'S√å¬£o Paulo SP,  Brasil' : 'BRA'
, 'S√å¬£o Paulo, Brasil' : 'BRA'
, 'Sacramento, CA' : 'USA'
, 'Sacramento, California' : 'USA'
, 'Saint Louis, Missouri' : 'USA'
, 'Saint Lucia' : 'USA'
, 'Saint Marys, GA' : 'USA'
, 'Saint Paul' : 'USA'
, 'Saipan, CNMI' : 'USA'
, 'Saline, MI' : 'USA'
, 'San Antonio-ish, TX' : 'USA'
, 'San Antonio, TX' : 'USA'
, 'San Diego' : 'USA'
, 'San Diego CA' : 'USA'
, 'San Diego California 92101' : 'USA'
, 'San Diego, CA' : 'USA'
, 'San Diego, Calif.' : 'USA'
, 'San Diego, California' : 'USA'
, 'San Diego, Texas.' : 'USA'
, 'San Francisco' : 'USA'
, 'San Francisco , CA' : 'USA'
, 'San Francisco Bay Area' : 'USA'
, 'San Francisco, CA' : 'USA'
, 'San Fransokyo' : 'USA'
, 'san gabriel la union' : 'USA'
, 'San Jose' : 'USA'
, 'San Jose, CA' : 'USA'
, 'San Jose, CA, USA' : 'USA'
, 'San Jose, California' : 'USA'
, 'San Luis Obispo, CA' : 'USA'
, 'San Mateo County, CA' : 'USA'
, 'Sand springs oklahoma' : 'USA'
, 'Sandton, South Africa' : 'ZAF'
, 'Santa Clara, CA' : 'USA'
, 'Santa Cruz, CA' : 'USA'
, 'Santa Maria, CA' : 'USA'
, 'Santa Monica, CA' : 'USA'
, 'Sao Paulo, Brazil' : 'BRA'
, 'Sarasota, FL' : 'USA'
, 'Saskatchewan, Canada' : 'CAN'
, 'Saudi Arabia' : 'SAU'
, 'SaudI arabia - riyadh ' : 'SAU'
, 'Savage States of America' : 'USA'
, 'Scotland' : 'UK'
, 'Scotland, United Kingdom' : 'UK'
, 'Scotts Valley, CA' : 'USA'
, 'Scottsdale, AZ' : 'USA'
, 'Scottsdale. AZ' : 'USA'
, 'Screwston, TX' : 'USA'
, 'SE London(heart is by the sea)' : 'UK'
, 'Seattle' : 'USA'
, 'Seattle native in Prescott, AZ' : 'USA'
, 'Seattle, WA' : 'USA'
, 'SEATTLE, WA USA' : 'USA'
, 'Seattle, Washington' : 'USA'
, 'SF Bay Area, California / Greater Phoenix, AZ' : 'USA'
, 'Sharkatraz/Bindle\'s Cleft, PA' : 'USA'
, 'Sherwood, Brisbane, Australia' : 'AUS'
, 'Shirley, NY' : 'USA'
, 'Singapore' : 'SGP'
, 'sitting on the fence, New York' : 'USA'
, 'Somerset, UK' : 'UK'
, 'Somewhere between Chicago & Milwaukee' : 'USA'
, 'Somewhere in China.' : 'CHN'
, 'Somewhere in Jersey' : 'USA'
, 'Somewhere in Spain' : 'ESP'
, 'Somewhere in the Canada' : 'CAN'
, 'somewhere USA ' : 'USA'
, 'South Africa' : 'ZAF'
, 'south africa eastern cape' : 'ZAF'
, 'South Bloomfield, OH' : 'USA'
, 'South Carolina, USA' : 'USA'
, 'South Florida' : 'USA'
, 'South Pasadena, CA' : 'USA'
, 'South, USA' : 'USA'
, 'Southern California' : 'USA'
, 'SOUTHERN CALIFORNIA DESERT' : 'USA'
, 'southwest, Tx' : 'USA'
, 'Spain' : 'ESP'
, 'Spain but Opa-Locka, FL' : 'USA'
, 'Spokane, Washington' : 'USA'
, 'Spokane, Washington 99206' : 'USA'
, 'Spring Grove, IL' : 'USA'
, 'Spring Tx' : 'USA'
, 'St Austell, Cornwall' : 'USA'
, 'St Charles, MD' : 'USA'
, 'St Louis, MO' : 'USA'
, 'St Paul, MN' : 'USA'
, 'St PetersburgFL' : 'USA'
, 'St. Catharines, Ontario' : 'CAN'
, 'St. John\'s, NL, Canada' : 'CAN'
, 'St. Joseph, Minnesota' : 'USA'
, 'St. Louis' : 'USA'
, 'St. Louis Mo.' : 'USA'
, 'St. Louis, Missouri' : 'USA'
, 'St. Louis, MO' : 'USA'
, 'St. Patrick\'s Purgatory' : 'USA'
, 'St.Cloud, MN' : 'USA'
, 'State College, PA' : 'USA'
, 'Stockton on tees Teesside UK' : 'UK'
, 'Street of Dallas' : 'USA'
, 'Sugar Land, TX' : 'USA'
, 'Sunny South florida ' : 'USA'
, 'Sunnyvale, CA' : 'USA'
, 'Sutton, London UK' : 'UK'
, 'Sydney' : 'AUS'
, 'Sydney Australia' : 'AUS'
, 'Sydney, Australia' : 'AUS'
, 'Sylacauga, Alabama' : 'USA'
, 'Tacoma,Washington' : 'USA'
, 'Tallahassee Florida' : 'USA'
, 'Tallahassee, FL' : 'USA'
, 'Tampa' : 'USA'
, 'Tampa-St. Petersburg, FL' : 'USA'
, 'Tampa, FL' : 'USA'
, 'Temecula, CA' : 'USA'
, 'Tennessee' : 'USA'
, 'Tennessee, USA' : 'USA'
, 'Terlingua, Texas' : 'USA'
, 'Texas' : 'USA'
, 'Texas ' : 'USA'
, 'texas a&m university' : 'USA'
, 'Texas af' : 'USA'
, 'Texas-USA¬â√£¬¢ ?' : 'USA'
, 'Texas, USA' : 'USA'
, 'texasss' : 'USA'
, 'The Great State of Texas' : 'USA'
, 'The Hammock, FL, USA' : 'USA'
, 'The land of New Jersey. ' : 'USA'
, 'The UK' : 'UK'
, 'The windy plains of Denver' : 'USA'
, 'Topeka, KS' : 'USA'
, 'Tornado Alley, USA ' : 'USA'
, 'Toronto' : 'CAN'
, 'toronto ¬â√õ¬¢ dallas' : 'USA'
, 'Toronto-Citizen of Canada & US' : 'CAN'
, 'Toronto, Bob-Lo, Miami Beach' : 'CAN'
, 'Toronto, Canada' : 'CAN'
, 'Toronto, ON' : 'CAN'
, 'Toronto, ON, Canada' : 'CAN'
, 'Toronto, Ontario' : 'CAN'
, 'Torrance, CA' : 'USA'
, 'Trackside California' : 'USA'
, 'Tractor land aka Bristol' : 'UK'
, 'trapped in America' : 'USA'
, 'Traverse City, MI' : 'USA'
, 'Tri-Cities, Wash.' : 'USA'
, 'Tring, UK' : 'UK'
, 'Trinity, Bailiwick of Jersey' : 'USA'
, 'Tucson, Az' : 'USA'
, 'Tulalip, Washington' : 'USA'
, 'Tulsa, Oklahoma' : 'USA'
, 'TX' : 'USA'
, 'Tyler, TX' : 'USA'
, 'U.K.' : 'UK'
, 'U.S' : 'USA'
, 'U.S.' : 'USA'
, 'U.S. Northern Virginia' : 'USA'
, 'U.S.A' : 'USA'
, 'U.S.A.   FEMA Region 5' : 'USA'
, 'U.S.A. - Global Members Site' : 'USA'
, 'UAE,Sharjah/ AbuDhabi' : 'UAE'
, 'UK' : 'UK'
, 'UK Great Britain ' : 'UK'
, 'Ukraine' : 'UKR'
, 'United Kingdom' : 'UK'
, 'United Kingdom,Fraserburgh' : 'UK'
, 'United States' : 'USA'
, 'United States of America' : 'USA'
, 'United States where it\'s warm' : 'USA'
, 'Unites States' : 'USA'
, 'University of Chicago' : 'USA'
, 'University of South Florida' : 'USA'
, 'University of Toronto' : 'CAN'
, 'Upper manhattan, New York' : 'USA'
, 'Upper St Clair, PA' : 'USA'
, 'Upstate New York' : 'USA'
, 'US' : 'USA'
, 'US, PA' : 'USA'
, 'USA' : 'USA'
, 'USA ' : 'USA'
, 'USA , AZ' : 'USA'
, 'USA (Formerly @usNOAAgov)' : 'USA'
, 'USA, Alabama' : 'USA'
, 'USA, Haiti, Nepal' : 'USA'
, 'USA, North Dakota' : 'USA'
, 'USA, WA' : 'USA'
, 'USA/SO FLORIDA via BROOKLYN NY' : 'USA'
, 'USAoV' : 'USA'
, 'Utah, USA' : 'USA'
, 'Utica NY' : 'USA'
, 'va' : 'USA'
, 'Van Buren, MO' : 'USA'
, 'Vancouver' : 'CAN'
, 'Vancouver BC' : 'CAN'
, 'Vancouver Canada' : 'CAN'
, 'vancouver usa' : 'USA'
, 'Vancouver, BC' : 'CAN'
, 'Vancouver, BC, Canada' : 'CAN'
, 'Vancouver, BC.' : 'CAN'
, 'Vancouver, British Columbia' : 'CAN'
, 'Vancouver, Canada' : 'CAN'
, 'Vancouver, Colombie-Britannique' : 'CAN'
, 'Ventura, Ca' : 'USA'
, 'Vermont, USA' : 'USA'
, 'Vero Beach , FL' : 'USA'
, 'Very SW CA, USA....Draenor' : 'USA'
, 'Victoria, Australia, Earth' : 'AUS'
, 'Victoria, BC' : 'USA'
, 'Victoria, BC  Canada' : 'CAN'
, 'Victoria, British Columbia' : 'CAN'
, 'Victoria, Canada' : 'CAN'
, 'Victoria, Tx.' : 'USA'
, 'Victorville, CA' : 'USA'
, 'Virginia' : 'USA'
, 'Virginia, United States' : 'USA'
, 'Virginia, USA' : 'USA'
, 'Vista, CA' : 'USA'
, 'Waco TX' : 'USA'
, 'Waco, Texas' : 'USA'
, 'WAISTDEEP, TX' : 'USA'
, 'Wales, United Kingdom' : 'UK'
, 'Walker County, Alabama' : 'USA'
, 'Walthamstow, London' : 'UK'
, 'Wandsworth, London' : 'UK'
, 'Warm Heart Of Africa' : 'USA'
, 'Warrandyte, Australia' : 'AUS'
, 'Washington' : 'USA'
, 'Washington D.C.' : 'USA'
, 'Washington DC' : 'USA'
, 'Washington DC / Nantes, France' : 'USA'
, 'Washington State' : 'USA'
, 'washington, d.c.' : 'USA'
, 'Washington, D.C. ' : 'USA'
, 'Washington, D.C., area' : 'USA'
, 'Washington, DC' : 'USA'
, 'Washington, DC & Charlotte, NC' : 'USA'
, 'Washington, DC 20009' : 'USA'
, 'Washington, DC NATIVE' : 'USA'
, 'Washington, Krasnodar (Russia)' : 'USA'
, 'Washington, USA' : 'USA'
, 'WASHINGTON,DC' : 'USA'
, 'Waterford MI' : 'USA'
, 'Waterloo, ON' : 'USA'
, 'Waterloo, Ont' : 'USA'
, 'Waukesha, WI' : 'USA'
, 'Wausau, Wisconsin' : 'USA'
, 'Waverly, IA' : 'USA'
, 'Webster, TX' : 'USA'
, 'West Chester, PA' : 'USA'
, 'West Coast, Cali USA' : 'USA'
, 'West Coast, USA' : 'USA'
, 'West Hollywood, CA' : 'USA'
, 'West Lancashire, UK.' : 'UK'
, 'West Palm Beach, Florida' : 'USA'
, 'West Richland, WA' : 'USA'
, 'West Vancouver, B.C.' : 'CAN'
, 'West Virginia, USA' : 'USA'
, 'Western New York' : 'USA'
, 'Western Washington' : 'USA'
, 'wherever-the-fuck washington' : 'USA'
, 'White Plains, NY' : 'USA'
, 'Wilbraham, MA' : 'USA'
, 'Wild Wild Web' : 'USA'
, 'Wildomar, CA' : 'USA'
, 'Williamsbridge, Bronx, New Yor' : 'USA'
, 'Williamsburg, VA' : 'USA'
, 'Williamstown, VT' : 'USA'
, 'Wilmington, DE' : 'USA'
, 'Wilmington, Delaware' : 'USA'
, 'Wilmington, NC' : 'USA'
, 'Wiltshire' : 'USA'
, 'Windsor,Ontario' : 'CAN'
, 'Winnipeg' : 'USA'
, 'Winnipeg, Manitoba' : 'USA'
, 'Winnipeg, MB, Canada' : 'CAN'
, 'Winston Salem, North Carolina' : 'USA'
, 'winston-salem north carolina' : 'USA'
, 'Winston-Salem, NC' : 'USA'
, 'Winter Park, Colorado' : 'USA'
, 'Wisconsin' : 'USA'
, 'Wisconsin, USA' : 'USA'
, 'wny' : 'USA'
, 'Wolverhampton/Brum/Jersey' : 'USA'
, 'Wood Buffalo, Alberta' : 'CAN'
, 'Woodcreek HS, Roseville, CA' : 'USA'
, 'Wynne, AR' : 'USA'
, 'Wyoming, MI (Grand Rapids)' : 'USA'
, 'Xi\'an, China' : 'CHN'
, 'Yadkinville, NC' : 'USA'
, 'Yellowknife, NT' : 'USA'
, 'Yewa zone' : 'USA'
, 'Ylisse' : 'USA'
, 'Yobe State' : 'USA'
, 'Yogya Berhati Nyaman' : 'USA'
, 'Youngstown, OH' : 'USA'
, 'Yuba City, CA' : 'USA'
, 'Yulee, FL' : 'USA'
, 'zboyer@washingtontimes.com' : 'USA'
, 'Zeerust, South Africa' : 'ZAF'
    
}


def real_country(x):
    if x in unify_country.keys():
        realCountry = unify_country[x]
    else:
        realCountry = x
    return realCountry


In [None]:
trainDF['Country'] = trainDF['location'].map( lambda text : real_country(text))
testDF['Country'] = testDF['location'].map( lambda text : real_country(text))

In [895]:
trainDF['text'] = trainDF['text']+' '+trainDF['keyword']+' '+trainDF['location']
testDF['text'] = testDF['text']+' '+testDF['keyword']+' '+trainDF['location']

In [258]:
Xtrain = pd.DataFrame(trainSparseMat.todense(), columns=countVec.get_feature_names())
Xtest = pd.DataFrame(testSparseMat.todense(), columns=countVec.get_feature_names())

In [259]:
Xtest.head()

Unnamed: 0,aba,aba woman,abandoned,abc,abc news,ablaze,able,absolutely,abstorm,access,...,youtube,youtube playlist,youtube video,youve,youve home,yr,yr old,yrs,yyc,zone
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [260]:
Xtrainmodified = pd.concat([trainDF['Country'], Xtrain], axis=1, sort=False)
Xtestmodified = pd.concat([testDF['Country'], Xtest], axis=1, sort=False)

In [261]:
Xtrainmodified = pd.get_dummies(Xtrainmodified,drop_first=True)
Xtestmodified = pd.get_dummies(Xtestmodified,drop_first=True)
common_cols = [col for col in set(Xtrainmodified.columns).intersection(Xtestmodified.columns)]
Xtrainmodified = Xtrainmodified[common_cols]
Xtestmodified = Xtestmodified[common_cols]


In [262]:
Xtrainmodified.head()

Unnamed: 0,investigators families,heat,read,fan,making,rly,attacked,obliterated,football,outrage amid,...,derailment freakiest,fog,played,ap,sinking feeling,rising,looking,confirms,mhtwfnet,imagine
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [263]:
Xtestmodified.head()

Unnamed: 0,investigators families,heat,read,fan,making,rly,attacked,obliterated,football,outrage amid,...,derailment freakiest,fog,played,ap,sinking feeling,rising,looking,confirms,mhtwfnet,imagine
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [264]:
scaler = StandardScaler()

Xstrainmodified = scaler.fit_transform(Xtrainmodified)
Xstestmodified = scaler.fit_transform(Xtestmodified)

In [265]:
Xtrain = Xstrainmodified
ytrain = trainDF['target']
Xtest = Xstestmodified
ytest = submitDF['target']
logreg = LogisticRegression()
logreg.fit(Xtrain, ytrain)
y_pred = logreg.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    logreg.score(Xtest, ytest)))
##----
predictions = logreg.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

Accuracy of logistic regression classifier on test set: 0.58
              precision    recall  f1-score   support

           0       1.00      0.58      0.73      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.58      3263
   macro avg       0.50      0.29      0.37      3263
weighted avg       1.00      0.58      0.73      3263

Average f1_score: 0.73


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))


In [130]:
submitDF['target'] = predictions
submitDF.to_csv('my_submission18.csv',index=False)

In [38]:
trainDF['text']

0       Our Deeds are the Reason of this earthquake Ma...
1                   Forest fire near La Ronge Sask Canada
2       All residents asked to shelter in place are be...
3        people receive wildfires evacuation orders in...
4       Just got sent this photo from Ruby Alaska as s...
                              ...                        
7608    Two giant cranes holding a bridge collapse int...
7609    ariaahrary TheTawniest The out of control wild...
7610                        M  UTCkm S of Volcano Hawaii 
7611    Police investigating after an ebike collided w...
7612    The Latest More Homes Razed by Northern Califo...
Name: text, Length: 7613, dtype: object

In [41]:
result[['Country','our','reason','earthquake','this']]

Unnamed: 0,Country,our,reason,earthquake,this
0,World,1,1,1,1
1,World,0,0,0,0
2,World,0,0,0,0
3,World,0,0,0,0
4,World,0,0,0,1
...,...,...,...,...,...
7608,World,0,0,0,0
7609,World,0,0,0,0
7610,World,0,0,0,0
7611,World,0,0,0,0


### HashingVectorizer

now, let's see if the HashingVectorizer will enhance this or not with still the same base model

In [994]:

hashVec = HashingVectorizer(ngram_range=(1,2), binary=True, norm='l2')
trainSparseMat = hashVec.fit_transform(trainDF["text"])
testSparseMat = hashVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
logreg = LogisticRegression()
logreg.fit(Xtrain, ytrain)
y_pred = logreg.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    logreg.score(Xtest, ytest)))
##----
predictions = logreg.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

Accuracy of logistic regression classifier on test set: 0.68
              precision    recall  f1-score   support

           0       1.00      0.68      0.81      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.68      3263
   macro avg       0.50      0.34      0.41      3263
weighted avg       1.00      0.68      0.81      3263

Average f1_score: 0.81


  _warn_prf(average, modifier, msg_start, len(result))


In [910]:
submitDF['target'] = predictions
submitDF.to_csv('my_submission22.csv',index=False)

Accuracy is **66%** using HashingVectorizer which is really an amazing improvement, let's see how we can improve this using different vectorizer

### Term frequency - inverse document frequency (tf-idf)

In [269]:
trainDF["text"]

0       Our Deeds are the Reason of this earthquake Ma...
1                   Forest fire near La Ronge Sask Canada
2       All residents asked to shelter in place are be...
3        people receive wildfires evacuation orders in...
4       Just got sent this photo from Ruby Alaska as s...
                              ...                        
7608    Two giant cranes holding a bridge collapse int...
7609    ariaahrary TheTawniest The out of control wild...
7610                        M  UTCkm S of Volcano Hawaii 
7611    Police investigating after an ebike collided w...
7612    The Latest More Homes Razed by Northern Califo...
Name: text, Length: 7613, dtype: object

In [912]:
termVec = TfidfVectorizer(min_df = 8, ngram_range = (1,2),binary=True ,stop_words='english') 
trainSparseMat = termVec.fit_transform(trainDF["text"])
testSparseMat = termVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
logreg = LogisticRegression()
logreg.fit(Xtrain, ytrain)
y_pred = logreg.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    logreg.score(Xtest, ytest)))
##----
predictions = logreg.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

Accuracy of logistic regression classifier on test set: 0.65
              precision    recall  f1-score   support

           0       1.00      0.65      0.79      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.65      3263
   macro avg       0.50      0.32      0.39      3263
weighted avg       1.00      0.65      0.79      3263

Average f1_score: 0.79


  _warn_prf(average, modifier, msg_start, len(result))


Same Resuilts as Hashing Vectorizer **65%** accuracy

now let's use decission tree classifier

In [271]:
dt = DecisionTreeClassifier()
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
print("DT CV training score:\t", cross_val_score(dt, Xtrain, ytrain, cv=5,n_jobs=1).mean())
dt.fit(Xtrain, ytrain)


DT CV training score:	 0.5972685144041906


DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

In [272]:
termVec = TfidfVectorizer(stop_words='english')
trainSparseMat = termVec.fit_transform(trainDF["text"])
testSparseMat = termVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
dt = DecisionTreeClassifier()
dt.fit(Xtrain, ytrain)
y_pred = dt.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    dt.score(Xtest, ytest)))
##----
predictions = dt.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

Accuracy of logistic regression classifier on test set: 0.59
              precision    recall  f1-score   support

           0       1.00      0.59      0.74      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.59      3263
   macro avg       0.50      0.30      0.37      3263
weighted avg       1.00      0.59      0.74      3263

Average f1_score: 0.74


  _warn_prf(average, modifier, msg_start, len(result))


### Bagging Classifier

In [273]:
termVec = TfidfVectorizer(min_df = 8, ngram_range = (1,2),stop_words='english') 
trainSparseMat = termVec.fit_transform(trainDF["text"])
testSparseMat = termVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
DT = DecisionTreeClassifier()
BaggingModel = BaggingClassifier(base_estimator=DT, n_estimators=50, 
                          max_features=0.5,max_samples=0.5, oob_score=True)
BaggingModel.fit(Xtrain, ytrain)
y_pred = BaggingModel.predict(Xtest)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(
    BaggingModel.score(Xtest, ytest)))
##----
predictions = BaggingModel.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

Accuracy of logistic regression classifier on test set: 0.64
              precision    recall  f1-score   support

           0       1.00      0.64      0.78      3263
           1       0.00      0.00      0.00         0

    accuracy                           0.64      3263
   macro avg       0.50      0.32      0.39      3263
weighted avg       1.00      0.64      0.78      3263

Average f1_score: 0.78


  _warn_prf(average, modifier, msg_start, len(result))


### Trying DTC  

In [24]:
countVec = CountVectorizer()
trainSparseMat = countVec.fit_transform(trainDF["text"])
testSparseMat = countVec.transform(testDF["text"])
Xtrain = trainSparseMat
ytrain = trainDF['target']
Xtest = testSparseMat
ytest = submitDF['target']
dt = DecisionTreeClassifier()

print("DT CV training score:\t", cross_val_score(dt, Xtrain, ytrain, cv=5,
                                                 n_jobs=1).mean())
dt.fit(Xtrain, ytrain)
print("DT test score:\t", dt.score(Xtest, ytest))


# gridsearch params
dtc_params = {
    'max_depth': range(1,20),
    'max_features': [None, 'log2', 'sqrt'],
    'min_samples_split': range(5,30),
    'max_leaf_nodes': [None],
    'min_samples_leaf': range(1,10)
}

from sklearn.model_selection import GridSearchCV
# set the gridsearch
dtc_gs = GridSearchCV(dt,
                      dtc_params, cv=5, verbose=1, n_jobs=-1)







DT CV training score:	 0.6521782083394089
DT test score:	 0.8072326080294208


In [25]:
dtc_gs.fit(Xtrain, ytrain)


dtc_best = dtc_gs.best_estimator_
print(dtc_gs.best_params_)
print(dtc_gs.best_score_)

Fitting 5 folds for each of 12825 candidates, totalling 64125 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:   11.3s
[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:   21.0s
[Parallel(n_jobs=-1)]: Done 792 tasks      | elapsed:   35.9s
[Parallel(n_jobs=-1)]: Done 3632 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 4256 tasks      | elapsed:  1.6min
[Parallel(n_jobs=-1)]: Done 7222 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 8960 tasks      | elapsed:  2.7min
[Parallel(n_jobs=-1)]: Done 11078 tasks      | elapsed:  3.4min
[Parallel(n_jobs=-1)]: Done 14284 tasks      | elapsed:  4.2min
[Parallel(n_jobs=-1)]: Done 17590 tasks      | elapsed:  5.2min
[Parallel(n_jobs=-1)]: Done 20996 tasks      | elapsed:  7.1min
[Parallel(n_jobs=-1)]: Done 24502 tasks      | elapsed:  8.5min
[Parallel(n_jobs=-1)]: Done 28108 tasks      | elapsed: 10.1min
[Parallel(n_jobs=-1)]: Done 34038 tasks 

{'max_depth': 19, 'max_features': None, 'max_leaf_nodes': None, 'min_samples_leaf': 1, 'min_samples_split': 15}
0.6629477231724162


In [27]:
predictions = dtc_best.predict(Xtest)
print(classification_report(ytest, predictions))
print('Average f1_score: {:.2f}'.format(
    f1_score(ytest, predictions, average='weighted')))

              precision    recall  f1-score   support

           0       0.77      0.89      0.83      2068
           1       0.74      0.54      0.63      1195

    accuracy                           0.76      3263
   macro avg       0.76      0.72      0.73      3263
weighted avg       0.76      0.76      0.75      3263



In [28]:
submitDF['target'] = predictions
submitDF.to_csv('my_submission12.csv',index=False)

## 6. Summary

In this assignment, I started to load all data followed by some EDA tasked in order to see the contents of the given data and if prospect features that is useful to be used for my predictions and I'm going to describe the steps i used as follows:
<br><br>**1. EDA** <br>
In this phase, I had a deep look into my data to see my dataframe info and for the train dataframe I have **7613** lines whearas **3263** entries for the test dataframe. By looking into my dataframes I found null values for **keyword** and **location** fields with (**61**, **2533**),(**26**, **1105**) null entries for the trainDF, testDF respectively. My target is almost **43%** disasterious and **57%** for the rest. Also, some of the values was repeated for the location such as **USA, New York, United States**,.. etc. and for the keyword **fatalities, deluge, armageddon**,.. etc as which I decided later as i will decsibe to try to see if they are useful for my model or not(I'll explain later on). for simplicity and benefits later on i decided to replace all **NaN** values in these two fields with white space.
<br><br>**2. Baseline Model** <br>
in my baseline model I choose to user CountVectorizer along with logistic regression classifier with only using the text column, so i creates the sparse matrix for both the train and test dataframe and the accuracy for this model was **63%** with **0.78** for my **f1 score**

<br>**3. Model Enhancements** <br><br>
In order to enhance my model and I started to look into my text and followed the following steps to have it cleaned:<br>
    **3.1 dealing with abbreviation issue:** here I started to look into all appreviations and i found some functions in the internet to replace these abbrevations with normal text.
<br>**3.2 dealing with contractions issue:** here I used python module **contractions** to fix all I'm,  don't,. words inside my text and the results was pretty good to return I am , do not, and did not.
<br>**3.3 dealing with URL's/HTML tags issue:** here i used regular expression python module to remove all URL's /HTML tagsfrom my text.
<br>**3.4 dealing with Emojis issue:** here I used regular expression python module to remove all emojies such as :) , ;) etc.
<br>**3.5 dealing with numbers issue:** here I used num2words python module to replace all numbers with related text but it didnt really work evectively as it replace for ex 21 to twoone which is totally different from twenty one, and despite of this issue I found a slight increase in f1 score after i used this technique but i believe i need to enhance this more and I'm pretty sure it will enhance my results dramatically.
<br>**3.6 dealing with punctuation issue:** here I used punctuations submodule in string module remove all punctuations.
<br>**3.7 removing stopwords:** and I have done this directly inside the vectorizer.
<br>**3.8 lemmatizing/stemming:** I have tried bith techniques and i ended up with using only the stemming through PorterStemmer as the enhancements was obvious.
<br><br>**4. Result** <br>
After executing the previous tasks i manages to move my f1-score to the best results i have so far (**0.80879**).
<img src="bestscore3.png">
<br>It is worth to mention here as well that i have tried other vectorizers like countvectorizer Hashing and tf and tried different hyperparameters and the best results was so far is CountVectorizer(ngram_range=(1,2), binary=True, stop_words='english')

<br><br>**5. Other tasks** <br>
I also tried the following but it that was of no significance on my model:
5.1 adding location and keyword to the text and then follow the step# 3 again
5.1 tried to generate a country feature from the location and use these as feature inside my model
5.1 tried to combine similar keywords a keyword feature.
5.1 tried to use gridsearching a decision tree classifier and see if there is an improvment
5.1 tried to use bagging classifier with DTC as a base and see if there is an improvment


## 7. Future Enhancements

1. Try to replace all emojies with related text emotion, for ex :) to be replace to Smiling. etc. (for the record i tried this a lot but i didnt find a ready made module for this and i think i will need to write my own function to replace all emojies with related text expression)
2. Try to fix the number to text issue to get more meanningful from the text
3. Trying Neural Networks

## 8. References
https://gist.github.com/slowkow/7a7f61f495e3dbb7e3d767f97bd7304b <br>
https://github.com/rishabhverma17/sms_slang_translator <br>
https://sondosatwi.wordpress.com/2017/08/01/using-text-data-and-dataframemapper-in-python/