# Short Flask Tutorial

Start with http://flask.pocoo.org/docs/0.12/quickstart/. Your task is to write an HTTP service that gets a string and returns its length. Using the `requests.get` (`requests` is built-in library), write a python client that communicates with the service. This will serve as a stub for your real use of Flask. 

Example: http://localhost:5500/get_length?str=supercalifragilisticexpialidocious should return 34 


In [1]:
from werkzeug.wrappers import Request, Response
from werkzeug.serving import run_simple
from flask import Flask, request, redirect, url_for
import pandas as pd
import numpy as np

server_name = 'http://localhost:5000'

def shutdown_server():
    func = request.environ.get('werkzeug.server.shutdown')
    if func is None:
        raise RuntimeError('Not running with the Werkzeug Server')
    func()

In [None]:
app = Flask(__name__)

# Server shutdown endpoint
@app.route('/shutdown')
def shutdown():
    shutdown_server()
    print('Shutting down server')
    return 'Server shutting down...'

@app.route('/get_length')
def get_length():
    word = request.args.get('str', None)
    if word is not None:
        return "<H3><font color='blue' face='arial'>String '%s' has a length of %d</font></H3>" % (word, len(word))
    else:
        return """
            <H3><font color='red' face='arial'>
            Missing 'str' argument: <br><br>http://localhost:5500/get_length?str=xxx
            </font></H3>
            """

if __name__ == '__main__':
    run_simple('localhost', 5500, app)

# Prepare the data

Download https://raw.githubusercontent.com/hadley/data-baby-names/master/baby-names.csv

Load it with pandas and build a table with the following columns:

1. Name
1. Mean Percent of boys over the years
1. Mean Percent of girls over the years
1. Total percent (Column2+Column3) / 2
1. IsGirl (= Column3 > Column2)

Sort by total percent and take the top 2000 names.

In [52]:
baby_data_filename = 'baby-names.csv'

# Load the file and show first rows
raw_data = pd.read_csv(baby_data_filename)
raw_data.head()

Unnamed: 0,year,name,percent,sex
0,1880,John,0.081541,boy
1,1880,William,0.080511,boy
2,1880,James,0.050057,boy
3,1880,Charles,0.045167,boy
4,1880,George,0.043292,boy


In [59]:
all_baby_names = pd.pivot_table(raw_data, values='percent', index=['name'], columns=['sex'], aggfunc=np.mean, fill_value=0)
all_baby_names['total_percent'] = all_baby_names['boy']+all_baby_names['girl']

# Add Is_Girl column
all_baby_names['is_girl'] = all_baby_names['girl'] > all_baby_names['boy']
all_baby_names = all_baby_names.sort_values('total_percent', axis=0, ascending=False)

# Clean Index names
all_baby_names = all_baby_names.reset_index()
all_baby_names.index.name = None
all_baby_names.columns.name = None

# Select top 2000 names
baby_names = all_baby_names.iloc[:2000]
not_in_top = all_baby_names.iloc[2001:]

In [60]:
# Display the first 100 rows in style
def custom_style(val):
    return 'background-color:pink; color:black' if val else 'background-color:steelblue; color:white'

(baby_names[:100].style
 .applymap(custom_style, subset=pd.IndexSlice[:, ['is_girl']])
)


Unnamed: 0,name,boy,girl,total_percent,is_girl
0,John,0.0410821,0.000190596,0.0412727,False
1,James,0.035465,0.000171445,0.0356365,False
2,Mary,0.000239301,0.0349757,0.035215,True
3,William,0.0341818,0.000139852,0.0343217,False
4,Robert,0.0296253,0.000142066,0.0297674,False
5,Charles,0.0195205,0.000112667,0.0196332,False
6,Michael,0.0183419,0.000190088,0.018532,False
7,Joseph,0.0177712,8.49535e-05,0.0178562,False
8,David,0.0167366,0.000100773,0.0168373,False
9,George,0.0162539,0.000138763,0.0163926,False


# Create features

Using `nltk` package, or yout own function, create 2-grams (https://en.wikipedia.org/wiki/N-gram) of the chracters in each name.

In [14]:
import nltk
from nltk import ngrams
from sklearn.preprocessing import MultiLabelBinarizer

def to_n_gram(x, n):
    return [''.join(grams) for grams in ngrams(x.lower(), n)]

def to_letter_list(x):
    char_list = [0] * 26
    for c in x.lower():
        char_list[ord(c)-ord('a')] += 1
    
    return char_list

def compute_features(df, name_column):
    df['bigram'] = df[name_column].apply(lambda x: to_n_gram(x,2))
    df['trigram'] = df[name_column].apply(lambda x: to_n_gram(x,3))
    
    df['letter_list'] = df[name_column].apply(to_letter_list)
    letter_list_col_names = list(chr(c) for c in range(ord('a'), ord('z')+1))
    letter_list_features = pd.DataFrame(df['letter_list'].values.tolist(), index= df.index, columns=letter_list_col_names)
    
    # Create a sparse feature matrix
    mlb = MultiLabelBinarizer()
    bigrams = mlb.fit_transform(df['bigram'])
    bg_features = pd.DataFrame(bigrams, columns=mlb.classes_)
    trigrams = mlb.fit_transform(df['trigram'])
    tg_features = pd.DataFrame(trigrams, columns=mlb.classes_)
    df = pd.concat([df, letter_list_features, bg_features, tg_features], axis=1, sort=False)
    
    new_features = pd.concat([letter_list_features, bg_features, tg_features], axis=1, sort=False)
    
    return df, new_features

enriched_baby_names, new_features = compute_features(baby_names, 'name')
enriched_baby_names.head()

Unnamed: 0,name,boy,girl,total_percent,is_girl,bigram,trigram,letter_list,a,b,...,zet,zie,zio,zly,zmi,zoe,zoi,zul,zzi,zzy
0,John,0.041082,0.000191,0.041273,False,"[jo, oh, hn]","[joh, ohn]","[0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, ...",0,0,...,0,0,0,0,0,0,0,0,0,0
1,James,0.035465,0.000171,0.035636,False,"[ja, am, me, es]","[jam, ame, mes]","[1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, ...",1,0,...,0,0,0,0,0,0,0,0,0,0
2,Mary,0.000239,0.034976,0.035215,True,"[ma, ar, ry]","[mar, ary]","[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...",1,0,...,0,0,0,0,0,0,0,0,0,0
3,William,0.034182,0.00014,0.034322,False,"[wi, il, ll, li, ia, am]","[wil, ill, lli, lia, iam]","[1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 1, 0, 0, ...",1,0,...,0,0,0,0,0,0,0,0,0,0
4,Robert,0.029625,0.000142,0.029767,False,"[ro, ob, be, er, rt]","[rob, obe, ber, ert]","[0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, ...",0,1,...,0,0,0,0,0,0,0,0,0,0


What are good reasons to lower case the names before producing the ngrams?

<font color='green'>Lowercasing the names would create less features, reduce the data sparsity and result in more populated features.</font>

What are bad reasons to lower case the names before producing the ngrams?

<font color='green'>Lowercasing the names before producing the ngrams would no longer differenciate whether a 2-letter sequence is at the beginning or later in a name and result in a loss of information</font>

What is the percent of boys in the data?

In [72]:
print('Percent of boys in the data = %.2f%%' % (enriched_baby_names['is_girl'].mean()*100))

Percent of boys in the data = 64.05%


What is the sparsity of the data? What's the percent of non-zero cells in the feature matrix you created? 

In [73]:
print('Percent of non-zero cells in the n-gram feature matrix = %.2f%%' % 
      (100*new_features.sum().sum()/new_features.count().sum()))

Percent of non-zero cells in the n-gram feature matrix = 0.69%


# Train a model

Sort by name column and take every fifth name to be the test data. Using Logistic Regression or any other model you like train a model. Evaluate the mode using Accuracy, AUC and Mean Average Percision (scikit: `average_precision_score`) on the train and test sets. Think about regularization - you have a lot of features. If you are running out of time, do this quickly and move to the next section. Come back to this later.

In [74]:
# Model Imports
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.metrics import average_precision_score

train_indices = enriched_baby_names.index[enriched_baby_names.index % 5 != 4].values
test_indices = enriched_baby_names.index[enriched_baby_names.index % 5 == 4].values

X_train, y_train = new_features.iloc[train_indices].copy(), enriched_baby_names['is_girl'].iloc[train_indices]
X_test, y_test = new_features.iloc[test_indices].copy(), enriched_baby_names['is_girl'].iloc[test_indices]

lr_clf = LogisticRegression(class_weight='balanced', solver='lbfgs')
lr_clf.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight='balanced', dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='warn', n_jobs=None, penalty='l2', random_state=None,
          solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)

In [75]:
# Test model against training set
y_train_predict = lr_clf.predict(X_train)
print('First name sex prediction against training set is %.2f%%' % (100*average_precision_score(y_train, y_train_predict)))

# Test model against test set
y_test_predict = lr_clf.predict(X_test)
print('First name sex prediction against test set is %.2f%%' % (100*average_precision_score(y_test, y_test_predict)))

First name sex prediction against training set is 96.20%
First name sex prediction against test set is 85.35%


# Save you model

using `pickle`, save your model to disk and load it again in a different script. Make sure you get the same results. 

In [2]:
import pickle
model_filename = 'finalized_model.sav'
feature_filename = 'finalized_model_features.sav'

In [None]:
# Save model
pickle.dump((lr_clf, X_train.columns) , open(model_filename, 'wb'))

In [3]:
# load the model from disk
loaded_model, feature_names = pickle.load(open(model_filename, 'rb'))

In [76]:
y_train_predict = loaded_model.predict(X_train)
print('First name sex prediction against training set is %.2f%%' % (100*average_precision_score(y_train, y_train_predict)))
# Test model against test set
y_test_predict = loaded_model.predict(X_test)
print('First name sex prediction against test set is %.2f%%' % (100*average_precision_score(y_test, y_test_predict)))

First name sex prediction against training set is 96.20%
First name sex prediction against test set is 85.35%


# Serve your model

Using `flask`, create an API that takes a name and decides if its a boy or a girl. Also have an endpoint that recieves a list of names and return a list of genders. Use the model you saved with `pickle` in the previous section.


e.g. http://localhost:5000/guess_the_gender?name=JeanLuc

In [4]:
def predict_genders(name_list):
    df_names = pd.DataFrame(name_list)
    df_names, new_features = compute_features(df_names, 'name')
    
    # add missing feature
    for x in feature_names:
        if x not in new_features.columns.values:
            new_features[x] = 0
            
    # reorder features
    new_features = new_features[feature_names]
    return loaded_model.predict(new_features), loaded_model.predict_proba(new_features).max(axis=1)

In [5]:
import json
app = Flask(__name__)

# Define flask endpoints
@app.route('/shutdown')
def shutdown():
    shutdown_server()
    print('Shutting down server')
    return 'Server shutting down...'

@app.route('/guess_the_gender')
def guess_the_gender():
    name = request.args.get('name', None)
    
    if name is None:
        return json.dumps("Missing Argument")
    
    prediction, proba = predict_genders({'name': [name]})
    return_dict = {'name': name, 
                   'gender': 'girl' if prediction else 'boy',
                   'proba': proba[0]}
        
    return json.dumps(return_dict)

@app.route('/guess_the_genders')
def guess_the_genders():
    name_list = request.args.getlist('name')

    if name_list is None:
        return json.dumps("Missing Argument")
    
    predictions, probas = predict_genders({'name': name_list})
    return_dict = [{'name': name, 
                    'gender': 'girl' if predictions[i] else 'boy',
                   'proba': probas[i]} for i, name in enumerate(name_list)]
        
    return json.dumps(return_dict)

In [6]:
# Run the server! (can't run anything else simultaneously)
if __name__ == '__main__':
    from werkzeug.serving import run_simple
    run_simple('localhost', 5000, app)

 * Running on http://localhost:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [30/Dec/2018 14:50:32] "[37mGET /shutdown HTTP/1.1[0m" 200 -


Shutting down server


# Consume you model with python

using `requests`, send requests to your model. 

In [19]:
import requests

def get_name_gender(name):
    global server_name
    name_param_in_url = 'name=%s' % name
    url = '%s/guess_the_gender?%s' % (server_name, name_param_in_url)
    print('Requesting url: %s' % url)
    return requests.get(url).json()

def get_name_genders(name_list):
    global server_name
    name_param_in_url = 'name=' + '&name='.join(name_list)
    url = '%s/guess_the_genders?%s' % (server_name, name_param_in_url)
    print('Requesting url: %s' % url)
    return requests.get(url).json()

In [79]:
server_name = 'http://localhost:5000'

# Local Server needs to be running on a different process (eg. pycharm)
print('Request:\n%s\n' % get_name_gender('jeanne'))
print('Request:\n%s\n' % get_name_gender('Jeremy'))
print('Request:\n%s\n' % get_name_genders(['Alice', 'JeanLouis', 'Annie', 'Yair', 'Sarah', 
                        'Simcha', 'Daniel', 'Danielle', 'Nathan', 'Benjamin',
                       'Gary', 'Michael', 'Bar', 'Arie', 'Olivier']))

Requesting url: http://localhost:5000/guess_the_gender?name=jeanne
Request:
{'name': 'jeanne', 'gender': 'girl', 'proba': 0.974317325874994}

Requesting url: http://localhost:5000/guess_the_gender?name=Jeremy
Request:
{'name': 'Jeremy', 'gender': 'boy', 'proba': 0.8287490991794317}

Requesting url: http://localhost:5000/guess_the_genders?name=Alice&name=JeanLouis&name=Annie&name=Yair&name=Sarah&name=Simcha&name=Daniel&name=Danielle&name=Nathan&name=Benjamin&name=Gary&name=Michael&name=Bar&name=Arie&name=Olivier
Request:
[{'name': 'Alice', 'gender': 'girl', 'proba': 0.80528871992616}, {'name': 'JeanLouis', 'gender': 'boy', 'proba': 0.6860290852166158}, {'name': 'Annie', 'gender': 'girl', 'proba': 0.8926453758867244}, {'name': 'Yair', 'gender': 'girl', 'proba': 0.7679896968314022}, {'name': 'Sarah', 'gender': 'girl', 'proba': 0.8088943856844397}, {'name': 'Simcha', 'gender': 'girl', 'proba': 0.5547897345564683}, {'name': 'Daniel', 'gender': 'boy', 'proba': 0.7142862953027947}, {'name': '

# Bonus: Create a friendly interface

In [None]:
# This is the code for my friendly interface. You could run it locally
# with the Main below or just go to my heroku page

@app.route('/gender_predictor', methods=['GET'])
def gender_predictor():
    # fetch results from previous request
    name = request.args.get('name')

    # show userform
    html_code = """
         <html>
         <body style="
        background: rgb(240,249,255); /* Old browsers */
        background: -moz-linear-gradient(top, rgba(240,249,255,1) 0%, rgba(203,235,255,1) 47%, rgba(161,219,255,1) 100%); 
        background: -webkit-linear-gradient(top, rgba(240,249,255,1) 0%,rgba(203,235,255,1) 47%,rgba(161,219,255,1) 100%); 
        background: linear-gradient(to bottom, rgba(240,249,255,1) 0%,rgba(203,235,255,1) 47%,rgba(161,219,255,1) 100%); 
        filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#f0f9ff', endColorstr='#a1dbff',GradientType=0 );" >
         <br><br><br>
         <form action = 'http://localhost:5000/gender_predictor' method = 'get'>
         <p align='center'><font face="Verdana" size="16">Enter 1 or more names</font><br>
         <font face="Verdana" size="4">(space separated)</font></p>
         <p align='center'><input type='text' name='name' style='width: 600px;height: 80px;font-size:28pt;'></p>
         <p align='center'><input type='submit' value='GUESS!!' style='width: 600px;height: 80px;font-size:92px;'></p><br>"""

    if name is not None:
        name_list = name.split()
        if len(name_list) == 1:
            results = [get_name_gender(name_list[0])]
        else:
            results = get_name_genders(name_list)

        html_code  += "<p align='center'>"
        for result in results:
            html_code += "<font face='Verdana' size='4' color='pink'>" \
                            if result['gender'] == 'girl' else "<font face='Verdana' size='4' color='blue'>"
            html_code += "%s is a %s with a probability of %.2f%%</font><br>" % \
                         (result['name'], result['gender'], 100*result['proba'])

    html_code += '</p></body></html>'

    return html_code

In [16]:
# Run the server
if __name__ == '__main__':
    from werkzeug.serving import run_simple
    app.run(threaded=True)
    run_simple('localhost', 5000, app)

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [30/Dec/2018 14:52:06] "[37mGET /guess_the_genders?name=jonas&name=simon&name=avi HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:06] "[37mGET /gender_predictor?name=jonas+simon+avi HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:12] "[37mGET /guess_the_gender?name=jeremy HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:12] "[37mGET /gender_predictor?name=jeremy HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:29] "[37mGET /guess_the_gender?name=jeremie HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:29] "[37mGET /gender_predictor?name=jeremie HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:35] "[37mGET /guess_the_genders?name=jeremy&name=benjamin&name=test HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:35] "[37mGET /gender_predictor?name=jeremy+benjamin+test HTTP/1.1[0m" 200 -
127.0.0.1 - - [30/Dec/2018 14:52:40] "[37mGET /guess_the_gender?name=liora HTTP/1.1[0m" 200 

Shutting down server


 * Running on http://localhost:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [30/Dec/2018 14:55:09] "[37mGET /shutdown HTTP/1.1[0m" 200 -


Shutting down server


# Bonus: Put it on Heoruko

### <font color='green'> Feel free to play the guessing game on my heroku page: https://bensous.herokuapp.com 

### <font color='green'>Or to consume the API as demonstrated below</font>

use you client to consume the public model. Follow https://devcenter.heroku.com/articles/getting-started-with-python until step 3 (deploy your app). You will have to use https://github.com/heroku-python/conda-buildpack to work with scikit.

In [21]:
server_name = 'https://bensous.herokuapp.com'
print('Request:\n%s\n' % get_name_gender('jeanne'))
print('Request:\n%s\n' % get_name_gender('Jeremy'))
print('Request:\n%s\n' % get_name_genders(['Alice', 'JeanLouis', 'Annie', 'Yair', 'Sarah', 
                        'Simcha', 'Daniel', 'Danielle', 'Nathan', 'Benjamin',
                       'Gary', 'Michael', 'Bar', 'Arie', 'Olivier']))

Requesting url: https://bensous.herokuapp.com/guess_the_gender?name=jeanne
Request:
{'name': 'jeanne', 'gender': 'girl', 'proba': 0.974317325874994}

Requesting url: https://bensous.herokuapp.com/guess_the_gender?name=Jeremy
Request:
{'name': 'Jeremy', 'gender': 'boy', 'proba': 0.8287490991794317}

Requesting url: https://bensous.herokuapp.com/guess_the_genders?name=Alice&name=JeanLouis&name=Annie&name=Yair&name=Sarah&name=Simcha&name=Daniel&name=Danielle&name=Nathan&name=Benjamin&name=Gary&name=Michael&name=Bar&name=Arie&name=Olivier
Request:
[{'name': 'Alice', 'gender': 'girl', 'proba': 0.80528871992616}, {'name': 'JeanLouis', 'gender': 'boy', 'proba': 0.6860290852166158}, {'name': 'Annie', 'gender': 'girl', 'proba': 0.8926453758867247}, {'name': 'Yair', 'gender': 'girl', 'proba': 0.7679896968314022}, {'name': 'Sarah', 'gender': 'girl', 'proba': 0.8088943856844397}, {'name': 'Simcha', 'gender': 'girl', 'proba': 0.5547897345564683}, {'name': 'Daniel', 'gender': 'boy', 'proba': 0.71428

# Evaluate through web


evaluate your model and your friends models using names not in the top 2000

In [68]:
# Get random names
random_names = not_in_top.sample(10)
print(random_names[['name', 'is_girl']])

          name  is_girl
2464     Lacie     True
2491   Brynlee     True
4786      Wirt    False
6705   Arnoldo    False
5656     Jerel    False
3780   Jasmyne     True
4640  Annabell     True
3618   Dillion    False
5695       Cam    False
4921     Abdul    False


In [71]:
# Saved for future reference:
"""
          name  is_girl
2464     Lacie     True
2491   Brynlee     True
4786      Wirt    False
6705   Arnoldo    False
5656     Jerel    False
3780   Jasmyne     True
4640  Annabell     True
3618   Dillion    False
5695       Cam    False
4921     Abdul    False"""

# Try these names in my model
print('Request:\n%s\n' % get_name_genders(random_names['name']))

# Results with Simon's model:
# https://simon-gender-guesser.herokuapp.com/guess_the_gender?name=Lacie,Brynlee,Wirt,Arnoldo,Jerel,Jasmyne,Annabell,Dillion,Cam,Abdul
"""    
Lacie: Girl
Brynlee: Girl
Wirt: Boy
Arnoldo: Boy
Jerel: Girl
Jasmyne: Girl
Annabell: Girl
Dillion: Boy
Cam: Boy
Abdul: Boy"""

# Comparison of scores Simon vs Jeremy:
"""
Lacie - 1 vs 1
Brynlee - 1 vs 1
Wirt - 1 vs 1
Arnoldo - 1 vs 1
Jerel - 0 vs 1
Jasmyne - 1 vs 1
Annabell - 1 vs 1
Dillion - 1 vs 1
Cam - 1 vs 1
Abdul - 1 vs 1

Final Scores!! 
Simon 9/10
Jeremy 10/10
Names were randomly picked!"""

Requesting url: https://bensous.herokuapp.com/guess_the_genders?name=Lacie&name=Brynlee&name=Wirt&name=Arnoldo&name=Jerel&name=Jasmyne&name=Annabell&name=Dillion&name=Cam&name=Abdul
Request:
[{'name': 'Lacie', 'gender': 'girl', 'proba': 0.9694444347960257}, {'name': 'Brynlee', 'gender': 'girl', 'proba': 0.9251626282164802}, {'name': 'Wirt', 'gender': 'boy', 'proba': 0.9694483694839638}, {'name': 'Arnoldo', 'gender': 'boy', 'proba': 0.9601498241034645}, {'name': 'Jerel', 'gender': 'boy', 'proba': 0.8585606158450109}, {'name': 'Jasmyne', 'gender': 'girl', 'proba': 0.8827693758125652}, {'name': 'Annabell', 'gender': 'girl', 'proba': 0.9952787607708213}, {'name': 'Dillion', 'gender': 'boy', 'proba': 0.6909228822662401}, {'name': 'Cam', 'gender': 'boy', 'proba': 0.7356358078780434}, {'name': 'Abdul', 'gender': 'boy', 'proba': 0.8177641338455629}]



'\nLacie - 1 vs 1\nBrynlee - 1 vs 1\nWirt - 1 vs 1\nArnoldo - 1 vs 1\nJerel - 0 vs 1\nJasmyne - 1 vs 1\nAnnabell - 1 vs 1\nDillion - 1 vs 1\nCam - 1 vs 1\nAbdul - 1 vs 1\n\nFinal Scores!! \nSimon 9/10\nJeremy 10/10\nNames were randomly picked!'

# Discussion

If this was a commercial service, How could you imporve it? 

1. Data preprocesses 
1. Output type
1. Interface

Give some examples


1. This list seems to focus on names used in the US and the performance of my model is much lower against Indian or Israeli names for example. One way to improve this would be to enrich the data with the first results of google images for a given first name, with a computer vision model that is trained to guess the gender.
1. If we wanted to sell this as a service, we should really enrich the output with the year a name was most used for example, average use, and so on. Also, regarding the API we decided to output json as it is the most frequently used format of API these days but a lot of processes are still based on REST api so supporting that as well would be a must if we wanted to sell the service.
1. We did a pretty basic interface but we could add previous searches, a ranking of the most searched names, and a lot of other improvements on the informat, layout, interactivity of the page ect!