Run the following two cells first

In [None]:
import pandas as pd
import csv
import tarfile

In [None]:
pd.set_option('display.max_colwidth', -1) # To display full content of the column
# pd.set_option('display.max_rows', None) # To display ALL rows of the dataframe (otherwise you can decide the max number)

# Languages of users

First, run the two following cells to prepare the necessary data.

In [None]:
def read_user_languages():
    data = pd.read_csv('./user_languages.csv', 
            sep='\t', 
            header=None, 
            names=['Language', 'Level', 'Username', 'Details'],
            quoting=csv.QUOTE_NONE)
    # Remove unknown users
    data = data[data['Username'] != r'\N']
    data = data.dropna(subset=['Username'])
    return data.fillna('')

The cell belo displays 10 random rows, just to give you an overview of the structure of the data.

In [None]:
user_infos = read_user_languages()
user_infos.sample(10)

### Languages of a specific user

In [None]:
def languages_of_user(username, user_frame):
    return user_frame[user_frame['Username'] == username].sort_values(by='Level', ascending=False)

Replace `username` by the username you want to check, and run the following cell. The results are displayed by descending `Level` order.

In [None]:
username = '08pb80'
languages_of_user(username, user_infos)

### Users of a specific language

In [None]:
def users_of_language(iso, user_frame):
    return user_frame[user_frame['Language'] == iso].sort_values(by='Username')

Choose your target language as a 3-letter ISO code (`cmn`, `fra`, `jpn`, `eng`, etc.).

In [None]:
language = 'fra'
users_of_language(language, user_infos)

### Natives of a specific language

In [None]:
def natives_of_language(iso, user_frame):
    frame = users_of_language(iso, user_frame)
    return frame[frame['Level'] == '5']

Choose your target language as a 3-letter ISO code (`cmn`, `fra`, `jpn`, `eng`, etc.).

In [None]:
language = 'fra'
natives_of_language(language, user_infos)

### Natives of X speaking Y

Run the following

In [None]:
def natives_speaking_other(main_language, other_language, user_frame):
    native_frame = users_of_language(main_language, user_frame)
    native_users = native_frame[native_frame['Level'] == '5'].Username.tolist()
    second_frame = user_frame[user_frame['Language'] == other_language]
    second_users = second_frame.Username.tolist()
    target_users = list(set(native_users).intersection(second_users))
    result = user_frame[user_frame['Username'].isin(target_users) & user_frame['Language'].isin([main_language, other_language])].sort_values(by='Username')
    print(f'{len(result) / 2} users found.')
    return result

Fetch users that are natives in `main_language` but also speaks `other_language`.  
Choose your target languages as a 3-letter ISO code (`cmn`, `fra`, `jpn`, `eng`, etc.). 

In [None]:
main_language = 'fra'
other_language = 'eng'
natives_speaking_other(main_language, other_language, user_infos)

You can get the list of Username of any frame you built by appending `.Username.tolist()`. Example:  

In [None]:
natives_speaking_other(main_language, other_language, user_infos).Username.tolist()