# Preprocessing for Experiment Ideas

## Ideas

### Modified Apriori

Create a set for each province containing *percentile* objects - 0%, 10%, etc.

Find predictive rules, but avoid the following trivial rules:

* N% in language A predicting M% in language A
* N% in language A predicting N% in language B, where B is a parent category of A or vice versa

*Requires: Parent Hierarchy, Merging Province Datasets per Year, Percentage-Wise Data*


### Trends

Explore the following trends over time:

1. For each language family, adoption of English and French over time
2. For each sex family, adoption of English and French over time
3. For each language, most sudden % influx of said language and explore whether interesting events occur in that time period
    * Over all languages or language group, did any specific period contain a rapid increase?  E.g. Maybe a certain war or law caused refugees to come to Canada in one time period
    
Prediction: Can we find, percentage-wise, which sex, provinces, language families best adopt English and French over time?

*Requires: Parent Hierarchy, Merging Province Datasets over all years, Percentage-Wise Data*


### Clustering Analysis

Do language families affect how languages spread?

Use each province as a feature dimension and perform k-means.

Do these results mimic the actual clustering of language families?

*Requires: Parent Hierarchy, Averaging Province Datasets over all years, Percentage-Wise Data*


---

## Data Representation

Given the above goals, census data should be reformatted to contain:

* One row per language/language category, sex, year combination

With the following non-key attributes:

* One column decimal proportion pop. per province
* One column decimal proportion English per province
* One column decimal proportion French per province
* One column decimal proportion Both per province
* One column decimal proportion Neither per province

Each row will therefore have 60 such attributes, the last 40 of which may be dropped if desired.

Use this master table to create variants by merging on the three primary keys.


## Master Table

### Loading Dataframes

Note the required `latin1` encoding for French accents.


In [84]:
import pandas as pd

dataset = {}

years = ['1991', '1996', '2001', '2006', '2011', '2016']

for year in years:
    
    try:
        dataset[year] = {
            'alberta': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-alberta.csv', encoding='latin1'),
            'british-columbia': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-british-columbia.csv', encoding='latin1'),
            'canada': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-canada.csv', encoding='latin1'),
            'manitoba': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-manitoba.csv', encoding='latin1'),
            'new-brunswick': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-new-brunswick.csv', encoding='latin1'),
            'newfoundland-and-labrador': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-newfoundland.csv', encoding='latin1'),
            'northwest-territories': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-northwest-territories.csv', encoding='latin1'),
            'nova-scotia': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-nova-scotia.csv', encoding='latin1'),
            'ontario': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-ontario.csv', encoding='latin1'),
            'prince-edward-island': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-prince-edward-island.csv', encoding='latin1'),
            'quebec': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-quebec.csv', encoding='latin1'),
            'saskatchewan': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-saskatchewan.csv', encoding='latin1'),
            'yukon': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-yukon.csv', encoding='latin1'),
        }

    except FileNotFoundError:
        dataset[year] = {
            'alberta': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-alberta.csv', encoding='latin1'),
            'british-columbia': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-british-columbia.csv', encoding='latin1'),
            'canada': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-canada.csv', encoding='latin1'),
            'manitoba': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-manitoba.csv', encoding='latin1'),
            'new-brunswick': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-new-brunswick.csv', encoding='latin1'),
            'newfoundland-and-labrador': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-newfoundland-and-labrador.csv', encoding='latin1'),
            'northwest-territories': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-northwest-territories.csv', encoding='latin1'),
            'nova-scotia': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-nova-scotia.csv', encoding='latin1'),
            'ontario': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-ontario.csv', encoding='latin1'),
            'prince-edward-island': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-prince-edward-island.csv', encoding='latin1'),
            'quebec': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-quebec.csv', encoding='latin1'),
            'saskatchewan': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-saskatchewan.csv', encoding='latin1'),
            'yukon': pd.read_csv(f'../datasets/canada-census-{year}/canada-census-{year}-geocode-yukon.csv', encoding='latin1'),
        }


codes = {
    'alberta': 'AB',
    'british-columbia': 'BC',
    'canada': 'CA',
    'manitoba': 'MB',
    'new-brunswick': 'NB',
    'newfoundland-and-labrador': 'NL',
    'northwest-territories': 'NT',
    'nova-scotia': 'NS',
    'ontario': 'ON',
    'prince-edward-island': 'PE',
    'quebec': 'QC',
    'saskatchewan': 'SK',
    'yukon': 'YT'
}

regions = list(codes.keys())


alberta = dataset['1991']['alberta']

alberta

Unnamed: 0,Geography,Sex (3),Detailed Mother,Total - Knowledge of Official Languages,English only,French only,Both English and French,Neither English nor French
0,Alberta (00048) 20000,Total - Sex,Total - Detailed Mother Tongue,2519180,2318935,1940,167155,31150
1,Alberta (00048) 20000,Total - Sex,Single responses,2489005,2294165,1895,162160,30790
2,Alberta (00048) 20000,Total - Sex,English,2031115,1931740,20,98715,640
3,Alberta (00048) 20000,Total - Sex,French,53715,4145,1670,47890,10
4,Alberta (00048) 20000,Total - Sex,Non-official languages,404180,358285,210,15550,30140
...,...,...,...,...,...,...,...,...
463,Alberta (00048) 20000,Female,Multiple responses,15370,12545,20,2620,190
464,Alberta (00048) 20000,Female,English and French,2725,785,10,1925,10
465,Alberta (00048) 20000,Female,English and non-official language,12235,11570,0,495,165
466,Alberta (00048) 20000,Female,French and non-official language,290,130,0,135,20


In [64]:
# TODO: Make this merge samples from all census years, deferring to newer ones as truest

def make_hierarchy(source_frame):

    top_level = 'Total - Detailed Mother Tongue'
    second_level = top_level
    third_level = second_level

    hierarchy = {}

    for lang in alberta['Detailed Mother']:
        if lang not in hierarchy:
            if lang[0] != ' ':
                top_level = lang
            if lang[2] != ' ':
                second_level = lang
            if lang[4] != ' ':
                third_level = lang

            hierarchy[lang] = (third_level, second_level, top_level)
    
    return hierarchy

            
top_hierarchy = make_hierarchy(alberta)

top_hierarchy

{'Total - Detailed Mother Tongue': ('Total - Detailed Mother Tongue',
  'Total - Detailed Mother Tongue',
  'Total - Detailed Mother Tongue'),
 'Single responses': ('Single responses',
  'Single responses',
  'Single responses'),
 '  English': ('  English', '  English', 'Single responses'),
 '  French': ('  French', '  French', 'Single responses'),
 '  Non-official languages': ('  Non-official languages',
  '  Non-official languages',
  'Single responses'),
 '    Aboriginal languages': ('    Aboriginal languages',
  '  Non-official languages',
  'Single responses'),
 '      Algonquian languages': ('    Aboriginal languages',
  '  Non-official languages',
  'Single responses'),
 '        Algonquin': ('    Aboriginal languages',
  '  Non-official languages',
  'Single responses'),
 '        Attikamek': ('    Aboriginal languages',
  '  Non-official languages',
  'Single responses'),
 '        Blackfoot': ('    Aboriginal languages',
  '  Non-official languages',
  'Single responses'),
 '

In [86]:
def proportionalize(source_string, total_count):
    if total_count == 0:
        return 0
    
    portion = 0 if source_string == '-' else int(source_string)
    
    return portion / total_count


def record_province_data(source_frame, year, province, record_dict):   
    if type(source_frame) is tuple:
        print(f'COULD NOT EXTRACT DATA FROM {year}, {province}')
        
        return record_dict
    
    try:
        province_total = int(source_frame['Total - Knowledge of Official Languages'][0])
    except KeyError:
        province_total = int(source_frame['Total - Knowledge of official languages'][0])
    
    for index, row in source_frame.iterrows():
        try:
            key = (year, row['Detailed Mother'], row['Sex (3)'])
        except KeyError:
            key = (year, row['Detailed mother'], row['Sex (3)'])
        
        try:
            speaker_total = 0 if row['Total - Knowledge of Official Languages'] == '-' else int(row['Total - Knowledge of Official Languages'])
        except KeyError:
            speaker_total = 0 if row['Total - Knowledge of official languages'] == '-' else int(row['Total - Knowledge of official languages'])

        
        if key not in record_dict:
            record_dict[key] = {}
        
        try:
            record_dict[key] = {**{
                f'Speakers {province}': speaker_total / province_total,
                f'{province} EN': proportionalize(row['  English only'], speaker_total),
                f'{province} FR': proportionalize(row['  French only'], speaker_total),
                f'{province} BOTH': proportionalize(row['  Both English and French'], speaker_total),
                f'{province} N/A': proportionalize(row['  Neither English nor French'], speaker_total)
            }, **record_dict[key]}
        except KeyError:
            record_dict[key] = {**{
                f'Speakers {province}': speaker_total / province_total,
                f'{province} EN': proportionalize(row['  English only'], speaker_total),
                f'{province} FR': proportionalize(row['  French only'], speaker_total),
                f'{province} BOTH': proportionalize(row['  English and French'], speaker_total),
                f'{province} N/A': proportionalize(row['  Neither English nor French'], speaker_total)
            }, **record_dict[key]}
    
    return record_dict


master_dict = {}

# TODO: Make this handle 2016; 2016 has different column names

for year in years[:-1]:
    for region in regions:
        # print(f'{year} {region}')
        record_province_data(dataset[year][region], year, codes[region], master_dict)


master_dict

{('1991',
  'Total - Detailed Mother Tongue',
  'Total - Sex'): {'Speakers YT': 1.0, 'YT EN': 0.9052612547459772, 'YT FR': 0.0009039956608208281, 'YT BOTH': 0.09293075393238112, 'YT N/A': 0.0010847947929849937, 'Speakers SK': 1.0, 'SK EN': 0.941636314271517, 'SK FR': 0.0004610490402495812, 'SK BOTH': 0.052047313877063836, 'SK N/A': 0.005860445578283566, 'Speakers QC': 1.0, 'QC EN': 0.054880842253645215, 'QC FR': 0.5813150668839846, 'QC BOTH': 0.3543140537127586, 'QC N/A': 0.009489302967563838, 'Speakers PE': 1.0, 'PE EN': 0.896135831381733, 'PE FR': 0.002107728337236534, 'PE BOTH': 0.10109289617486339, 'PE N/A': 0.000624512099921936, 'Speakers ON': 1.0, 'ON EN': 0.8613398442726836, 'ON FR': 0.005436975139457486, 'ON BOTH': 0.11388581099332418, 'ON N/A': 0.01933736959453466, 'Speakers NS': 1.0, 'NS EN': 0.6707686825001589, 'NS FR': 0.15226710187376513, 'NS BOTH': 0.16294908747466338, 'NS N/A': 0.014014942925374837, 'Speakers NT': 1.0, 'NT EN': 0.8506137372682162, 'NT FR': 0.001392878906

In [87]:
import math

master_table = pd.DataFrame()

main_columns = {
    'Year': [],
    'Language Group': [],
    'Sex': [],
    'Log Percentiles': []
}
speakers_columns = {}
other_columns = {}

for key, value in master_dict.items():
    main_columns['Year'].append(key[0])
    main_columns['Language Group'].append(key[1])
    main_columns['Sex'].append(key[2])
    
    percentiles = []
    
    for sub_key, sub_value in value.items():
        if 'Speakers' in sub_key:
            if sub_key not in speakers_columns:
                speakers_columns[sub_key] = []
            speakers_columns[sub_key].append(sub_value)
            percentiles.append(f'{sub_key[9:]}_{math.floor(math.log(sub_value * 10000000 + 1)/math.log(5))}')  # TODO: Maybe do a log scale?
            # French and English are gonna be massive, but few langs if any will exceed the first percentile
        else:
            if sub_key not in other_columns:
                other_columns[sub_key] = []
            other_columns[sub_key].append(sub_value)
    
    main_columns['Log Percentiles'].append(percentiles)

    
for sub_dict in [main_columns, speakers_columns, other_columns]:
    for key, value in sub_dict.items():
        master_table[key] = value

master_table

Unnamed: 0,Year,Language Group,Sex,Log Percentiles,Speakers YT,Speakers SK,Speakers QC,Speakers PE,Speakers ON,Speakers NS,...,CA BOTH,CA N/A,BC EN,BC FR,BC BOTH,BC N/A,AB EN,AB FR,AB BOTH,AB N/A
0,1991,Total - Detailed Mother Tongue,Total - Sex,"[YT_10, SK_10, QC_10, PE_10, ON_10, NS_10, NT_...",1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.162949,0.014015,0.916497,0.000351,0.063794,0.019358,0.920512,0.000770,0.066353,0.012365
1,1991,Single responses,Total - Sex,"[YT_10, SK_10, QC_10, PE_10, ON_10, NS_10, NT_...",0.989333,0.988510,0.989351,0.997190,0.986936,0.988620,...,0.160774,0.014067,0.917438,0.000336,0.062787,0.019438,0.921720,0.000761,0.065151,0.012370
2,1991,English,Total - Sex,"[YT_9, SK_9, QC_8, PE_9, ON_9, NS_9, NT_9, NL_...",0.882661,0.826927,0.087976,0.941374,0.739734,0.599016,...,0.082279,0.000400,0.948124,0.000006,0.051499,0.000367,0.951074,0.000010,0.048601,0.000315
3,1991,French,Total - Sex,"[YT_7, SK_7, QC_9, PE_8, ON_8, NS_9, NT_7, NL_...",0.031278,0.021398,0.815839,0.043638,0.048651,0.240900,...,0.385686,0.000211,0.103501,0.015868,0.880324,0.000205,0.077167,0.031090,0.891557,0.000186
4,1991,Non-official languages,Total - Sex,"[YT_8, SK_8, QC_8, PE_7, ON_9, NS_8, NT_9, NL_...",0.075393,0.140179,0.085537,0.012178,0.198551,0.148704,...,0.112612,0.091569,0.855451,0.000469,0.044693,0.099396,0.886449,0.000520,0.038473,0.074571
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2176,2011,Multiple responses,Female,"[YT_6, SK_6, QC_7, PE_6, ON_7, NS_6, NT_6, NL_...",0.007280,0.006413,0.009771,0.002817,0.011705,0.003629,...,0.334995,0.006093,0.831283,0.001973,0.158389,0.008355,0.819792,0.003418,0.170732,0.006214
2177,2011,English and French,Female,"[YT_6, SK_5, QC_6, PE_6, ON_6, NS_6, NT_5, NL_...",0.001931,0.000903,0.004146,0.001734,0.001949,0.001795,...,0.863539,0.000736,0.229327,0.004410,0.765160,0.001103,0.227950,0.006873,0.765178,0.000000
2178,2011,English and non-official language,Female,"[YT_6, SK_6, QC_5, PE_5, ON_7, NS_6, NT_6, NL_...",0.004903,0.005033,0.001532,0.000903,0.008853,0.001609,...,0.093770,0.007096,0.933742,0.000418,0.057350,0.008491,0.951669,0.000388,0.041149,0.006405
2179,2011,French and non-official language,Female,"[YT_4, SK_5, QC_6, PE_4, ON_5, NS_4, NT_4, NL_...",0.000149,0.000383,0.003305,0.000108,0.000579,0.000165,...,0.431060,0.011347,0.569863,0.024658,0.383562,0.024658,0.510417,0.045139,0.427083,0.017361


In [88]:
master_table.columns

Index(['Year', 'Language Group', 'Sex', 'Log Percentiles', 'Speakers YT',
       'Speakers SK', 'Speakers QC', 'Speakers PE', 'Speakers ON',
       'Speakers NS', 'Speakers NT', 'Speakers NL', 'Speakers NB',
       'Speakers MB', 'Speakers CA', 'Speakers BC', 'Speakers AB', 'YT EN',
       'YT FR', 'YT BOTH', 'YT N/A', 'SK EN', 'SK FR', 'SK BOTH', 'SK N/A',
       'QC EN', 'QC FR', 'QC BOTH', 'QC N/A', 'PE EN', 'PE FR', 'PE BOTH',
       'PE N/A', 'ON EN', 'ON FR', 'ON BOTH', 'ON N/A', 'NS EN', 'NS FR',
       'NS BOTH', 'NS N/A', 'NT EN', 'NT FR', 'NT BOTH', 'NT N/A', 'NL EN',
       'NL FR', 'NL BOTH', 'NL N/A', 'NB EN', 'NB FR', 'NB BOTH', 'NB N/A',
       'MB EN', 'MB FR', 'MB BOTH', 'MB N/A', 'CA EN', 'CA FR', 'CA BOTH',
       'CA N/A', 'BC EN', 'BC FR', 'BC BOTH', 'BC N/A', 'AB EN', 'AB FR',
       'AB BOTH', 'AB N/A'],
      dtype='object')

Minimum proportions for current *Log Percentile* formula, which might be used for a modified Apriori:

In [89]:
dict(zip(range(11), [((5 ** i) - 1) / 10000000 for i in range(11)]))

{0: 0.0,
 1: 4e-07,
 2: 2.4e-06,
 3: 1.24e-05,
 4: 6.24e-05,
 5: 0.0003124,
 6: 0.0015624,
 7: 0.0078124,
 8: 0.0390624,
 9: 0.1953124,
 10: 0.9765624}

### Sexless Table

In [90]:
sexless = master_table[master_table['Sex'] == 'Total - Sex']

sexless

Unnamed: 0,Year,Language Group,Sex,Log Percentiles,Speakers YT,Speakers SK,Speakers QC,Speakers PE,Speakers ON,Speakers NS,...,CA BOTH,CA N/A,BC EN,BC FR,BC BOTH,BC N/A,AB EN,AB FR,AB BOTH,AB N/A
0,1991,Total - Detailed Mother Tongue,Total - Sex,"[YT_10, SK_10, QC_10, PE_10, ON_10, NS_10, NT_...",1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.162949,0.014015,0.916497,0.000351,0.063794,0.019358,0.920512,0.000770,0.066353,0.012365
1,1991,Single responses,Total - Sex,"[YT_10, SK_10, QC_10, PE_10, ON_10, NS_10, NT_...",0.989333,0.988510,0.989351,0.997190,0.986936,0.988620,...,0.160774,0.014067,0.917438,0.000336,0.062787,0.019438,0.921720,0.000761,0.065151,0.012370
2,1991,English,Total - Sex,"[YT_9, SK_9, QC_8, PE_9, ON_9, NS_9, NT_9, NL_...",0.882661,0.826927,0.087976,0.941374,0.739734,0.599016,...,0.082279,0.000400,0.948124,0.000006,0.051499,0.000367,0.951074,0.000010,0.048601,0.000315
3,1991,French,Total - Sex,"[YT_7, SK_7, QC_9, PE_8, ON_8, NS_9, NT_7, NL_...",0.031278,0.021398,0.815839,0.043638,0.048651,0.240900,...,0.385686,0.000211,0.103501,0.015868,0.880324,0.000205,0.077167,0.031090,0.891557,0.000186
4,1991,Non-official languages,Total - Sex,"[YT_8, SK_8, QC_8, PE_7, ON_9, NS_8, NT_9, NL_...",0.075393,0.140179,0.085537,0.012178,0.198551,0.148704,...,0.112612,0.091569,0.855451,0.000469,0.044693,0.099396,0.886449,0.000520,0.038473,0.074571
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1712,2011,Multiple responses,Total - Sex,"[YT_7, SK_7, QC_7, PE_6, ON_7, NS_6, NT_7, NL_...",0.015154,0.012275,0.019552,0.005418,0.022582,0.006957,...,0.336163,0.005832,0.836655,0.001945,0.153500,0.007839,0.823054,0.002960,0.168067,0.005840
1713,2011,English and French,Total - Sex,"[YT_6, SK_6, QC_7, PE_6, ON_6, NS_6, NT_6, NL_...",0.004160,0.001694,0.008291,0.003178,0.003663,0.003333,...,0.866538,0.000726,0.231977,0.004651,0.762791,0.000581,0.227705,0.004756,0.766944,0.000595
1714,2011,English and non-official language,Total - Sex,"[YT_7, SK_7, QC_6, PE_6, ON_7, NS_6, NT_7, NL_...",0.009954,0.009668,0.002998,0.001842,0.017248,0.003130,...,0.088260,0.006913,0.939317,0.000509,0.052035,0.008140,0.956474,0.000500,0.036422,0.006504
1715,2011,French and non-official language,Total - Sex,"[YT_5, SK_5, QC_6, PE_4, ON_5, NS_5, NT_5, NL_...",0.000446,0.000737,0.006606,0.000253,0.001073,0.000346,...,0.458686,0.010345,0.549254,0.022388,0.404478,0.020896,0.495756,0.037351,0.451613,0.013582


### Year-Averaged Table

In [92]:
year_avg = sexless.iloc[:, :17].groupby(['Language Group']).mean()

year_avg

Unnamed: 0_level_0,Speakers YT,Speakers SK,Speakers QC,Speakers PE,Speakers ON,Speakers NS,Speakers NT,Speakers NL,Speakers NB,Speakers MB,Speakers CA,Speakers BC,Speakers AB
Language Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
"Cree, n.i.e.",0.000000,0.000000,0.000000,0.000000,0.000000e+00,0.000000,0.000000,0.000000,0.000000,0.000004,3.019217e-07,0.000000,0.000000
"Cree, n.o.s.",0.000446,0.021550,0.001936,0.000000,3.093051e-04,0.000027,0.003289,0.000020,0.000014,0.015900,2.351970e-03,0.000219,0.004638
Plains Cree,0.000000,0.000025,0.000000,0.000000,0.000000e+00,0.000000,0.000000,0.000000,0.000000,0.000004,4.528825e-06,0.000000,0.000033
Swampy Cree,0.000000,0.000015,0.000000,0.000000,7.860359e-07,0.000000,0.000000,0.000000,0.000000,0.000042,2.113452e-06,0.000000,0.000000
Woods Cree,0.000000,0.000049,0.000000,0.000000,0.000000e+00,0.000000,0.000000,0.000000,0.000000,0.000004,2.113452e-06,0.000000,0.000004
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Single responses,0.988332,0.989914,0.984606,0.995292,9.828112e-01,0.994648,0.988576,0.998325,0.991234,0.983870,9.850865e-01,0.985222,0.986825
Multiple responses,0.012520,0.012810,0.012487,0.003080,1.462642e-02,0.008021,0.017804,0.001163,0.007642,0.017093,1.274545e-02,0.012047,0.012307
Single responses,0.987570,0.987190,0.987513,0.996920,9.853733e-01,0.991979,0.982152,0.998832,0.992355,0.982907,9.872545e-01,0.987954,0.987694
Total - Detailed Mother Tongue,1.000000,1.000000,1.000000,1.000000,1.000000e+00,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000e+00,1.000000,1.000000


---

## Experiment: Clustering Analysis

http://www.bigendiandata.com/2017-04-18-Jupyter_Customer360/