# Demos for analyzing World Color Survey (WCS)

COG 260: Data, Computation, and The Mind (Yang Xu)

Data source: http://www1.icsi.berkeley.edu/wcs/data.html

______________________________________________

Import helper function file for WCS data analysis.

In [1]:
from wcs_helper_functions import *

Import relevant Python libraries.

In [35]:
import numpy as np
import pandas as pd
from scipy import stats
from random import random
%matplotlib inline

## Demo 3: Import color naming data
    
> Each of the 330 color chips was named by speakers of 110 different languages.

______________________________________________

Load naming data. 

`namingData` is a hierarchical dictionary organized as follows:

**language _(1 - 110)_ &rarr; speaker _(1 - *range varies per language*)_ &rarr; chip index _(1 - 330)_ &rarr; color term**

In [12]:
namingData = readNamingData('term.txt')

For example, to obtain naming data from language 1 and speaker 1 for all 330 color chips:

In [143]:
namingData[1]

{1: {1: 'LB',
  2: 'LB',
  3: 'LE',
  4: 'WK',
  5: 'LF',
  6: 'LE',
  7: 'F',
  8: 'LE',
  9: 'LE',
  10: 'LB',
  11: 'LB',
  12: 'F',
  13: 'LB',
  14: 'LB',
  15: 'LF',
  16: 'LF',
  17: 'LE',
  18: 'LB',
  19: 'LF',
  20: 'LB',
  21: 'LE',
  22: 'LF',
  23: 'LF',
  24: 'LB',
  25: 'LB',
  26: 'LB',
  27: 'LB',
  28: 'LF',
  29: 'LE',
  30: 'LB',
  31: 'LE',
  32: 'LF',
  33: 'LE',
  34: 'LB',
  35: 'LB',
  36: 'LE',
  37: 'LB',
  38: 'LB',
  39: 'LE',
  40: 'LB',
  41: 'LB',
  42: 'LE',
  43: 'F',
  44: 'LB',
  45: 'LF',
  46: 'LB',
  47: 'LF',
  48: 'LB',
  49: 'LE',
  50: 'LB',
  51: 'F',
  52: 'LF',
  53: 'LE',
  54: 'LB',
  55: 'LB',
  56: 'LE',
  57: 'LB',
  58: 'F',
  59: 'LF',
  60: 'LB',
  61: 'LE',
  62: 'F',
  63: 'LE',
  64: 'LB',
  65: 'LE',
  66: 'LF',
  67: 'F',
  68: 'LE',
  69: 'F',
  70: 'LF',
  71: 'F',
  72: 'F',
  73: 'F',
  74: 'LF',
  75: 'LB',
  76: 'LE',
  77: 'LB',
  78: 'LF',
  79: 'F',
  80: 'LB',
  81: 'F',
  82: 'LB',
  83: 'LF',
  84: 'LE',
  85: 'LB',

## Demo 5: Import speaker demographic information

> Most speakers' age _(integer)_ and gender _(M/F)_ information was recorded.

______________________________________________

Load speaker information.

`speakerInfo` is a hierarchical dictionary organized as follows:

**language &rarr; speaker &rarr; (age, gender)**

In [338]:
naming_df = pd.DataFrame.from_dict(namingData, orient= 'index').reset_index()
naming_df = naming_df.rename(columns = {'index':'Language'})
naming_df.head(4)
#naming_df = naming_df.fillna(0)


Unnamed: 0,Language,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,1,"{1: 'LB', 2: 'LB', 3: 'LE', 4: 'WK', 5: 'LF', ...","{1: 'LB', 2: 'F', 3: 'LE', 4: 'WK', 5: 'F', 6:...","{1: 'F', 2: 'F', 3: 'WK', 4: 'LB', 5: 'F', 6: ...","{1: 'LB', 2: 'LF', 3: 'LE', 4: 'WK', 5: 'LF', ...","{1: 'LF', 2: 'F', 3: 'LE', 4: 'WK', 5: 'LF', 6...","{1: 'G', 2: 'F', 3: 'LE', 4: 'WK', 5: 'F', 6: ...","{1: 'G', 2: 'G', 3: 'LE', 4: 'S', 5: 'G', 6: '...","{1: 'G', 2: 'WK', 3: 'LE', 4: 'WK', 5: 'F', 6:...","{1: 'G', 2: 'F', 3: 'LE', 4: 'LB', 5: 'F', 6: ...",...,,,,,,,,,,
1,2,"{1: 'YN', 2: 'YN', 3: 'NR', 4: 'TK', 5: 'YN', ...","{1: 'IT', 2: 'AT', 3: 'NR', 4: 'IR', 5: 'YN', ...","{1: 'PN', 2: 'PN', 3: 'NR', 4: 'PN', 5: 'YN', ...","{1: 'IT', 2: 'EP', 3: 'NR', 4: 'TK', 5: 'IT', ...","{1: 'IT', 2: 'YN', 3: 'NR', 4: 'TK', 5: 'IT', ...","{1: 'IT', 2: 'YN', 3: 'NR', 4: 'TK', 5: 'YN', ...","{1: 'TK', 2: 'EP', 3: 'NR', 4: 'IT', 5: 'IT', ...","{1: 'IT', 2: 'YN', 3: 'NR', 4: 'KR', 5: 'IT', ...","{1: 'YN', 2: 'MP', 3: 'NR', 4: 'TK', 5: 'TK', ...",...,,,,,,,,,,
2,3,"{1: '*', 2: '*', 3: 'ED', 4: '*', 5: '*', 6: '...","{1: 'ID', 2: 'EL', 3: 'AA', 4: 'NG', 5: 'ID', ...","{1: 'AA', 2: '*', 3: 'AT', 4: 'PA', 5: 'BA', 6...","{1: 'BA', 2: 'AA', 3: 'AT', 4: 'NG', 5: 'AA', ...","{1: 'LU', 2: 'EL', 3: 'AT', 4: 'NG', 5: 'GA', ...","{1: 'AA', 2: 'BA', 3: 'AT', 4: 'IN', 5: 'AA', ...","{1: 'BA', 2: 'ID', 3: 'ED', 4: 'PA', 5: '*', 6...","{1: 'AA', 2: 'ID', 3: 'AT', 4: 'BA', 5: 'IN', ...","{1: 'BA', 2: 'EL', 3: 'AT', 4: 'NG', 5: 'AA', ...",...,,,,,,,,,,
3,4,"{1: 'CE', 2: 'TA', 3: 'CY', 4: 'TX', 5: 'TA', ...","{1: 'CE', 2: 'SA', 3: 'KA', 4: 'TX', 5: 'SA', ...","{1: 'TX', 2: 'SA', 3: 'CY', 4: 'TX', 5: 'TA', ...","{1: 'CE', 2: 'TA', 3: 'CY', 4: 'TX', 5: 'TA', ...","{1: 'SP', 2: 'SW', 3: 'KA', 4: 'CY', 5: 'SA', ...","{1: 'CE', 2: 'SA', 3: 'CY', 4: 'TX', 5: 'KA', ...","{1: 'SP', 2: 'KA', 3: 'LA', 4: 'TX', 5: 'KA', ...","{1: 'XE', 2: 'SA', 3: 'CY', 4: 'TX', 5: 'SA', ...","{1: 'SP', 2: 'SA', 3: 'LA', 4: 'KE', 5: 'SA', ...",...,"{1: 'CE', 2: 'TA', 3: 'CY', 4: 'TX', 5: 'TA', ...","{1: 'CE', 2: 'SA', 3: 'CY', 4: 'TX', 5: 'TA', ...","{1: 'XE', 2: 'SA', 3: 'CY', 4: 'KE', 5: 'TA', ...","{1: 'XE', 2: 'CE', 3: 'CY', 4: 'TA', 5: 'SA', ...","{1: 'CE', 2: 'TA', 3: 'CY', 4: 'CY', 5: 'TA', ...","{1: 'TA', 2: 'SP', 3: 'LA', 4: 'TX', 5: 'TA', ...","{1: '*', 2: 'TA', 3: 'CY', 4: 'TX', 5: 'TA', 6...","{1: 'TA', 2: 'TA', 3: 'CY', 4: 'TX', 5: 'TA', ...","{1: 'TA', 2: 'TA', 3: 'CY', 4: 'KA', 5: 'TA', ...","{1: 'TA', 2: 'SW', 3: 'CY', 4: 'CY', 5: 'TA', ..."


In [344]:
#naming_df.notnull()

In [325]:
for index, row in naming_df.iterrows(): 
    if row.notnull == False:
        print(row)
        print(naming_df[index+1])


In [397]:
#how to iterate over dic: dict.items(): 
#create a list of the dict values 

#type(naming_df[1])
#list(naming_df[1][0].values())
#ndf = naming_df.notna
for index, row in naming_df.iterrows(): 
    print(row)
    if row.notnull()==True: 
        print(row)


Language                                                    1
1           {1: 'LB', 2: 'LB', 3: 'LE', 4: 'WK', 5: 'LF', ...
2           {1: 'LB', 2: 'F', 3: 'LE', 4: 'WK', 5: 'F', 6:...
3           {1: 'F', 2: 'F', 3: 'WK', 4: 'LB', 5: 'F', 6: ...
4           {1: 'LB', 2: 'LF', 3: 'LE', 4: 'WK', 5: 'LF', ...
5           {1: 'LF', 2: 'F', 3: 'LE', 4: 'WK', 5: 'LF', 6...
6           {1: 'G', 2: 'F', 3: 'LE', 4: 'WK', 5: 'F', 6: ...
7           {1: 'G', 2: 'G', 3: 'LE', 4: 'S', 5: 'G', 6: '...
8           {1: 'G', 2: 'WK', 3: 'LE', 4: 'WK', 5: 'F', 6:...
9           {1: 'G', 2: 'F', 3: 'LE', 4: 'LB', 5: 'F', 6: ...
10          {1: 'G', 2: 'F', 3: 'LE', 4: 'S', 5: 'F', 6: '...
11          {1: 'F', 2: 'FU', 3: 'LE', 4: 'WK', 5: 'LF', 6...
12          {1: 'LB', 2: 'LF', 3: 'LE', 4: 'F', 5: 'LF', 6...
13          {1: 'G', 2: 'F', 3: 'LE', 4: 'LB', 5: 'F', 6: ...
14          {1: 'G', 2: 'FU', 3: 'LE', 4: 'G', 5: 'F', 6: ...
15          {1: 'G', 2: 'F', 3: 'LE', 4: 'F', 5: 'F', 6: '...
16      

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [311]:
ndf = []
for i in range(1,8):
    if naming_df[i].notnull == True:
        print(naming_df[i])
        #ndf.append(naming_df[i].apply(lambda x: x.values())) #for i in range(1,36)]
#ndf

In [370]:
print(naming_df[27])

0                                                    NaN
1                                                    NaN
2                                                    NaN
3      {1: 'CE', 2: 'SA', 3: 'CY', 4: 'TX', 5: 'TA', ...
4                                                    NaN
                             ...                        
105                                                  NaN
106                                                  NaN
107                                                  NaN
108                                                  NaN
109                                                  NaN
Name: 27, Length: 110, dtype: object


In [387]:
def dict_keys(x):
    try: 
        return x.values()
    except TypeError:
        return None

In [388]:
naming_df[7].apply(lambda v: dict_keys(v) if(np.all(pd.notnull(v[1]))) else v) 


TypeError: 'float' object is not subscriptable

In [373]:
naming_df[1]

0      {1: 'LB', 2: 'LB', 3: 'LE', 4: 'WK', 5: 'LF', ...
1      {1: 'YN', 2: 'YN', 3: 'NR', 4: 'TK', 5: 'YN', ...
2      {1: '*', 2: '*', 3: 'ED', 4: '*', 5: '*', 6: '...
3      {1: 'CE', 2: 'TA', 3: 'CY', 4: 'TX', 5: 'TA', ...
4      {1: '7', 2: '2', 3: '4', 4: '6', 5: '7', 6: '9...
                             ...                        
105    {1: 'BI', 2: 'G', 3: 'KU', 4: 'T', 5: 'G', 6: ...
106    {1: 'DA', 2: 'DA', 3: 'OS', 4: 'FI', 5: 'DA', ...
107    {1: 'IA', 2: 'IA', 3: 'QA', 4: 'QA', 5: 'IA', ...
108    {1: 'S', 2: 'S', 3: 'K', 4: 'K', 5: 'S', 6: 'Y...
109    {1: 'G', 2: 'Y', 3: 'R', 4: 'BL', 5: 'G', 6: '...
Name: 1, Length: 110, dtype: object

In [None]:
lang = []
for s in range(1,18):
    #print(s)
    for l in range(110):
        #print(lang.append(naming_df[s][l].values))
        print(lang)
        for i in range(1,311): 
            
            lang.append(naming_df[s][l][i])
print(lang)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [249]:
gender.head(2)
gender[1][0]

[('90', 'M')]

In [251]:
gender1 = []
for i in range(110):
    gender1.append(gender[1][i])
print(gender1)

[[('90', 'M')], [('20', 'F')], [('26', 'F')], [('15', 'F')], [('29', 'M')], [('40', 'M')], [('22', 'F')], [('30', 'M')], [('0', 'F')], [('64', 'M')], [('17', 'F')], [('23', 'M')], [('26', 'M')], [('65', 'F')], [('25', 'F')], [('21', 'F')], [('18', 'F')], [('16', 'F')], [('18', 'M')], [('25', 'F')], [('16', 'F')], [('17', 'F')], [('25', 'F')], [('56', 'F')], [('17', 'F')], [('27', 'F')], [('15', 'F')], [('19', 'F')], [('18', 'M')], [('22', 'M')], [('14', 'F')], [('16', 'F')], [('29', 'F')], [('16', 'F')], [('25', 'F')], [('70', 'M')], [('12', 'F')], [('14', 'F')], [('18', 'F')], [('16', 'F')], [('13', 'F')], [('17', 'F')], [('18', 'F')], [('24', 'F')], [('22', 'F')], [('23', 'F')], [('19', 'F')], [('20', 'F')], [('16', 'F')], [('25', 'F')], [('20', 'F')], [('28', 'M')], [('16', 'F')], [('16', 'F')], [('40', 'F')], [('15', 'F')], [('47', 'F')], [('30', 'F')], [('18', 'F')], [('15', 'F')], [('20', 'F')], [('48', 'F')], [('16', 'F')], [('30', 'M')], [('30', 'F')], [('0', 'F')], [('14', 'F'

In [243]:
gender.head(2)

Unnamed: 0,index,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,1,"[(90, M)]","[(26, M)]","[(38, M)]","[(35, M)]","[(80, M)]","[(48, M)]","[(26, M)]","[(39, M)]","[(47, F)]",...,,,,,,,,,,
1,2,"[(20, F)]","[(40, F)]","[(45, F)]","[(45, F)]","[(50, F)]","[(50, F)]","[(50, F)]","[(55, F)]","[(55, F)]",...,,,,,,,,,,


In [232]:
pd.concat([naming_df[1].apply(pd.Series)])

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,321,322,323,324,325,326,327,328,329,330
0,LB,LB,LE,WK,LF,LE,F,LE,LE,LB,...,LE,G,G,LF,WK,WK,LF,G,WK,LF
1,YN,YN,NR,TK,YN,EP,YN,NR,KA,TK,...,NR,IT,YN,EP,TK,AA,EP,YN,KA,EP
2,*,*,ED,*,*,*,IN,*,*,LU,...,ED,*,LU,EL,*,DA,EL,IN,*,EL
3,CE,TA,CY,TX,TA,CY,TA,TX,XK,XE,...,CY,SA,TA,SA,TX,KA,SA,TA,XE,SA
4,7,2,4,6,7,9,2,6,1,7,...,4,7,2,9,6,5,7,2,1,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105,BI,G,KU,T,G,KU,G,J,S,BI,...,P,BI,G,K,T,B,PO,G,S,G
106,DA,DA,OS,FI,DA,PA,CH,OS,FI,FI,...,OS,DA,FI,OS,FI,CH,OX,DA,FI,OX
107,IA,IA,QA,QA,IA,JE,IA,QA,*,IA,...,QA,IA,IA,JA,QA,JE,JA,IA,JE,JA
108,S,S,K,K,S,Y,S,K,T,S,...,K,S,S,Q,K,Y,Q,S,Y,Q


In [221]:
#find the unique values for each speaker 

for i in range(1,36): 
    print("Speaker number : {}".format(i))
    print(pd.concat([naming_df[i].apply(pd.Series)], axis=1))

Speaker number : 1
    1   2   3   4   5   6   7   8   9   10   ... 321 322 323 324 325 326 327  \
0    LB  LB  LE  WK  LF  LE   F  LE  LE  LB  ...  LE   G   G  LF  WK  WK  LF   
1    YN  YN  NR  TK  YN  EP  YN  NR  KA  TK  ...  NR  IT  YN  EP  TK  AA  EP   
2     *   *  ED   *   *   *  IN   *   *  LU  ...  ED   *  LU  EL   *  DA  EL   
3    CE  TA  CY  TX  TA  CY  TA  TX  XK  XE  ...  CY  SA  TA  SA  TX  KA  SA   
4     7   2   4   6   7   9   2   6   1   7  ...   4   7   2   9   6   5   7   
..   ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ...  ..  ..  ..  ..  ..  ..  ..   
105  BI   G  KU   T   G  KU   G   J   S  BI  ...   P  BI   G   K   T   B  PO   
106  DA  DA  OS  FI  DA  PA  CH  OS  FI  FI  ...  OS  DA  FI  OS  FI  CH  OX   
107  IA  IA  QA  QA  IA  JE  IA  QA   *  IA  ...  QA  IA  IA  JA  QA  JE  JA   
108   S   S   K   K   S   Y   S   K   T   S  ...   K   S   S   Q   K   Y   Q   
109   G   Y   R  BL   G   Y   G  BL   C   B  ...   R   G   G   W   P   Y   W   

    328 329 330  
0 

     0    1    2    3    4    5    6    7    8    9    ...  321  322  323  \
0    NaN    G   WK   LE   WK    F    F   WK   LE    F  ...   LE    G    G   
1    NaN   IT   YN   NR   KR   IT   EP   AW   KR    *  ...   NR   TK   YN   
2    NaN   AA   ID   AT   BA   IN   KA   AA    *   LA  ...   AT   BA   KY   
3    NaN   XE   SA   CY   TX   SA   SA   TA   CY   XK  ...   CY   CE   TA   
4    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
105  NaN   BI    G    P   BI    G   KU    G    J    S  ...    J    A    G   
106  NaN   DA   OX   OS   FI   DA   CH   DA   OS   CH  ...   OS   OX   FI   
107  NaN   IA   IA   QA   IP   IA    *   JE   IP   CA  ...   QA   IA   IA   
108  NaN    S    S    K    E    S    Q    S    K    T  ...    K    S    S   
109  NaN    G    G    R    G    P    Y    G    P    Y  ...    R    G    G   

     324  325  326  327  328  329  330  
0     LF   WK   WK    F    G   WK 

[110 rows x 331 columns]
Speaker number : 15
     0    1    2    3    4    5    6    7    8    9    ...  321  322  323  \
0    NaN    G    F   LE    F    F   WK   LB    F    F  ...   LE    G   LB   
1    NaN   IT   YN   NR   TK   YN   KA   YN   KR   AT  ...   NR   PN   YN   
2    NaN   ID   BU   AT   PA   BU   AS   ID   AT   KA  ...   AT   EL   ID   
3    NaN   CE   TA   CY   TX   TA   KA   TA   CY   XK  ...   CY   CE   TA   
4    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
105  NaN   BI    G    P    S    G   KU    G    P    S  ...    P   BI    G   
106  NaN   DA   DA   OS    *    *   PA   PA   CH   CH  ...   OS   DA    *   
107  NaN   IA   IA   QA   QA   IA   JA   IA   CA    *  ...   QA   IA   IA   
108  NaN    S    S    K    E    S    Y    S    E   MA  ...    K    Q    S   
109  NaN   BL    G    R    P    G    Y    G    C    Y  ...    R    G    G   

     324  325  326  327  328  

     0    1    2    3    4    5    6    7    8    9    ...  321  322  323  \
0    NaN    G    F   LE   LB    G   WK   LF    S    F  ...   LE    G    G   
1    NaN   YN   YN   NR   TK   YN    *   YN   NR   AT  ...   NR   EP   YN   
2    NaN   AA   ID   AT    *   ID    *   ID   AA   KA  ...   AT   UT   AA   
3    NaN   CE   TA   CY   TA   TX   KA   TA   CY   KA  ...   CY   CE   TA   
4    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
105  NaN   BI    G    P    S    G    B    G    J    S  ...    J   BI    G   
106  NaN   DA   OX   OS   FI   DA   CH   DA   OS   DA  ...   OS   DA   FI   
107  NaN   IA   IA   QA   IP   IA   IP   IA   JE    *  ...   QA   IA   IA   
108  NaN    S    S    K    E    S    K    S    K    T  ...    K    S    S   
109  NaN    G    G    R    P    G    Y    G    P   BL  ...    R   BL    G   

     324  325  326  327  328  329  330  
0      F   LE   WK   LF   WK   WK 

     0    1    2    3    4    5    6    7    8    9    ...  321  322  323  \
0    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
1    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
2    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
3    NaN   CE   TA   CY   CY   TA   SA   TA   CY   CY  ...   CY   SP   TA   
4    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
105  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
106  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
107  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
108  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   
109  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...  NaN  NaN  NaN   

     324  325  326  327  328  329  330  
0    NaN  NaN  NaN  NaN  NaN  NaN 

[110 rows x 331 columns]


In [245]:
gender = pd.DataFrame.from_dict(speakerInfo, orient = 'index').reset_index()
gender = gender.rename(columns = {'index':'Language'})
gender.head(2)
#gender.loc[0,13]

Unnamed: 0,Language,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,1,"[(90, M)]","[(26, M)]","[(38, M)]","[(35, M)]","[(80, M)]","[(48, M)]","[(26, M)]","[(39, M)]","[(47, F)]",...,,,,,,,,,,
1,2,"[(20, F)]","[(40, F)]","[(45, F)]","[(45, F)]","[(50, F)]","[(50, F)]","[(50, F)]","[(55, F)]","[(55, F)]",...,,,,,,,,,,


In [185]:
for i in range(1,26): 
    print("Speaker{}: {}".format(i, len(lang1[i].unique())))

Speaker1: 6
Speaker2: 7
Speaker3: 6
Speaker4: 6
Speaker5: 7
Speaker6: 6
Speaker7: 7
Speaker8: 7
Speaker9: 7
Speaker10: 7
Speaker11: 8
Speaker12: 6
Speaker13: 8
Speaker14: 8
Speaker15: 6
Speaker16: 7
Speaker17: 7
Speaker18: 6
Speaker19: 7
Speaker20: 5
Speaker21: 7
Speaker22: 7
Speaker23: 6
Speaker24: 7
Speaker25: 6


In [156]:
lang1= pd.DataFrame.from_dict(namingData[1], orient= 'index').transpose()
lang1.describe()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,16,17,18,19,20,21,22,23,24,25
count,330,330,330,330,330,330,330,330,330,330,...,330,330,330,330,330,330,330,330,330,330
unique,6,7,6,6,7,6,7,7,7,7,...,7,7,6,7,5,7,7,6,7,6
top,LB,LB,LB,LB,LF,LB,LB,S,F,LB,...,LB,LB,LB,LB,LB,G,G,S,F,LB
freq,78,99,139,100,90,72,80,69,71,122,...,90,124,112,79,86,88,79,102,72,89


In [None]:
#speaker profile with their age , gender and unique colour naming 
#create subplot for each language 

In [None]:
#three scatter subplots to visualize the gender distribution across languages, 
#gender vs number of unique colour names, 
#age vs number of unique colour names and identify the trends with linear regression lines across 110 languages. 


In [140]:
dic = []
for i in range(1, len(namingData)): 
    dic.append([])
    #print(type(namingData[i]))
    for j in range(1, len(namingData[i])): 
        dic.append(namingData[i][j])

In [141]:
len(dic)

2591

In [119]:
list_dic = []
for i in range(1, 111):
    #print(i)
    #print(namingData[i])
    #print('-----')
    list_dic.append(namingData[i])

'''
chip , color in namingData[1].items(): 
    #print(speaker)
    print(chip)
    print(color)
'''
print(type(list_dic[1]))


<class 'dict'>


In [138]:
lst = []
for i, lang in enumerate(list_dic):
    lst.append([])
    #print(lang)
    #print(lang.keys())
    for speakers in range(1, len(list_dic[i])): 
        #print(type(speakers))
        lst[i].append(speakers)

In [88]:
demo = pd.DataFrame.from_dict(list_dic[1][1], orient='index', columns= ['colour_term'])
demo['chip_index'] = demo.index
demo= demo[['chip_index', 'colour_term']]

In [96]:
demo.head(10)

Unnamed: 0,chip_index,colour_term
1,1,YN
2,2,YN
3,3,NR
4,4,TK
5,5,YN
6,6,EP
7,7,YN
8,8,NR
9,9,KA
10,10,TK


In [70]:
for data in list_dic[1]:
    data_row = data['1']
    time = data['Name']
      
    for row in data_row:
        row['Name']= time
        rows.append(row)

NameError: name 'list_dic' is not defined

In [None]:
# using data frame
df = pd.DataFrame(rows)
  
# print(df)

In [None]:
pd.DataFrame(["language_index": lang, "speaker_index":speaker, "age":age, "gender":gender for ])

In [34]:
namingData[1] # remove semicolon to see data in full

{1: {1: 'LB',
  2: 'LB',
  3: 'LE',
  4: 'WK',
  5: 'LF',
  6: 'LE',
  7: 'F',
  8: 'LE',
  9: 'LE',
  10: 'LB',
  11: 'LB',
  12: 'F',
  13: 'LB',
  14: 'LB',
  15: 'LF',
  16: 'LF',
  17: 'LE',
  18: 'LB',
  19: 'LF',
  20: 'LB',
  21: 'LE',
  22: 'LF',
  23: 'LF',
  24: 'LB',
  25: 'LB',
  26: 'LB',
  27: 'LB',
  28: 'LF',
  29: 'LE',
  30: 'LB',
  31: 'LE',
  32: 'LF',
  33: 'LE',
  34: 'LB',
  35: 'LB',
  36: 'LE',
  37: 'LB',
  38: 'LB',
  39: 'LE',
  40: 'LB',
  41: 'LB',
  42: 'LE',
  43: 'F',
  44: 'LB',
  45: 'LF',
  46: 'LB',
  47: 'LF',
  48: 'LB',
  49: 'LE',
  50: 'LB',
  51: 'F',
  52: 'LF',
  53: 'LE',
  54: 'LB',
  55: 'LB',
  56: 'LE',
  57: 'LB',
  58: 'F',
  59: 'LF',
  60: 'LB',
  61: 'LE',
  62: 'F',
  63: 'LE',
  64: 'LB',
  65: 'LE',
  66: 'LF',
  67: 'F',
  68: 'LE',
  69: 'F',
  70: 'LF',
  71: 'F',
  72: 'F',
  73: 'F',
  74: 'LF',
  75: 'LB',
  76: 'LE',
  77: 'LB',
  78: 'LF',
  79: 'F',
  80: 'LB',
  81: 'F',
  82: 'LB',
  83: 'LF',
  84: 'LE',
  85: 'LB',

For example, to see how many speakers language 1 has:

In [15]:
len(namingData[1])

25

In [18]:
speakerInfo = readSpeakerData('spkr-lsas.txt')
speakerInfo

{1: {1: [('90', 'M')],
  2: [('26', 'M')],
  3: [('38', 'M')],
  4: [('35', 'M')],
  5: [('80', 'M')],
  6: [('48', 'M')],
  7: [('26', 'M')],
  8: [('39', 'M')],
  9: [('47', 'F')],
  10: [('49', 'M')],
  11: [('40', 'F')],
  12: [('45', 'M')],
  13: [('50', 'M')],
  14: [('30', 'M')],
  15: [('21', 'M')],
  16: [('60', 'F')],
  17: [('32', 'M')],
  18: [('67', 'M')],
  19: [('15', 'M')],
  20: [('42', 'M')],
  21: [('40', 'M')],
  22: [('47', 'M')],
  23: [('23', 'F')],
  24: [('45', 'F')],
  25: [('30', 'F')]},
 2: {1: [('20', 'F')],
  2: [('40', 'F')],
  3: [('45', 'F')],
  4: [('45', 'F')],
  5: [('50', 'F')],
  6: [('50', 'F')],
  7: [('50', 'F')],
  8: [('55', 'F')],
  9: [('55', 'F')],
  10: [('55', 'F')],
  11: [('60', 'F')],
  12: [('60', 'F')],
  13: [('35', 'M')],
  14: [('40', 'M')],
  15: [('45', 'M')],
  16: [('45', 'M')],
  17: [('0', 'M')],
  18: [('0', 'M')],
  19: [('0', 'M')],
  20: [('0', 'M')],
  21: [('0', 'M')],
  22: [('0', 'M')],
  23: [('65', 'M')],
  24: [('

In [168]:
gender = pd.DataFrame.from_dict(speakerInfo, orient = 'index').reset_index()

In [169]:
gender

Unnamed: 0,index,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,1,"[(90, M)]","[(26, M)]","[(38, M)]","[(35, M)]","[(80, M)]","[(48, M)]","[(26, M)]","[(39, M)]","[(47, F)]",...,,,,,,,,,,
1,2,"[(20, F)]","[(40, F)]","[(45, F)]","[(45, F)]","[(50, F)]","[(50, F)]","[(50, F)]","[(55, F)]","[(55, F)]",...,,,,,,,,,,
2,3,"[(26, F)]","[(26, F)]","[(30, F)]","[(35, F)]","[(38, F)]","[(40, F)]","[(43, F)]","[(46, F)]","[(46, F)]",...,,,,,,,,,,
3,4,"[(15, F)]","[(17, F)]","[(18, F)]","[(20, F)]","[(20, F)]","[(22, F)]","[(23, F)]","[(24, F)]","[(24, F)]",...,"[(24, M)]","[(25, M)]","[(29, M)]","[(32, M)]","[(37, M)]","[(37, M)]","[(50, M)]","[(58, M)]","[(77, M)]","[(0, M)]"
4,5,"[(29, M)]","[(20, M)]","[(40, F)]","[(16, F)]","[(24, M)]","[(30, M)]",,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105,106,"[(21, F)]","[(40, M)]","[(20, F)]","[(65, M)]","[(57, M)]","[(20, M)]","[(22, F)]","[(39, F)]","[(48, M)]",...,,,,,,,,,,
106,107,"[(15, F)]","[(16, F)]","[(16, F)]","[(17, F)]","[(21, F)]","[(22, F)]","[(0, F)]","[(30, F)]","[(33, F)]",...,,,,,,,,,,
107,108,"[(27, F)]","[(31, F)]","[(25, F)]","[(52, F)]","[(0, X)]","[(14, F)]","[(0, X)]","[(25, F)]","[(20, M)]",...,,,,,,,,,,
108,109,"[(65, M)]","[(60, M)]","[(49, F)]","[(56, F)]","[(73, F)]","[(48, M)]","[(42, F)]","[(64, M)]","[(61, M)]",...,,,,,,,,,,


For example, uncomment the following line to access _(age, gender)_ information for all speakers from language 1:

In [19]:
speakerInfo[1]

{1: [('90', 'M')],
 2: [('26', 'M')],
 3: [('38', 'M')],
 4: [('35', 'M')],
 5: [('80', 'M')],
 6: [('48', 'M')],
 7: [('26', 'M')],
 8: [('39', 'M')],
 9: [('47', 'F')],
 10: [('49', 'M')],
 11: [('40', 'F')],
 12: [('45', 'M')],
 13: [('50', 'M')],
 14: [('30', 'M')],
 15: [('21', 'M')],
 16: [('60', 'F')],
 17: [('32', 'M')],
 18: [('67', 'M')],
 19: [('15', 'M')],
 20: [('42', 'M')],
 21: [('40', 'M')],
 22: [('47', 'M')],
 23: [('23', 'F')],
 24: [('45', 'F')],
 25: [('30', 'F')]}

For example, uncomment the following line to access _(age, gender)_ information for speaker 1 from language 1:

In [20]:
speakerInfo[1][1]

[('90', 'M')]

## Demo 6: Visualize color naming from an individual speaker

> Naming patterns from a speaker can be visualized in the stimulus palette _(Munsell space)_.

______________________________________________

Extract an example speaker datum from an example language.

In [21]:
lg61_spk5 = namingData[61][5]

Extract color terms used by that speaker.

In [22]:
terms = lg61_spk5.values()

Encode the color terms into random numbers (for plotting purposes).

In [23]:
encoded_terms = map_array_to(terms, generate_random_values(terms))

Visualize the color naming pattern for that speaker&mdash;each color patch corresponds to extension of a color term. Color scheme is randomized, but the partition of the color space is invariant.

In [24]:
plotValues(encoded_terms)
#these are random colours 

FileNotFoundError: [Errno 2] No such file or directory: './WCS_data_core/chip.txt'

**Note**: `plotValues()` is a generic function for visualizing various kinds of information on the chart, suited to needs.

Now you are in a position to start exploring this data set - enjoy!