# Anomalous durations of /a/ and /o/

The universal tendency is that low vowels (here, /a/) will be produced with longer durations than high vowels. This prediction plays out in the DIMEx100 Corpus, but not in CBAS. In CBAS, /o/ is produced with the longest duration, followed by /a/, contrary to prediction.

In [1]:
import pandas as pd

In [3]:
vowels = pd.read_csv("plot_model/vowel_model.csv")
ao = vowels[(vowels["Vowel"]=="a") | (vowels["Vowel"]=="o")].copy()
len(ao)

2201

In [5]:
ao.groupby(["Corpus", "Vowel", "stress"]).Dur_norm.mean()

Corpus    Vowel  stress    
CBAS      a      stressed      187.500780
                 unstressed    131.886186
          o      stressed      188.656261
                 unstressed    172.060115
DIMEx100  a      stressed      168.764524
                 unstressed    179.254139
          o      stressed      144.655878
                 unstressed    146.746008
Name: Dur_norm, dtype: float64

It seems that the duration of /a/ and /o/ are equal, but that /o/ does not show an effect of stress as /a/ does, thus skewing the average to be higher than that of /a/. 

In [15]:
# now with raw duration
ao.groupby(["Corpus", "Vowel", "stress",]).Dur_ms.mean()

Corpus    Vowel  stress    
CBAS      a      stressed      128.595745
                 unstressed     90.395257
          o      stressed      130.442478
                 unstressed    118.436019
DIMEx100  a      stressed       78.364055
                 unstressed     84.008197
          o      stressed       67.398810
                 unstressed     68.768382
Name: Dur_ms, dtype: float64

In [17]:
# Now check for following segment, where in English, duration is longest before a voiced consonant
# first group consonants according to voicing
ao.next_ph.unique()

array(['x', 'k', 's', 'm', 'l', 'n', 'sp', 'T', 'rf', 'b', 'G', 'L', 'p',
       't', 'D', 'ng', 'f', 'r', 'tS', nan, 'g', '.sil', 'd', 'r(', 'Z',
       '.bn', '.0', 'n~'], dtype=object)

In [33]:
# drop nan row
nan_df = ao[ao.isna().any(axis=1)]
nan_df[["word", "prev_ph", "next_ph"]]

Unnamed: 0,word,prev_ph,next_ph
660,hablas,,b
675,asado,,s
717,afuera,rf,
756,bajo,x,
796,velatorio,j,
1292,música,k,
1293,lucha,tS,
1411,barometro,rf,
1420,vecino,n,
1426,polvo,b,


In [34]:
# replace nan with ".sil"
ao.fillna(".sil", axis = 0, inplace = True)
len(ao)

2201

In [35]:
ao.next_ph.unique()

array(['x', 'k', 's', 'm', 'l', 'n', 'sp', 'T', 'rf', 'b', 'G', 'L', 'p',
       't', 'D', 'ng', 'f', 'r', 'tS', '.sil', 'g', 'd', 'r(', 'Z', '.bn',
       '.0', 'n~'], dtype=object)

In [43]:
import numpy as np

# create a list of our conditions
conditions = [
    ## first condition
    (ao["next_ph"] == "x") | (ao["next_ph"] == "k") | (ao["next_ph"] == "s") | (ao["next_ph"] == "p") |
    (ao["next_ph"] == "t") | (ao["next_ph"] == "f") | (ao["next_ph"] == "tS") | (ao["next_ph"] == "T"),
    ## second condition
    (ao["next_ph"] == "m") | (ao["next_ph"] == "l") | (ao["next_ph"] == "n") | (ao["next_ph"] == "rf") |
    (ao["next_ph"] == "b") | (ao["next_ph"] == "G") | (ao["next_ph"] == "D") | (ao["next_ph"] == "ng") |
    (ao["next_ph"] == "r") | (ao["next_ph"] == "d") | (ao["next_ph"] == "g") | (ao["next_ph"] == "r(") |
    (ao["next_ph"] == "n~") | (ao["next_ph"] == "Z") | (ao["next_ph"] == "L"),
    ## third condition
   (ao["next_ph"] == "sp") | (ao["next_ph"] == ".sil") | (ao["next_ph"] == ".bn") | (ao["next_ph"] == ".0")
    ]

# create a new column and use np.select to assign values to it using our lists as arguments
ao["voicing_next_ph"] = np.select(conditions, ["voiceless", "voiced", "other"])

In [44]:
ao.groupby(["Corpus", "Vowel", "stress", "voicing_next_ph"]).Dur_norm.mean()

Corpus    Vowel  stress      voicing_next_ph
CBAS      a      stressed    other              264.321776
                             voiced             151.949004
                             voiceless          140.234874
                 unstressed  other              252.154884
                             voiced             118.005110
                             voiceless          114.932389
          o      stressed    other              259.078968
                             voiced             147.670264
                             voiceless          125.270581
                 unstressed  other              281.651933
                             voiced             120.221974
                             voiceless          135.970964
DIMEx100  a      stressed    other              248.904100
                             voiced             169.888851
                             voiceless          163.698588
                 unstressed  other              275.348358
           

In [45]:
# remove 0.bn and .0 contexts
ao_fixed = ao[(ao["prev_ph"]!=".bn") & (ao["next_ph"]!=".bn")
             & (ao["next_ph"]!=".0") & (ao["next_ph"]!=".0")]
len(ao_fixed)

2183

In [66]:
ao_fixed.groupby(["Corpus", "Vowel", "stress"]).Dur_norm.mean()

Corpus    Vowel  stress    
CBAS      a      stressed      187.500780
                 unstressed    131.886186
          o      stressed      188.656261
                 unstressed    172.060115
DIMEx100  a      stressed      168.751121
                 unstressed    179.098699
          o      stressed      142.106495
                 unstressed    146.795241
Name: Dur_norm, dtype: float64

In [47]:
#remove word final context to avoid duration calculation issues from background periodicity
ao_fixed_no_fin = ao[ao["voicing_next_ph"]!= "other"]
len(ao_fixed_no_fin)

1863

In [55]:
ao_fixed_no_fin.groupby(["Corpus", "Vowel"]).Dur_norm.mean()

Corpus    Vowel
CBAS      a        124.149024
          o        132.117995
DIMEx100  a        171.340289
          o        139.924971
Name: Dur_norm, dtype: float64

In [53]:
ao_fixed.groupby(["Corpus", "Vowel", "is_wdfin_ph"]).Gender.count()

Corpus    Vowel  is_wdfin_ph
CBAS      a      False          602
                 True           139
          o      False          281
                 True           156
DIMEx100  a      False          416
                 True           154
          o      False          327
                 True           108
Name: Gender, dtype: int64

In [54]:
ao_fixed.groupby(["Corpus", "Vowel", "is_wdfin_ph"]).Dur_norm.mean()

Corpus    Vowel  is_wdfin_ph
CBAS      a      False          124.149024
                 True           259.420007
          o      False          132.117995
                 True           268.050275
DIMEx100  a      False          171.447345
                 True           185.522575
          o      False          138.806967
                 True           163.818599
Name: Dur_norm, dtype: float64

# CBAS female production of /i/

In [4]:
i = vowels[vowels["Vowel"]=="i"].copy()
len(i)

596

In [5]:
i.groupby(["Corpus", "Gender"]).stress.count()

Corpus    Gender
CBAS      Female    182
          Male       82
DIMEx100  Female    246
          Male       86
Name: stress, dtype: int64

In [6]:
i.groupby(["Corpus", "Gender", "stress"]).Participant.count()

Corpus    Gender  stress    
CBAS      Female  stressed       53
                  unstressed    129
          Male    stressed       29
                  unstressed     53
DIMEx100  Female  stressed       74
                  unstressed    172
          Male    stressed       27
                  unstressed     59
Name: Participant, dtype: int64

In [7]:
i.groupby(["Corpus", "Gender", "stress"])["F2.50_norm"].mean()

Corpus    Gender  stress    
CBAS      Female  stressed      1.698654
                  unstressed    1.432272
          Male    stressed      2.114636
                  unstressed    2.120478
DIMEx100  Female  stressed      2.220036
                  unstressed    2.178350
          Male    stressed      2.205195
                  unstressed    2.186619
Name: F2.50_norm, dtype: float64

In [9]:
# Is it all females or just one?
i_fem_cbas = i[(i["Gender"]=="Female") & (i["Corpus"]=="CBAS")].copy()
i_fem_cbas.groupby(["Participant", "stress"])["F2.50_norm"].mean()

Participant  stress    
p113         stressed      1.764646
             unstressed    1.422183
p115         stressed      1.099619
             unstressed    1.001715
p120         stressed      1.916068
             unstressed    1.329856
p124         stressed      2.029584
             unstressed    2.035795
Name: F2.50_norm, dtype: float64

For some female CBAS speakers, stress seems to play a large role in /i/ F2 production, where unstressed /i/ is more backed (more central). However, most of the backing of /i/ seen globally is due to the productions from just one participant, p115.

In [11]:
# check if p115's productions for all vowels is different

fem_cbas = vowels[(vowels["Gender"]=="Female") & (vowels["Corpus"]=="CBAS")].copy()
fem_cbas.groupby(["Vowel", "Participant"])[["F1.50_norm", "F2.50_norm"]].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,F1.50_norm,F2.50_norm
Vowel,Participant,Unnamed: 2_level_1,Unnamed: 3_level_1
a,p113,0.692598,1.460948
a,p115,0.682383,1.537743
a,p120,0.717317,1.346907
a,p124,0.662301,1.423849
e,p113,0.557605,1.75044
e,p115,0.53351,1.454207
e,p120,0.484634,1.636097
e,p124,0.516169,1.747837
i,p113,0.411296,1.533856
i,p115,0.413449,1.029383
