/
Discriminant_analysis.html
107 lines (106 loc) · 6.96 KB
/
Discriminant_analysis.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
<html><head><meta name="robots" content="index,follow">
<title>Discriminant analysis</title></head><body bgcolor="#FFFFFF">
<table border=0 cellpadding=0 cellspacing=0><tr><td bgcolor="#CCCC00"><table border=4 cellpadding=9><tr><td align=middle bgcolor="#000000"><font face="Palatino,Times" size=6 color="#999900"><b>
Discriminant analysis
</b></font></table></table>
<p>
This tutorial will show you how to perform discriminant analysis with P<font size=-1>RAAT</font></p>
<p>
As an example, we will use the dataset from <a href="Pols_et_al___1973_.html">Pols et al. (1973)</a> with the frequencies and levels of the first three formants from the 12 Dutch monophthongal vowels as spoken in /h_t/ context by 50 male speakers. This data set has been incorporated into Praat and can be called into play with the <a href="Create_TableOfReal__Pols_1973____.html">Create TableOfReal (Pols 1973)...</a> command that can be found in the "New / TableOfReal" menu.</p>
<p>
In the list of objects a new TableOfReal object will appear with 6 columns and 600 rows (50 speakers × 12 vowels). The first three columns contain the formant frequencies in Hz, the last three columns contain the levels of the first three formants given in decibels below the overall sound pressure level of the measured vowel segment. Each row is labelled with a vowel label.</p>
<p>
Pols et al. use logarithms of frequency values, we will too. Because the measurement units in the first three columns are in Hz and in the last three columns in dB, it is probably better to standardize the columns. The following script summarizes our achievements up till now:</p>
<code>
Create TableOfReal (Pols 1973)... yes<br></code>
<code>
Formula... if col < 4 then log10 (self) else self fi<br></code>
<code>
Standardize columns<br></code>
<code>
# change the column labels too, for nice plot labels.<br></code>
<code>
Set column label (index)... 1 standardized log (%F__1_)<br></code>
<code>
Set column label (index)... 2 standardized log (%F__2_)<br></code>
<code>
Set column label (index)... 3 standardized log (%F__3_)<br></code>
<code>
Set column label (index)... 4 standardized %L__1_<br></code>
<code>
Set column label (index)... 5 standardized %L__2_<br></code>
<code>
Set column label (index)... 6 standardized %L__3_<br></code>
<p>
To get an indication of what these data look like, we make a scatter plot of the first standardized log-formant-frequency against the second standardized log-formant-frequency. With the next script fragment you can reproduce the following picture.</p>
<code>
Viewport... 0 5 0 5<br></code>
<code>
select TableOfReal pols_50males<br></code>
<code>
Draw scatter plot... 1 2 0 0 -2.9 2.9 -2.9 2.9 10 y + y<br></code>
<p><font size=-2>[sorry, no pictures yet in the web version of this manual]</font></p>
<p>
Apart from a difference in scale this plot is the same as fig. 3 in the Pols et al. article.</p>
<h3>
1. How to perform a discriminant analysis</h3>
<p>
Select the TableOfReal and choose from the dynamic menu the option <a href="TableOfReal__To_Discriminant.html">To Discriminant</a>. This command is available in the "Multivariate statistics" action button. The resulting Discriminant object will bear the same name as the TableOfReal object. The following script summarizes:</p>
<code>
select TableOfReal pols_50males<br></code>
<code>
To Discriminant<br></code>
<h3>
2. How to project data on the discriminant space</h3>
<p>
You select a TableOfReal and a Discriminant object together and choose: <a href="Discriminant___TableOfReal__To_Configuration___.html">To Configuration...</a>. One of the options of the newly created Configuration object is to draw it. The following picture shows how the data look in the plane spanned by the first two dimensions of this Configuration. The directions in this configuration are the eigenvectors from the Discriminant.</p>
<p><font size=-2>[sorry, no pictures yet in the web version of this manual]</font></p>
<p>
The following script summarizes:</p>
<code>
select TableOfReal pols_50males<br></code>
<code>
plus Discriminant pols_50males<br></code>
<code>
To Configuration... 0<br></code>
<code>
Viewport... 0 5 0 5<br></code>
<code>
Draw... 1 2 -2.9 2.9 -2.9 2.9 12 y + y<br></code>
<p>
If you are only interested in this projection, there also is a short cut without an intermediate Discriminant object: select the TableOfReal object and choose <a href="TableOfReal__To_Configuration__lda____.html">To Configuration (lda)...</a>.</p>
<h3>
3. How to draw concentration ellipses</h3>
<p>
Select the Discriminant object and choose <a href="Discriminant__Draw_sigma_ellipses___.html">Draw sigma ellipses...</a>. In the form you can fill out the coverage of the ellipse by way of the <i>Number of sigmas</i> parameter. You can also select the projection plane. The next figure shows the 1-<i></i>σ concentration ellipses in the standardized log <i>F</i><sub>1</sub> vs log <i>F</i><sub>2</sub> plane. When the data are multinormally distributed, a 1-<i></i>σ ellipse will cover approximately 39.3% of the data. The following code summarizes:</p>
<code>
select Discriminant pols_50males<br></code>
<code>
Draw sigma ellipses... 1.0 no 1 2 -2.9 2.9 -2.9 2.9 12 yes<br></code>
<p><font size=-2>[sorry, no pictures yet in the web version of this manual]</font></p>
<h3>
4. How to classify</h3>
<p>
Select together the Discriminant object (the classifier), and a TableOfReal object (the data to be classified). Next you choose <a href="Discriminant___TableOfReal__To_ClassificationTable___.html">To ClassificationTable</a>. Normally you will enable the option <i>Pool covariance matrices</i> and the pooled covariance matrix will be used for classification.</p>
<p>
The ClassificationTable can be converted to a <a href="Confusion.html">Confusion</a> object and its fraction correct can be queried with: <a href="Confusion__Get_fraction_correct.html">Confusion: Get fraction correct</a>.</p>
<p>
In general you would separate your data into two independent sets, <font size=-1>TRAIN</font> and <font size=-1>TEST</font>. You would use <font size=-1>TRAIN</font> to train the discriminant classifier and <font size=-1>TEST</font> to test how well it classifies. Several possibilities for splitting a dataset into two sets exist. We mention the <a href="Jackknife.html">jackknife</a> ("leave-one-out") and the <a href="Bootstrap.html">bootstrap</a> methods ("resampling").</p>
<h3>Links to this page</h3>
<ul>
<li><a href="Acknowledgments.html">Acknowledgments</a>
<li><a href="Canonical_correlation_analysis.html">Canonical correlation analysis</a>
<li><a href="Configuration.html">Configuration</a>
<li><a href="Discriminant.html">Discriminant</a>
<li><a href="Feedforward_neural_networks_3__FFNet_versus_discriminan.html">Feedforward neural networks 3. FFNet versus discriminant classifier</a>
<li><a href="Intro.html">Intro</a>
<li><a href="Statistics.html">Statistics</a>
<li><a href="Types_of_objects.html">Types of objects</a>
<li><a href="What_s_new_.html">What's new?</a>
</ul>
<hr>
<address>
<p>© djmw, February 27, 1999</p>
</address>
</body>
</html>