To understand first each variable, let's review some audio concepts:

The Ryerson Audio-Visual database of Emotional Speech and Song has the following variables: 
A **sample** of an audio is a portion of an audio's wave in certain (small) interval.

The **sample rate** is the number of samples taken per second. This is a frequency and it is measured in kilohertz (kHz). Remember that the more often you take samples of the original audio, the closer to the original you can get.

For understanding the difference between sample and frame take into consideration these formulas:

* Sample rate = number of samples / second
* Frame = 1 sample from each channel (PCM)
* Frame Size = Sample size * Channels
* Frame Rate = frames / second

For PCM, which is a digital representation of an analog signal, the sample rate and the frame rate are the same since a frame consists of a a sample from each channel.

The sample size is the size of an individual sample, also called **Bit Depth** or **Sample Width**, and it indicates how many bits of information a sample contains. (This is one important factor in the quality/resolution of the audio).

An **audio channel** is the path via a signal or data is delivered, i.e, it's where a sound signal is conveyed from the player source to the speaker. For one channel we talk about **mono**, and several channels we refer to **stereo**.  For instance, in stereo sound, there are two audio sources: one speaker on the left, and one on the right. Each of these is represented by one channel.

An **audio frame** is a data record that contains the samples of all the channels available in an audio signal to the same point in time.

The **zero-crossing rate** is the rate at which a signal changes from positive to zero or negative, and from negative to zero or positive. It's a measure of the smoothness of the signal. The zero-crossing rate can be utilized as a basic pitch detection algorithm for monophonic tonal signals and is a key feature to classify percussive sounds.

**Mel-Frequency Cepstral Coefficients** is a small set of features (usually about 10-20) which concisely describe the overall shape of a spectral envelope.

**Spectral Centroid** indicates where the center of mass of the spectrum is located, and it is a good predictor of the 'brightness' of a sound, which depends on distribution of total power between high and low frequencies. Also can be seen as the amplitude-weighted mean of several frequency components.

**stft chromagram**: applying the Fourier transform, which is used to convert a time-dependent signal to a frequency-dependent signal, to local sections of an audio signal, one obtains the short-time Fourier transform (STFT). The Chroma feature or chromagram of an audio represents the intensity of the twelve distinctive pitch classes that are used to study music.

* 'Modality': media file types (audio-only).
* 'Actor' and 'Sex': corresponding identifier of the actor (01 to 24) and its sex (M of F).
* 'statement': phrase repeated by the actors ("Kids are talking by the door", "Dogs are sitting by the door").
* 'repetition': number of repetitions (1st repetition, 2nd repetition).
* 'vocal chanel': type of channel (speech or song).
* 'Emotion': the emotion of the speaker (neutral, calm, happy, sad, angry, fearful, disgust, surprised).
* 'Emotional intensity': level of emotion of each expression (normal, strong). NOTE: There is no strong intensity for the 'neutral' emotion.
* sample_width: number of bytes of storage needed to save the sample (1 means 8-bit, 2 means 16-bit).
* frame_rate: frequency of samples used (in Hertz).
* frame_width: Number of bytes for each frame. One frame contains a sample for each channel.
* length_ms: audio file length in milliseconds.
* frame_count: number of frames from the sample.
* intensity: loudness in dBFS, which is dB relative to the maximum possible loudness.
* zero_crossings_sum: sum of the zero-crossing rate.
* 'mean', 'std', 'min', 'max', 'kur', 'skew': statistics of the original audio signal.
* mfcc_ 'mean', 'std', 'min', 'max': statistics of the Mel-Frequency Cepstral Coefficients.
* sc_ 'mean', 'std', 'min', 'max', 'kur', 'skew': statistics of the spectral centroid.
* stft_ 'mean', 'std', 'min', 'max', 'kur', 'skew': statistics of the stft chromagram.

Measures to understand the shape of the data:
**Skewness** (skew) measures the symmetry of the distribution

**Kurtosis** (kur) measures the heaviness of the distribution tails, i.e., provides an indication of the presence of outliers.

### Data semantics

## Classification of the variables**
* **Nominal/Categorical:** actor, sex, modality, statement, repetition, vocal channel.
* **Ordinal:**  emotional intensity - Emotion 
* **Numeric:**   ; .



* **Ratio-Scaled:** lenght_ms, zero_crossing_sum; frame_rate, width;
sample_width
stft_mean, stft_std, stft_min, stft_max, stft_kur, stft_skew
sc_mean, sc_std, sc_min, sc_max, sc_kur, sc_skew
mfcc_mean, mfcc_std, mfcc_min, mfcc_max, mfcc_kur, mfcc_skew
'std', 'min', 'max', 'kur', 'skew

The classification of variables depends on the real data?
what it means the caracteristic of the variables? Is just the classification and the domain?
For emotion, is it nomial or ordinal? what about partial ordered sets?

Possible tuples of features we can analyze:

* EI with SW (sample width, I don't know if it does make sense, SW has always 2 as value)


**Distributions in Claudio's notebook**:
* emotion x sex (s/c)
* emotion x intensity (box plot)
* emotion (pie chart - histo)
* audio length (hist)
* vocal channel x sex (stacked chart)
* sex with  emotional intensity (EI)
* SW - Frame Width (it doesn't make sense, SW has always the same value and frame width has almost always the same value)
* Frame rate - ZC (it doesn't make sense, frame rate has always the same value)
* Frame count - ZC
* Length_ms - Frame count
* EI with statement (stat), length, intensity (I), Zero-crossing sum (ZC)

**Tasks for next meeting**
1. To understand the meaning of each variable.
2. To think about the characteristics of each variable, for instance _what does it means that a sc_skew is higher or smaller than the others?_
3. If you have time, to think about the statistical analysis we can associate to pairs of variables.
