# feature Extraction and representation

Feature extraction and representation: 

- converting text of various level (word, sentence, document) into a set of features (e.g., continuous vectors, or matrices)

- capture essential information from the text while discarding irrelevant details.

- serve as input of language model for downstream task

<table>
    <tr>
        <th>Level</th>
        <th>Model</th>
        <th>Description</th>
        <th>Pros</th>
        <th>Cons</th>
    </tr>
    <tr>
        <td rowspan="3">Word</td>
        <td>One-hot Encoding</td>
        <td>Represents words as binary vectors with a 1 at the index corresponding to the word and 0s elsewhere</td>
        <td>Simple and easy to implement</td>
        <td>High-dimensional, sparse vectors; doesn't capture semantic similarity</td>
    </tr>
    <tr>
        <td>Distributed Word Embeddings (e.g., Word2Vec, GloVe)</td>
        <td>Represents words as dense vectors that capture semantic information</td>
        <td>Lower-dimensional, dense vectors; captures semantic similarity</td>
        <td>Requires pre-training on large text corpus</td>
    </tr>
    <tr>
        <td>Contextualized Word Embeddings (e.g., BERT, ELMo)</td>
        <td>Represents words as dense vectors that capture semantic information within context</td>
        <td>Captures context-dependent meanings; adaptable to various NLP tasks</td>
        <td>Requires pre-training on large text corpus; computationally expensive</td>
    </tr>
    <tr>
        <td rowspan="3">Sentence</td>
        <td>Averaged Word Embeddings</td>
        <td>Represents sentences as the average of their word embeddings</td>
        <td>Simple and fast; captures some semantic information</td>
        <td>Loses word order information; may not capture complex sentence structure</td>
    </tr>
    <tr>
        <td>Contextualized Embeddings (e.g., BERT, GPT)</td>
        <td>Represents sentences as contextualized word embeddings that capture word meaning within context</td>
        <td>Captures word order and semantic information; adaptable to various NLP tasks</td>
        <td>Requires pre-training on large text corpus; computationally expensive</td>
    </tr>
    <tr>
        <td>Parse Tree</td>
        <td>Represents sentences as trees that capture their syntactic structure</td>
        <td>Captures word order and syntactic relationships between words</td>
        <td>May not capture semantic information; parsing can be computationally expensive</td>
    </tr>
    <tr>
        <td rowspan="3">Document</td>
        <td>Bag-of-Words (BoW)</td>
        <td>Represents documents as frequency vectors of words, disregarding word order</td>
        <td>Simple and easy to implement; works well for some tasks</td>
        <td>High-dimensional, sparse vectors; loses word order and grammar information</td>
    </tr>
    <tr>
        <td>TF-IDF</td>
        <td>Represents documents as weighted frequency vectors that emphasize important and discriminative words</td>
        <td>Improves performance in classification and clustering tasks compared to BoW</td>
        <td>High-dimensional, sparse vectors; loses word order and grammar information</td>
    </tr>
    <tr>
        <td>Doc2Vec</td>
        <td>Represents documents as dense vectors that capture semantic information and relationships between words</td>
        <td>Lower-dimensional, dense vectors; captures semantic information</td>
        <td>Requires pre-training on large text corpus; may not capture complex document structure</td>
    </tr>
</table>