In [None]:
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import HTML
def css_styling():
    styles = open("../input/titlestyle/style2.css", "r").read()
    return HTML(styles)
css_styling()

<div class="heading">
   <h1><span style="color: white">Intro</span></h1>
</div>
<div class="content">

<u>📔 Public notebooks - 942+</u><br>

<u>🥇 Gold medals - 56+</u><br>

<u>🥈 Silver medals - 64+</u><br>

<u>🥉 Bronze medals - 235+</u><br>
    
<u>1️ place solution - https://www.kaggle.com/c/commonlitreadabilityprize/discussion/257844</u><br>
    
<u>2️ place solution - https://www.kaggle.com/c/commonlitreadabilityprize/discussion/258328</u><br>
    
<u>3️ place solution - https://www.kaggle.com/c/commonlitreadabilityprize/discussion/258095</u><br>


Wordcloud made of notebook titles:
<img src="https://i.imgur.com/FDr7zla.png" alt="img1"/>
</div>

<div style = "font-family: Arial;font-size:1.6em;color: #0a6121;background: #ace6bc;padding:5px;border-style: solid;border-color:#0a6121;">
<b>Summa summarum: Transformers, ensemble methods and meta-labeling</b> 
</div>

<div class="heading">
   <h1><span style="color: white">Goal</span></h1>
</div>
<div class='content'>
    <h3><b>📖 Build algorithms to rate the complexity of reading passages for grade 3-12 classroom use.</b></h3>
    <h3><b>📖 The rate complexity is a number between -3.67 and 1.71 that is the result of a Bradley-Terry analysis of more than 111,000 pairwise comparisons. This task is very similar to simple sentiment prediction with a difference in different target range.</b></h3>
</div>

In [None]:
train_df = pd.read_csv("../input/commonlitreadabilityprize/train.csv")
print('Sample with minimum target value ({})'.format(train_df['target'].min()))
print(train_df[train_df['target'] == train_df['target'].min()]['excerpt'].values)
print('_'*100)
print('Sample with maximum target value ({})'.format(train_df['target'].max()))
print(train_df[train_df['target'] == train_df['target'].max()]['excerpt'].values)

In [None]:
train_df.head()

<div class="heading">
   <h1><span style="color: white">Transformers</span></h1>
</div>
<div class='content'>
    
A <b>transformer</b> is a deep learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data and It is used primarily in the field of natural language processing (NLP) The most popular transformer-based model is called <b>BERT (Bidirectional Encoder Representations from Transformers)</b>.
<br><br>
🔥<b>Comprehensive notebook related to BERT and Transformer models</b> with different ways of utilizing layers and outputs, finetuning stability, LIT (Language Interpretability Tool), speeding up transformers, etc. This notebook also provides many content-related references. <a href="https://www.kaggle.com/rhtsingh/utilizing-transformer-representations-efficiently">https://www.kaggle.com/rhtsingh/utilizing-transformer-representations-efficiently</a>
<br><br>
🔥<b>BERT, theoretical explanation from scratch</b> <a href="https://www.kaggle.com/mdfahimreshm/bert-in-depth-understanding">https://www.kaggle.com/mdfahimreshm/bert-in-depth-understanding</a>
<br><br>
🔥<b>Ideas to improve BERT model</b> <a href="https://www.kaggle.com/c/commonlitreadabilityprize/discussion/241029">https://www.kaggle.com/c/commonlitreadabilityprize/discussion/241029</a>
<br><br>
Simplier theoretical explanation of BERT model with example using Keras <a href="https://www.kaggle.com/krishna1997gopal/understand-bert-in-depth-theory-implementation">https://www.kaggle.com/krishna1997gopal/understand-bert-in-depth-theory-implementation</a>
<br><br>
One modification of BERT model is <b>RoBERTa</b>. It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with much larger mini-batches and learning rates. RoBERTa base solution in PyTorch <a href="https://www.kaggle.com/andretugan/lightweight-roberta-solution-in-pytorch">https://www.kaggle.com/andretugan/lightweight-roberta-solution-in-pytorch</a>
<br><br>
<b>RoBERTa with TF</b> implementation can be found here <a href="https://www.kaggle.com/dimitreoliveira/commonlit-readability-eda-roberta-tf-baseline">https://www.kaggle.com/dimitreoliveira/commonlit-readability-eda-roberta-tf-baseline</a>
<br><br>
Similarly, user friendly implementation of <b>ensemble with BERT, RoBERTa and distilBERT</b> <a href="https://www.kaggle.com/eneszvo/bert-roberta-distilbert-ensemble-5-fold-cv">https://www.kaggle.com/eneszvo/bert-roberta-distilbert-ensemble-5-fold-cv</a>
<br><br>
<b>Pytorch BERT</b> step by step for begginers <a href="https://www.kaggle.com/chumajin/pytorch-bert-beginner-s-room">https://www.kaggle.com/chumajin/pytorch-bert-beginner-s-room</a>
<br><br>
User friendly <b>BERT model with Keras</b> as well as many other models <a href="https://www.kaggle.com/donmarch14/commonlit-detailed-guide-to-learn-nlp">https://www.kaggle.com/donmarch14/commonlit-detailed-guide-to-learn-nlp</a>
<br><br>
One interesting method of word embedding used to be popular before transformers and BERT is <b>GloVe</b>. It focuses on words co-occurrences over the whole corpus. Its embeddings relate to the probability that two words appear together. <b>GloVe with LSTM</b> are explained here <a href="https://www.kaggle.com/andreshg/commonlit-a-complete-analysis">https://www.kaggle.com/andreshg/commonlit-a-complete-analysis</a>
</div>

<div class="heading">
   <h1><span style="color: white">Non neural network approach</span></h1>
</div>
<div class='content'>

<b>Features engineering</b> is a process by which we extract features from raw data for use in our machine learning model. We need these features because they help us better understand the relationships between variables in our dataset. Using the `readability` library we can extract 24 powerful traditional features such as <b>words per sentence, Flesch, or Kincaid readability score.</b> In addition, the way of creating more than <b>300 diverse features from text</b> using `spacy` library, as well as a Ridge regression model, is presented here <a href="https://www.kaggle.com/ravishah1/readability-feature-engineering-non-nn-baseline">https://www.kaggle.com/ravishah1/readability-feature-engineering-non-nn-baseline</a>
<br><br>
XGBRFRegressor with features importance can be found here <a href="https://www.kaggle.com/andradaolteanu/i-commonlit-explore-xgbrf-repeatedfold-model">https://www.kaggle.com/andradaolteanu/i-commonlit-explore-xgbrf-repeatedfold-model</a>
<br><br>
One way of converting the collection of text documents to a matrix of tokens is using <b>CountVectorizer</b>. It simply counts the appearance of each word in the sentence using one corpus vector.
<img src="https://i.imgur.com/sFToffN.png" alt="img1"/>
Slightly more complex sentence representation would be TF-IDF (term frequency-inverse document frequency)
<img src="https://i.imgur.com/72ZeW1X.png" alt="img1"/>
Both approaches are explained here <a href="https://www.kaggle.com/andreshg/commonlit-a-complete-analysis">https://www.kaggle.com/andreshg/commonlit-a-complete-analysis</a>

Very simple <b>Ridge</b> model with spacy features <a href="https://www.kaggle.com/konradb/linear-baseline-with-cv">https://www.kaggle.com/konradb/linear-baseline-with-cv</a>
</div>

<div class="heading">
   <h1><span style="color: white">Text preprocessing</span></h1>
</div>
<div class='content'>
    
Text preprocessing is the first step in Natural Language Processing (NLP). The process of NLP text preprocessing includes removing stop words, punctuation, and numbers in the text. The primary goal of NLP text preprocessing is to remove content that does not carry any semantic meaning. This will help reduce the time needed for processing and make the rest of the process more accurate.
<br><br>
<b>Stopword removal</b> removes a list of common words (e.g., "a", "an", "the") that are not useful for building any meaningful relationships between different input sentences or phrases.
<b>Lemmatizing</b> is the process of reducing words to their roots or base forms, which are more easily processed by computers. For example, the words "writing", "written" and "writes" might all be assigned the lexeme "write."
In order to analyze textuall content, we can observe most common <b>unigrams</b> (single word), <b>bigrams</b> (group of two words), <b>trigrams</b> (group of two words) etc.
<br><br>
Text <b>part-of-speech tagging</b> is the process by which a machine assigns parts of speech to words according to linguistic rules and patterns. The most common POS tags are Noun, Verb, Adjective, Adverb, Preposition, Conjunction, Interjection.
<br><br>
EDA, stopword removal, lemmatizing and part-of-speech tagging are presented here <a href="https://www.kaggle.com/ruchi798/commonlit-readability-prize-eda-baseline">https://www.kaggle.com/ruchi798/commonlit-readability-prize-eda-baseline</a>
</div>

<div class="heading">
   <h1><span style="color: white">Other useful notebooks</span></h1>
</div>
<div class='content'>
    
Not directly related to this competition but interesting, step by step implementation of <b>NGram</b> approach, where the idea is to get the next word given the sequence of words <a href="https://www.kaggle.com/alincijov/nlp-starter-logsoftmax-nlloss-cross-entropy">https://www.kaggle.com/alincijov/nlp-starter-logsoftmax-nlloss-cross-entropy</a>
<br><br>
<b>PyTorch data samplers</b> and how to contol batches <a href="https://www.kaggle.com/shahules/guide-pytorch-data-samplers-sequence-bucketing">https://www.kaggle.com/shahules/guide-pytorch-data-samplers-sequence-bucketing</a>
<br><br>
Comprehensive notebook with <b>data cleaning</b> techniques <a href="https://www.kaggle.com/mpwolke/dataprep-clean-literature">https://www.kaggle.com/mpwolke/dataprep-clean-literature</a>
</div>