# FINAL PROJECT 2024
The goal of this project is to implement trending topic detection algorithms in X messages, as proposed in the article "Sensing Trending Topics in Twitter" by L. M. Aiello et al., IEEE Transactions on Multimedia, vol. 15, no. 6, pp. 1268-1282, Oct. 2013, doi: 10.1109/TMM.2013.2265080.

•	The suggested Databricks runtime version is the 12.2 LTS ML (Scala 2.12, Spark 3.3.2) instead the Standard version.

•	You have to follow the instructions provided in this webpage to install the SparkNLP library in Databricks, in order to be able to use it in that environment:

https://sparknlp.org/docs/en/install#databricks-support


In [0]:
from pyspark.sql.functions import col, udf, explode, from_json
from pyspark.sql.types import *
import sparknlp
import json
import urllib.request, json 
import nltk

from sparknlp.base import DocumentAssembler, Pipeline
from sparknlp.annotator import (
    LanguageDetectorDL
)
import pyspark.sql.functions as F

### Function to retrieve data
In the following function we are retrieving the data (tweets) based on time parameters.

You must specify the start date (year, month, day, hour. minute) and the time lenght of the data to be retrieved

Here we create the url for the given parameters and then proceed to make a request




In [0]:
from pyspark.sql import SparkSession
from datetime import datetime, timedelta
import urllib.request
import os

def get_tweets(start_year=2019, start_month=8, start_day=1, start_hour=2, start_minute=0, minutes_length=5):
    """
    Fetch tweets from the given URL based on the specified parameters and return a filtered DataFrame.

    Parameters:
        start_year (int): The starting year for fetching tweets.
        start_month (int): The starting month for fetching tweets.
        start_day (int): The starting day for fetching tweets.
        start_hour (int): The starting hour for fetching tweets.
        start_minute (int): The starting minute for fetching tweets.
        minutes_length (int): The total duration in minutes to fetch tweets.

    Returns:
        pyspark.sql.DataFrame: Filtered DataFrame with selected tweet fields.
    """
    # Initialize SparkSession
    spark = SparkSession.builder.appName("GetTweets").getOrCreate()
    
    # Define the maximum duration for a single request
    MAX_MINUTES = 30

    # Start time
    start_time = datetime(start_year, start_month, start_day, start_hour, start_minute)
    end_time = start_time + timedelta(minutes=minutes_length)

    # Initialize a list to collect data from all chunks
    all_data = []

    # Loop through the time intervals in chunks of MAX_MINUTES
    current_time = start_time
    while current_time < end_time:
        # Calculate the duration for this request
        request_minutes = min(MAX_MINUTES, (end_time - current_time).seconds // 60)
        
        # Construct the URL for the current chunk
        datafile = (
            f"http://mcomputing.tsc.uc3m.es/get_tweets.php?start_year={current_time.year}"
            f"&start_month={current_time.month:02d}&start_day={current_time.day:02d}"
            f"&start_hour={current_time.hour:02d}&start_minute={current_time.minute:02d}"
            f"&minutes_length={request_minutes}"
        )
        
        try:
            # Download the JSON data for the current chunk
            with urllib.request.urlopen(datafile) as url:
                data = url.read().decode()
                all_data.append(data)

        except Exception as e:
            print(f"Failed to fetch data for interval starting at {current_time}: {e}")

        # Increment the current time
        current_time += timedelta(minutes=request_minutes)

    # If no data was fetched, display a message and return
    if not all_data:
        print("No data was fetched for the given time range.")
        return None

    # Save all data into a single temporary file
    temp_file_path = "/tmp/temporal.json"
    with open(temp_file_path, "w") as f:
        for chunk in all_data:
            f.write(chunk + "\n")  # Write each chunk on a new line

    # Read the combined JSON file into a Spark DataFrame
    tweet_df = spark.read.json(f"file://{temp_file_path}")
    
    # Filter and select the desired fields
    filtered_df = tweet_df.select("created_at", "user.id", "user.name", "user.screen_name", "user.lang", "text")

    return filtered_df



In [0]:
tweets = get_tweets()

In [0]:
tweets.count()

Out[10]: 11206

In [0]:
tweets.columns

Out[11]: ['created_at', 'id', 'name', 'screen_name', 'lang', 'text']

In [0]:
display(tweets.select("created_at","id","text").head(10))

created_at,id,text
Thu Aug 01 08:00:00 +0000 2019,2506848924.0,Чому не варто терти очі кулаками: думка експертів
Thu Aug 01 08:00:00 +0000 2019,1.110566818576388e+18,kamala harris fucked up bgt parah
Thu Aug 01 08:00:00 +0000 2019,7.810562051793265e+17,RT @mgmgnet: 朝早くドトールに行ったら白髪のおじさまが「このタピ…ってやつはあるかい？」「俺みたいなんが飲んだら喉につまらせるかね？」「楽しみだなぁ、テレビで見て飲んでみたかったんだ」とニコニコしていて大変よかった　ドトールでタピオカする意味ある
Thu Aug 01 08:00:00 +0000 2019,1.132529419099095e+18,RT @niwaniwa_28ko: ポコに「ちょうだい」を教えてみたよ♪(๑’ᵕ’๑)♪ #ニワトリは飼うと可愛い https://t.co/Dbh5X9Ixd7
Thu Aug 01 08:00:00 +0000 2019,1.1068552395780874e+18,@elementslover 乗ります！
Thu Aug 01 08:00:00 +0000 2019,613459404.0,2019-08-01T07:59:15.000Z1156836779833483264
Thu Aug 01 08:00:00 +0000 2019,1388797110.0,RT @loopsjongin: https://t.co/uzS31kV1mS
,,
Thu Aug 01 08:00:00 +0000 2019,524872493.0,RT @lament1ess: 手取り23万、食費4万。僕はただ働き続け、気付けば日々弾力を失っていく心がひたすらつらかった。 そしてある朝、かつてあれほどまでに真剣で切実だった思いがきれいに失われていることに僕は気づき、もう限界だと知った時、会社を辞めた。 https://…
Thu Aug 01 08:00:00 +0000 2019,1867085616.0,@pcyegan wkwkwk sedi bre kita kepisah


### Function performing initial data cleaning
Here, we are cleaning the tweets by converting timestamps, revoming nulls and unwanted characters.

First we will deal with the created_at column to properly convert it into a timestamp format

Then we will remove rows with any null values in the columns of relevance which are created_at and text mainly

For better refinement, we get rid of any characters that arent alphnumeric or _ (or @ # if special_cahracters is set to True), so this way we get rid of ! , etc, and any symbols belonging to a different alphabet. And we convert to lower case. Using it does not change teh rseults much since if we have #love or @mike32 they will simply become love and mike32, in which case if they are repeated often, it will still show up as relevant.

After applying this filter, any rows that have now been emptied, will be removed.

In [0]:
from pyspark.sql import functions as F

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY") # important for the time column conversion


def inital_cleaning_prep(tweets, special_characters=False, column_name='text'):
    """
    Performs initial cleaning on a DataFrame of tweets:
    - Converts `created_at` to a timestamp.
    - Removes rows with null values in `created_at`, `id`, or `text`.
    - Removes symbols and special characters from the `text` column.
    - Converts the `text` column to lowercase.
    - Removes rows where the `text` column is empty after cleaning.

    Parameters:
        tweets (DataFrame): The input PySpark DataFrame.
        column_name (str): The name of the column to clean (default is 'text').

    Returns:
        DataFrame: The cleaned PySpark DataFrame.
    """
    # Step 1: Convert `created_at` to a timestamp
    tweets = tweets.withColumn(
        "created_at",
        F.to_timestamp("created_at", "EEE MMM dd HH:mm:ss Z yyyy")  # Legacy version
    )
    
    # Step 2: Remove rows with null values in `created_at`, `id`, or `text`
    cleaned_tweets = tweets.dropna(subset=["created_at", "id", column_name])

    # Step 3.1: Remove symbols and special characters from the `text` column
    if special_characters:
        cleaned_tweets = cleaned_tweets.withColumn(
            column_name,
            F.regexp_replace(column_name, r"[^\w\s@#]", "")  # Remove all non-word or number characters except whitespace, `@` and `#` 
        )
    else:
        cleaned_tweets = cleaned_tweets.withColumn(
            column_name,
            F.regexp_replace(column_name, r"[^\w\s]", "")  # Remove all non-word or number characters except whitespace
        )

    # Step 3.2: Convert the `text` column to lowercase
    cleaned_tweets = cleaned_tweets.withColumn(
        column_name,
        F.lower(F.col(column_name))  # Convert to lowercase
    )

    # Step 4: Remove rows where the `text` column is empty after cleaning
    cleaned_tweets = cleaned_tweets.filter(F.col(column_name).rlike(r"\S"))  # Keep rows where `text` has non-whitespace characters

    return cleaned_tweets



In [0]:
tweets = inital_cleaning_prep(tweets)

In [0]:
display(tweets.select("created_at","id","text").head(10))

created_at,id,text
2019-08-01T08:00:00.000+0000,1110566818576388096,kamala harris fucked up bgt parah
2019-08-01T08:00:00.000+0000,781056205179326464,rt mgmgnet
2019-08-01T08:00:00.000+0000,1132529419099095040,rt niwaniwa_28ko httpstcodbh5x9ixd7
2019-08-01T08:00:00.000+0000,1106855239578087424,elementslover
2019-08-01T08:00:00.000+0000,613459404,20190801t075915000z1156836779833483264
2019-08-01T08:00:00.000+0000,1388797110,rt loopsjongin httpstcouzs31kv1ms
2019-08-01T08:00:00.000+0000,524872493,rt lament1ess 234  https
2019-08-01T08:00:00.000+0000,1867085616,pcyegan wkwkwk sedi bre kita kepisah
2019-08-01T08:00:00.000+0000,1964776824,rt onthecccccc httpstcosylbr1tm7d
2019-08-01T08:00:00.000+0000,1121798073808613376,rt jnkchanels this is so sad httpstcooqr6apodz1


In [0]:
tweets.count()

Out[19]: 9311

### Function filters English tweets
With this function we filter tweets to retain only those in English, like in the original example provided to us.

In [0]:
from pyspark.sql.functions import udf
from pyspark.sql.types import BooleanType
from sparknlp.base import DocumentAssembler
from sparknlp.annotator import LanguageDetectorDL
from pyspark.ml import Pipeline

def filter_english_tweets(filteredDF):
    """
    Filters English tweets using Spark NLP's LanguageDetectorDL.

    Args:
    - filteredDF: Spark DataFrame with a `text` column.

    Returns:
    - DataFrame: Filtered DataFrame containing only English tweets with required columns.
    """
    # Step 1: Language Detection Pipeline
    document_assembler = DocumentAssembler() \
        .setInputCol("text") \
        .setOutputCol("document")

    language_detector = LanguageDetectorDL.pretrained() \
        .setInputCols("document") \
        .setOutputCol("language")

    nlp_pipeline = Pipeline(stages=[document_assembler, language_detector])
    detected_languages = nlp_pipeline.fit(filteredDF).transform(filteredDF)

    # Step 2: Filter for English Language
    def is_english(language_data):
        return 'en' in language_data

    udf_is_english = udf(is_english, BooleanType())

    english_tweets = detected_languages.filter(
        udf_is_english("language.result")
    ).select("created_at", "text")  # Select only relevant columns

    return english_tweets



In [0]:
tweets = filter_english_tweets(tweets)

ld_wiki_tatoeba_cnn_21 download started this may take some time.
Approximate size to download 7.1 MB
[ | ][OK!]


In [0]:
display(tweets.select("created_at","text").head(10))

created_at,text
2019-08-01T08:00:00.000+0000,kamala harris fucked up bgt parah
2019-08-01T08:00:00.000+0000,rt jnkchanels this is so sad httpstcooqr6apodz1
2019-08-01T08:00:00.000+0000,auceiram ako sana kaya lang walang pera wait hanap buhay muna ako
2019-08-01T08:00:00.000+0000,rt sam56786260 zitudiary amen
2019-08-01T08:00:00.000+0000,nerf everywhere
2019-08-01T08:00:00.000+0000,01082019 085801 httpstcombaecjem9y blastfmcountry buck johnson country rockin amp reelin
2019-08-01T08:00:00.000+0000,mayazzam_ almasryalyoum
2019-08-01T08:00:00.000+0000,rt hotladysite you will simply choose a woman and ask to fuck httpstco0g4uladojt httpstcozw8ktl8jkf
2019-08-01T08:00:00.000+0000,thank goodness for waterproof mattress covers
2019-08-01T08:00:00.000+0000,predicar la unitat s incompatible amb dedicar retrets constants als socis que noms faran descarrilar la respos httpstcook3gh2rwff


In [0]:
tweets.count()

Out[23]: 3587

## Data preparation - BNGRAM

### Data preprocessing for BNgram

Now that we have cleared the first steps, getting the data and the initial language filter (plus the extra cleaning of null or empty rows) we will perform further data preparation before applying the topic detection algorithm.

Given that we have chosen to apply BNgram, these steps haing refined for this specific case.

In these stpes we include:

- **Tokenization** - splits text into tokens and removes punctuation, hyphens, and abbreviations.

- **Temporal aggregation** - gropu the documents with the clean text (treated tokens) into superdocuments based on a time window

After this rows which are empty are removed.

In this process there is something specific to **BNGram** worth mentioning. To enhance the performance of BNgram, **NER** (Named Entity Recognition) can be applied. What this will do is extract the relevant words which represent known entities, like names or places, so that they can later on be used in the scoring of the ngrams (groups of words) giving more importance to the ones that are recogniezd as entities, since this will help identify topics. It is worth noting, that we allow the user to apply NER or not, given that adding it increments the running time a lot.

I will also mention here that we had implemented stemming, but it was later removed since, like it is said in the paper, in the case of bngram it worsens the performance.



### Step by step

We will go through the steps, seeing the intermediate results of each part of the preprocessing and algorithm


#### NER

We extract named entities from tweets using a pretrained (NER) pipeline, if requested by the user.

Like we said, if it is implemented it will take some time to run.

In [0]:
from pyspark.sql import Row
from pyspark.sql.functions import col, window, collect_list, lower, size, expr
from sparknlp.annotator import Tokenizer, StopWordsCleaner
from sparknlp.pretrained import PretrainedPipeline

# NER part separate - given that it is a heavy process

def extract_entities(input_df):
    """
    Apply a pretrained NER pipeline to extract entities from text data.

    Parameters:
    - input_df: Spark DataFrame with columns ["created_at", "text"].

    Returns:
    - enriched_df: Spark DataFrame with additional "entities" column.
    """
    # Initialize the pretrained NER pipeline
    ner_pipeline = PretrainedPipeline("onto_recognize_entities_sm", lang="en")

    # Extract entities row by row locally
    enriched_rows = []
    for row in input_df.toLocalIterator():  # Process locally to avoid serialization issues
        ner_result = ner_pipeline.annotate(row["text"])
        enriched_rows.append(Row(created_at=row["created_at"], text=row["text"], entities=ner_result.get("entities", [])))

    # Convert enriched rows back to a DataFrame
    enriched_df = spark.createDataFrame(enriched_rows)

    return enriched_df



We prepare text data for n-gram generation and temporal aggregation, folowing the stpes described above.

In [0]:
def preprocess_for_bngram(input_df, time_window="1 minutes", apply_ner=True):
    """
    Preprocess data for the BNgram algorithm.

    Parameters:
    - input_df: Spark DataFrame containing the raw text data.
    - time_window: Time window for temporal aggregation (default: '5 minutes').
    - apply_ner: Whether to apply the NER step (default: True).

    Returns:
    - aggregated_data: Spark DataFrame with temporally aggregated text data.
    """
    # Apply NER if requested
    if apply_ner:
        input_df = extract_entities(input_df)

    # Step 1: Tokenization
    document_assembler = DocumentAssembler() \
        .setInputCol("text") \
        .setOutputCol("document")

    tokenizer = Tokenizer() \
        .setInputCols(["document"]) \
        .setOutputCol("token")

    stopwords_cleaner = StopWordsCleaner() \
        .setInputCols(["token"]) \
        .setOutputCol("clean_tokens") \
        .setCaseSensitive(False)

    # Create text cleaning pipeline
    text_pipeline = Pipeline(stages=[document_assembler, tokenizer, stopwords_cleaner])
    processed_data = text_pipeline.fit(input_df).transform(input_df)

    # Filter out rows with empty clean_text
    non_empty_text = processed_data.filter(size(col("clean_tokens.result")) > 0)

    # Step 2: Temporal Aggregation
    if apply_ner:
        aggregated_data = non_empty_text.groupBy(window("created_at", time_window)) \
            .agg(collect_list("clean_tokens.result").alias("super_documents"),
                 collect_list("entities").alias("entities_list"))
    else:
        aggregated_data = non_empty_text.groupBy(window("created_at", time_window)) \
            .agg(collect_list("clean_tokens.result").alias("super_documents"))
    # Filter out empty super-documents
    aggregated_data = aggregated_data.filter(size(col("super_documents")) > 0)

    return aggregated_data




In [0]:
aggregated_tweets = preprocess_for_bngram(tweets, time_window='1 minutes', apply_ner=True)

onto_recognize_entities_sm download started this may take some time.
Approx size to download 159 MB
[ | ][OK!]


In [0]:
# display(aggregated_tweets.head(10)) # due to the super docs in time aggreagation it takes too long to run otherwse

display(aggregated_tweets.head(1))

window,super_documents,entities_list
"List(2019-08-01T08:00:00.000+0000, 2019-08-01T08:01:00.000+0000)","List(List(kamala, harris, fucked, bgt, parah), List(rt, jnkchanels, sad, httpstcooqr6apodz1), List(auceiram, ako, sana, kaya, lang, walang, pera, wait, hanap, buhay, muna, ako), List(rt, sam56786260, zitudiary, amen), List(nerf, everywhere), List(01082019, 085801, httpstcombaecjem9y, blastfmcountry, buck, johnson, country, rockin, amp, reelin), List(mayazzam_, almasryalyoum), List(rt, hotladysite, simply, choose, woman, ask, fuck, httpstco0g4uladojt, httpstcozw8ktl8jkf), List(thank, goodness, waterproof, mattress, covers), List(predicar, la, unitat, incompatible, amb, dedicar, retrets, constants, als, socis, que, noms, faran, descarrilar, la, respos, httpstcook3gh2rwff), List(kane, wootten, said, personal, issues, time, incident, httpstco9qmjn7yrig), List(backlash, views, transgender, children, lopez, backtracks, httpstcox8x598ystw), List(httpstcoumtknu4afr, panther, dan, httpstcorftcbca6bl), List(trump, administration, says, set, system, allowing, americans, access, lowercost, prescription, drugs, c, httpstcob4pevdsnhd), List(2016, case, la, manada, spain, set, massive, protests, call, reform, legislation, two, recent, cases, httpstcowbkmyx1ikt), List(consumers, rank, cybersecurity, ahead, price, attractiveness, choosing, primary, retailer, httpstco0ooc2z8fus), List(looking, today, todays, worldwidewebday, httpstcol4n7qsxkv3), List(_, _, _, httpstcomugqt6wpps), List(futureofldns, making, londons, waterways, conference, took, place, 20th, june, 2019, crps, director, susann, httpstcowlvwmojgvb), List(httpstcoaukvtnxkbc, big, sis, takes, squirt, thecadencelux, lillillylitxxx, httpstco3vcgukjzeb), List(road, work, sr, 86ajo, way, reduced, one, lane, way, overnight, week, 9, pm, 5, i1, httpstcogrg7wfxgi0), List(r, xoxox1004, sbryytsyo, httpstcosfcmvctpgi), List(321eshop, 3d, floor, sticker, wall, painting, stone, steps, small, road, bathroom, httpstcooqq7i1xcea), List(use, computer, home, without, help, clicksilver, participant, find, httpstcoummlcxkxin, httpstco3bdnr7cyxv), List(countdown, begun, 3500, attendees, counting, book, seat, today, httpstcofwitohrt7x, httpstco5lozt6cue1), List(new, review, biblio_kitten, enjoyed, fascinating, characters, clever, suspenseful, plot, good, girl, bad, girl, httpstcool4foppary), List(loreal, committed, making, profound, transformation, towards, lowcarbon, business, model, ambitious, httpstcockruj1lvys), List(httpstcorelcl9wkfe), List(cs01081912214, scored, 80, dont, click), List(rt, cheminahsayang, king, httpstcomjmawhsnnb), List(rt, halalcoholism, orange, man, actually, bad, tho), List(rt, cheesewhis, httpstcoxiiy5cn8fv), List(rt, hmpbarlvisitors, beautifully, written, piece, family, experiences, someone, prison, httpstcoha0xbomvas, hear), List(rt, trumpwarroom, kamala, harris, says, know, predators, fact, check, true, top, staffer, 14, years, accused, sexual, harassment), List(subwayyellow), List(rt, asg1956, growth, 8, core, sectors, dips, 02, june, near, 4year, low, times, india, httpstcov7fnelkbco, mr, narendramodi, ke), List(hashtag4, women, vintage, wear, work, elegant, business, party, bodycon, sheath, office, ruffle, female, dress, httpstcoy0utdq0qp9), List(iiateef, httpstcoe3fl5ayymb), List(1st, aug, 2019, holybible, httpstcoarognyesya), List(rayk_ea11r), List(rt, kangminehee, early, po, x1, debut, album, estimasi, harga, rp, 225000, sd, rp, 325000, harga, fix, bisa, diketahui, ketika, barang, sudah, rilis, b), List(rt, gameofryuk007, happybirthday, 23, 2019, 2019, 81, rt), List(rt, tradielover1, wow, nice, piece, meat, httpstcodfnzeprjjm), List(rt, meyou57449329, proudresister, racisttrump), List(rt, iluvmayward1, minsan, still, pinch, na, eto, na, ang, babies, natin, rubbing, elbows, stars, na, idols, lang, din, nila, da), List(rt, meyou57449329, proudresister, racisttrump), List(rt, weverseforbts, weversetrans, v, bts_twt, op, taehyung, still, exercise, days, v, dont, became, small, muscl), List(rt, thedlcc, 97, days, blue, wave, virginia, flipvablue), List(good, til, goes, bad), List(rt, br31_icecream, 31, 31, 81100, br31_icecream, rt, httpst), List(stream, itzys, icy), List(rt, x1picsby9, happy, august, everyone, minimalism_mh, kangminhee, produce_x_101, x101, 1014, x1members, httpstco9qgavjcxn4), List(huntingfield, wx, 0900, 187c, 10155, hpa, 88, hum, wind, 29, mph, wind, w, 00, mm, rain, magnacartabarons, magnacarta), List(0100, aku, 497d, 645, 2, users, load, average, 276, 300, 302, hydra, 497d, 818, 4, users, load, average, 006, httpstcougnz304vgw), List(screaming), List(vvsshae, showed, definition, man, whos, life, comes, lot, willing, someo, httpstcovkebxzsxvh), List(rt, championsleague, thebest, fifa, mens, player, 2019, winner, ucl, httpstcooskcrcpwst), List(youre, sweet), List(typicalgamer, typical, gamer, wake, alarm, went, start, anytime, hype, season, x), List(absconds, tes, grands, morts, httpstco7waakr8lyx), List(youre, sweet), List(cs01081912214, scored, 80, dont, click), List(rt, mydearjanx, httpstcocz4ioeqhhp), List(rt, shamindan1, sad, think, lost, world, identifying, country, birth, religio), List(tijd, 1000, temp, 188c, wind, 274, kmh, zw, luchtdruk, 10140, hpa, regen, 16, mm, wind, chill, 188c, rv, 8110cmtemp, 186c), List(rt, essexview, lbc, q2, rajar, show, expectjames, oboring, haemorrhaging, listeners, 10am, iromg, talkradio, inherits), List(feel, bad, bc, miss, past, time, right), List(dont, mind, wasting, time, thats, wasting, time), List(rt, kelseypaige__, dude, 10, seconds, sign, said, mom, im, fine, looking, like, fine, httpstcoqftisiiz42), List(rt, ewarren, read, mueller, report, day, came, clear, referral, impeachment, con), List(men, city, said, unto, seventh, day, sun, went, sweeter, honey, httpstcocxvapjesvh), List(time, machine, take, place, time, past, yo, httpstcohpeita15ko), List(rt, userbrook, hate, family, stops, giving, money, birthday, get, older, like, need, w), List(prowlingbeast, either, one), List(rt, babemayonnaise, 9, httpst), List(rt, myghowd, frangel, meet, skyiana, perfect, match, httpstcosj7htgnyqa), List(rt, george_t_bear, dodo, better, twitter, account, dodo), List(rt, psychologydoc, imagine, babysitter, telling, two, dogs, faced, deadly, cobra, keep, 1yearold, daughter, saf), List(rt, sujuwings_, sjofficial, blacksuit), List(rt, overflow_meme, passing, nuget, pack, arguments, azure, devops, build, pipeline, httpstcocc0120wcr1, httpstcocrhmoz2hdy), List(charges, redistributing, providing, access, work, 62376442), List(nazirahel, crime, junkies, receipts, soulful, convos, oprah), List(rt, heavenbrat, im, anyones, girlfriend, happy, national, girlfriends, day, u, pretty, bitches), List(looks, cool), List(bigdata, analyticsweek, futureofdata, podcast, juan, gorricho, disney, httpstcoz7piizgdxc), List(rt, ayeyocookie, love, ride, face, dig, face, deeper, wrap, ya, thighs, around, get, couple, ice, cubes, htt), List(rt, stfuiol, mom, l, 5, apples, amp, eat, 2, many, left, 4, mom, 5, apples, httpstcoze81zt4cbi), List(femi_sorry, time, got, proper, job, learn, much, britains, industry, gre, httpstcobk0saaixwa), List(6000, httpstcobqfnpwgfdr), List(rt, bananagoose62), List(yohanesaries_, askmenfess, astagfirullah, koko), List(melomelomew), List(httpstcogqdcywtcog), List(0948, temp, 186c, hum, 69, dewp, 117c, bar, 10175, hpa, wind, 0, kmh), List(rt, lotives, dating, like, finding, song, always, skipped, actually, fire), List(d7dea36a929764, aimaix, mera_dgsks, sachiel69, ww), List(rt, thefreakies, httpstcoovuzjduy6s), List(rt, angueira77, know, happened, fucking, love, wataru, god, love, ocean, man, spiritliliii, tigressceline, thank), List(rt, pxm__, interested, inconsistent), List(rt, daynnaflo, fuck, allllll, people, boat, smh, hope, boat, flips, httpstcoe77jt6ifmo), List(rt, may14thday, 181111, im, monsta_x, officialmonstax, httpstcovoznapf6y1), List(trouble, picking, password, try, lightbulb223), List(httpstcompdxdaf88p, look, made, wikipedia, news, httpstcobf5enw5fej), List(rt, jlist, today, day, rt, httpstcodmburm4kgx), List(theyve, named, stranger, things, characters, httpstco0tytwmgh17), List(traffic, tour, w, dj, canale, starting, listen, live, httpstcovjt4lcahtn), List(fortnite_br, xblcrossfire), List(august, 01, 2019, 0500pm), List(well, since, start, fabulous, huddersfield, food, drink, festival, thought, wed, httpstcol5c2usrula), List(great, see, magic, money, tree, hiding, standby, nhs, paperwork, finish, 350m, wee, httpstcof5tzfmd5al), List(conservatives, jamescleverly, borisjohnson, army, non, working, vent, jealousy, pathe, httpstcovvs1pn9qxy), List(rt, candicebrishun_, difference, everybody, dumb, httpstcohjxtnnqts7), List(rt, deer__lover, httpstcob4hxio6lnl), List(rt, tkollywood24x7, akshaykumars, upcoming, kaththi, hindi, remake, reportedly, titled, ikka, directed, missionmangal, fame, director), List(head, heart), List(rt, viivps, happy9thanniversarymb, httpstcofmvnngrgob), List(rt, hellsunderworld, people, act, play, rockmetalindie, bops, car, httpstcoqloky86zqk), List(irrefutablematt, ngl, definitely, live, rest, life, muumuus), List(lunaloverosies, httpstcozu166csyh0), List(rt, mynamesjannica, buggin, u, lyinnnnnnn), List(gun, shop, buys, 4, horsemen, billboard, insulting, squad, httpstco5ukejk1dpn), List(amadeus, finds, growth, businesses, outside, distribution, httpstcongxbkn5kig), List(go, see, doctor), List(know, uks, first, commercial, asphalt, product, incorporating, waste, tyre, rubber, launched, httpstcowuh5mppgqv), List(rt, asteroidxyou), List(rt, vt_cosmetics, vt), List(jale_ibrahim, bs, ny, xeyirdi, d), List(posted, picture, many, guys, dmed, calling, hot, asking, shit, like, feel, stupid), List(rt, lcveblanchett, baby, cate, uwus, httpstcok583gthlt6), List(hi, 1156836920997154816), List(sftstdmby, cuteplesss), List(biliary, transporter, gene, mutations, severe, intrahepatic, cholestasis, pregnancy, diagnostic, management, impli, httpstcodjbjrebzao), List(lynnieyankee, jockmcintosh, bodskiboodle, mum, learning, well, hoopup, communicate, without, words, httpstcoqdr82qt0rk), List(spotify, better, send, something, good, sue), List(individual, giving, amp, legacy, expert, brilliant, leader, want, work, amazing, charity, httpstco3pbjudaet0), List(rt, numazu_city_pr, 81, happy, birthday, httpstcobxkmrrqw7x, lovelive, 2019, 2019, numa), List(dadeen__, thank, mara, kunya, master), List(rt, abematimes, cinema, fighters, project, 51on, way, 1, abematv, abematimes), List(potato_crisp, fam), List(rt, amourkm13, eng, sorry, isnt, jikook, doesnt, love, kindest, considerate, king, jhope, httpstcofpvrzta3xu), List(rt, corncakegirl, ifi, pretend, gf, national, gf, dayand, u, take, datethen, get, know), List(another, day, august, 01, 2019, 0400pm, boy, lovemarriottrewards, mrpoints), List(rt, lizzyymcguire, aint, different, dating, someone, younger, stupid, httpstcozkii4xm40z), List(one, person, unfollowed, automatically, checked, httpstcooavohkszjz), List(rt, mateo_private, black, monday, big, deal, black, monday, mateo, subscribe, onlyfans, 4, next, days, y), List(threadiise, mxm, order, priority, day6, milan, httpstconrgofceqcl, dean, milan, httpstcoonsk2xdi1o, httpstcojgoxp5isfe), List(happy, birthday, one, honest, colleague, magjeee, long, live, prosperity, bro, httpstcofdzmfcs9ly), List(shut, tf, look, httpstcougfokrjf23), List(dean, arjagose, sets, pg, students, homework, take, small, social, risks, extend, friendship, encouraging, student, httpstcoyj9cfoti08), List(dont, take, chances, create, big, payouts, show, register, bet, lottostar, today, httpstcopznttwshg3), List(really, aikatsu), List(rt, 9gag, youre, attentionseeking, dont, want, obvious, httpstcotxubpsp0sl), List(rt, mtuan93, hows, everyone, today), List(rt, stevie_emerson, wolf, wall, street, getting, laid, httpstcocy6ejdsq08), List(rt, haywired50, sharing, lipstick, httpstcofs7crlmqtl), List(rt, doonungvonpai, 26, 96), List(rt, ninadschick, astonishing, headline, bought, brussels, little, irelands, ridiculous, leaders, landed, brexit, crisis), List(ladydurrant, borisjohnson, weve, already, wait, three, years, ample, time, prepare, theresa, mays, fir, httpstcol6st9s32a1), List(drawing, rihanna, listening, rihanna, httpstcosnslhnkfy3, httpstcosnslhnkfy3, httpstcosnslhnkfy3, httpstcogt5ojuec4l), List(websites, content, audience, relate, need, help, contact, us, httpstcolmejqgwkzc), List(come, taster, session, meet, barbri, representatives, answer, questions, bar, httpstcoo337rycupv), List(rt, shirtwhere, fyserias, kihae_129), List(rt, dessertmood, httpstcoat3zapx1kf), List(thermoaddict, bulu, ayam), List(tbc), List(rt, the_zemeckises, 2442, 44mm2), List(rt, bushi_creative, bkub_comic, http), List(rt, chayahyunxx, 555555555, https), List(rt, worrierprincess, one, time, trying, talk, hot, farmer, farmers, market, said, wow, love, merch, table), List(rt, msblairewhite, transgender, 3, year, old, like, vegan, cat, know, whos, making, lifestyle, choices), List(rt, allenakinkunle, buy, idea, must, constantly, push, body, near, breaking, point, want, successful), List(rt, afccommunity, without, arsenal, community, isnt, quite, without, community, arsenal, congratu), List(rt, jayerosex, monday, morning, httpstcowflcqu8mpc), List(loveiyjimin, ios, never), List(happens, determined, activists, meet, police, unwilling, file, firs, httpstcobofzu9lxox, day, 15, 16, httpstco6dxwxzrpny), List(rt, thickancreamy11, since, ya, love, big, balls, much, httpstco5u7mq8usf9), List(ian_bell, englandcricket, much, wish, still, team, ian, best, rehab, fr, httpstcoagtqwboy4q), List(people, think, tall, guys, see, orebes, carrying, know, devil, httpstcogdpxsudzay), List(poverty, actually, start, getting, eradicated, nation, day, politicians, hues, share, secret, astron, httpstcosdckuraoic), List(prepare, mornings, training, session, reflect, bold, new, office, paradigm, agile, lea, httpstco2mrusncq0c), List(justice, httpstcomnkc3a7unf), List(rt, weareoneexo, amp, closer, mv, behind, photos, sehun, chanyeol, sehun_chanyeol, exo_sc, whatalife, exo, w), List(rt, nipasan_tibikab, fortnitex, httpstconxoo5oavn7), List(rt, russiaactually), List(joetoshi2, blitz_stream, help, p), List(rt, thecakechancery, took, bike, ikolaba, dugbe, bike, man, started, singing, die, make, u, cry, e, jen, ku, e, je, nki), List(pete, bardensheart, hearta, httpstco8uwotixdg1), List(lugey6, absolutely, spontaneity, key, life, fulfillment), List(rt, is_salsu, retweet, expecting, august, come, blessings), List(alphabets, ai, might, able, predict, kidney, disease, smb, httpstcoslk2cwjdx0), List(rt, diorfred, terapia, cara, este, video, gratis, httpstcoacwcabeosp), List(rt, bringahappymeal, wanna, crunch, crunch, sum, ice, ice), List(rt, russdiemon, went, petra, today, really, wild, httpstconytjwzs2hp), List(rt, chloejm89, coss_birmingham, researchers, want, involve, policy, makers, projects, equally, interested, doesn), List(2, free, pathways, weight, management, programme, call, us, find, suit, best, 01709, 7, httpstcodxdsnhk12t), List(8, httpstcoysrpadyrrr), List(rt, tae_20130613, rt, follow, 10000, httpstco4p6vfuqw8b), List(prelim, aqi, 0100, 94550, pm2534, good, o19, good, cawx), List(rt, rynford_, sexy, love, httpstcoahqhj6uqbq, httpstcovdlas5heja), List(rt, cropout, u, ever, met, human, version, headache), List(rt, trackingsm, congratulations, proposed, httpstcotvmzcxgsf3), List(rt, koreaboo, happy, birthday, izones, fairy, chaewon, heres, another, year, health, happiness, happychaewonday, chaew), List(rt, xpressanny, rachelreevesmp, friend, labour, party, another, tory, labour, clothing, httpstcodsjjgb8vuf), List(rt, op109, edward, hopper, 18821967, house, shore, httpstconewpwqbn3i), List(0851, temp, 179c, hum, 82, dewp, 14c, bar, 10153, hpa, rain, today, 0, mm), List(isaiah, rashad, f, jean, deaux, menthol), List(kevgrab8, chic_vegan), List(another, hour, august, 01, 2019, 0500pm), List(avenue_kobby, funny, youve, ever, victim, mob, action, wont, even, joke), List(rt, scallysex, amateur, 100, bareback, california, shot, buy, guys, hooked, condomfree, humping, featuring, cute, young, usa, guys), List(international, news, diet, already, key, part, managing, diseases, like, diabetes, hypertension, new, resear, httpstco46edyzxvfu), List(airbus, says, germanys, ongoing, suspension, export, licenses, defense, equipment, saudiarabia, cost, co, httpstcoudxbtgpo8x), List(help, save, edna, fire, cat, plz, sign, httpstco1xnwfrkdsq, httpstcoemkyt2minj), List(httpstconl4cinziyi), List(rt, velumania, business, 1, dont, borrow, start, work, till, earn, 2, must, borrow, 10, times, profits, 3, borrow), List(lala_onefamily, im_nameless_16), List(kemphospice, thank, kemphospice), List(rt, tinabobuk, im, calling, people, pretend, anxiety, wait, bus, mum), List(rt, evilbart24, cleaning, ur, entire, room, sad, different, type, glow), List(rt, pixieely, middle, class, kids, cant, afford, college, get, nothing, fafsa, like, wanna, go, college, parents, m), List(rt, 182appleboy, rt, plz, apple, boy, 5, 6, 301set, 2000, 2days, special, gift, 8182, rt), List(tweeterofbabel8729031827703603201156836943763783680), List(rt, 1130_1026, httpstcomlgxkerzzw), List(lirryayewewan556420221156836944669749248), List(rt, asadprincess_, favourite, thing, people, remember, little, things, told, like, seriously, actually, listened, thank), List(rt, yoongibam9793, httpstco2goix7gliz), List(rt, 3c5u3r5r3y5, httpst), List(rt, ajakobian, let, tell, girl, httpstcokne90lzbp4), List(katenv_, thank, youuu), List(zallah__, ha, ha, ha, ha, use, something, hold, mouth, ooo), List(kristenbell, reveals, hulu, surprised, everyone, veronicamars, premiere, httpstcozemuvryxgi), List(rt, y2kcarti, watching, mum, cry, dad, cheating, knowing, woman, httpstcotvfqa2cuyo), List(hes, better, options, point, time, uyabona, cyrilramaphosa, trying, part, httpstcotgapp41uxp), List(job, alert, igc, currently, looking, contracts, assistant, based, london, hub, deadline, tomor, httpstcomofccozj4d), List(rt, natiilights, httpstco5awzg2suh9), List(ciara_rimando, hellobatikawsad), List(rt, idreamofghouls, thats, humans, came, existence, undoubtedly, case, incest, family, members, procreated, via, sex), List(rt, lucymdonoghue, insulting, labelled, dole, bludger, youre, aged, 65, workforce, paying, taxes), List(rt, zainaazra, mom, asked, dad, ever, got, plastic, surgery, dad, said, ever, feel, like, chan), List(rt, not_my_brother, congratulations, humahabib6, 10k, followers), List(1200, peing, httpstcoxhfrtpgn4y), List(rt, ebekun123456, oreratuyoi, esports, team, kwl), List(another, day, august, 01, 2019, 0400pm, one, day, lovemarriottrewards, mrpoints), List(rt, aceface4ever, httpstcoqjvcyu3cnc), List(genarayjdm, lolll, yes, 400, goodnight, crazy, ladylololol), List(rt, aco_peda, httpstcopzc8vdg45p), List(magicfirefox, wants, find, place, world, triss, like, kovir, detail, wa, httpstcousunh0todx), List(rt, h_zett_m, 64, speed, music, ep2, httpstcowce5q91eqp), List(bryceegibbs, buzzrothfield, moley, geeing, fool), List(qvadfeedsbtw, fortnitebr, dont, know, new, rust, lord, selectable, styles, well, see), List(super6, joinjeff), List(httpstcohu2lco4s4e), List(jongoofel21484817111156835684885250048), List(moneylens, investors, around, world, making, mistakes, steering, clear, httpstcobhxwzrbgsh, httpstcow3fygegkua), List(rt, _nojam_nolife, httpstco8rgso0rcps), List(rt, rina67708264, fairy, tail, itunes150020), List(rt, 4kumi0, fgo, fatego, httpstcon2rqt3czdb), List(alextweeterman27583134591156835685107523584), List(giveawayspanda, fazerasion, iphone), List(rt, bharathtarak99, requesting, tarak9999fans, unofficial, trend, today, evening, 7pm, get, ready, tag, reveal), List(rt, harveybookr, making, money, youre, sleep, one, greatest, assets), List(neoslaps, yeah, actually), List(cups, hand, around, mouth, fight, fight, fight, fight), List(rt, samarsamy21, __, ____, __, ____), List(rt, fnbrhq, season, x, loading, screen, fortnite, httpstcowh452r66jv), List(rt, lee_thanat, 1, tawan_v), List(kanaseeeeeeeeee, wwwwww), List(alexandria, ocasiocortez, agrees, antisemitic, comments, amp, justifies, palestinian, terrorism, httpstcop2kg2tyolt), List(sekerlibicay, gnaydn, hayrl, sabahlar), List(rt, _macrime, doe, department, employment), List(rt, soieange, favorite, looks, cushnie, et, ochsfall, winter, 2018, httpstcoihmvpi1cpf), List(rt, arjmxrell, pain, changes, people), List(rt, bainjal, know, zomato, delivers, food, cook, maybe, sheepish, bigoted, ignorance), List(really, post, robot, smut, since, robots, 2, interest, shota, ao3, page, shows, httpstcoqzbarqyzbi), List(rt, _tony116, told, anything, wants, asked, vanish, im, till, pained, httpstcousfiqe7vy0), List(genpact, hiring, inviting, graduates, customer, service, role, 26th, june, httpstcod7uboq8htb, httpstcoz0bj9sqrpx), List(httpstcoqqeg7l4ux3), List(hello, color, right, middle, every, luxe, card, greatgoodcreative, chose, sunny, yellow, seam, color, httpstco6uavdbtaws), List(rt, dnpreport, 1100), List(ryosuke_nea_, w), List(tsunderensung, likely, yeah, bc, weve, never, met, always, bails, meeting, person, theres, much, thats, right), List(rt, q_cupid, unicef, 60020), List(rt, justegerton, picture, httpstcoriq7jtv5bh), List(proman, acquires, temporary, employment, firm, epos, httpstcouu0z1u6fkt, proman, recruitment), List(nooooo, accidbelty, clicked, kn, ig, dm, bc, searching, sum, twt), List(thursday, 01, august, 2019, 0858, bst, temperature, 168c, wind, nnw, 3, mph, ave, 8, mph, gust, humidity, 86, rain, toda, httpstcouac3bfpcko), List(also, people, think, great, job, nothing, life, twitting, httpstcoi9scrnu98l), List(lynne47718354, tomfitton, judicialwatch, realdonaldtrump, jsolomonreports, tom, keeps, lying, plea, httpstcoxrzzfjlkzl), List(steps, store, transfer, backup, files, cpanel, httpstcodaauyf9mll), List(httpstcou1mkzq6sux), List(competition, time, celebrate, launch, brand, new, autumn, 2019, collection, bella, natura, givin, httpstcoyym1rpiba5), List(right, coastway, digest, brightonargus, talking, newspaper, hour, httpstcobkcsf0zmac, brighton, httpstcoctq8q4kyhx), List(httpstco8kh17bgeys), List(newprofilepic), List(rt, jenniferxramos, tomorrow, national, girlfriend, day, im, still, nobodys, girlfriend), List(im, desperate, crazy), List(sometimes, walk, away, want, find, deserve), List(rt, awwkim_, help, mom, gais, show, twitter, power, httpstco6l3osi8wgn), List(braillescreen, long, tweet, option), List(rt, tulag1996, ask, shark, gyms, thats, racist, httpstcoseiqsvf0wy), List(rt, tboywonder, dear, august, nice, tweet, fam, happy, new, month, fam, may, month, better, previous, month, august1s), List(etenergyworld, five, years, households, solar, water, heaters, double, pune, httpstcoaubagthbhs), List(national, gf, day), List(day, arrived, remains, risen, crypt, terrorise, uk, australasia, huge, thank, httpstcomztkzlifoq), List(seriously, good, opportunity, pay, want, tickets, still, available, first, time, ive, excit, httpstcopwlhhkn5sq), List(rt, sanrio_news, 810831, 6, httpstcoca8on9jhal), List(rt, americangayboys, watch, fulllength, hd, helixstudios, twink, videos, forjust, 295, httpstcoutzb2yem4z, euroboys, httpst), List(rt, real_defender, realdonaldtrump, trump, greatest, supporter, military, president, ever, seen, absolutely, wond), List(httpstcobg0q1gkp47), List(rt, bewafa_tum_, many, worldwide, followers, want, 100m, 200m, 300m, 400m, 500m, 600m, 700m, 800m, 900m, 999m, reply, hello), List(rt, sugar_fairy_), List(current, time, august, 01, 2019, 0300am, oclock, morning), List(njhighways, amazing, suggestion), List(rt, shounantk, 5, 10), List(hattanast, d7em_otb, 9toty, cfc_rashed, httpstcom6asgmfato), List(rt, muralikrishnae1, yes, historic, day, one, step, close, uniform, civil, code, one, nation, one, law, day, mourning, tu), List(torihiromi, dm, httpstcom49gawckxy), List(rt, pdoctortomy), List(yasusunsun3, www), List(u, nobody), List(rt, thestandardth, 26), List(httpstco5ofm0wnvqw), List(thoyeebarh, follow, back), List(rt, olufeazy, zitudiary, raelsammyworld, amen), List(barry_scharneck, caster, internal, testes, female, reproductive, organs, possibly, argue, caster, female), List(rt, jpnstuffs, dont, forget, pokemon, trainer, collection, releasing, saturday, preorders, available, httpstcoslorquqstf), List(rt, 904skrilla, yall, please, watch, beat, ass, httpstco1zny1qoixp), List(rt, nanciemohamed, take, notes, future, husband, take, notes, httpstconiszyp3fl9), List(20x3a, whats, song, name, singer, please), List(rt, bgsnezana, squirrel, spa, credit, babygirlbabysquirrel, httpstcoibssqfrpkt), List(ahh, love, content), List(cyberpunk, 2077, hong, kong, dlc, looking, dope), List(rt, _a1dan, fortnitegame, prove, im, legit, giving, everyone, guaranteed, working, code, retweet, like, f), List(lil, kesh, nkan, nbe, ft, mayorkun, published, latest, nigeria, news, httpstcoeuno9hfzcd, httpstcohgngds5j0b), List(34fgo, fgo, fgo), List(rt, aaaaagghhhh, httpstcoctmmku4try), List(rt, bobprice101, please, sign, petition, introduction, cyclist, road, tax, fair, go, towards, paying, reduction), List(_alrightbutera, girl, imma, keep, grinding, dm, limits), List(rt, cctv_idiots, teamwork, rt, buitengebieden_, httpstco3acxloz6w0), List(kunertantje, similar, reason, find, difficult, stay, connected, brother), List(rt, stfutena, dont, realize, convenience, get, 1010, httpstconeenjezycy), List(rt, meluvfrotatl, current, mood, httpstcookejiyfjtc), List(rt, keziahcole_, ncforlife34, church, hospital, judge, whos, attendance, job, restore, lives, equip, saints), List(mbidaiii, atiku_b, omojuwa, hi, deserve, follow, kindly, follow, back, thanks), List(boffman_, soldier, youll, get, promised, land, ive, cigarette, free, since, october, 17, last, year, mana, httpstcoxmjkcvfrvx), List(rt, notcapnamerica, really, thought, white, woman, put, law, hate, see, httpstcokixvash6pv), List(rt, mudpiefridays, ad, blog, today, talking, relationship, sunscreen, signed, sunsen), List(rt, okamburrr2, ahhh, bitchhh, august), List(rt, carolineglick, shame, kamalaharris, indifference, innocent, blood, man, spilled, crown, heights, thi), List(also, figured, today, u, talk, screen, recording, u, turn, mic, im, wow), List(nybr_11, httpstcoulf7xtpgym), List(httpstcoyzv4orbwcn), List(littlefoxyc, womanfeeds, di, shopee, banyak, yg, jual, kalo, offline, store, nya, aku, kurang, tauu), List(rt, shahfaesal, appeal, jammu, youth, httpstconnzxv4qejg), List(rt, _itsmissbre, skinny, httpstcoc7hxrxxijm), List(rt, afneil, scores, celebrities, rich, arrived, sicily, google, conference, came, 114, private, jets, flotilla), List(womanfeeds, jennie), List(rt, alessabocchi, hong, kong, protestors, another, level, theyre, using, lasers, avoid, facial, recognition, cameras, cyber, war, aga), List(biscvit_, thank, much, ginger, snack), List(rt, sazzymanny, nephew, sia, httpstcosjdmbdnqhc), List(rt, halsey, know, good, die, young, must, better, think), List(rt, starlightseve, parents, saw, wanted, get, crib, httpstcocniqodl0h2), List(leeseojin, jakomo, jakomo, repost, jakomo_style, httpstcon2wkel1zut), List(rt, uenozoogardens, 729, https), List(rt, iambobongquotes, silly, thought, actually, cared), List(natsuki_s_sh), List(atrusto1, americans), List(rt, divine_oketa, place, christians, come, worship, godohhand, also, gist, friends, havent, seen), List(fnbrleaks, flint), List(rt, na_tha_niel, create, another, app, called, sco, pa, tu, manaa, yall, leave, twitter, go, sco, pa, tu, manaa, hell), List(rt, userbrook, hate, family, stops, giving, money, birthday, get, older, like, need, w), List(youre, convinced, interactive, content, b2b, space, work, well, first, check, interac, httpstcorapivwybth), List(hi, august), List(rt, karinabuddies, 150, replies, tagline, kaya, ba, karjon, ouronlinechoice, mskarinab), List(rt, 2ton89hina, httpstcoyobb9fboek), List(natsuki_s_sh, _c), List(rt, br31_icecream, 31, 31, 81100, br31_icecream, rt, httpst), List(nefestetl, gnaydn), List(yorkshireday, never, north, yorkshire, point, may, come, close, going, vancouver, shocking), List(with_bravado, bravado, friends, playing, mpg, radios, innovative, music, mix, httpstcopq26avjoqk), List(rt, nse_plc, excited, joined, nse, ibuka, program, strong, intent, one, kenyas, remarkable, listings), List(190801, 12, sujis, ig, post, 5, trans, precious, bond, made, 5, mont, httpstco6xdoo6h9zp), List(ikaw, paren, naman), List(rt, cheminahsayang, my_crimewatch), List(food, fantsy), List(rt, im__pine, 190731, httpstcosv9yrx38cf, httpstcoa6twi4ks52), List(rt, united_cinemas, 41500, united_cinemas, rt, httpstcojmrl6fyfay), List(lots, shit, black, thats, fact), List(go, even, heart, still, want), List(rt, hanosuke, 7, httpstcop1qvpkyhfd), List(rt, yuha_twice, a4), List(rt, rinkimikaaa, restingkpop, triciaxsalcedo, httpstcoddfgg5rw0x), List(rt, yasminayuni, something, crazy, hair, today, fell, love, httpstcovjdmesbzez), List(rt, mirzaaznin, math, easy, right, teacher, httpstcofuyc0spqib), List(rt, mrkenshabby, schools, area, talking, close, certain, days, due, lack, government, funding, httpstcomdkmy), List(hooooooy), List(ascunn, hi, andrew, clicking, start, private, conversation, rebeccah, httpstcoronopr15qb), List(owenpaterson, trouble, remoaners, choose, see), List(maybe, day, wont, floods, traffic), List(nowplaying, metal, meyhem, radio, right, good, riddance, time, life, greenday, httpstcohvz1ddersj), List(next, welcome, pompey, montgomery, waters, meadow, opening, skybetleagueone, game, ticket, sales, hav, httpstcov9kgqfkoy9), List(india, doctors, protest, proposed, law, allowing, nondoctors, practice, medicine, legalnews, legalupdate, india, doctor, httpstco0kimymrsst), List(rt, tiinexile, disclaimer, dear, folks, free, choice, eat, drink, wear, whatever, want, id, never, judge), List(rt, dailytaykupdate, jail), List(nel1011, toasted, stained), List(slutjln, bruh, tell, poor, sod, ur), List(coys100, nlahamilton, thinking, hoping, friday, give, people, feel, good, going, weekend), List(weeeeeesh, mest, dj, arriv), List(rt, brazilegend10, 1981, paris, bob, paisley, lfc, wins, third, european, cup, 5, yearsferguson, won, two, 22, years, lfchistory, lfchist), List(hope, nurse, work, one, day, share, experience, get, along, well, httpstcoksolfxuioe), List(httpstcow6ulpqb8hw, httpstcow6ulpqb8hw), List(httpstco54qabwb1s4), List(keshika_su, wwww), List(aish), List(rt, tulirosak, im, addicted, kaorhys, angaugustoko, kaorhys, angaugustoko, kaorhys, angaugustoko, kaorhys, angaugustoko, kaorhys, angaugusto), List(rt, ddm_the, bj), List(rt, marsnightcore, illumi, ignores, bodys, warnings, needs, remove, shapeshifting, needles, begin, reject, like, bad), List(oyebandhu, bandhureviews, okay, oyebandhu), List(rt, zackfox, freaky, bed, httpstcool8nuygcv7), List(rt, stfuhurt, hate, fact, dont, start, conversation, wont, one), List(new, profile, pic, credit, loli, gang, throws, gang, signs), List(rachels, wheel, spin, buy, emily, bottle, wine, believe, treating, people, way, like, httpstcohiqrlzavkk), List(theofficertatum, diamondandsilk, realdonaldtrump, everybody, opposes, democrats, racist, fascist, httpstcov9ul9pvcw4), List(biggest, fans, week, andrew_s_hatton, thank, via, httpstcouy8cdhhpbj, httpstco0pdjfnvlb3), List(rt, visunavi, diaura, 81ndgofficial, siteofficial, funclub, 1016finalelast, rebellio), List(rt, rekltwi, 25, httpstcorfab6edtic), List(rt, layzhang, stay, cool, everyone, icecream, httpstco3poxqr6pjl), List(luis05379469, masochistdevil, follow), List(rt, _ameangirl, niggas, destinys, child, tryna, tell, us, cater), List(rt, ferrifrump, dear, bbcarchive, enjoyment, clip, spoiled, fact, ended, please, httpstcoq3fa8w), List(rt, taylorswift13, guys, stop, smiling, im, trying, loud, 652, million, views, 652), List(rt, ippatel, gua, rakshak, gopal, killed, peaceful, cow, butchers, tried, stop, cow, trafficking, hodalpalwal, haryana), List(rt, tentayuriu, still, sketch, refinement, yes, corset, piercings, im, madly, love, idea, v), List(dis, niggka, blueface, fooo), List(rt, takyahla, something, mak, something, something, laundry, something, satanist, httpstcor0wgxrj63s), List(rt, trueeyethespy, oral, arguments, epstein, take, place, oct, 28, 2019, million, pages, evidence, review, j), List(rt, rollajabi, woke, 640am, got, ready, school, scraped, wavy, brown, hair, ponytail, looked, mirro), List(grecya_mx, ranferide, nombregracias), List(hitdaweed_dee, sleep, imma), List(bjkwqghukgc7pwj, amphttpstconfourk2nwn, httpstcowsqfrgx9fq), List(ramuda_mushi), List(jessica, driving, hard, bargains, pearson), List(rt, taeyeonchart, gaon, digital, chart, 20190721, 20190727, 1, 54157121, new, 23, four, seasons, 16187081, total, 55), List(askmenfess, penting, banget, banget, banget), List(rt, nctsmtown_127, nct, 127, takes, singapore, 1st, world, tour, _nct, 127, world, neocityinsingapore, nct127insingapore, nct127totheworld), List(ethandolan, ooooffffffff), List(rt, yahoo_weather, 7, 4, httpstco6dgcii9izy, httpstcodjnuaw5gbl), List(rt, joy997fm, mahama, promises, new, regions, six, hospitals, httpstcozqnwafuuef, joysms, httpstco15aqg9gieo), List(lucas7yoshi, wont, work), List(cheatingchi_chi, httpstcozkfupfry5o), List(rt, brahmslover, bstbs1930), List(rt, hhana_a, rt, dm, foreverything), List(rt, blxcknicotine, maybe, hard, love, love, hard), List(limestone, hills, weather, 800, pm, temperature, 4, wind, 0, wnw, daily, rain, 20, pressure, 100440), List(rt, clevvertv, big, happy, 21st, birthday, bretmanrock, inspires, prettiest, version, every, day, httpstcoyffhpbx), List(bloom_now_, w), List(100tdogbone, rate, 5, stars, awake), List(rt, nataliebots1, scale, 110, bitch, 11, httpstco7uipkgoya3), List(rt, redmakuzawa, one, day, every, year, post, httpstcotocszxpriy), List(yall, doin), List(rt, guyfieri, need, know, got, overalls, lilnasx, httpstcoot6eceshy1), List(rt, kopper_ls, amp, settle, le, kopangoa, keng, hun, really, first, httpstcodmlvxrrt2q), List(httpstco575yx5cg0o, share2steem), List(rt, soleyjararts, never, stop, talking, much, love, skinks, thicc, throated, boy, red, sided, skink, name, peek), List(break, duluaugust, 01, 2019, 0300pm), List(sadnibbi, hmm, married), List(rt, olori_joszedey, parents, give, time, children, listen, encourage, call, beautiful, names, dont, let, anyone), List(rt, derekrays, yang, genius, spin, everything, back, ubi, america, never, better, people, jobs, money, limited, governm), List(rt, ukskies, crazy, saw, thegreathack, last, night, thank, ian, great, host, profound, film, ive, ever, seen, wake), List(biggest, fans, week, tunjiogunoye, _shyone, tomiiide, thank, via, httpstcorutqaqlmzu, httpstcow7hdwf7hyn), List(naked, protected, time, purchase, whitestonedomeglass, thebest, screen, protector, phone, amazonus, httpstcof2s07v6tzx), List(bad, apple, feat, nomico, no1pv, pv, ex), List(rt, beamkawee, d2binfinity2019, httpstcov5i1ecs0ud), List(rt, shannonjessie, stop, available, people, dont, even, know, want), List(rt, exile_news__, generations, live, tour, 2019, httpstcovieghars0k, httpstcooqepfnausy), List(jhonzou22, screen, x, ha, httpstcogfm4ymfkpy), List(gomorezvidinha, breakthrough, series), List(shygothexhib, sideboob, public, aereas, people, near, religion, shea, lexo, sister, beauty, httpstcoa6klcaqmbe), List(bold, saying, one, comes, close, yoongi, except, like, im, bold, cause, facts, httpstcom6dmw8sk4g), List(several, factors, lead, wildfires, including, humans, weather, volatility, extreme, conditions, httpstcokv2it0xcob), List(1gottlob, comment, think, shit, httpstco7tidmw1gvz), List(areamfs, mbin), List(give, eta), List(august, 01, 2019, 1100am), List(rt, kyono_iyashi, httpstcoth62a7tw09), List(rt, aoshimamegu, httpstcozutgdphkwk), List(rt, btsw_official, bts, world, notice, maintenance, finished, purple, day, bts, w), List(rt, eevapaavilainen, kyungds79, iyunjunho, hanwooabi, hwangjuhong, stop, dividing, dogs, pet, dogs, amp, meat, dogs, please, pass, 1), List(woeg, july, profitable, glad, shit), List(rt, pmnewsnigeria, help, military, fight, insurgency, lawan, httpstcowt3ycw1pwa), List(rt, lolde_acid, httpstcof9lfrsv04d), List(rt, fanbookofficial, nct, dream, boom, httpst), List(rt, rozierhistorian, daisy, puts, bed, like, least, three, four, times, day, finally, caught, good, camera, nev), List(rt, osasdreal, got, nice, voice, pronounce, name, else, sound, like, one, police, terrorist, group), List(rt, kshitijwrites_, less, life, law, take, bow, arvindkejriwal, pehlehalfabmaaf, httpstco), List(ivtherapy, reduce, pain, often, something, recommend, higher, doses, vitamin, c, also, lend, lowering, httpstcouoztjjvcwg), List(di, nagrereply, si, ibasco, oh), List(willblackmusic, tobokerun, jyaneeeee), List(rt, arisaralive, satien_pptv), List(sanpabloking, marcjiws, cheers, hela), List(rt, withpun027, ig, jennisbnk48official, punbnk48, jennisbnk48, bnk48, withpun, httpstcoim2pbkfg), List(rt, evertonrshite, hahahaha, wtf, shite, hahahaha, httpstcomt5mbaaceq), List(rt, fanbookofficial, nct, dream, boom, httpst), List(virgin, ass, self, singing, along, tweaking, hair, tie, like, relate, issa, bop, wtf, gabcake, nikidemar), List(pbenavidez19, shut, patrick), List(rt, hlebbyke, im, glad, dont, child, mekshubakushubela, mina, ngedwa), List(rt, ippatel, food, religion, halal, meat, deliver, hindus, zomatoin, httpstcoqfyaj1puj5), List(rt, fanbookofficial, nct, dream, boom, httpst), List(nathanielblow, amen), List(amid, complaints, fayette, county, gop, takes, dartboard, pasted, pictures, democratic, squad, pittsburgh, po, httpstco8qniuve4ez), List(chrismurphyct, many, things, disagree, mr, murphy, one, things, get, co, httpstcouewdatqmmv), List(event, internet, things, business, model, disruptor, join, us, half, day, conference, explore, iot, httpstco8kwjgmuagp), List(rt, oricon, yoshiki1000, httpstco9terqqufjs, yoshiki, yoshikiofficial), List(gwapongalien01, ay, wait), List(rt, 1stindianews, jaipur, strike, rajasthanonfirstindia, rag), List(isabellapinedaa, dm, gotta, ask), List(2, 56, httpstcoesccs6qtfn), List(rt, noelaquino, dansantos8, binays, enrile, gringo, digong, original, dilawans, back, days), List(rt, cupcake_aisyah, reason, sleep, days, get, see, dreams), List(hydrangealove, walked, past, previous, clients, front, door, yesterday, welcome, home, end, day, h, httpstco0pn9oal3cj), List(providing, quality, ed, millions, students, sea, topicaglobal, edtech, edtechatscale, httpstcobqz3mttoza), List(rt, ygjapanofficial, ikon, jay, amp, june, jayampjune, ikon, photo, magazine, https), List(rt, _peachxslate, super, daddy), List(rt, fanbookofficial, nct, dream, boom, httpst), List(rt, starcdnpoli, heather, scoffield, signs, mounting, competitive, protectionism, stay, httpstcoqfqa42ydct), List(rt, jbknockout, mom, church, pastor, talking, disobedient, children, httpstcohcfwxodqef), List(2, mil, wey, yo, si, uwu), List(carolinelucas, guardian, give, china, india, turkey, call, let, us, know, say), List(prettyylele, kingstoney_, understand, point, 100, valid, im, chilling, though, im, wondering, httpstcosiabd0hlzz), List(wsheepdog), List(biggest, fans, week, brenso_, thank, via, httpstcojxkuyzaaqf, httpstcopclbwtg5lj), List(cold, drinks, shopcooking, games, app, apk, download, android, amp, iosphones, httpstcoio693zvdo9, httpstcovji6vygnhq), List(check, current, temperatures, across, newjersey, fios1news, weather, httpstcopfndfgmytd), List(rt, mfspedia, mfs, temenan, yok, rtrep, ya, jfb, jgn, lupa, irene), List(rt, yourdimplegirl, httpstcos4spuachin), List(alwaysgoodluck, www), List(trabzonspor, cas, ikayet, edin), List(rt, minorujoeling, feed, got, strength, stabbed, later, chenqingling, jinguangyao, lanx), List(keithboykin, arlenparsa, uncivilized, monkeys, whats, racist, telling, truth), List(rt, sjaju1, autonomousunmanned, systems, huge, economicstrategic, potential, negative, spinoffs, nuanced, approach, needed, stress), List(rt, ggiittiikkaa, librat, refused, board, uber, cab, angry, hanuman, sticker, hailed, champion, sorts, bha), List(rt, celebinterviewr, thank, devonfranklin, starting, week, prayer, devonfranklin, httpstcov6wyvkvxhb), List(rt, lucyzodion, smart, street, lighting, making, switch, connected, streetlights, help, unlock, potential, future, cities, int), List(chemistryworld, usnistgov, first, clear, current, co2, measurements, good, enough, confident, theyr, httpstco6bc6osivcc), List(marvellous, cant, good, sunday, lunch, without, food2remember, visitmalton, maltoncookery, httpstcoymllucugxr), List(rt, a__chxn, 81, 1800, 3000, 1500, white0314kouya, a__chxn, rt), List(youtube), List(another, hour, august, 01, 2019, 1100am, httpstcogpacxk38zi), List(httpstcoloa1icncun), List(ayinyanng, ikaw, pa, talaga, ang, nagyaaack, ha, huhuhuhuhu), List(rt, ms_thiccgiselle, destinys, child, know, soldiers, six, year, old, ass, httpstcokpyn), List(rt, xxl, needs, corrected, expeditiously, drake, album, make, httpstcovzl8qdyu2c), List(rt, eleanorxneale, bold, assume, earth, wont, die, im, 30, httpstcovdnoyeghjt), List(mcd, one, section, ilead, youth, summit, felt, moved, something, ive, never, seen, done, asked, httpstcohafcayznri), List(shit, necessary), List(24, 4, httpstcopf3tsr7lky), List(get, found, using, inbound, marketing, httpstcodar3pwwywk, wealth, enthusiast, httpstcobezs0axy3z), List(august, 01, 2019, 1200pm, httpstco7hhwfd4iq8), List(biggest, fans, week, twiztidmojo, stephen34184311, hsmith198730, thank, via, httpstcot9xqcoi15u, httpstcoed4ddbouir), List(httpstcoc10wxoq552, httpstcournlypg6o5), List(biggest, fans, week, hullhour, philwhiteradio, judson_ian, thank, via, httpstcov7whhbijsn, httpstcocww52wvlbh), List(f1, furutachi_bot), List(rt, masteraqua_txt, dear, mickey, afraid, sent, ipad), List(rt, lilyskeery, dont, choose, gay, watch, video, zendaya, makes, gay, httpstcoykrhsstcuw), List(rt, sir_saydaat, son, approves, httpstcoiztyuq0lju), List(rt, realdonaldtrump, soon, time, choose, keep, build, upon, prosperity, success, let, go, respected), List(time, go2019, tell, us, tales, amp, share, topic, suggestions, ai, zettabytes, skys, clou, httpstco8blidt8agx), List(onmyoji, arena, httpstcoglipsxnlf6), List(strive, provide, outstanding, learning, environment, connected, community, celebrate, cult, httpstcofynl1sgolt), List(18, httpstcosonargtcyr), List(xa, plsssss), List(httpstcoksuea5ph5j), List(islamic, extremely, worst, nightmare, probably, football, cant, even, mention, know, power), List(rt, solelunastro, sun, leo, moon, leo, mercury, cancer, venus, leo, mars, leo, take, whew, wh), List(give, up55555555555555), List(rt, idillionaire, august, going, breakthrough, month), List(rt, doctor_oxford, 21, billion, spent, 420k, hip, ops, 45k, nurses, 28k, doctors, 2100, ct, scanners, 6, entire, nhs, hosp), List(poshpresh, poshpresh), List(szavante, aint, tired, yet, ass, never, sleep), List(rt, _hermie29, humanda, ka, na, reign, makukulong, ja, na, itssophiealbert, bihagrebelasyon, kapusobrigade, encabattalionkb, httpstcoyb3xhm), List(rt, pierrebotte1, feel, good, black, httpstcodx4dfeg7gs), List(shake, like, coppa, x, currency, risenjam, thegoodmorningshow, w, wongiwongi, x, e_bukkie, x, httpstco372yngcjpi), List(rt, cxmeronmacc_, facebook, gift, keeps, giving, httpstconref4syowo), List(100correct, sadly), List(refresh, pressure, cleaning, google, httpstcoj7vv2olrbw), List(biggest, fans, week, jenmishy, sorrowfuldean, cozmiccowboy1, thank, via, httpstcoizzh3o5fjh, httpstcocoki5c41b2), List(dhivehinge, leybonee, storm, dhiraagu, silentgethering, friday, 16hrs, 2august, batelco, httpstcowrgyvqmbkc), List(notaluckygrl, astagaaku, kira, isinya, yg, sama), List(aysebulut334, merhaba, iyi, gnler, dilerim, aye, hanim), List(rt, visunavi, diaura, 81ndgofficial, siteofficial, funclub, 1016finalelast, rebellio), List(another, day, august, 01, 2019, 0400pmlovemarriottrewards, mrpointsrenhotels, mrpoints, autographhotels, mrpoints), List(rt, nctconfess, nctea, haechan, hi, nct, 12dream, httpstcopdivtktsk2), List(rt, guclunecmi2, paths, leave, two, strangers, matter, hard, hard, everything, yes, must, forget), List(hello, august, weve, expecting, httpstcoehqrsoqyq4), List(rt, greying_, httpstcobjxv1xfi8f), List(rt, fanbookofficial, nct, dream, boom, httpst), List(rt, lowlifekev, mood, cause, life, really, beating, ass, httpstcoamvpi97dao), List(rt, callmeshylo, praising, furu, call, top, didnt, rt, baby, put, hard, work, give, u), List(new, networking, group, ipswich, growing, want, attend, httpstcotsqsnd8zqa), List(__fy), List(rt, fnjpnews, x, httpstcowugsmyou45), List(lovilyguks, gooegie, lets, go, clowns), List(rt, bradbirda113, 20, years, ago, today, iron, giant, premiered, graumans, chinese, theater, hollywood, ig, team, proud, wed, ma), List(rt, hottestcapital, thinks, handle, add, snapchat, show, httpstcotkml3urpq1), List(rt, cl69420, last, two, brain, cells, trying, math, bretmanrock, nikitadragun, httpstcorjmwqt6f8t), List(dogmanrespecter, examiner, underclothing), List(rt, fnbrleaks, vaulted, items, season, x, ballers, quad, crashers, flint, knock, pistols, shadow, bomb, semiauto, snipers, tactical, assau), List(rt, opsylabs, easter, around, corner, please, patronize, mom, stay, shagam, amp, environs, sells, fruit, winesjuicesoft), List(rt, oricon, yoshiki1000, httpstco9terqqufjs, yoshiki, yoshikiofficial), List(rt, 054758373, httpstcoj46qm6xxlk), List(ow_pote, lynn_grbr, blackpants34), List(rt, canpnn, amecha_0312, httpstconmvgza1b), List(mashaallriaz, rhubarbsncarbs, umm), List(rt, cimmarley, gnaydn, httpstcoknaw6g9d5h), List(rt, pyenon, khemsterven, cute, boy), List(accessmostareas, janinemayjames, british, courts, dont, use, gavel, auctioneers), List(phonepe_, totally, server), List(rt, muzvarebetty, lest, forget, lost, young, men, women, 1, august, 2018, zimbabwe, army, responsible, loss, lives, thi), List(rt, goal, fifa, revealed, nominees, thebest, mens, player, cristiano, ronaldo, frenkie, de, jong, matthijs, de, ligt, eden, h), List(rt, adamconover, bizarre, myth, basis, medical, science, trivial, disprove, modern, research, tells, us, women), List(rt, ashtanofcpang, fave, batch, ultimate, threats, tanredroncal, ashdlmundo, ashtan, isourchoice, ashtan, isourchoice, ashtan, isourc), List(rt, tolusaba, shut, absolute, fuck, httpstcofftrx2gfnh), List(free, newspaper, wordpress, themes, news, magazine, blog, websites, httpstcojs4cghhnnj, wordpress, themes, httpstcot8m0euoegj), List(httpstcofww1ownyvc), List(rt, unsubtledesi, ubereats_ind, tweeted, saying, zomatoin, said, food, religion, uber, silent, liberal), List(rt, btsvotingcrew, ltvideo, music, awardsgt, dont, forget, vote, boys, vmas, 10, votesaccount, per, day, power, hour, voting), List(ios, 13, beta5, 10, httpstcolynx5he1nq), List(rt, blissbooksph, another, prince, arrived, prince, hell, devils, stolen, heritage, vixenneanne, available, https), List(back, regularly, scheduled, programming), List(rt, inspirestagram, dont, afraid, losing, someone, doesnt, feel, blessed), List(ladyjubes, ever, penthouse, living, wont, come, anyone, else, except, wont, eat, anyone, else, p, httpstcocuamflfmh9), List(rt, tidal, official, nickiminaj, remix, popsmoke10s, welcome, party, look, tidalxpopsmoke, https), List(rt, dailybennet, stills, chloe, bennet, daisy, johnson, two, part, finale, agentsofshield, airing, friday, 22, httpst), List(earlier, month, essex, hosted, national, landscapes, life, conference, bringing, together, voices, representing, httpstco65iwnznpoh), List(rt, savvyrinu, august, filled, blessings, august, filled, blessings, august, filled, blessings, august, b), List(aklile_solomon, fear, forming, separate, party, may, marginalize, womens, political, participation, push, elect, httpstcosbrxtknux7), List(greyhound, station, 4, jerry, springer, marathon, playing, monitors, food, options, httpstcozizkfego9o), List(instagram, httpstcoieplcezi0y, backgroundmusic, royaltyfreemusic, musicformedia, musicforvideos, httpstco4fsau9kzpu), List(biggest, fans, week, itvcorrie, bairstowliam, zoraidapalacios, thank, via, httpstcovyj8ccsrwp, httpstco9wwjlqsbj7), List(rt, shino__hajime_, 1rt_, 2rt_, 3rt_, 4rt_, 5rt_, 6rt_, 7rt_, 8rt_, 9rt_), List(gs), List(rt, layzhang, stay, cool, everyone, icecream, httpstco3poxqr6pjl), List(rt, selenacarti, miss, pnd, httpstcowxpapdbwm5), List(rt, llcupidll, 1, mtvhottest, bts, bts_twt, https), List(rt, ultmino_, found, 8d, versions, minos, im, mino, httpstcoxu5kxzyqah, httpstcou4kzcf4p9y), List(rt, erodingarchaeo1, exciting, day, amazing, archaeology, swandroorkney, excavation, complex, multi, period, site, destroye), List(lizkershawdj, nicolasturgeon, narrow, minded, tweet, liz, head, hot, wash), List(rt, weverseforbts, weversetrans, jin, bts_twt, op, world, wide, lot, golden, hands, talented, people, jin), List(rt, gisexllee, told, boyfriend, show, pictures, outfits, ordered, sure, expecting, httpstco9), List(aar494, problem, constantly, linked, players, causes, hope, desire, seeing, thes, httpstcovppvhyktpr), List(rt, fanbookofficial, nct, dream, boom, httpst), List(einstkells, chanez_nene, bbnaija, stood, leave, going, ease, himselfshola, wrongother, hoh, httpstcocbfirql5uk), List(throwbackthursday, former, employee, bill, millss, restoration, downton, erf, bill, showcasing, fully, restore, httpstcocb4ri65sgr), List(teaqah, quote, tu, lebih, kepada, nak, tayang, muka, actually), List(rt, autotls2, tls, mutualan, yuk, rt, ya, fb, jan, lupa), List(fortnitebr, excited), List(rt, dailyhangyul, cheekies, httpstcocdoi1xjo7m), List(rt, tundetash, ever, wanted, score, resume, see, dashboard, analysis, presentation, errors, amp, key, pointers, improv), List(par1395, httpstcovyamed0ecs), List(rt, syas_ow, never, forget, still, scared, httpstcozsxztjjhsr), List(latest, big, time, city, daily, httpstcop3lj7b3ula), List(rt, lkmats, highest, national, debt, ever, boy, trump, really, good, bankruptcies, hes, good, robbing, people, give), List(percentage, uks, exports, imports, australia, count, exactly), List(bidmake, im, talking, tweet, making, sense, even, get, 56, predicted, things, httpstcoasjw9w22zc), List(day, 22, spotify, psycho, stayup, ice, queen, amp, diamond, bayagnibaekhyun, b_hundred_hyun, layzhang, httpstcogwyombiml4), List(rt, vishweshwarbhat, good, move, chief, minister, bsybjp, today, paid, surprise, visit, offices, ground, floor, vidh), List(rt, trodrawle, httpstcooatsuq2lff), List(rt, fourthraybeauty, family, affair, giveaway, welcoming, solbodyco, family, giveaway, enter, win, face, mi), List(rt, tatjana_loor, xo_mids, im, stealing), List(rt, btsanalytics, blood, sweat, amp, tears, joins, dna, fake, love, boy, luv, become, bts_twts, 4th, song, surpass, 200, million, global, sp), List(rt, rapplerdotcom, kathryn, bernardo, alden, richards, receiving, congratulatory, messages, abscbn, gma, 7, celebrities), List(rt, brgnhp, thread, jr), List(rt, fanbookofficial, nct, dream, boom, httpst), List(mmm, chips), List(bbcworld, defector, trusted, planted, spy, absolutely, wierd, leave, without, co, httpstco1h9h0fijqa), List(rt, ldontgiveafuck, sex, intimate, sacred, body, temple, shouldnt, share, anyone, doesnt, listen, kehlani, h), List(im, playing, identity, v, fancy, game, httpstcooctjbldq5t), List(rt, weverseforbts, weversetrans, jin, bts_twt, op, see, bts, real, life, least, regularly, every, 6, months, many, m), List(rt, youthawc), List(rt, gretchenunder18, whatsapp, hookup, cool, girls, interested, drop, number, dm, group, link), List(gtjooajnzzuugde, gamewith_fn, youtube), List(rt, fanbookofficial, nct, dream, boom, httpst), List(badrista7, moath_1919), List(scroll_in, terrorists, respect, antiterrorist, officer, natural, phenomenon), List(rt, jdaiey, youre, reading, successful, dont, give), List(rt, pablovazquez_, backpack, got, stolen, laptop, camera, passport, things, brazil, venezuela, mxico), List(rt, prettygirlkaee, dont, know, yall, cant, sit, toilet, im, wet, httpstcokbviymhgfh), List(legend, like, say, coz, kkk, expressive, even, stand, n, look, beautiful, scene, httpstco4cf1f5nqjc), List(rt, r4visingh, town3rdkit, alysias1ngh, proud, launching, kit, htafc, seanmjarvis, team, thanks, katemallin1, amp, char), List(httpstco7pfqfw7yhi), List(postmixigreeskypelinemixigreelineskypedm), List(fooo), List(rt, chinesetvshow), List(rt, zoclvokua2tptwx, __), List(rt, bringbackkalas, oh, goddddd, pedro, done, salzburg, 14, chelsea, httpstcoutdlxo1ste), List(rt, humorandanimals, sarah, special, needs, walks, dance, hopes, encourage, best, version, sup), List(friendship, never, favour, never, beg, one, think, feels, re, u, favour, ur, friend, httpstcoqltmgnemhb), List(rt, gradle, reminder, need, new, gradle, logos, plugin, website, blog, post, conference, talk, download), List(rt, eplbible, lukaku, gets, juventus, training, ronaldo, gives, ball, httpstcog5vnt7ehbr), List(shamaiyak, lmao, coming, juicy, crabs, movies), List(fortnitebr, time, wait, 4, hours), List(wayv_official, hadeh, ini, anak, pejaten, village, cobain, kuy), List(rt, wings_phoenix), List(fnjpnews, guille_gag), List(rt, asthedearr, markleelioncub, httpstcowofaq4yoab), List(fortnitegame, happening, bois), List(rt, sproutlyohan, lee, sejin, ig, boyfriend, want, httpstco3rkijixal5), List(rt, hyphynano, nothings, better, cracking, tf, outta, foward, httpstcoljqd5ecqpl), List(hotytylcr, happy, birthday), List(smithysu43, skynews, borisjohnson, laid, 20000, 6000, cover, natural, wastage, resignatio, httpstco9uhkc4d4uq), List(31k_followers, keep, rocking, httpstcom8kwoasqaa), List(rt, d2megaten, 10050, rt819, httpstcog5l2be0ccu, 88d2), List(dapump53585120, dapump), List(rt, _konatsuami, httpstcokjlc3uy3gh), List(rt, animatehonten, 7f731cd, 6th7f, 2l), List(need, learn, lemon, crusades, disenchantments, season, 2, cities, decimated, part, httpstcokrdwp2yjmw), List(tgiseok, us), List(sad_omega, psycrowforprez, thank, im, happy, boyfriend, adorable), List(imdone_waitin, dont, know, aint), List(rt, marleen032013, good, morning, good, day, httpstcozsbv6o5ev3), List(wanna, boy, girl), List(biggest, fans, week, charlo51691592, thank, via, httpstcoyao5hu9lty, httpstco08v3sdlnhd), List(httpstcomdrswibtpf), List(holding, application, form, btch, started, raining, stress), List(sarap, makakita, ng, good, carry, sa, test, paper, sa, calculus, sarap, sa, eyes), List(httpstcocpnps03rrc), List(rt, sindivanzyl, paternitytest, saliva, swab, test, available, toga, labs, consent, mother, needed, case, going, court, r1), List(19, trash), List(rt, emojimashupbot, smilingthreehearts, unamused, httpstcoyyxl2ohhfg), List(rt, jeremycorbyn, perhaps, nhs, money, back, sued, httpstcomziqks2cqz), List(new, post, london, stock, exchange, buy, refinitiv, 27, bln, published, fundswift, httpstcohvgcdssshn), List(rt, royalbiink, since, army, love, screaming, mediaplay, youtube, ads, let, remind, bts, used, use, bigbang, tag, thei), List(rt, getfandom, comicbook, thor, spinoff, rather, see), List(rt, adrii_n2, one, shawn, mendes, camila, cabello, public, httpstcolvocy2jita), List(hotpointsupport, jlandpcustserv, yes, repair, accept, many, opportunit, httpstcopqe1ddqqdc), List(rt, sunshlnekth, yall, dont, give, wishing, star, credit, deserves, song, masterpiece, yall, pay, dust, httpstcolv), List(kim, jongun, repeatedly, expressed, satisfaction, result, testfire, said, kcna, httpstcokrbe80sfc0), List(httpstcovmgh5cu7sp, todays, weather, 900, whaley, bridge, peakdistrict, weather, whaleybridge, httpstco5ew0sewdm7), List(actually, people, like, piss, really, piss, kid, aint, gonna, remember, trip, u, httpstcoihuvs78lpt), List(fan, cy), List(17, satanfps), List(evening, five, oclock), List(rt, noahcent, im, even, religious), List(rt, nhk_news, nhk_news, httpstcoe8c0ia6xgf), List(rt, jaojho, flirt, date, date, marry, flirt, date, date, ghost, httpstcoyujwu120x5), List(rt, getup, theyve, paid, weekly, rent, people, newstart, almost, nothing, left, meet, basic, needs, isnt), List(rt, tais_affin, hello, august, good, morning, friends, httpstcouisngqdxqx), List(rt, wamagaisa, names, silvia, maphosa, 53, ishmael, kumire, 41, gavin, dean, charles, 45, jealous, chikandira, 21, brian, zhuwao, 26, chal), List(rt, snowrealm, fella, joke, around, character, card, done, korboryn, featuring, barbarian, class, character, griff, ht), List(another, day, august, 01, 2019, 0400pmrewardspoints, membersgetit, marriottrewards), List(clim8resistance, every, years, unimaginably, wealthy, climate, salvationists, virtuesignalling, celebs, rentseeking, httpstco7xipjsodsf), List(rt, askanshul, 2016, ujjain, madrasas, refuse, midday, meals, hindu, organisations, hmmmmm, food, religion, consum), List(rt, gabselin, fucking, moron, decided, leak, aas, assets, og, politcal, views, staff, give, shit), List(good, morning, spot, whats, wrong, picture, theres, coffee, cup, yetbut, read, httpstcoqumw6x95yz), List(rt, iroirokininaru, lovdeblov), List(rt, atshopde, rtamp, 50, httpstcofkwwfmfpeb), List(rt, fgoproject, fategrand, order1034, fgo, httpstcoycswlbyd3g, httpstco), List(rt, jalensyourking, coercion, rape, said, yes, 15th, time, asking, doesnt, mean, wanted, httpstcoqi), List(rt, cfcpys, christian, pulisic, going, turn, fantastic, signing, chelsea, fast, dribble, run, behind), List(movie, starring, hogan, wed, happy, watch, kids, httpstconvna4w2kce, wwe, wwa, badmovies), List(great, idea, need, dig, cool, race, tshirts, arent, full, adverts), List(rt, lootybag, personalised, swim, bags, drawstring, backpacks, pe, bags, handmade, cotton, canvas, waterproof, lined, embroidered), List(right, protiviti, view, talking, companies, changing, way, get, work, done, httpstcovilipppegx), List(httpstcofw8ovlftdr, lian, chos, cute, water, colour, illustration, lighten, day, httpstcomyjlzpgyx2), List(httpstcol3newvpcrw), List(rt, ss__34_, rt, follw, n, httpstcolqtsr2cyof), List(rt, moneybaggyo, u, gotta, quick, putting, yo, trust, people, u, kno, fold), List(rt, tennistv, victory, achieved, yoshihitotennis, battles, past, david, goffin, 675, 62, 765, reaches, round, 16, first), List(rt, cherzinga, people, think, gamer, girls, game, vs, actually, game, httpstcos542iczipq), List(rt, rangerpikachu, araw, araw, laging, nasa, ph, trends, ang, ht, natinang, saya, lang, ang, swerte, naming, keepers, sa, solid, k, amp, solid, amay, ibang), List(rt, ustrwa, feel, disgusted, n, dejected, politicl, apathyaftr, runng, fm, pillar, 2, post, jst, 2, gt, road, repaired, pol, reppd, heed), List(rt, na_tha_niel, create, another, app, called, sco, pa, tu, manaa, yall, leave, twitter, go, sco, pa, tu, manaa, hell), List(congratulations, designstrapz, winning, battle, pass, giveaway), List(rt, juanlovescock, enough, talk, lets, get, business, bro, senior_nobody2, httpstco8xftwef77e), List(rt, smanthaax, girl, soulmate, im, losing, hope, httpstcok9o5cxlgpi), List(rt, fentyfairies, emptypools, httpstco3ecrn4akiv), List(know, thats, answer, youd, like, give, answer), List(great, ruling, immigrant, asylum, barr, httpstcoqzwxwhuxn7), List(rt, kevinwada, bucky, barnes, natasha, romanoff, redesigns, funsies, nomoresleeves, httpstcov0h86ucacz), List(rt, john_houseof308, father, son, goal, funniest, thing, youll, ever, see, today, happy, new, month, httpstcoqfzas8sbma), List(club, employs, stadium, managerwhose, job, manage, stadium, budget, setwhy, isnt, fau, httpstcoceaz8ptedx), List(found, old, passport, photos, couldnt, help, compare, tiger, transformation, year, c, httpstco1rlvugocnd))","List(List(harris), List(), List(), List(sam56786260 zitudiary), List(), List(01082019 085801, johnson), List(), List(), List(), List(), List(), List(), List(), List(), List(2016, two), List(), List(today), List(), List(20th june 2019, susann), List(), List(86ajo, one, 9 pm to 5 am), List(), List(), List(), List(3500, today), List(), List(), List(), List(80), List(), List(), List(), List(), List(harris, 14 years), List(), List(8, 02, june, 4year, narendramodi ke), List(), List(), List(1st aug 2019), List(), List(225000, 325000), List(23 2019 2019 81), List(), List(), List(), List(), List(), List(97 days), List(), List(31 31 81100), List(), List(x101 1014), List(0900, 187c 10155 hpa 88 hum, 29 mph, 00 mm), List(497d 645 2, 276 300 302, 497d 818 4, 006), List(), List(), List(2019), List(), List(), List(), List(), List(80), List(mydearjanx httpstcocz4ioeqhhp), List(), List(1000, 188c, 274 kmh, 10140 hpa, 16 mm), List(q2, 10am), List(), List(), List(10 seconds), List(mueller, the day), List(the seventh day), List(), List(), List(one), List(9), List(), List(), List(two), List(), List(), List(), List(), List(), List(), List(), List(), List(5, 2, 4, 5), List(), List(6000), List(), List(), List(), List(), List(0948, 186c, 69, 117c, 10175 hpa, 0 kmh), List(), List(), List(), List(), List(), List(), List(181111), List(), List(), List(today), List(), List(), List(), List(01 2019), List(), List(350m), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(4), List(), List(), List(first), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(81, 2019 2019), List(), List(51on, 1), List(), List(), List(), List(august 01 2019, 0400pm), List(), List(one), List(monday, monday, the 4 next days), List(), List(one), List(), List(), List(), List(), List(9gag), List(), List(), List(), List(26 96), List(), List(three years), List(), List(), List(), List(), List(), List(), List(), List(2442 44mm2), List(), List(555555555), List(), List(3 year old), List(), List(), List(monday morning), List(), List(15, 16), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(today), List(), List(2, 7), List(8), List(tae_20130613, 10000), List(0100 94550, pm2534, o19), List(), List(), List(), List(), List(), List(op109 edward hopper 18821967), List(0851, 179c, 82, 14c, 10153 hpa, 0 mm), List(isaiah rashad f jean), List(), List(august 01 2019), List(), List(100), List(), List(), List(), List(), List(1, 2, only 10, 3), List(), List(), List(), List(), List(), List(5 6 301set 2000, 2days, 8182), List(), List(1130_1026), List(), List(), List(), List(3c5u3r5r3y5), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(65), List(), List(), List(1200), List(ebekun123456 oreratuyoi), List(another day its august 01 2019, 0400pm one more day), List(), List(400), List(), List(), List(64), List(), List(), List(), List(), List(), List(), List(), List(rina67708264), List(4kumi0 fgo fatego httpstcon2rqt3czdb), List(), List(), List(today evening, 7pm), List(), List(), List(), List(), List(season), List(1), List(), List(), List(), List(), List(2018), List(), List(), List(2), List(), List(26th june), List(), List(), List(1100), List(), List(), List(60020), List(), List(), List(), List(thursday 01 august 2019 0858, 168c, 3 mph, 8 mph, 86), List(), List(), List(), List(), List(2019), List(), List(), List(), List(tomorrow), List(), List(), List(), List(), List(tulag1996), List(this month), List(five years), List(), List(), List(), List(810831, 6), List(295), List(), List(), List(300m 400m 500m 600m, 700m), List(), List(august 01 2019, 0300am oclock of the morning), List(), List(5 10), List(), List(one, one, one), List(), List(), List(), List(), List(26), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(2077), List(_a1dan), List(), List(), List(), List(), List(), List(), List(), List(1010), List(), List(), List(), List(october 17 last year), List(), List(today), List(august), List(), List(today), List(), List(), List(jual kalo), List(), List(), List(114), List(jennie), List(), List(), List(), List(), List(), List(), List(729), List(), List(), List(), List(), List(), List(), List(), List(first), List(), List(150), List(2ton89hina), List(), List(31 31 81100), List(), List(), List(), List(one), List(190801 12, 5, 5 mont), List(), List(), List(), List(190731), List(41500), List(), List(), List(7), List(yuha_twice a4), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(nel1011), List(), List(), List(), List(1981, third, 5, two, 22 years), List(one), List(), List(), List(), List(), List(), List(ddm_the bj), List(), List(oyebandhu bandhureviews), List(), List(one), List(), List(), List(democrats), List(this week), List(), List(25), List(), List(), List(), List(), List(652 million, 652), List(), List(), List(), List(), List(oct 28 2019, more than a million), List(), List(), List(), List(bjkwqghukgc7pwj amphttpstconfourk2nwn httpstcowsqfrgx9fq), List(), List(), List(20190721 20190727 1, about you 54157121, 23, four, 16187081, 55), List(), List(127, 1st, 127), List(), List(7 4), List(joy997fm mahama, six), List(), List(), List(), List(), List(), List(800 pm, 4, 20, 100440), List(21st), List(), List(5), List(110, 11), List(the one day), List(), List(), List(first), List(), List(), List(01 2019), List(), List(), List(), List(last night), List(this week), List(), List(), List(d2binfinity2019), List(), List(2019), List(), List(), List(), List(), List(), List(), List(), List(), List(01 2019), List(), List(), List(), List(1), List(), List(), List(), List(), List(at least three, four), List(one), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(one), List(half day), List(), List(), List(), List(), List(2 56), List(), List(), List(yesterday, the end of a day), List(millions), List(jay amp), List(), List(), List(), List(), List(2 mil wey), List(), List(100), List(), List(), List(), List(), List(temenan yok), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(first), List(), List(81 1800 3000  1500), List(), List(august 01 2019), List(), List(), List(six year old), List(), List(30), List(), List(), List(4), List(), List(01 2019), List(this week), List(), List(this week), List(), List(), List(zendaya), List(), List(), List(go2019), List(), List(), List(18), List(), List(), List(), List(), List(), List(), List(21 billion, 420k, 45k, 28k, 2100, 6), List(), List(), List(), List(), List(), List(), List(100correct), List(), List(this week), List(16hrs 2august batelco), List(), List(), List(), List(august 01 2019), List(), List(two), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(20 years ago, today), List(), List(cl69420, two), List(), List(season), List(), List(), List(054758373), List(), List(canpnn amecha_0312 httpstconmvgza1b), List(), List(), List(), List(), List(), List(1 august 2018), List(), List(), List(), List(), List(), List(), List(), List(10), List(13 beta5 10, httpstcolynx5he1nq), List(), List(), List(), List(), List(), List(daisy johnson, two, this friday 22), List(month), List(), List(), List(4 am), List(), List(this week), List(), List(), List(), List(), List(1), List(), List(), List(), List(jin), List(), List(), List(), List(), List(), List(tu lebih, muka), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(56), List(day 22), List(today), List(), List(), List(), List(4th, 200 million), List(kathryn bernardo, alden, 7), List(), List(), List(), List(), List(), List(), List(every 6 months), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(), List(14), List(), List(), List(), List(lukaku), List(), List(4 hours), List(), List(), List(), List(), List(), List(ig), List(), List(), List(20000, 6000), List(), List(10050 rt819, 88d2), List(), List(_konatsuami httpstcokjlc3uy3gh), List(7f731cd), List(2), List(), List(), List(), List(morning), List(), List(this week, charlo51691592), List(), List(), List(), List(), List(), List(19), List(), List(), List(27 bln), List(), List(), List(), List(), List(), List(kim jongun), List(900), List(), List(), List(17), List(five oclock), List(), List(nhk_news nhk_news httpstcoe8c0ia6xgf), List(), List(weekly), List(), List(53, 41, 45, 21, zhuwao, 26), List(), List(august 01 2019), List(), List(2016), List(), List(), List(), List(50), List(), List(15th), List(), List(hogan), List(), List(), List(), List(), List(), List(), List(), List(675 62 765, 16, first), List(), List(araw araw), List(2, 2 gt), List(), List(), List(), List(), List(fentyfairies emptypools httpstco3ecrn4akiv), List(), List(barr httpstcoqzwxwhuxn7), List(), List(today), List(), List())"


### Algorithm implementation - BNGRAM

Now that we have our superdocuments, with the clean tokens per time window (and entities if NER was applied). The next steps for the implementation of the BNgram algorithm are the following:

- **Generate n-grams:** Create n-grams (sequences of `n` words) from `super_documents` in the aggregated data.
- **Compute temporal relevance:** Calculate relevance scores for n-grams over time to identify trends, based on the frequency of occurence in each time window, taking into account the curent and past frequency in each window.
- **Boost named entities:** Boost n-grams containing named entities if `apply_ner=True`, otheriwse the boosted score will be the same as the relevance calculated in the previous step.
- **Cluster n-grams:** Group similar n-grams using KMeans clustering for a certain number of clusters (`num_clusters`), first converting the words into vector embeddings to create numerical features which can be clustered.
- **Rank topics:** In the final stpes we rank n-grams within clusters based on boosted scores and relevance. 

These ranked n-grams per cluster will be the final result. Each cluster is supposed to represent a different group of topics, which are semantically more similar, and in each of these group of topics, the speciic n-grams are ordered in terms of relevance to show the more popular topics in each cluster.

Let's go through each of the functions that compose the overall algorihtm.


We generate n-grams (sequences of n words) from text data.

In [0]:
from pyspark.sql.functions import col, explode, flatten, sequence, concat_ws, expr

def generate_ngrams(aggregated_data, n=5):
    """
    Generate n-grams from aggregated super_documents.

    Parameters:
    - aggregated_data: Spark DataFrame with `super_documents`.
    - n: The size of n-grams to generate.

    Returns:
    - DataFrame with `window` and generated `ngrams`.
    """

    # Flatten super_documents to a single array
    processed_data = aggregated_data.withColumn("flat_documents", flatten(col("super_documents")))

    # Generate n-grams for each document
    ngrams_df = processed_data.withColumn(
        "ngrams",
        explode(
            expr(f"""
                transform(
                    sequence(1, size(flat_documents) - {n} + 1),
                    pos -> concat_ws(" ", slice(flat_documents, pos, {n}))
                )
            """)
        )
    ).select("window", "ngrams")

    return ngrams_df



In [0]:
ngrams_df = generate_ngrams(aggregated_tweets)

In [0]:
ngrams_df.printSchema()

root
 |-- window: struct (nullable = false)
 |    |-- start: timestamp (nullable = true)
 |    |-- end: timestamp (nullable = true)
 |-- ngrams: string (nullable = false)



We calculate a temporal relevance score for each n-gram.

In [0]:
from pyspark.sql.functions import col, avg, count, unix_timestamp, struct

def compute_temporal_relevance(ngrams_df):
    """
    Compute temporal relevance scores for n-grams.

    Parameters:
    - ngrams_df: DataFrame with `ngrams` and `window`.

    Returns:
    - DataFrame with `ngrams`, `window`, `current_count`, `average_past_count`, and `relevance_score`.
    """

    # Convert window.start to a numeric timestamp
    ngrams_df = ngrams_df.withColumn("window_start", unix_timestamp(col("window.start")))

    # Count occurrences of n-grams in each window
    current_counts = ngrams_df.groupBy("window_start", "ngrams").agg(
        count("*").alias("current_count")
    )

    # Calculate average past counts (excluding the current window)
    past_counts = current_counts.groupBy("ngrams").agg(
        avg("current_count").alias("average_past_count")
    )

    # Join current and past counts
    relevance_df = current_counts.join(
        past_counts, on="ngrams", how="left"
    ).withColumn(
        "relevance_score",
        col("current_count") / (col("average_past_count") + 1)  # Avoid division by zero
    )

    return relevance_df



In [0]:
relevance_df = compute_temporal_relevance(ngrams_df)

In [0]:
relevance_df.printSchema()

root
 |-- ngrams: string (nullable = false)
 |-- window_start: long (nullable = true)
 |-- current_count: long (nullable = false)
 |-- average_past_count: double (nullable = true)
 |-- relevance_score: double (nullable = true)



Here we aim to enhance the relevance scores of n-grams containing named entities, when NER is applied

In [0]:
from pyspark.sql.functions import col, struct, from_unixtime, expr, explode, concat

def boost_named_entities(relevance_df, aggregated_data, apply_ner=True, time_window_seconds=300):
    """
    Boost relevance scores for n-grams containing named entities.

    Parameters:
    - relevance_df: DataFrame with computed relevance scores.
    - aggregated_data: Original aggregated data with `entities_list`.
    - apply_ner: Boolean indicating whether to apply NER boosting.
    - time_window_seconds: Window duration in seconds (default: 300 seconds).

    Returns:
    - DataFrame with boosted scores.
    """
    if not apply_ner:
        # If NER is not applied, return relevance_df with a boosted_score column equal to relevance_score
        boosted_df = relevance_df.withColumn("boosted_score", col("relevance_score"))
        return boosted_df

    # Step 1: Explode entities_list into individual entities and ensure the entity is a string
    entities_df = aggregated_data.select(
        "window",
        explode(col("entities_list")).alias("entity")
    ).withColumn(
        "entity", col("entity").cast("string")  # Ensure entity is a string
    )

    # Step 2: Reconstruct `window` column in relevance_df
    relevance_df = relevance_df.withColumn(
        "window",
        struct(
            from_unixtime(col("window_start")).cast("timestamp").alias("start"),
            (from_unixtime(col("window_start")).cast("timestamp") + expr(f"INTERVAL {time_window_seconds} SECONDS")).alias("end")
        )
    )

    # Step 3: Perform the join and calculate boosted scores
    boosted_df = relevance_df.join(
        entities_df, on="window", how="left"
    ).withColumn(
        "boosted_score",
        expr("relevance_score * (case when ngrams like concat('%', entity, '%') then 1.5 else 1 end)")
    )

    return boosted_df



In [0]:
boosted_df = boost_named_entities(relevance_df, aggregated_data= aggregated_tweets, apply_ner = True)

We cluster n-grams into topics using Word2Vec embeddings and KMeans clustering.

In [0]:
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import Word2Vec

def cluster_ngrams(boosted_df, num_clusters=5):
    """
    Cluster related n-grams into topics using Word2Vec and KMeans.

    Parameters:
    - boosted_df: DataFrame with boosted relevance scores.
    - num_clusters: Number of clusters for KMeans.

    Returns:
    - DataFrame with clusters assigned to n-grams.
    """
    # Aggregate n-grams into lists for each window
    ngram_docs = boosted_df.groupBy("window_start").agg(
        collect_list("ngrams").alias("ngram_list")
    )

    # Fit Word2Vec model to get embeddings
    word2vec = Word2Vec(vectorSize=50, inputCol="ngram_list", outputCol="features")
    w2v_model = word2vec.fit(ngram_docs)

    # Transform n-grams into embeddings
    ngram_embeddings = w2v_model.transform(ngram_docs)

    # Apply KMeans clustering
    kmeans = KMeans(k=num_clusters, featuresCol="features", predictionCol="cluster")
    kmeans_model = kmeans.fit(ngram_embeddings)

    # Assign clusters
    clustered_df = kmeans_model.transform(ngram_embeddings)

    # Explode ngrams and add cluster labels
    final_df = clustered_df.select(
        "window_start", explode("ngram_list").alias("ngram"), "cluster"
    )

    return final_df


In [0]:
clustered_df = cluster_ngrams(boosted_df)

We rank clusters (topics) based on the average boosted scores of their n-grams.

In [0]:
from pyspark.sql.functions import col, avg, collect_list, desc

def rank_topics(clustered_df, boosted_df):
    """
    Rank topics based on the average boosted scores of their n-grams.

    Parameters:
    - clustered_df: DataFrame with clusters assigned to n-grams.
    - boosted_df: DataFrame with boosted scores for n-grams.

    Returns:
    - DataFrame with ranked topics, their associated n-grams, and average scores.
    """
    # Join clustered_df with boosted_df to include boosted_score
    ranked_df = clustered_df.join(
        boosted_df.select(col("ngrams").alias("ngram"), "boosted_score"),
        on="ngram",
        how="inner"
    )

    # Group by cluster and calculate aggregated information
    ranked_topics = ranked_df.groupBy("cluster").agg(
        collect_list("ngram").alias("ngrams"),
        avg("boosted_score").alias("average_score")
    ).orderBy(desc("average_score"))

    return ranked_topics



In [0]:
ranked_topics = rank_topics(clustered_df, boosted_df)

lets see what the top 3 n-grams per cluster are

In [0]:
from pyspark.sql.functions import col, slice

def get_top_ngrams_per_cluster(ranked_topics, top_n=3):
    """
    Extract the top n-grams for each cluster.
    
    Parameters:
    - ranked_topics: Spark DataFrame containing ranked topics with "ngrams" and "cluster".
    - top_n: Number of top n-grams to extract per cluster.
    
    Returns:
    - Spark DataFrame with top n-grams for each cluster.
    """
    top_n_topics_with_ngrams = ranked_topics.withColumn(
        "top_ngrams", slice(col("ngrams"), 1, top_n)
    ).select("cluster", "top_ngrams", "average_score")
    return top_n_topics_with_ngrams


In [0]:
top_n_topics_with_ngrams = get_top_ngrams_per_cluster(ranked_topics, top_n=3)

display(top_n_topics_with_ngrams)

cluster,top_ngrams,average_score
2,"List(chanyeol sehun_chanyeol exo_sc, think babies mature, opinion httpstcosicxdyj4rc rt)",0.5077807896778362
4,"List(opened gate httpstcoyzlabbgvvb, shouldnt tell someo, 20s baby life)",0.507086928801416
1,"List(way overnight week, face dig face, britains industry gre)",0.5063540922014329
3,"List(chanyeol sehun_chanyeol exo_sc, karlousm damn good, rt peryiat go)",0.5059210868267039
0,"List(hmmmmm food religion, hassystants 12 816, fave look unbuttoned)",0.5046124569067216


if we ignore the cluster they belong to, overall the top n-grmas woudl be the following

In [0]:
from pyspark.sql.functions import explode

def get_top_overall_ngrams(ranked_topics, top_n=5):
    """
    Extract the top n-grams across all clusters.
    
    Parameters:
    - ranked_topics: Spark DataFrame containing ranked topics with "ngrams".
    - top_n: Number of overall top n-grams to extract.
    
    Returns:
    - Spark DataFrame with top n-grams.
    """
    top_ngrams = ranked_topics.select(explode(col("ngrams")).alias("ngram")).limit(top_n)
    return top_ngrams


In [0]:
top_ngrams = get_top_overall_ngrams(ranked_topics, top_n=5)

display(top_ngrams)

ngram
chanyeol sehun_chanyeol exo_sc
think babies mature
opinion httpstcosicxdyj4rc rt
top bgt masih
pickup ssr3 rt


### Final BNgram function

In the next function we integrated all the steps to extract and rank trending topics.

In [0]:
def bngram_algorithm(aggregated_data, n=5, apply_ner=False, num_clusters=5):
    """
    Apply the BNgram algorithm to aggregated data.

    Parameters:
    - aggregated_data: Aggregated Spark DataFrame with `super_documents` and optionally `entities_list`.
    - n: Number of words in the n-grams.
    - apply_ner: Boolean flag indicating whether NER was applied.
    - num_clusters: Number of clusters for KMeans clustering.

    Returns:
    - DataFrame with ranked topics and their n-grams.
    """
    # Step 1: Generate n-grams
    ngrams_df = generate_ngrams(aggregated_data, n)

    # Step 2: Compute temporal relevance scores
    relevance_df = compute_temporal_relevance(ngrams_df)

    # Step 3: Boost named entities (conditionally)
    boosted_df = boost_named_entities(relevance_df, aggregated_data, apply_ner)

    # Step 4: Cluster n-grams
    clustered_df = cluster_ngrams(boosted_df, num_clusters)

    # Step 5: Rank topics
    ranked_topics = rank_topics(clustered_df, boosted_df)

    return ranked_topics



In [0]:
topics = bngram_algorithm(aggregated_tweets, n=5, apply_ner=True, num_clusters=5)

In [0]:
top_topics_per_cluster = get_top_ngrams_per_cluster(topics, top_n=3)

display(top_topics_per_cluster)

cluster,top_ngrams,average_score
2,"List(chanyeol sehun_chanyeol exo_sc, think babies mature, opinion httpstcosicxdyj4rc rt)",0.5077807896778362
4,"List(opened gate httpstcoyzlabbgvvb, shouldnt tell someo, 20s baby life)",0.507086928801416
1,"List(way overnight week, face dig face, britains industry gre)",0.5063540922014329
3,"List(chanyeol sehun_chanyeol exo_sc, karlousm damn good, rt peryiat go)",0.5059210868267039
0,"List(hmmmmm food religion, hassystants 12 816, fave look unbuttoned)",0.5046124569067216


In [0]:
top_ngrams_topics = get_top_overall_ngrams(topics, top_n=5)

display(top_ngrams_topics)

ngram
chanyeol sehun_chanyeol exo_sc
think babies mature
opinion httpstcosicxdyj4rc rt
top bgt masih
pickup ssr3 rt


## Final function
This final function integrates all the previous functions used to detect trending topics.

We get the necessary data, treat it and apply the topic detection algorithm

In [0]:
def final_function(start_year=2019, start_month=8, start_day=1, start_hour=2, start_minute=0, 
                   minutes_length=5, 
                   n=3,
                   num_clusters = 5,
                   special_characters=False,
                   time_window='1 minutes', 
                   apply_ner=True,
                   n_top_clusters=3,
                   n_top_ngrams=5):
    
    """

    Final function that retrieves the tweets, preprocesses them, and applies the bngram algorithm to extract topics (scored n-grams in their clusters).

    Parameters:
    -----------
    start_year (int): Starting year of the time range for the tweets being fetched (Default = 2019).
    start_month (int): Starting month of the time range for the tweets being fetched (Default = 8).
    start_day (int): Starting day of the time range for the tweets being fetched (Default = 1).
    start_hour (int): Starting hour of the time range for the tweets being fetched (Default = 2).
    start_minute (int): Starting minute of the time range for the tweets being fetched (Default = 0).
    minutes_length (int): Length of the time window in minutes to fetch tweets (Default = 5).
    n (int): Size of n-grams (e.g., 2 for bigrams, 3 for trigrams) to extract from the text (Default = 3).
    num_clusters (int): Number of clusters to generate during clustering (Default = 5).
    special_characters (bool): Whether to include special characters such as ⁠ # ⁠ (hashtags) and ⁠ @ ⁠ (mentions) in the filtering (Default = False).
    time_window (str): Aggregation window size (e.g., '5 minutes') for preprocessing tweets (Default = '1 minutes').
    apply_ner (bool): Whether to apply Named Entity Recognition (NER) and boost named entities during processing (Default = True).
    n_top_clusters (int): Number of top ngrmas per clusters to extract results from the final ranked results (Default = 3).
    n_top_ngrams (int): Number of top n-grams to extract as results from final ranked results (Default = 5).

    Returns:
    --------
    result_df (DataFrame): A Spark DataFrame containing the results of the bngram algorithm, including clusters and n-grams.
    top_topics_per_cluster (DataFrame): A Spark DataFrame containing the top n-grams for each of the clusters.
    top_ngrams_topics (DataFrame): A Spark DataFrame containing the top overall n-grams extracted from the dataset, over all clusters.

    """
  
    # Fetch tweets
    tweets = get_tweets(start_year=start_year, start_month=start_month, start_day=start_day, start_hour=start_hour, 
                        start_minute=start_minute, minutes_length=minutes_length)
    
    # Handle case where no tweets are fetched
    if tweets is None:
        print("No tweets fetched for the given time range.")
        return None

    # Cleaning
    cleaned_tweets = inital_cleaning_prep(tweets, special_characters=special_characters)

    # Filter English tweets
    english_tweets = filter_english_tweets(cleaned_tweets)

    # Preprocess tweets for bngram
    aggregated_tweets = preprocess_for_bngram(english_tweets, time_window=time_window, apply_ner=apply_ner)

    # Run the bngram algorithm
    result_df = bngram_algorithm(aggregated_tweets, n=n, apply_ner=apply_ner, num_clusters=num_clusters)

    # Get top ngrams per cluster
    top_topics_per_cluster = get_top_ngrams_per_cluster(result_df, top_n=n_top_clusters)

    # Get top overall ngrams
    top_ngrams_topics = get_top_overall_ngrams(result_df, top_n=n_top_ngrams)
  
    return result_df, top_topics_per_cluster, top_ngrams_topics

In [0]:
result, top_clusters, top_ngrams = final_function(start_year=2019, start_month=8, start_day=1, start_hour=2, start_minute=0, 
                   minutes_length=5, 
                   n=5,
                   num_clusters = 5,
                   special_characters=False,
                   time_window='1 minutes', 
                   apply_ner=False,
                   n_top_clusters=3,
                   n_top_ngrams=5)
result.show()


ld_wiki_tatoeba_cnn_21 download started this may take some time.
Approximate size to download 7.1 MB
[ | ][OK!]
+-------+--------------------+------------------+
|cluster|              ngrams|     average_score|
+-------+--------------------+------------------+
|      2|[chanyeol sehun_c...|0.5077807896778362|
|      4|[opened gate http...| 0.507086928801416|
|      1|[way overnight we...|0.5063540922014329|
|      3|[chanyeol sehun_c...|0.5059210868267039|
|      0|[hmmmmm food reli...|0.5046124569067216|
+-------+--------------------+------------------+



In [0]:
display(top_clusters)

cluster,top_ngrams,average_score
2,"List(chanyeol sehun_chanyeol exo_sc, think babies mature, opinion httpstcosicxdyj4rc rt)",0.5077807896778362
4,"List(opened gate httpstcoyzlabbgvvb, shouldnt tell someo, 20s baby life)",0.507086928801416
1,"List(way overnight week, face dig face, britains industry gre)",0.5063540922014329
3,"List(chanyeol sehun_chanyeol exo_sc, karlousm damn good, rt peryiat go)",0.5059210868267039
0,"List(hmmmmm food religion, hassystants 12 816, fave look unbuttoned)",0.5046124569067216


In [0]:
display(top_ngrams)

ngram
chanyeol sehun_chanyeol exo_sc
think babies mature
opinion httpstcosicxdyj4rc rt
top bgt masih
pickup ssr3 rt


It is also important to note that the algorithm can also be improved by creating ngrams of several sizes, of 3,4,5,6.. which may improve the results of the scores, the issue is the time it takes to run this, and that we do not have a list of ground truth topics to compare the results to. 

Like this issue, a lot of other factors could be improved, if we had something to verify our resuls with, like for instance the number of clusters, maybe event the time wnidows (which they vary in the paper) a lot of combinatiosn could be tested to see what seems to work best and when, but unfourtunelty it is not the case.

Aditionally, with more computational power and more time, a more refined version may have been implemented, since the time it takes to run is also a huge limitation.




## Conclusion

We started this project by reading the paper "Sensing Trending Topics in Twitter" which explores different algorithms for detecting trending topics in socal media. After reviwing these methods we decided that our best option was to implement the BNgram algorithm because of its performance on the paper and also its ability to capture patters of trending topics through n-gram analysis. 

Throughout this project, we collected the data by retrieving tweets using specified time windows. We performed preprocessing where we cleaned and filtered tweets, processed into tokenized ngrams. Using the BNgram algorithm, ngrams were evaluated based on their occurrance within and outside defined time window to compute relevance scores.

In addition, NER was optionally applied to enhance scores for the  eaniing entities, followed by clustering ngrrams into topics. Finally, clusters were ranked bassed on the average scores to identify the most relevant trends. All of this was incorporated in the final function which allows flexibility in parameters such as cluster number or the inclusion of special characters.

While we consider that the implementation was successful, we believe there are many areas of improvement. Firt, the algorithm was tested primarily with fixed ngram sizes. Expanding to multiple sizes could increase the diversity of captured patters. However, computational time made this difficult to implement. Moreover, The time requiered for ngrams, NER boosting and clustering proved to be a bottleneck. If we had more computational power larger datasets could be processed and more complex configurations could be tested to refine our results. During this project we tested different variations in parameters such as including/excluding special characters (@,#), applying NER, and using different window sizes.

Despite the challenges, we believe that our project successfully demonstrates trending topic detection with the BNgram algorithm. For futere work improvement we could use multiple ngram sizes to improve relevance scoring and do validation using a ground truth dataset for more reliable benchmarking.

In conclusion, this project successfully demonstrates the capability of the BNgram algorithm to process large scale social media data and uncover meaningful patterns in real time.