# Markov Chains for  D.Trump Tweets prediction

This example shows how to feed Markov Chains with Tweets of D.Trump. The result will provide a sentence prediction of what is said in social media about him. 

## 0. - Running the example

In [None]:
# 0.1 - Load the module Tweeter
from tweeter import Tweeter

# 0.2 - Load the source to feed the Markov Chains algorithm
trump_tweeter = Tweeter('trump_tweets.txt')

In [1]:
# 0.3 - See the results of the prediction
trump_tweeter.new_tweet()

'"@Amber_Sadler22: Donald gets it!'

Here the results show one of the most comon users tweets about Trump.
Download or clone this repository and run it again to get any other result! 

## 1. - In deeper with the code

Tweeter module deciphered, to see how the content of this module is working, step by step. 

In [2]:
### Loading Required libraries for the module Tweeter

from collections import Counter, defaultdict
import random

In [3]:

# Function 1

def list_lines(fpath):
    """
    Converts a text file to a list split by lines.
    :param fpath: filepath of text file
    :return: list of lines of text file
    """
    with open(fpath, 'r', encoding='utf-8-sig') as f:
        return f.readlines()

# Class 1
    
class Tweeter:
    def __init__(self, fpath, model_order=2):
        self.fpath = fpath

        self.model_order = model_order  # length of ngrams the model looks at to find next word
        self.markov_model = defaultdict(lambda: Counter())
        self.start_ngrams = Counter()
        self.__load_model()

    def __load_model(self):
        # load model (prepare data)
        # nth order (depends on self.model_order) markov model of words (no preprocessing)
        # to add to this model, markov_model[current_ngram][next_word] += 1
        tweets = list_lines(self.fpath)

        for tweet in tweets:
            tokens = tweet.split()

            # add each ngram except the last ngram in the tweet to the markov model
            for i in range(len(tokens[:-self.model_order])):
                current_ngram = tuple(tokens[i:i+self.model_order])
                next_word = tokens[i + self.model_order]
                self.markov_model[current_ngram][next_word] += 1
            # for the last ngram, nothing follows
            if len(tokens) >= self.model_order:
                current_ngram = tuple(tokens[-self.model_order:])
                next_word = None
                self.markov_model[current_ngram][next_word] += 1

                start_ngram = tuple(tokens[:self.model_order])
                self.start_ngrams[start_ngram] += 1

    def new_tweet(self):
        # create new tweet from model
        new_tweet_list = []

        start_ngrams = list(self.start_ngrams.keys())
        start_values = list(self.start_ngrams.values())
        start_ngram = random.choices(start_ngrams, weights=start_values)[0]

        current_ngram = start_ngram

        while None not in current_ngram:
            new_tweet_list.append(current_ngram[0])

            next_words = list(self.markov_model[current_ngram].keys())
            next_values = list(self.markov_model[current_ngram].values())
            next_word = random.choices(next_words, weights=next_values)[0]

            current_ngram = tuple(current_ngram[1:] + (next_word,))
        new_tweet_list.extend(current_ngram[:-1])

        return ' '.join(new_tweet_list)

In [4]:


# 1.1 Loading source into Tweeter class
trump_tweeter = Tweeter('trump_tweets.txt')

# 1.2 Load object of class Twitter into function new_tweet, and see the results! 
print(trump_tweeter.new_tweet())

"@DannyBo4455: @hamishjoy Mr Trump? The Mayor in U.S. strongly in to everyone out all over the so many other alternatives.


NOTE: The results do not need to make sense at all. This is just an experiment to show up how Markov chains model can be used to make sentence predictions by training them with tweets on one topic (in this case Donald trump)