## Obtaining Tweets

### Using snscrape

Note that colab is using a python 3.7 environemnt, whereas snscrape requires >3.8 

Hence, snscrape won't work in colab, but if you'd like to reproduce the results on your own (Try out with different usernames, etc..) , the code is here

In [None]:
!pip install git+https://github.com/JustAnotherArchivist/snscrape.git

In [None]:
import snscrape.modules.twitter as sntwitter

tweets = []

username = "elonmusk"

# Will retrieve 2000 most recent tweets from specified user
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(f'from:{username}').get_items()):
    if i>2000:
        break
    # Preprocessing of tweets. Remove all newlines, encode from bytes to string
    tweet = tweet.content.replace('\n', '')
    tweet = tweet.encode('ascii', errors='ignore')
    tweet = tweet.decode("utf-8")
    tweets.append(tweet)

tweets = ".".join(tweets)

### Loading from .txt file

In [1]:
# Read .txt file and load into variable
with open("tweets.txt", "r") as f:
  tweets = f.read()

In [3]:
tweets



## Tweets Preprocessing

In [4]:
# Regex for replacing all urls with custom url
import re
def replace_URLs(string, custom_URL):
    modified_string = re.sub(
        "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+",
        f" {custom_URL} ",
        string,
    )
    return modified_string

In [5]:
# Define phishing URL here
EVIL_URL = "http://evil.url"

In [6]:
# Preprocess the dataset
preprocessed = replace_URLs(tweets, EVIL_URL)

In [8]:
preprocessed



## Markov Model

In [9]:
# Install markov model for python
!pip install markovify

Collecting markovify
  Downloading markovify-0.9.3.tar.gz (28 kB)
Collecting unidecode
  Downloading Unidecode-1.3.2-py3-none-any.whl (235 kB)
[K     |████████████████████████████████| 235 kB 7.9 MB/s 
[?25hBuilding wheels for collected packages: markovify
  Building wheel for markovify (setup.py) ... [?25l[?25hdone
  Created wheel for markovify: filename=markovify-0.9.3-py3-none-any.whl size=18622 sha256=61d1bdde6b1d5403d5c389471709b6514f0a91ea8a018145e7b7cc7815fc86f7
  Stored in directory: /root/.cache/pip/wheels/d9/f0/5b/748a27bdf2496bd4df51acb9442dae516efce507ff4849813e
Successfully built markovify
Installing collected packages: unidecode, markovify
Successfully installed markovify-0.9.3 unidecode-1.3.2


In [10]:
import markovify

In [11]:
# Fit markov model on text
model = markovify.Text(preprocessed)

In [14]:
# Generate sentence
model.make_sentence()

'Orbital launch tower arm, using the power of irony.'

In [24]:
# Keep generating sentences till URL is inside the generated sentence
found = False

# not False ==> not True
while not found:
  sent = model.make_sentence()
  if sent and EVIL_URL in sent:
    print(sent)
    found = True

@Teslarati @KlenderJoey http://evil.url ~1000 improvements.
