# Text Mining in Social Networks ‚Äî Preprocessing Demo


Instructions:

1Ô∏è‚É£ You can enter one or more tweets manually.
   - Type each tweet and press "Enter" after each.
   - When you‚Äôre done, just press "Enter" again on a blank line.

2Ô∏è‚É£ If you don‚Äôt want to enter anything, just press "Enter", the program will run on the default dataset.


________________________________________________________
Example:
I love my new iPhone!! It's absolutely amazing üòç

  Life is great‚Ä¶ said no one during exams üòí


In [None]:
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import TweetTokenizer
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

## Enter your tweets (leave blank and run to use default dataset)

In [None]:
print("      TEXT MINING IN SOCIAL NETWORKS ‚Äî PREPROCESSING DEMO")
print("="*65)



# STEP 0: Input or Default Dataset

user_inputs = []

while True:
    tweet = input("Enter tweet (or press Enter to finish): ").strip()
    if tweet == "":
        break
    user_inputs.append(tweet)

if user_inputs:
    data = pd.DataFrame({
        'id': range(1, len(user_inputs)+1),
        'user': ['@user'] * len(user_inputs),
        'text': user_inputs
    })
else:
    print("\nNo input provided ‚Äî running on default dataset...\n")
    data = pd.DataFrame({
        'id': range(1, 11),
        'user': [
            '@elonmusk','@taylorswift13','@BillGates','@Oprah','@JeffBezos',
            '@NASA','@BarackObama','@elonmusk','@neiltyson','@ladygaga'
        ],
        'text': [
            "Good thing I never tweet anything controversial‚Ä¶ except for the time I said Tesla stock was too high, or Dogecoin was the future, or when I named my kid after an encryption algorithm ü§∑‚Äç‚ôÇÔ∏è.",
            "Every single one of you made this album possible. From late-night writing sessions to surprise releases ‚Äî your love made the magic real. Forever grateful ‚ù§Ô∏è #Swifties #ThankYou",
            "People think innovation is just about ideas. It‚Äôs also about staying calm when your prototype catches fire, your investor ghosts you, and your code breaks five minutes before the demo. Fun times.",
            "When I said ‚ÄòYou get a car!‚Äô, I didn‚Äôt mean you get a global pandemic. Life‚Äôs surprises have a strange sense of humor. But keep your spirits high ‚Äî gratitude is the best immunity.",
            "I love reading tweets about how easy it must be to run a space company. Sure, because rockets, physics, and orbital trajectories are just weekend hobbies, right? üòè #SarcasmModeOn",
            "After traveling nearly 300 million miles, Perseverance has safely landed on Mars! Data looks great, instruments are stable, and we just received the first panoramic image. #Mars2020 #MissionAccomplished",
            "I keep reminding young people: post with purpose. The internet never forgets, even when you wish it would. Choose your words wisely ‚Äî they echo longer than you think.",
            "Just for fun, I changed the Twitter logo to a Shiba Inu for a day. Didn‚Äôt expect crypto markets to explode. Maybe memes are the most powerful economic forces of our time üòÇ #DogeDay",
            "Every time I see someone arguing that Earth is flat, I remember ‚Äî ships disappear bottom-first over the horizon, not because they‚Äôre shy, but because physics still works.",
            "Performing live again after two years feels unreal ‚Äî the lights, the energy, the fans singing every word. My heart is full and my voice is hoarse. Wouldn‚Äôt trade this for anything üí´ #Grateful"
        ]
    })

print("\nOriginal Data:\n")
print(data[['user', 'text']])


# STEP 1: Cleaning Function

def clean_text(text):
    text = re.sub(r'http\S+', '', text)                # Remove URLs
    text = re.sub(r'@\w+', '', text)                   # Remove mentions
    text = re.sub(r'#\w+', '', text)                   # Remove hashtags
    text = re.sub(r'[^\w\s]', '', text)                # Remove punctuation
    text = re.sub(r'[\U00010000-\U0010ffff]', '', text)# Remove emojis
    text = text.lower()                                # Lowercase
    text = re.sub(r'\s+', ' ', text).strip()           # Remove extra spaces
    return text

data['clean_text'] = data['text'].apply(clean_text)


# STEP 2: Tokenization

tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
data['tokens'] = data['clean_text'].apply(tokenizer.tokenize)


# STEP 3: Stopword Removal

stop_words = set(stopwords.words('english'))
data['tokens'] = data['tokens'].apply(lambda tokens: [t for t in tokens if t not in stop_words])


# STEP 4: Lemmatization

lemmatizer = WordNetLemmatizer()
data['tokens'] = data['tokens'].apply(lambda tokens: [lemmatizer.lemmatize(t) for t in tokens])


# STEP 5: Rejoin Clean Text

data['final_text'] = data['tokens'].apply(lambda x: ' '.join(x))


# DISPLAY RESULTS

print("\n================== CLEANED DATA (Before & After) ==================\n")
for i in range(len(data)):
    print(f"Tweet {i+1}: {data.text[i]}")
    print(f"Cleaned : {data.final_text[i]}")
    print("-"*70)




      TEXT MINING IN SOCIAL NETWORKS ‚Äî PREPROCESSING DEMO
Enter tweet (or press Enter to finish): 

No input provided ‚Äî running on default dataset...


Original Data:

             user                                               text
0       @elonmusk  Good thing I never tweet anything controversia...
1  @taylorswift13  Every single one of you made this album possib...
2      @BillGates  People think innovation is just about ideas. I...
3          @Oprah  When I said ‚ÄòYou get a car!‚Äô, I didn‚Äôt mean yo...
4      @JeffBezos  I love reading tweets about how easy it must b...
5           @NASA  After traveling nearly 300 million miles, Pers...
6    @BarackObama  I keep reminding young people: post with purpo...
7       @elonmusk  Just for fun, I changed the Twitter logo to a ...
8      @neiltyson  Every time I see someone arguing that Earth is...
9       @ladygaga  Performing live again after two years feels un...


Tweet 1: Good thing I never tweet anything controversial‚Ä¶ e