# Scrapping twitter without api

One of the hot topics in data science is social media analytics. People love these analyzes and interest them because everyone knows this world. Most of our time is spent on Twitter, Instagram, Facebook, and some other social media apps. The use of social media analysis is mostly used in the tasks of relationship analysis. With not only scraping twitter with python, but I will also do some relationship analysis based on our scrapped data.

In [1]:
!pip3 uninstall twint


Uninstalling twint-2.1.21:
  Would remove:
    /root/.local/bin/twint
    /root/.local/lib/python3.7/site-packages/twint-2.1.21.dist-info/*
    /root/.local/lib/python3.7/site-packages/twint/*
Proceed (y/n)? y
  Successfully uninstalled twint-2.1.21


In [2]:
!pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Collecting twint
  Cloning https://github.com/twintproject/twint.git (to revision origin/master) to /tmp/pip-install-blh7dfmf/twint
  Running command git clone -q https://github.com/twintproject/twint.git /tmp/pip-install-blh7dfmf/twint
  Running command git checkout -q origin/master
Building wheels for collected packages: twint
  Building wheel for twint (setup.py) ... [?25l[?25hdone
  Created wheel for twint: filename=twint-2.1.21-cp37-none-any.whl size=38872 sha256=d45d3b6611b5af5326f5be90c204cfa56eea1fb45454db6b7b61c8edfb612c0a
  Stored in directory: /tmp/pip-ephem-wheel-cache-y65ncv8j/wheels/4f/3b/75/62d04b3b446658ba85401e8868d3cd1d4bc22f17ad755460a6
Successfully built twint
Installing collected packages: twint
Successfully installed twint-2.1.21


In [4]:
!pip install nest_asyncio



In [3]:
import nest_asyncio
nest_asyncio.apply()

In [4]:
import twint
import pandas as pd
from collections import Counter

In [6]:
# setting up twint config
c = twint.Config()

# searching for keyword that i want to collect tweets for 
c.Search = 'cricket wtc final'
# collecting 100 tweets
c.Limit = 100
# collecting from when 
c.Since = '2021-06-10'
# want data in a datafram
c.Pandas = True
# searching
twint.run.Search(c)

1403633602257211392 2021-06-12 08:41:43 +0000 <newslivereport> WTC Final: Virender Sehwag ‚ÄúLooking Forward‚Äù To Trent Boult vs Rohit Sharma Contest | Cricket¬†News  https://t.co/lMnn8Kgygt
1403632514682421255 2021-06-12 08:37:24 +0000 <Imchenul> De Kock dedicates his ton to a friend who lost his finger in Afghanistan and for Rhino conservation. "One of my friends got his fingers shot off in Afghanistan, And I said I will salute to him" #cricketnl #Cricket #WTCFinal #WTCFinals #WTC #IPL2021 #Proteas #ENGvNZ #INDvNZ #PSL6  https://t.co/zjF5ssvR54
1403632306322030592 2021-06-12 08:36:34 +0000 <jyostna59883008> WTC Final: Virender Sehwag ‚ÄúLooking Forward‚Äù To Trent Boult vs Rohit Sharma Contest | Cricket¬†News  https://t.co/nJltUCwesd
1403631736253140996 2021-06-12 08:34:18 +0000 <realirfanjuneja> Practice on top gears‚Ä¶ #WTCFinal2021 #WTCFinal #WTC21 #INDvsNZ #ICCWTCFinal #Cricket @BCCI @ICC @BLACKCAPS @imVkohli @NotNossy @mohsinaliisb  https://t.co/D6EsdfbaRh
1403631601309818880 2

In [7]:
# returns all column name s
def column_names():
  return twint.output.panda.Tweets_df.columns

# convertting to dataframe
def twint_to_df(columns):
  return twint.output.panda.Tweets_df[columns]

In [8]:
column_names()

Index(['id', 'conversation_id', 'created_at', 'date', 'timezone', 'place',
       'tweet', 'language', 'hashtags', 'cashtags', 'user_id', 'user_id_str',
       'username', 'name', 'day', 'hour', 'link', 'urls', 'photos', 'video',
       'thumbnail', 'retweet', 'nlikes', 'nreplies', 'nretweets', 'quote_url',
       'search', 'near', 'geo', 'source', 'user_rt_id', 'user_rt',
       'retweet_id', 'reply_to', 'retweet_date', 'translate', 'trans_src',
       'trans_dest'],
      dtype='object')

In [14]:
# what i am interested in 
tweet_df = twint_to_df(['date', 'tweet', 'hashtags', 'username','nlikes', 'nreplies', 'nretweets'])

In [15]:
tweet_df.sample(5)

Unnamed: 0,date,tweet,hashtags,username,nlikes,nreplies,nretweets
57,2021-06-16 12:58:47,Ravindra Jadeja 46 runs away from achieving ra...,[wtcfinal],IndiaTVSports,15,0,1
27,2021-06-16 13:32:01,I hope Virat Kohli will play this kind of agre...,"[viratkohli, virat, wtcfinals, wtcfinal, wtcfi...",AbdullahNeaz,13,1,3
95,2021-06-16 12:23:17,"For all this worrying about the NZ pacers, we ...","[wtcfinal, cricket]",banglani,15,1,0
19,2021-06-16 13:44:05,Mohammed Shami has taken India's pace attack t...,"[wtcfinal, cricket, mohammedshami]",CricCrazyDebu,1,0,0
58,2021-06-16 12:57:08,Spinners Ravichandran Ashwin and Ravindra Jade...,[wtc],cricketpakcompk,1,0,1


# Sentiment analysis uaing Transformer

In [17]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB)
[K     |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2.3MB 11.6MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
[K     |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3.3MB 60.1MB/s 
Collecting huggingface-hub==0.0.8
  Downloading https://files.pythonhosted.org/packages/a1/88/7b1e45720ecf59c6c6737ff332f41c955963090a18e72acbcbeac6b25e86/huggingface_hub-0.0.8-py3-none-any.whl
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packag

In [18]:
tweet_list = tweet_df['tweet'].tolist()

# using hugging face, therfore no neutral class
from transformers import pipeline

sentiment_classifier = pipeline('sentiment-analysis')

resutls = sentiment_classifier(tweet_list)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=629.0, style=ProgressStyle(description_‚Ä¶




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267844284.0, style=ProgressStyle(descri‚Ä¶




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti‚Ä¶




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=48.0, style=ProgressStyle(description_w‚Ä¶




In [24]:
ind = 0

for result in resutls:
  print('====================================================================')
  print(tweet_list[ind] + '\n')
  print(f"label : {result['label']}, with score : {round(result['score'], 4)}")
  print()
  ind += 1

Some all-time greats with the mic will make sure we enjoy the whole broadcast of the #WTCFinal and not just bits and pieces üôÉ  #WTC21

label : POSITIVE, with score : 0.9997

Pic 1 : - WTC Final is just 2 days away but Jay Shah just sits in AC room and just signs some documents .  Pic 2 : - Meanwhile Omar abdullah discussing about cricket with former Indian cricketers.  "CHOOSE YOUR BCCI SECRETARY WISELY üòì"  https://t.co/fPzWUQv1yj

label : NEGATIVE, with score : 0.9792

@StarSportsIndia what is byju's cricket live start time for WTC Final?

label : NEGATIVE, with score : 0.9978

Let‚Äôs talk about the #WorldTestChampionship Final with Sports Journalists, Sana Ullah Khan and Nitin Bharadwaj, today at 9:30pm IST | 12pm EDT, on Bakstage.  . . . #Cricket #WTC #SportsCentral  https://t.co/eepHYfSEp5

label : NEGATIVE, with score : 0.9521

#INDvNZ #WTCFinal2021 #NZvIND 'Swing, Seam or Spin': @virendersehwag Backs #India Bowlers For to Outclass #NewZealand in @ICC #WTCFinal   #INDvsNZ #

In [None]:
``