# Twitter Scraping

This notebook is a simple example of how to scrape tweets from Twitter using the `twikit` library. 

#### Main information:
- No API key is required;
- Is Free to use;
- You just need a Twitter account to use it.

#### References
- https://blog.apify.com/how-to-scrape-tweets-and-more-on-twitter-59330e6fb522/
- https://github.com/d60/twikit?tab=readme-ov-file#no-api-key-required
- https://twikit.readthedocs.io/en/latest/index.html


### Log in to Twitter

In [1]:
from twikit import Client
import json
import pandas as pd

client = Client('en-US')

## You can comment this `login` part out after the first time you run the script (and you have the `cookies.json`` file)
client.login(auth_info_1='user', password='password')
client.save_cookies('cookies.json')
client.load_cookies(path='cookies.json')

### Searching and filtering tweets

In [2]:
num_tweets_to_fetch = 5

Search tweets of a specific user

In [3]:
user_name = 'BillGates'
user = client.get_user_by_screen_name(user_name)
tweets = user.get_tweets('Tweets', count=num_tweets_to_fetch)

Search tweets with a specific content

In [None]:
search_input = "Python x Java"
tweets = client.search_tweet(query=search_input,product='Top', count=num_tweets_to_fetch)

### Building tweet dataset

In [4]:
tweets_to_store = [];

def format_medias(medias) -> list:
    if not medias:
        return []
    urls = []
    for media in medias:
        if media['type'] == 'video' and 'video_info' in media:
            for variant in media['video_info']['variants']:
                urls.append(variant['url'])
        elif 'media_url_https' in media:
            urls.append(media['media_url_https'])
    return urls

for tweet in tweets:
    tweets_to_store.append({
        'id': tweet.id,
        'created_at': tweet.created_at,
        'user': tweet.user.screen_name,
        'favorite_count': tweet.favorite_count,
        'retweet_count': tweet.retweet_count,
        'urls': tweet.urls,
        'full_text': tweet.full_text,
        'hashtags': tweet.hashtags,
        'media': format_medias(tweet.media)
    })

### Creating DataFrame to export to JSON

In [5]:
df = pd.DataFrame(tweets_to_store)
df.to_json('tweets.json', orient='records')

### Sorting tweets values if necessary

In [None]:
# Pandas also allows us to sort or filter the data
print(df.sort_values(by='favorite_count', ascending=False))

# We can also print the data as a JSON object
print(json.dumps(tweets_to_store, indent=5))