# Twitter Sentiment Analysis Scraper
This notebook sets up a Twitter scraper using the `ntscraper` package to collect tweets related to the Palestine-Israel conflict for sentiment analysis purposes.

The `ntscraper` package is an unofficial Python library designed for scraping Twitter data using Nitter instances. Here's how it works:

1. **Initialization**: To begin using `ntscraper`, you first import the Nitter class from the package and create an instance of the scraper. You have the option to set the log level and whether to skip checking the Nitter instances during script execution.

2. **Scraping Tweets**: The primary function of `ntscraper` is to scrape tweets. You can specify the term or hashtag you want to search for, or scrape tweets from a specific user profile. The scraper allows customization of various parameters, such as the number of tweets, date range (`since` and `until`), location (`near`), language, filters to apply or exclude, and the maximum number of retries for scraping a page. The default instance used by the scraper is chosen randomly unless specified.

3. **Multiprocessing**: `ntscraper` supports multiprocessing, allowing multiple terms to be scraped simultaneously, each in a different process. However, it's important to only use this feature within a `if __name__ == "__main__"` block to avoid errors.

4. **Profile Information**: Besides tweets, the scraper can also fetch profile information of Twitter users, returning details like display name, username, number of tweets, and profile picture.

5. **Handling Nitter Instances**: `ntscraper` can use a random public Nitter instance or a specific one if provided. Due to recent changes on Twitter's side, some instances might not work as expected.

6. **Installation**: The package can be easily installed using pip (`pip install ntscraper`).

In [None]:

import pandas as pd

# Check if ntscraper is installed, if not, install it
try:
    from ntscraper import Nitter
except ImportError:
    !pip install ntscraper
    from ntscraper import Nitter
    

## Import Required Libraries
Here we import the necessary libraries for scraping and data manipulation. If `ntscraper` is not installed, it will be installed via `pip`.

In [None]:

scraper = Nitter()
    

## Initialize Scraper
We create an instance of the `Nitter` scraper.

In [None]:

terms = ["palestine", "ghaza"]
    

## Define Search Terms
We define the terms 'Palestine' and 'Ghaza' on 2-1 October 2023, namely 5 days before and 5 days after the Hamas operation. The location is also determined, namely in the area around 'US' (United States). to collect relevant tweets.

In [None]:

results = scraper.get_tweets(terms, mode='term', until='2023-10-11', since='2023-10-02',
                             language='en', near='USA', number=6000)
    

## Scrape Tweets
Execute the scraper to fetch tweets with the specified terms, within the given date range, language, and geographical proximity to the USA.

In [None]:

df_raw = pd.DataFrame(results[0]['tweets'])
    

## Dataframe Creation
Convert the raw results into a pandas DataFrame for easier manipulation and analysis.

In [None]:

# Separate user information into new columns
user_df = df_raw['user'].apply(pd.Series)
df = pd.concat([df_raw, user_df], axis=1)
    

## Process User Information
Extract user information into separate columns for detailed analysis.

In [None]:

# Separate 'stats' column into new columns
stats_df = df['stats'].apply(pd.Series)
df = pd.concat([df, stats_df], axis=1)

# Drop the original 'stats' and 'user' columns
df = df.drop(['stats', 'user'], axis=1)
    

## Process Tweet Statistics
Separate tweet statistics into individual columns and remove the original 'stats' column.

In [None]:

df.to_csv('twitter_raw.csv', index=False)
    