### Twitter scraper 

This notebook scrapes tweets by using the snscrape module. The parameters below work as follows: 
- **maxTweets**: Number of daily tweets you want to download (the goal was to scrape all tweets, but I limited the amount to 1M)
- **datelist**: List of dates when you want to query
- **query_text**: Text to search for (Ukraine, in this case)

The goal of this notebook was to scrape all tweets for 4 days in december (from 01/12/21 to 04/12/21) as a "control group" and 4 days in the period of the conflict (from 23/02/22 to 26/02/22) for comparison. 

In [None]:
from datetime import datetime, timedelta
import snscrape.modules.twitter as sntwitter
import pandas as pd 

In [None]:
maxTweets = 1000000
datelist = ['Dec 01 2021', 'Dec 02 2021', 'Dec 03 2021', 'Dec 04 2021',
             'Feb 23 2022', 'Feb 24 2022', 'Feb 25 2022', 'Feb 26 2022']

query_text = 'Ukraine'
for date in datelist:

    datetime_object = datetime.strptime(date, '%b %d %Y') 

    sd = str(int(datetime_object.timestamp())) 
    ed = str(int((datetime_object + timedelta(days=1)).timestamp())) 

    df_coords = pd.DataFrame((sntwitter.TwitterSearchScraper(
        f'{query_text} since:{sd} until:{ed}').get_items()))[['date', 'content','user', 'coordinates', 'place']]

    df_coords.to_csv(f'df_with_coors_Ukraine_{date}.csv', sep = '~')
    print(len(df_coords), date)

Notes on saving the files: 
- Since the scraping process is pretty slow and we do not want to restart in case of a memory error, I am saving a different dataframe for every day and re-writing on the same name (df_coords) so that we don't have to keep everything in memory
- Saving with tilde (~) separator to avoid any tweet splitting up because the user added a '\n' or a ';' in their tweet! I suggest to keep it like this and just read the csv by specifying the same separator.

More info at: 

https://github.com/igorbrigadir/twitter-advanced-search \
https://medium.com/swlh/how-to-scrape-tweets-by-location-in-python-using-snscrape-8c870fa6ec25