# Using a package

The code below uses the python [ntscraper](https://github.com/bocchilorenzo/ntscraper) library, which provides a simple set of commands to scrape tweets and social media profiles from the Twitter back-end Nitter. (Note that nitter could go down at pretty much any moment, and even now this runs very slowly!)

In [2]:
import pandas as pd
from ntscraper import Nitter
import json
import os
import time
from IPython.display import clear_output

We start with a list of congressional X handles from [this github](https://github.com/unitedstates/congress-legislators)

In [3]:
handles = pd.read_json('https://unitedstates.github.io/congress-legislators/legislators-social-media.json')
socials = [i.get('twitter') for i in handles['social']]

Start by making a directory to hold the tweets data:

In [4]:
os.makedirs('congress_tweets', exist_ok=True)

Then we'll set up our scraper and initialize an "errors" counter tha we can use to break the loop if we keep encountering errors.

In [5]:
scraper = Nitter(log_level=1, skip_instance_check=False)
errors = 0 

Testing instances: 100%|█████████████████████████████████████████████████████████████████| 6/6 [00:04<00:00,  1.25it/s]


Finally, we'll run the loop:

In [None]:


for current_handle in socials:
    filename = "congress_tweets/" + current_handle +".json"
    if os.path.exists(filename):
        else: 
            print("profile already scraped")
            next
    try: 
        print("retrieving data for " + current_handle)
        # only scraping 20, but could be increased to get up to 800 or so
        member_tweets = scraper.get_tweets(current_handle, mode='user', instance='https://nitter.privacyredirect.com', number=20, max_retries=1)
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(member_tweets, f, ensure_ascii=False, indent=4)
        clear_output() 
    except:
        print("error attempting to reconnect")
        scraper = Nitter(log_level=1, skip_instance_check=False)
        errors = errors + 1
    finally: 
        if errors > 20:
            print("max errors exceeded, quitting")
            break
    time.sleep(1)
    


