## Grabbing Wordle games posted to Twitter

dt

5/2/22

This notebook uses the requests library to search through Twitter and retrieve messages with Wordle result posts. People often tack on commentary, so I don't necessarily want to exclude posts with that. I could try to guess which Wordle is being solved by the date, but I will try to limit to posts that begin with the Wordle # info to easily decode what the solution is.

[Tweet interpreter](#Tweet-interpreter)

# A game of Wordle

I'll play a [game of Wordle](https://www.nytimes.com/games/wordle/index.html) now to show the input and output

![wordle may 2, 2022](./pix/wordleex.png)

Not too bad if I may say so myself!

In [2]:
#example of raw share result
'''Wordle 317 3/6

⬜🟨⬜🟨⬜
🟨⬜🟨🟨🟩
🟩🟩🟩🟩🟩'''

'Wordle 317 3/6\n\n⬜🟨⬜🟨⬜\n🟨⬜🟨🟨🟩\n🟩🟩🟩🟩🟩'

The game needs to interpret any missing rows due to _expertly_ solving the game ahead of the 6-row limit. I like this setup. I could explicitly only search for posts containing exactly this format, but it's possible people will cut out the wordle title, number, number of rows, or tack on something at the end. I'd like to grab information out of the text at the top, but most important is converting the grid of boxes into a simple array of ints.

# Tweet interpreter

I need a function that takes in the raw tweet text, checks if it contains the kind of grid I am looking for, extracts that grid and any other relevant info like the date and time, any other twitter info, the wordle number, anything else really.

In [3]:
#the first search I saved using the function search_twitter defined in a section below
first_twitter_search_example = "search_twitter('🟩', 'tweet.fields=text', my_bearer_token)"

In [4]:
first_result_dict = {'data': [{'id': '1516628016776060931',
   'text': 'RT @bigbangnw: 🟩 Esto es empoderamiento\n\n🚨 Monjas de Salta denunciaron al arzobispo y a dos curas por Violencia de Género\n\nhttps://t.co/2MH…'},
  {'id': '1516628014905184257',
   'text': 'Wordle 305 3/6\n\n🟨🟨⬛⬛⬛\n🟩⬛🟨⬛🟩\n🟩🟩🟩🟩🟩'},
  {'id': '1516628011419725824',
   'text': '@IrishGirl2117 Wordle 305 3/6\n\n⬜⬜🟨🟨⬜\n🟨🟨⬜🟩⬜\n🟩🟩🟩🟩🟩\nDeep sigh….of satisfaction'},
  {'id': '1516628011067613189',
   'text': 'Poeltl 55 - 4/8 - 👤4\n\n⬛⬛🟩⬛⬛⬛🟨⬛\n⬛⬛⬛⬛⬛⬛⬛⬛\n⬛⬛🟩🟩🟨🟨🟩⬛\n🟩🟩🟩🟩🟩🟩🟩🟩'},
  {'id': '1516628009976967168',
   'text': 'THE BOYZ Heardle #14\n\n🔉🟩⬜️⬜️⬜️⬜️⬜️\n\nhttps://t.co/5C760y9GNl'},
  {'id': '1516628008773521415',
   'text': 'RT @matrix_movieJP: ＼📢ブルーレイ＆DVD発売記念／\n\n🟩その手で未来を選べ！🟩\n『マトレザ』フォロー&amp;RTキャンペーン\n\n今目の前に『#マトリックス』の真実がある…\n\n問おう。\nあなたはどちらの未来を選ぶ？\n\n💊フォロー\n💊RT\n💊抽選で🎁\n\n──答えはも…'},
  {'id': '1516628008408436738',
   'text': 'Twenty One Pilots Heardle #6\n\n🔊🟩⬜️⬜️⬜️⬜️⬜️\n\nBELOVED..................'},
  {'id': '1516628005090693127',
   'text': 'Wordle 305 2/6\n\n🟨🟩⬛🟩⬛\n🟩🟩🟩🟩🟩\n\nno one talk to me'},
  {'id': '1516628004935708673',
   'text': 'RT @matrix_movieJP: ＼📢ブルーレイ＆DVD発売記念／\n\n🟩その手で未来を選べ！🟩\n『マトレザ』フォロー&amp;RTキャンペーン\n\n今目の前に『#マトリックス』の真実がある…\n\n問おう。\nあなたはどちらの未来を選ぶ？\n\n💊フォロー\n💊RT\n💊抽選で🎁\n\n──答えはも…'},
  {'id': '1516628004780507136',
   'text': '#ことのはたんご 第89回  7/10\n \n⬜🟨⬜⬜🟨 608 \n🟨🟨🟨⬜⬜ 25 \n⬜⬜⬜🟩🟩 15 \n⬜🟩🟩🟩🟩 7 \n⬜🟩🟩🟩🟩 6 \n⚪🟢🟢🟢🟢 5 \n🟢🟢🟢🟢🟢 1'}],
 'meta': {'newest_id': '1516628016776060931',
  'oldest_id': '1516628004780507136',
  'result_count': 10,
  'next_token': 'b26v89c19zqg8o3fpytnjsqegfjyl5wp9hdb6vqs8cghp'}}

### interpret_row

In [9]:
# takes in a string of wordle color emojis and gives an array of ints with 0, 1, 2 as gray, yellow, green. Can also tell if
# the row ends with a \n or not, or convert squares or other elements to other entries.

# I'm going to be doing a LOT of these replacements, so I am going to try to use as pythony a way as I know how.
# there are functions like str.replace() and str.translate(), but the first seems too simple and would need a repeated
# series of passes through and rewriting of the string. This sounds costly. The second involves creating an 
# explicit mapper from an iterable (like a dict) first, and then passing that through the string's translate method.
# this also strikes me as costly for such a simple conversion. I'm just going to use a dict and list comp. This also has
# the benefit of setting up the list format I want.

default_square_converter = {
    '⬜':0,
    '⬛':0,
    '🟨':1,
    '🟩':2
}

def interpret_row(row_string, converter = default_square_converter):
    interpreted_row = [
        converter[char]
        for char in row_string
    ]
    return interpreted_row
        

# def interpret_grid(grid_string):
    

In [10]:
# test it out.
interpret_row('⬜🟨⬜⬜🟨')

[0, 1, 0, 0, 1]

In [11]:
interpret_row('🟨🟩⬛🟩⬛')

[1, 2, 0, 2, 0]

Looks good to me!

### Interpret 

# Twitter Search

The following function was copied from a [search tutorial](https://towardsdatascience.com/searching-for-tweets-with-python-f659144b225f) on Towards Data Science written by Martin Šiklar, accessed April 2022.

In [12]:
key_file_path = '../../APIKEYS/Twitter.txt'
bearer_token_file_path = '../../APIKEYS/twitter_bearer_token.txt'
with open(bearer_token_file_path, 'r') as bearer_token_file:
    my_bearer_token = bearer_token_file.read()

import requests
import json
#its bad practice to place your bearer token directly into the script (this is just done for illustration purposes)
BEARER_TOKEN = my_bearer_token
#define search twitter function
def search_twitter(query, tweet_fields, bearer_token = BEARER_TOKEN):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}

    url = "https://api.twitter.com/2/tweets/search/recent?query={}&{}".format(
        query, tweet_fields
    )
    response = requests.request("GET", url, headers=headers)

    print(response.status_code)

    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

### TWEET FIELDS
[parameters for tweet fields from Twitter developer docs](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent)

- attachments
- author_id
- context_annotations
- conversation_id
- created_at
- entities
- geo
- id
- in_reply_to_user_id
- lang
- non_public_metrics
- public_metrics
- organic_metrics,
- promoted_metrics
- possibly_sensitive
- referenced_tweets
- reply_settings
- source
- text
- withheld

Any of these parameters can be added as a string of the form `tweet.fields=text,author_id,created_at`, which simply gets sent as part of the http request. It may be worth reading through the page above to familiarize myself with each of these fields.

### Tweet caps
Unfortunately, there are a [number of limitations](https://developer.twitter.com/en/docs/twitter-api/tweet-caps) for what and how much can be searched for for each project. A full archive search does not appear to be possible, but there are surely many "recent" tweets to work with.

According to my [Twitter developer dashboard](https://developer.twitter.com/en/portal/dashboard), I've pulled 30 Tweets. That was in the `TweepyPractice.ipynb` where I failed to use Tweepy but was able to pull some tweets with http requests. This feels like a lot of data, and the first tweets I was able to grab were 30% of the format I wanted. Others all had the box I searched for but were for different games or possibly unrelated. At least 4 appear to be in foreign languages --- perhaps I can leverage the `lang` tweet field to stick just to english? or perhaps I can do something with the language data as well.

Perhaps most importantly here, the search is only done on tweets from the last _**seven  days**_. This means I will have a short collection of Wordle answers.


### regex search

I should look into whether I can use regex to search for tweets with any row of five of the squares. There are four that I want to search for: 🟩 (correct letter and placement), 🟨 (letter elsewhere in word), and ⬜ and ⬛ (letter not in word, light vs dark modes). Searching for only rows of 5 squares will greatly reduce the number of guff posts like some of those above.

A regex search would have logic like (🟩 or 🟨 or ⬜ or ⬛)x5, perhaps with the inclusion of an \n at the end, but this will exclude wins on first try.


[Unfortunately, it looks like Twitter API doesn't support regex searches](https://stackoverflow.com/questions/23363940/using-regular-expression-in-twitter-api#:~:text=Twitter%20unfortunately%20doesn't%20support,using%20regex%20(including%20me)) as of 2016. This is for Twitter API v1, but a quick google search seems to say the same for v2 as well.
[Twitter itself has a Ruby script to search Tweets with regex](https://github.com/twitter/twitter-text/blob/master/rb/lib/twitter-text/regex.rb), but I believe this is for already-gathered text.

In [13]:
first_result_dict

{'data': [{'id': '1516628016776060931',
   'text': 'RT @bigbangnw: 🟩 Esto es empoderamiento\n\n🚨 Monjas de Salta denunciaron al arzobispo y a dos curas por Violencia de Género\n\nhttps://t.co/2MH…'},
  {'id': '1516628014905184257',
   'text': 'Wordle 305 3/6\n\n🟨🟨⬛⬛⬛\n🟩⬛🟨⬛🟩\n🟩🟩🟩🟩🟩'},
  {'id': '1516628011419725824',
   'text': '@IrishGirl2117 Wordle 305 3/6\n\n⬜⬜🟨🟨⬜\n🟨🟨⬜🟩⬜\n🟩🟩🟩🟩🟩\nDeep sigh….of satisfaction'},
  {'id': '1516628011067613189',
   'text': 'Poeltl 55 - 4/8 - 👤4\n\n⬛⬛🟩⬛⬛⬛🟨⬛\n⬛⬛⬛⬛⬛⬛⬛⬛\n⬛⬛🟩🟩🟨🟨🟩⬛\n🟩🟩🟩🟩🟩🟩🟩🟩'},
  {'id': '1516628009976967168',
   'text': 'THE BOYZ Heardle #14\n\n🔉🟩⬜️⬜️⬜️⬜️⬜️\n\nhttps://t.co/5C760y9GNl'},
  {'id': '1516628008773521415',
   'text': 'RT @matrix_movieJP: ＼📢ブルーレイ＆DVD発売記念／\n\n🟩その手で未来を選べ！🟩\n『マトレザ』フォロー&amp;RTキャンペーン\n\n今目の前に『#マトリックス』の真実がある…\n\n問おう。\nあなたはどちらの未来を選ぶ？\n\n💊フォロー\n💊RT\n💊抽選で🎁\n\n──答えはも…'},
  {'id': '1516628008408436738',
   'text': 'Twenty One Pilots Heardle #6\n\n🔊🟩⬜️⬜️⬜️⬜️⬜️\n\nBELOVED..................'},
  {'id': '1516628005090693127',
   'tex

In [14]:
first_twitter_search_example

"search_twitter('🟩', 'tweet.fields=text', my_bearer_token)"

Since I can't search Twitter by regex, I will perform the search, and then use regex on the results to find what I need.

In [20]:
search_twitter('🟩', 'tweet.fields=text,geo,entities,lang,source', my_bearer_token)

200


{'data': [{'text': 'Wordle 317 5/6\n\n⬛⬛⬛🟨🟨\n🟩⬛⬛🟨⬛\n🟨⬛⬛⬛⬛\n🟩⬛🟩⬛⬛\n🟩🟩🟩🟩🟩',
   'source': 'Twitter for iPhone',
   'id': '1521337377083604993',
   'lang': 'en'},
  {'text': 'Wordle 318 5/6\n\n⬜⬜🟨🟨⬜\n🟨🟨⬜⬜🟩\n⬜🟩🟨⬜🟩\n🟩🟩⬜⬜🟩\n🟩🟩🟩🟩🟩',
   'source': 'Twitter for Android',
   'id': '1521337373086580738',
   'lang': 'en'},
  {'text': 'Little Mix Heardle #12\n\n🔊🟩⬜️⬜️⬜️⬜️⬜️',
   'source': 'Twitter for iPhone',
   'id': '1521337368762060800',
   'lang': 'en'},
  {'entities': {'hashtags': [{'start': 0, 'end': 8, 'tag': 'ことのはたんご'}]},
   'text': '#ことのはたんご 第102回  X/10\n \n⬜⬜🟩⬜⬜ 934 \n⬜🟩🟩⬜⬜ 384 \n🟩🟩🟩⬜🟩 17 \n🟩🟩🟩⬜🟩 16 \n🟩🟩🟩⬜🟩 15 \n🟢🟢🟢⚪🟢 14 \n🟢🟢🟢⚪🟢 13 \n🟢🟢🟢⚪🟢 12 \n🟢🟢🟢⚪🟢 11 \n🟢🟢🟢⚪🟢 10\n\nあたらーん。',
   'source': 'Twitter for iPhone',
   'id': '1521337360046501889',
   'lang': 'ja'},
  {'text': '@AndriaBitton Me too! \n\nWordle 317 3/6\n\n⬛⬛⬛⬛⬛\n⬛🟨🟨⬛🟩\n🟩🟩🟩🟩🟩',
   'source': 'Twitter for iPhone',
   'id': '1521337358213337088',
   'entities': {'mentions': [{'start': 0,
      'end': 13,
      'username': 'AndriaBitto

In [21]:
search_twitter('🟩 Wordle', 'tweet.fields=text,geo,entities,lang,source', my_bearer_token)

200


{'data': [{'text': 'Wordle 318 5/6\n\n⬛⬛⬛⬛⬛\n⬛🟨⬛🟨🟨\n⬛🟨🟨⬛🟨\n⬛🟩🟨⬛🟩\n🟩🟩🟩🟩🟩',
   'source': 'Twitter for iPhone',
   'id': '1521337903129202688',
   'lang': 'en'},
  {'text': 'Wordle 317 3/6\n\n🟩🟩⬛🟩⬛\n🟩🟩⬛🟩⬛\n🟩🟩🟩🟩🟩',
   'source': 'Twitter for iPhone',
   'id': '1521337895537283073',
   'lang': 'en'},
  {'text': 'Wordle 318 5/6\n\n⬜🟩🟨⬜🟩\n🟨🟩⬜⬜🟩\n⬜🟩🟩🟩🟩\n⬜🟩🟩🟩🟩\n🟩🟩🟩🟩🟩',
   'source': 'Twitter for iPhone',
   'id': '1521337893284761605',
   'lang': 'en'},
  {'text': 'Wordle Türkçe 318 3/6\n\n🟨⬜⬜🟩⬜\n🟨⬜🟨🟩⬜\n🟩🟩🟩🟩🟩\n\nhttps://t.co/9xejrWxtwf',
   'entities': {'urls': [{'start': 42,
      'end': 65,
      'url': 'https://t.co/9xejrWxtwf',
      'expanded_url': 'https://www.wordleturkce.com/',
      'display_url': 'wordleturkce.com',
      'status': 200,
      'title': 'Wordle Türkçe - Günlük Kelime Oyunu',
      'description': 'Wordle Türkçe Tr Oyna. Günün kelimesini bul. Wordle nasıl oynanır herkese göster.',
      'unwound_url': 'https://www.wordleturkce.com/'}]},
   'source': 'Twitter for iPhone',
  

Ok, this is good. I want a DataFrame that has columns...

| id (index) | text | geo | entities | lang | source |

What else?

In [None]:
attachments
author_id
context_annotations
conversation_id
created_at
entities
geo
id
in_reply_to_user_id
lang
non_public_metrics
public_metrics
possibly_sensitive
referenced_tweets
reply_settings
source
text
withheld

In [23]:
search_twitter('🟩 Wordle', 'tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld', my_bearer_token)

200


{'data': [{'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 318 4/6\n\n⬛⬛🟨🟨⬛\n⬛🟩⬛🟩🟩\n⬛🟩🟩🟩🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521351866545811458',
   'created_at': '2022-05-03T04:52:04.000Z',
   'author_id': '1080339364432044032',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521351866545811458',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle (ES)  #117 3/6\n\n⬜⬜⬜⬜🟩\n⬜⬜🟩⬜🟩\n🟩🟩🟩🟩🟩\n https://t.co/KW7ZOLuywJ https://t.co/rmpYwYxw9B',
   'entities': {'urls': [{'start

In [24]:
dat = Out[23]['data']

In [27]:
import pandas as pd

In [28]:
df = pd.DataFrame(dat)

In [30]:
df.head()

Unnamed: 0,public_metrics,lang,text,conversation_id,created_at,author_id,source,context_annotations,id,possibly_sensitive,reply_settings,entities,attachments,in_reply_to_user_id,referenced_tweets
0,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 4/6\n\n⬛⬛🟨🟨⬛\n⬛🟩⬛🟩🟩\n⬛🟩🟩🟩🟩\n🟩🟩🟩🟩🟩,1521351866545811458,2022-05-03T04:52:04.000Z,1080339364432044032,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351866545811458,False,everyone,,,,
1,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 3/6\n\n⬜⬜⬜⬜🟩\n⬜⬜🟩⬜🟩\n🟩🟩🟩🟩🟩\n...,1521351849911193600,2022-05-03T04:52:01.000Z,2933210357,Twitter for Android,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351849911193600,False,everyone,"{'urls': [{'start': 42, 'end': 65, 'url': 'htt...",{'media_keys': ['16_1521351843950972929']},,
2,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",ja,日本語のWordle（ローマ字）\n#111\n4/6\nhttps://t.co/wqPI...,1521351849206763520,2022-05-03T04:52:00.000Z,1417413144058556427,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351849206763520,False,everyone,"{'urls': [{'start': 26, 'end': 49, 'url': 'htt...",,,
3,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 5/6\n\n⬜⬜⬜⬜⬜\n🟨🟩⬜⬜🟩\n⬜🟩🟩⬜🟩\n...,1521351829464006662,2022-05-03T04:51:56.000Z,109343037,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351829464006662,False,everyone,"{'urls': [{'start': 54, 'end': 77, 'url': 'htt...",,,
4,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 5/6\n\n⬜⬜🟨⬜⬜\n⬜⬜🟨⬜⬜\n⬜🟩⬜⬜🟨\n⬜🟩🟨⬜🟨\n...,1521351825966120960,2022-05-03T04:51:55.000Z,25260021,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351825966120960,False,everyone,,,,


In [31]:
df.loc[0, 'public_metrics']

{'retweet_count': 0, 'reply_count': 0, 'like_count': 0, 'quote_count': 0}

In [42]:
# takes a DataFrame and replaces it with a new column for each key in a dict/json column
def expand_json_col(tweet_df, column, inplace = False):
    try:
        col_series = tweet_df[column]
        return col_series
    except KeyError:
        print(f'Key error! A column called "{column}" may not be in the DataFrame.')
    except TypeError:
        print(f'Type error! Use a pandas DataFrame with a dict / JSON column.')

In [45]:
expand_json_col(df,'public_metrics')[0]

{'retweet_count': 0, 'reply_count': 0, 'like_count': 0, 'quote_count': 0}

In [34]:
'''Wordle 318 4/6

🟨🟨🟩⬜⬜
🟨🟩🟩⬜🟩
⬜🟩🟩🟩🟩
🟩🟩🟩🟩🟩'''

'Wordle 318 4/6\n\n🟨🟨🟩⬜⬜\n🟨🟩🟩⬜🟩\n⬜🟩🟩🟩🟩\n🟩🟩🟩🟩🟩'

In [48]:
df['lang'].value_counts()

en     7
ja     2
und    1
Name: lang, dtype: int64

### lang
The language is automatically determined by Twitter. My guess is there is a default to English based on more information than just the text --- the only word in some of the English-labeled tweets is "Wordle." According to the [Search Tweets](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent) page, `lang` is returned as a "BCP47 language tag."

> Phillips, A., Ed., and M. Davis, Ed., "Matching of Language Tags", BCP 47, RFC 4647, September 2006.

> Phillips, A., Ed., and M. Davis, Ed., "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009.

> https://www.rfc-editor.org/info/bcp47

My guess is that Twitter adheres to this strictly. For now I expect the values to all come from a common standard list. The pdf doc in the link above doesn't seem to have a list for corresponding languages. They might be Twitter-specific, or listed in a matching (RFC?) doc.

In [49]:
first_result_dict

{'data': [{'id': '1516628016776060931',
   'text': 'RT @bigbangnw: 🟩 Esto es empoderamiento\n\n🚨 Monjas de Salta denunciaron al arzobispo y a dos curas por Violencia de Género\n\nhttps://t.co/2MH…'},
  {'id': '1516628014905184257',
   'text': 'Wordle 305 3/6\n\n🟨🟨⬛⬛⬛\n🟩⬛🟨⬛🟩\n🟩🟩🟩🟩🟩'},
  {'id': '1516628011419725824',
   'text': '@IrishGirl2117 Wordle 305 3/6\n\n⬜⬜🟨🟨⬜\n🟨🟨⬜🟩⬜\n🟩🟩🟩🟩🟩\nDeep sigh….of satisfaction'},
  {'id': '1516628011067613189',
   'text': 'Poeltl 55 - 4/8 - 👤4\n\n⬛⬛🟩⬛⬛⬛🟨⬛\n⬛⬛⬛⬛⬛⬛⬛⬛\n⬛⬛🟩🟩🟨🟨🟩⬛\n🟩🟩🟩🟩🟩🟩🟩🟩'},
  {'id': '1516628009976967168',
   'text': 'THE BOYZ Heardle #14\n\n🔉🟩⬜️⬜️⬜️⬜️⬜️\n\nhttps://t.co/5C760y9GNl'},
  {'id': '1516628008773521415',
   'text': 'RT @matrix_movieJP: ＼📢ブルーレイ＆DVD発売記念／\n\n🟩その手で未来を選べ！🟩\n『マトレザ』フォロー&amp;RTキャンペーン\n\n今目の前に『#マトリックス』の真実がある…\n\n問おう。\nあなたはどちらの未来を選ぶ？\n\n💊フォロー\n💊RT\n💊抽選で🎁\n\n──答えはも…'},
  {'id': '1516628008408436738',
   'text': 'Twenty One Pilots Heardle #6\n\n🔊🟩⬜️⬜️⬜️⬜️⬜️\n\nBELOVED..................'},
  {'id': '1516628005090693127',
   'tex

In [50]:
type(first_result_dict)

dict

In [52]:
df.dtypes

public_metrics         object
lang                   object
text                   object
conversation_id        object
created_at             object
author_id              object
source                 object
context_annotations    object
id                     object
possibly_sensitive       bool
reply_settings         object
entities               object
attachments            object
in_reply_to_user_id    object
referenced_tweets      object
dtype: object

In [59]:
pd.DataFrame.from_dict(first_result_dict['data'], orient = 'index')

AttributeError: 'list' object has no attribute 'values'

In [64]:
pd.DataFrame(first_result_dict['data'])

Unnamed: 0,id,text
0,1516628016776060931,RT @bigbangnw: 🟩 Esto es empoderamiento\n\n🚨 M...
1,1516628014905184257,Wordle 305 3/6\n\n🟨🟨⬛⬛⬛\n🟩⬛🟨⬛🟩\n🟩🟩🟩🟩🟩
2,1516628011419725824,@IrishGirl2117 Wordle 305 3/6\n\n⬜⬜🟨🟨⬜\n🟨🟨⬜🟩⬜\...
3,1516628011067613189,Poeltl 55 - 4/8 - 👤4\n\n⬛⬛🟩⬛⬛⬛🟨⬛\n⬛⬛⬛⬛⬛⬛⬛⬛\n⬛⬛...
4,1516628009976967168,THE BOYZ Heardle #14\n\n🔉🟩⬜️⬜️⬜️⬜️⬜️\n\nhttps:...
5,1516628008773521415,RT @matrix_movieJP: ＼📢ブルーレイ＆DVD発売記念／\n\n🟩その手で未...
6,1516628008408436738,Twenty One Pilots Heardle #6\n\n🔊🟩⬜️⬜️⬜️⬜️⬜️\n...
7,1516628005090693127,Wordle 305 2/6\n\n🟨🟩⬛🟩⬛\n🟩🟩🟩🟩🟩\n\nno one talk ...
8,1516628004935708673,RT @matrix_movieJP: ＼📢ブルーレイ＆DVD発売記念／\n\n🟩その手で未...
9,1516628004780507136,#ことのはたんご 第89回 7/10\n \n⬜🟨⬜⬜🟨 608 \n🟨🟨🟨⬜⬜ 25 \...


### Tweet rate for this short range of time
These ten tweets are from May 3.

In [77]:
last = df['created_at'].max()
last

'2022-05-03T04:52:04.000Z'

In [78]:
first = df['created_at'].min()
first

'2022-05-03T04:51:49.000Z'

In [80]:
pd.to_datetime(last)-pd.to_datetime(first)

Timedelta('0 days 00:00:15')

This is about 10 tweets in 15 seconds. I'm sure the rate changes over the course of a day.

In [90]:
864000/15

57600.0

That's about 58k tweets a day if this rate is typical. This might just be able to grab all the results from the last week, given the 7-day and 500k tweet limits. How exciting.

# Gathering data

~~The Twitter search only retrieves 100 maximum tweets at a time. This will require some finesse to repeatedly sample evenly through time. If I want the ones immediately before or after a collected ranage, I can start by using~~

The api call supports pagination, which allows for repeated pages to be gathered using the next_token parameter. Update this value with the returned token in the meta tab.


- Create a DataFrame to fill with tweets
    - index by tweet id. These are unique.
- Repeatedly get tweets and tack them onto the end of the DataFrame.
    - evenly sample throughout the last seven days?
        - I think I will just search backwards until I get a null.
    - I can pull <500k total. How many does that mean per...
        - day: 71429
        - hour: 2976
        - minute: 50
    - Useful params:
        - '&max_results=100'
        - '&end_time="YYYY-MM-DDTHH:mm:ssZ"'
        - '&since_id=43252345792853213' (tweet id)
        - '&next_token=b26v89c19zqg8o3fobd8v73egzbdt3qao235oql' 
            - use to grab the next page of results

### Searching by time, number of tweets, getting user data...

### User fields

- created_at 
- description
- entities
- id
- location
- name
- pinned_tweet_id
- profile_image_url
- protected
- public_metrics 
- url 
- username
- verified
- withheld

In [82]:
df.created_at[0]

'2022-05-03T04:52:04.000Z'

In [83]:
search_twitter('🟩 Wordle', 
                'start_time=2022-05-01T04:52:04.000Z&user.fields=description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld&tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld',
               my_bearer_token)


200


{'data': [{'source': 'Twitter for Android',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:24.000Z',
   'lang': 'en',
   'id': '1521615433845063680',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'Wordle 318 4/6\n\n⬜🟨⬜⬜⬜\n🟨🟩⬜⬜🟩\n⬜🟩🟨⬜🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521615433845063680',
   'author_id': '2582877667',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'Twitter for Android',
   'entities': {'urls': [{'start': 51,
      'end': 74,
      'url': 'https://t.co/clmTyJITbL',
      'expanded_url': 'http://wordle.gelozp.com',
      'display_url': 'wordle.gelozp.com',
      'images': [{'url': 'https://pbs.twimg.com/ne

In [84]:
sec_response = {'data': [{'source': 'Twitter for Android',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:24.000Z',
   'lang': 'en',
   'id': '1521615433845063680',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'Wordle 318 4/6\n\n⬜🟨⬜⬜⬜\n🟨🟩⬜⬜🟩\n⬜🟩🟨⬜🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521615433845063680',
   'author_id': '2582877667',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'Twitter for Android',
   'entities': {'urls': [{'start': 51,
      'end': 74,
      'url': 'https://t.co/clmTyJITbL',
      'expanded_url': 'http://wordle.gelozp.com',
      'display_url': 'wordle.gelozp.com',
      'images': [{'url': 'https://pbs.twimg.com/news_img/1520479054184468480/yryaukBy?format=png&name=orig',
        'width': 400,
        'height': 400},
       {'url': 'https://pbs.twimg.com/news_img/1520479054184468480/yryaukBy?format=png&name=150x150',
        'width': 150,
        'height': 150}],
      'status': 200,
      'title': 'WordleCAT - Un mot diari',
      'description': 'Endevina el mot amagat en 6 intents. Cada dia hi ha un nou mot per endevinar.',
      'unwound_url': 'https://wordle.gelozp.com/'}],
    'hashtags': [{'start': 0, 'end': 10, 'tag': 'WordleCAT'}]},
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:22.000Z',
   'lang': 'und',
   'id': '1521615426949718017',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': '#WordleCAT 124 5/6\n\n🟨⬜⬜⬜⬜\n⬜🟨🟨⬜🟩\n⬜🟩🟩⬜🟩\n🟩🟩🟩⬜🟩\n🟩🟩🟩🟩🟩\n\nhttps://t.co/clmTyJITbL',
   'conversation_id': '1521615426949718017',
   'author_id': '2539425801'},
  {'source': 'Twitter Web App',
   'entities': {'urls': [{'start': 24,
      'end': 47,
      'url': 'https://t.co/M1MsQYlofi',
      'expanded_url': 'http://k-wordle.com',
      'display_url': 'k-wordle.com',
      'status': 200,
      'title': 'Korean Wordle - 한글 워들',
      'description': 'Korean Wordle clone',
      'unwound_url': 'https://nakosung.github.io/wordle/'}],
    'hashtags': [{'start': 0, 'end': 3, 'tag': '한글'},
     {'start': 4, 'end': 7, 'tag': '워들'},
     {'start': 8, 'end': 15, 'tag': 'Korean'},
     {'start': 16, 'end': 23, 'tag': 'Wordle'}]},
   'in_reply_to_user_id': '786905959',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:21.000Z',
   'lang': 'und',
   'id': '1521615421039955968',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': '#한글 #워들 #Korean #Wordle https://t.co/M1MsQYlofi 196 5/6\n\n⬜⬜⬜🟩🟨\n⬜⬜🟩🟩🟩\n🟩⬜🟩🟩🟩\n🟩⬜🟩🟩🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1515799682433359878',
   'referenced_tweets': [{'type': 'replied_to', 'id': '1521615310847201281'}],
   'author_id': '786905959',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'TweetDeck',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:19.000Z',
   'lang': 'en',
   'id': '1521615414966427648',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'Wordle 319 3/6\n\n⬜🟨⬜🟩⬜\n⬜⬜🟩🟩🟨\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521615414966427648',
   'author_id': '928594029637386240',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'Twitter Web App',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:19.000Z',
   'lang': 'en',
   'id': '1521615414542880774',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'And he does it again!!\n\nWordle 318 3/6\n\n⬜🟩⬜⬜⬜\n⬜🟩⬜⬜🟨\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521615414542880774',
   'author_id': '992029237606600707',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'Twitter Web App',
   'entities': {'urls': [{'start': 34,
      'end': 57,
      'url': 'https://t.co/LjxErHthR0',
      'expanded_url': 'http://wordle.gelozp.com',
      'display_url': 'wordle.gelozp.com',
      'images': [{'url': 'https://pbs.twimg.com/news_img/1520479054184468480/yryaukBy?format=png&name=orig',
        'width': 400,
        'height': 400},
       {'url': 'https://pbs.twimg.com/news_img/1520479054184468480/yryaukBy?format=png&name=150x150',
        'width': 150,
        'height': 150}],
      'status': 200,
      'title': 'WordleCAT - Un mot diari',
      'description': 'Endevina el mot amagat en 6 intents. Cada dia hi ha un nou mot per endevinar.',
      'unwound_url': 'https://wordle.gelozp.com/'}],
    'hashtags': [{'start': 0, 'end': 10, 'tag': 'WordleCAT'}]},
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:19.000Z',
   'lang': 'und',
   'id': '1521615412252790784',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': '#WordleCAT 124 2/6*\n\n⬜🟩🟩⬜🟩\n🟩🟩🟩🟩🟩\n\nhttps://t.co/LjxErHthR0',
   'conversation_id': '1521615412252790784',
   'author_id': '920703379713781760'},
  {'source': 'Twitter for iPhone',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:18.000Z',
   'lang': 'en',
   'id': '1521615407504928771',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'Wordle 318 4/6*\n\n⬛⬛🟨🟩⬛\n🟨⬛⬛🟩⬛\n⬛🟨🟨🟩⬛\n🟩🟩🟩🟩🟩\n\nGot emmmmm',
   'conversation_id': '1521615407504928771',
   'author_id': '749985910830338048',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'Twitter for Android',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:17.000Z',
   'lang': 'en',
   'id': '1521615403562188800',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'Wordle 318 6/6\n\n⬜⬜🟨⬜⬜\n⬜🟩🟨⬜⬜\n⬜🟩⬜🟨🟨\n⬜🟩🟩🟩🟩\n⬜🟩🟩🟩🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521615403562188800',
   'author_id': '1003673538388267009',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]},
  {'source': 'Twitter Web App',
   'entities': {'urls': [{'start': 42,
      'end': 65,
      'url': 'https://t.co/vH2gLXRzXA',
      'expanded_url': 'https://wordle.mega-yadoran.jp/',
      'display_url': 'wordle.mega-yadoran.jp',
      'images': [{'url': 'https://pbs.twimg.com/news_img/1520627181159784450/oEzApr9k?format=png&name=orig',
        'width': 200,
        'height': 200},
       {'url': 'https://pbs.twimg.com/news_img/1520627181159784450/oEzApr9k?format=png&name=150x150',
        'width': 150,
        'height': 150}],
      'status': 200,
      'title': 'ポケモンWordle',
      'description': 'ポケモンWordle (ポケモンワードル)は単語当てゲーム「Wordle」のポケモン版。１０回のチャレンジでポケモンの名前を当てよう！',
      'unwound_url': 'https://wordle.mega-yadoran.jp/'}],
    'hashtags': [{'start': 67, 'end': 78, 'tag': 'ポケモンWordle'}]},
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:15.000Z',
   'lang': 'ja',
   'id': '1521615398659330048',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'ポケモンWordle 4/10\n\n⬛⬛⬛⬛🟨\n⬛🟨⬛🟨⬛\n⬛⬛🟨⬛🟨\n🟩🟩🟩🟩🟩\n\nhttps://t.co/vH2gLXRzXA\n #ポケモンWordle',
   'conversation_id': '1521615398659330048',
   'author_id': '1624485012',
   'context_annotations': [{'domain': {'id': '45',
      'name': 'Brand Vertical',
      'description': 'Top level entities that describe a Brands industry'},
     'entity': {'id': '781974597310615553', 'name': 'Entertainment'}},
    {'domain': {'id': '46',
      'name': 'Brand Category',
      'description': 'Categories within Brand Verticals that narrow down the scope of Brands'},
     'entity': {'id': '781974597218340864', 'name': 'Video Games'}},
    {'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}},
    {'domain': {'id': '130',
      'name': 'Multimedia Franchise',
      'description': "Franchises which span multiple forms of media like 'Harry Potter'"},
     'entity': {'id': '10045599546',
      'name': 'Pokémon',
      'description': 'This entity includes all conversation about the franchise, as well as any individual installments in the series, if applicable. **NOTE: Annotations redirected to domain 130 ONLY on 8/7/18.'}}]},
  {'source': 'Twitter for iPhone',
   'reply_settings': 'everyone',
   'created_at': '2022-05-03T22:19:13.000Z',
   'lang': 'ja',
   'id': '1521615390182612992',
   'possibly_sensitive': False,
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': 'ぐれいと\nWordle 319 5/6\n\n⬛⬛🟨⬛⬛\n🟩🟩⬛🟨⬛\n🟩🟩🟩⬛⬛\n🟩🟩🟩⬛⬛\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521615390182612992',
   'author_id': '23082176',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}]}],
 'meta': {'newest_id': '1521615433845063680',
  'oldest_id': '1521615390182612992',
  'result_count': 10,
  'next_token': 'b26v89c19zqg8o3fpywl7wpl0esoe8b2aqdwhj7k2xmkd'}}

In [92]:
sec_response['data'][1]

{'source': 'Twitter for Android',
 'entities': {'urls': [{'start': 51,
    'end': 74,
    'url': 'https://t.co/clmTyJITbL',
    'expanded_url': 'http://wordle.gelozp.com',
    'display_url': 'wordle.gelozp.com',
    'images': [{'url': 'https://pbs.twimg.com/news_img/1520479054184468480/yryaukBy?format=png&name=orig',
      'width': 400,
      'height': 400},
     {'url': 'https://pbs.twimg.com/news_img/1520479054184468480/yryaukBy?format=png&name=150x150',
      'width': 150,
      'height': 150}],
    'status': 200,
    'title': 'WordleCAT - Un mot diari',
    'description': 'Endevina el mot amagat en 6 intents. Cada dia hi ha un nou mot per endevinar.',
    'unwound_url': 'https://wordle.gelozp.com/'}],
  'hashtags': [{'start': 0, 'end': 10, 'tag': 'WordleCAT'}]},
 'reply_settings': 'everyone',
 'created_at': '2022-05-03T22:19:22.000Z',
 'lang': 'und',
 'id': '1521615426949718017',
 'possibly_sensitive': False,
 'public_metrics': {'retweet_count': 0,
  'reply_count': 0,
  'like_count

In [93]:
search_twitter('🟩 Wordle', 
                'start_time=2022-05-01T04:52:04.000Z&user.fields=description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld&tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld&expansions=author_id',
               my_bearer_token)

200


{'data': [{'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 121 3/6\n\n⬛🟨⬛⬛🟩\n🟨🟨⬛⬛🟩\n🟩🟩🟩🟩🟩\nHappy Wednesday 🍀',
   'conversation_id': '1521619805735903232',
   'created_at': '2022-05-03T22:36:46.000Z',
   'author_id': '1475139411448844288',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619805735903232',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 319 5/6\n\n⬛⬛🟨⬛⬛\n⬛⬛🟨⬛⬛\n⬛⬛🟩⬛⬛\n⬛⬛🟨⬛🟨\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619803076632577',
   'created_at': '

In [94]:
third_result = {'data': [{'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 121 3/6\n\n⬛🟨⬛⬛🟩\n🟨🟨⬛⬛🟩\n🟩🟩🟩🟩🟩\nHappy Wednesday 🍀',
   'conversation_id': '1521619805735903232',
   'created_at': '2022-05-03T22:36:46.000Z',
   'author_id': '1475139411448844288',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619805735903232',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 319 5/6\n\n⬛⬛🟨⬛⬛\n⬛⬛🟨⬛⬛\n⬛⬛🟩⬛⬛\n⬛⬛🟨⬛🟨\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619803076632577',
   'created_at': '2022-05-03T22:36:46.000Z',
   'author_id': '119910827',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619803076632577',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'no',
   'text': 'Wordle 319 2/6\n\n⬜🟨🟩🟨⬜\n🟩🟩🟩🟩🟩\n\nMahjong Handle 91 5/6\nhttps://t.co/Lc6jLs25J0\n\n⬜⬜⬜⬜⬜⬜⬜⬜🟨⬜🟨⬜⬜⬜\n⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟨🟨🟩\n🟩⬜⬜🟩⬜🟨⬜🟨⬜⬜⬜⬜🟩🟨\n🟩🟩🟩🟩🟩🟩🟩🟩🟩🟨🟩🟩🟩🟨\n🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩',
   'entities': {'urls': [{'start': 51,
      'end': 74,
      'url': 'https://t.co/Lc6jLs25J0',
      'expanded_url': 'https://mahjong-handle.update.sh/',
      'display_url': 'mahjong-handle.update.sh',
      'status': 200,
      'title': 'Mahjong Handle',
      'description': 'Mahjong Handle',
      'unwound_url': 'https://mahjong-handle.update.sh/'}]},
   'conversation_id': '1521619773825908736',
   'created_at': '2022-05-03T22:36:39.000Z',
   'author_id': '3230367980',
   'source': 'Twitter for Android',
   'context_annotations': [{'domain': {'id': '66',
      'name': 'Interests and Hobbies Category',
      'description': 'A grouping of interests and hobbies entities, like Novelty Food or Destinations'},
     'entity': {'id': '872578743771963392',
      'name': 'Tabletop gaming',
      'description': 'Board Games'}},
    {'domain': {'id': '67',
      'name': 'Interests and Hobbies',
      'description': 'Interests, opinions, and behaviors of individuals, groups, or cultures; like Speciality Cooking or Theme Parks'},
     'entity': {'id': '1146508557044469760', 'name': 'Mahjong'}},
    {'domain': {'id': '67',
      'name': 'Interests and Hobbies',
      'description': 'Interests, opinions, and behaviors of individuals, groups, or cultures; like Speciality Cooking or Theme Parks'},
     'entity': {'id': '1175097393290735616',
      'name': 'Traditional games',
      'description': 'Traditional tabletop games like poker, chess, and mahjong.'}},
    {'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619773825908736',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 319 2/6\n\n⬜🟩⬜🟨⬜\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619770868736001',
   'created_at': '2022-05-03T22:36:38.000Z',
   'author_id': '139377369',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619770868736001',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Regulinchi...\nWordle (ES)  #117 6/6\n\n⬜⬜🟨⬜🟩\n🟨⬜🟨⬜🟩\n⬜⬜⬜🟩🟩\n⬜⬜🟨🟩🟩\n⬜🟩🟩🟩🟩\n🟩🟩🟩🟩🟩\n https://t.co/03T9R48vbt',
   'entities': {'urls': [{'start': 74,
      'end': 97,
      'url': 'https://t.co/03T9R48vbt',
      'expanded_url': 'https://wordle.danielfrg.com/',
      'display_url': 'wordle.danielfrg.com',
      'images': [{'url': 'https://pbs.twimg.com/news_img/1519133738936188928/9bYmUl75?format=jpg&name=orig',
        'width': 1200,
        'height': 630},
       {'url': 'https://pbs.twimg.com/news_img/1519133738936188928/9bYmUl75?format=jpg&name=150x150',
        'width': 150,
        'height': 150}],
      'status': 200,
      'title': 'Un juego de palabras diario',
      'description': 'Adivina la palabra oculta en 6 intentos. Un nuevo puzzle cada día.',
      'unwound_url': 'https://wordle.danielfrg.com/'}]},
   'conversation_id': '1521619752505864192',
   'created_at': '2022-05-03T22:36:33.000Z',
   'author_id': '2422157442',
   'source': 'Twitter for Android',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619752505864192',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 318 4/6\n\n🟨⬛🟩⬛⬛\n⬛⬛🟨⬛🟩\n🟨🟩🟩⬛🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619740111867905',
   'created_at': '2022-05-03T22:36:31.000Z',
   'author_id': '336226200',
   'source': 'Twitter for Android',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619740111867905',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 319 4/6\n\n⬛⬛🟩⬛⬛\n🟨⬛⬛🟨⬛\n⬛🟨⬛🟨⬛\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619739847524352',
   'created_at': '2022-05-03T22:36:30.000Z',
   'author_id': '1320944011914309633',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619739847524352',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 319 3/6\n\n⬜🟩⬜⬜⬜\n⬜🟩🟩🟩🟩\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619735057559553',
   'created_at': '2022-05-03T22:36:29.000Z',
   'author_id': '269615980',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619735057559553',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'lang': 'en',
   'text': 'Wordle 319 2/6\n\n⬛🟩⬛🟨⬛\n🟩🟩🟩🟩🟩',
   'conversation_id': '1521619730578116609',
   'created_at': '2022-05-03T22:36:28.000Z',
   'author_id': '2825619056',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619730578116609',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'},
  {'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'in_reply_to_user_id': '41868633',
   'lang': 'en',
   'text': 'Wordle 318 4/6\n\n⬛⬛🟩⬛⬛\n⬛⬛🟩⬛⬛\n⬛🟨🟩⬛⬛\n🟩🟩🟩🟩🟩',
   'conversation_id': '1485658173331714056',
   'referenced_tweets': [{'type': 'replied_to', 'id': '1521120671115395072'}],
   'created_at': '2022-05-03T22:36:22.000Z',
   'author_id': '41868633',
   'source': 'Twitter for iPhone',
   'context_annotations': [{'domain': {'id': '30',
      'name': 'Entities [Entity Service]',
      'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
     'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
   'id': '1521619705345318913',
   'possibly_sensitive': False,
   'reply_settings': 'everyone'}],
 'includes': {'users': [{'verified': False,
    'entities': {'description': {'hashtags': [{'start': 47,
        'end': 60,
        'tag': 'slavarussian'}]}},
    'description': '19 🇺🇸🇮🇹 Lombardia ••ALWAYS GO FORWARD•• him/he #slavarussian🇷🇺',
    'username': 'andreeee002',
    'url': '',
    'protected': False,
    'name': 'andrea',
    'id': '1475139411448844288',
    'public_metrics': {'followers_count': 28,
     'following_count': 179,
     'tweet_count': 1123,
     'listed_count': 0},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1492634721498583041/593wibnK_normal.jpg'},
   {'location': '横浜',
    'verified': False,
    'entities': {'url': {'urls': [{'start': 0,
        'end': 23,
        'url': 'https://t.co/95OYdZykYl',
        'expanded_url': 'http://twilog.org/xapy_ya',
        'display_url': 'twilog.org/xapy_ya'}]},
     'description': {'urls': [{'start': 26,
        'end': 49,
        'url': 'https://t.co/aGfLZUS6Dy',
        'expanded_url': 'http://www.facebook.com/haruhiko.yamaguti',
        'display_url': 'facebook.com/haruhiko.yamag…'}]}},
    'description': '山本恭司/本城未紗子/井上喜久子 ライブ会場出没中 https://t.co/aGfLZUS6Dy',
    'username': 'xapy_ya',
    'url': 'https://t.co/95OYdZykYl',
    'protected': False,
    'name': 'はるや\u3000ライブ報告は2週ディレイ',
    'id': '119910827',
    'public_metrics': {'followers_count': 180,
     'following_count': 123,
     'tweet_count': 7206,
     'listed_count': 0},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/469077065526820865/jVPfMhjJ_normal.jpeg'},
   {'location': '音ゲー/TRPG/麻雀/ラノベ',
    'verified': False,
    'entities': {'url': {'urls': [{'start': 0,
        'end': 23,
        'url': 'https://t.co/OaI45QSx3T',
        'expanded_url': 'http://twpf.jp/lieselotte_nimi',
        'display_url': 'twpf.jp/lieselotte_nimi'}]},
     'description': {'mentions': [{'start': 16,
        'end': 28,
        'username': 'moon_ganbar'}]}},
    'description': 'ほよ 逃げるのが得意 アイコン:@moon_ganbar',
    'username': 'lieselotte_nimi',
    'url': 'https://t.co/OaI45QSx3T',
    'protected': False,
    'pinned_tweet_id': '1521059595544051723',
    'name': 'なゃん',
    'id': '3230367980',
    'public_metrics': {'followers_count': 1219,
     'following_count': 2101,
     'tweet_count': 151926,
     'listed_count': 25},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1490173532630036482/7i0xmIw__normal.jpg'},
   {'location': 'Eastern Cape',
    'verified': False,
    'description': 'I will not win immediately but I will certainly win.',
    'username': 'qhami_m',
    'url': '',
    'protected': False,
    'pinned_tweet_id': '1305046123049103360',
    'name': '𝒬𝒽𝒶𝓂𝒶𝓃𝒾',
    'id': '139377369',
    'public_metrics': {'followers_count': 406,
     'following_count': 369,
     'tweet_count': 7190,
     'listed_count': 2},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1462870489005711370/LpU--Adu_normal.jpg'},
   {'location': 'La Boca, C.A.B.A.',
    'verified': False,
    'entities': {'description': {'hashtags': [{'start': 68,
        'end': 74,
        'tag': 'madre'}]}},
    'description': 'Cocinera 🔪- Artivista 💚\n- Feminista en formación 💜 \n- Hija de Tita \n#madre',
    'username': 'ivovitry',
    'url': '',
    'protected': False,
    'pinned_tweet_id': '1502291207027843073',
    'name': 'La Viuda de ʇɹoɯǝploʌ 🖤',
    'id': '2422157442',
    'public_metrics': {'followers_count': 137,
     'following_count': 131,
     'tweet_count': 6300,
     'listed_count': 0},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1311327269978337281/8HzSetPv_normal.jpg'},
   {'location': 'Chicago, IL',
    'verified': False,
    'entities': {'url': {'urls': [{'start': 0,
        'end': 23,
        'url': 'https://t.co/AKaYTwJ4pL',
        'expanded_url': 'http://instagram.com/ginaaxmariee',
        'display_url': 'instagram.com/ginaaxmariee'}]},
     'description': {'hashtags': [{'start': 95, 'end': 99, 'tag': 'BLM'}]}},
    'description': "26 🇵🇷🏳️\u200d🌈 She/They | Bloom '18 JTHS '14 |\nTikTok: Sailor-Phoenix95 Twitch: Sailor_Phoenix95 🎮  #BLM",
    'username': 'ginaaxmariee',
    'url': 'https://t.co/AKaYTwJ4pL',
    'protected': False,
    'name': 'gina ❄',
    'id': '336226200',
    'public_metrics': {'followers_count': 1783,
     'following_count': 1769,
     'tweet_count': 10947,
     'listed_count': 9},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1480082918139736067/uDGrX-N-_normal.jpg'},
   {'verified': False,
    'description': 'APEX、DBD、スプラなどやってます へたっぴですがよろしくです！のんびり楽しみたい',
    'username': 'kar_apex_dbd',
    'url': '',
    'protected': False,
    'pinned_tweet_id': '1324400752752619520',
    'name': 'かー',
    'id': '1320944011914309633',
    'public_metrics': {'followers_count': 16,
     'following_count': 20,
     'tweet_count': 436,
     'listed_count': 1},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1329817868377100289/Sw4it26P_normal.jpg'},
   {'location': 'Melbourne, Victoria',
    'verified': False,
    'description': 'Registered nurse working in health practitioner regulation',
    'username': 'petrina13',
    'url': '',
    'protected': False,
    'name': 'Petrina Halloran',
    'id': '269615980',
    'public_metrics': {'followers_count': 29,
     'following_count': 181,
     'tweet_count': 282,
     'listed_count': 0},
    'profile_image_url': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png'},
   {'verified': False,
    'description': '',
    'username': 'blainetan_',
    'url': '',
    'protected': False,
    'name': '🙇🏻\u200d♀️',
    'id': '2825619056',
    'public_metrics': {'followers_count': 521,
     'following_count': 359,
     'tweet_count': 10374,
     'listed_count': 0},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1488495773935304707/EYVtHCAv_normal.jpg'},
   {'location': 'ms → nyc',
    'verified': False,
    'entities': {'url': {'urls': [{'start': 0,
        'end': 23,
        'url': 'https://t.co/LmdYIYnocf',
        'expanded_url': 'http://instagram.com/peter.atthepark',
        'display_url': 'instagram.com/peter.atthepark'}]}},
    'description': 'not in this economy',
    'username': 'peteratthepark',
    'url': 'https://t.co/LmdYIYnocf',
    'protected': False,
    'pinned_tweet_id': '1484715489179885571',
    'name': 'Bussy Philipps',
    'id': '41868633',
    'public_metrics': {'followers_count': 587,
     'following_count': 966,
     'tweet_count': 49125,
     'listed_count': 19},
    'profile_image_url': 'https://pbs.twimg.com/profile_images/1516197652676890635/DybGL4Sf_normal.jpg'}]},
 'meta': {'newest_id': '1521619805735903232',
  'oldest_id': '1521619705345318913',
  'result_count': 10,
  'next_token': 'b26v89c19zqg8o3fpywl7wpl6i4s54cnebnrnw60l29vh'}}

In [95]:
third_result['data'][0]

{'public_metrics': {'retweet_count': 0,
  'reply_count': 0,
  'like_count': 0,
  'quote_count': 0},
 'lang': 'en',
 'text': 'Wordle 121 3/6\n\n⬛🟨⬛⬛🟩\n🟨🟨⬛⬛🟩\n🟩🟩🟩🟩🟩\nHappy Wednesday 🍀',
 'conversation_id': '1521619805735903232',
 'created_at': '2022-05-03T22:36:46.000Z',
 'author_id': '1475139411448844288',
 'source': 'Twitter for iPhone',
 'context_annotations': [{'domain': {'id': '30',
    'name': 'Entities [Entity Service]',
    'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'},
   'entity': {'id': '1480827185451573250', 'name': 'Wordle'}}],
 'id': '1521619805735903232',
 'possibly_sensitive': False,
 'reply_settings': 'everyone'}

In [98]:
pd.DataFrame(third_result['data'])

Unnamed: 0,public_metrics,lang,text,conversation_id,created_at,author_id,source,context_annotations,id,possibly_sensitive,reply_settings,entities,in_reply_to_user_id,referenced_tweets
0,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 121 3/6\n\n⬛🟨⬛⬛🟩\n🟨🟨⬛⬛🟩\n🟩🟩🟩🟩🟩\nHappy W...,1521619805735903232,2022-05-03T22:36:46.000Z,1475139411448844288,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619805735903232,False,everyone,,,
1,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 319 5/6\n\n⬛⬛🟨⬛⬛\n⬛⬛🟨⬛⬛\n⬛⬛🟩⬛⬛\n⬛⬛🟨⬛🟨\n...,1521619803076632577,2022-05-03T22:36:46.000Z,119910827,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619803076632577,False,everyone,,,
2,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",no,Wordle 319 2/6\n\n⬜🟨🟩🟨⬜\n🟩🟩🟩🟩🟩\n\nMahjong Hand...,1521619773825908736,2022-05-03T22:36:39.000Z,3230367980,Twitter for Android,"[{'domain': {'id': '66', 'name': 'Interests an...",1521619773825908736,False,everyone,"{'urls': [{'start': 51, 'end': 74, 'url': 'htt...",,
3,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 319 2/6\n\n⬜🟩⬜🟨⬜\n🟩🟩🟩🟩🟩,1521619770868736001,2022-05-03T22:36:38.000Z,139377369,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619770868736001,False,everyone,,,
4,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Regulinchi...\nWordle (ES) #117 6/6\n\n⬜⬜🟨⬜🟩\...,1521619752505864192,2022-05-03T22:36:33.000Z,2422157442,Twitter for Android,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619752505864192,False,everyone,"{'urls': [{'start': 74, 'end': 97, 'url': 'htt...",,
5,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 4/6\n\n🟨⬛🟩⬛⬛\n⬛⬛🟨⬛🟩\n🟨🟩🟩⬛🟩\n🟩🟩🟩🟩🟩,1521619740111867905,2022-05-03T22:36:31.000Z,336226200,Twitter for Android,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619740111867905,False,everyone,,,
6,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 319 4/6\n\n⬛⬛🟩⬛⬛\n🟨⬛⬛🟨⬛\n⬛🟨⬛🟨⬛\n🟩🟩🟩🟩🟩,1521619739847524352,2022-05-03T22:36:30.000Z,1320944011914309633,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619739847524352,False,everyone,,,
7,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 319 3/6\n\n⬜🟩⬜⬜⬜\n⬜🟩🟩🟩🟩\n🟩🟩🟩🟩🟩,1521619735057559553,2022-05-03T22:36:29.000Z,269615980,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619735057559553,False,everyone,,,
8,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 319 2/6\n\n⬛🟩⬛🟨⬛\n🟩🟩🟩🟩🟩,1521619730578116609,2022-05-03T22:36:28.000Z,2825619056,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619730578116609,False,everyone,,,
9,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 4/6\n\n⬛⬛🟩⬛⬛\n⬛⬛🟩⬛⬛\n⬛🟨🟩⬛⬛\n🟩🟩🟩🟩🟩,1485658173331714056,2022-05-03T22:36:22.000Z,41868633,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521619705345318913,False,everyone,,41868633.0,"[{'type': 'replied_to', 'id': '152112067111539..."


This is good for now. The JSON entries should be expanded later, but it will be good to save this directly so I don't chance losing anything.

## saving

I want to see if these save "correctly". It looks like `to_csv` preserves non-ascii characters including emojis and foreign language characters just fine.

In [123]:
df.to_csv('./my_first.csv')

In [125]:
df

Unnamed: 0,public_metrics,lang,text,conversation_id,created_at,author_id,source,context_annotations,id,possibly_sensitive,reply_settings,entities,attachments,in_reply_to_user_id,referenced_tweets
0,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 4/6\n\n⬛⬛🟨🟨⬛\n⬛🟩⬛🟩🟩\n⬛🟩🟩🟩🟩\n🟩🟩🟩🟩🟩,1521351866545811458,2022-05-03T04:52:04.000Z,1080339364432044032,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351866545811458,False,everyone,,,,
1,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 3/6\n\n⬜⬜⬜⬜🟩\n⬜⬜🟩⬜🟩\n🟩🟩🟩🟩🟩\n...,1521351849911193600,2022-05-03T04:52:01.000Z,2933210357,Twitter for Android,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351849911193600,False,everyone,"{'urls': [{'start': 42, 'end': 65, 'url': 'htt...",{'media_keys': ['16_1521351843950972929']},,
2,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",ja,日本語のWordle（ローマ字）\n#111\n4/6\nhttps://t.co/wqPI...,1521351849206763520,2022-05-03T04:52:00.000Z,1417413144058556427,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351849206763520,False,everyone,"{'urls': [{'start': 26, 'end': 49, 'url': 'htt...",,,
3,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 5/6\n\n⬜⬜⬜⬜⬜\n🟨🟩⬜⬜🟩\n⬜🟩🟩⬜🟩\n...,1521351829464006662,2022-05-03T04:51:56.000Z,109343037,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351829464006662,False,everyone,"{'urls': [{'start': 54, 'end': 77, 'url': 'htt...",,,
4,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 5/6\n\n⬜⬜🟨⬜⬜\n⬜⬜🟨⬜⬜\n⬜🟩⬜⬜🟨\n⬜🟩🟨⬜🟨\n...,1521351825966120960,2022-05-03T04:51:55.000Z,25260021,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351825966120960,False,everyone,,,,
5,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,@almuckersie Wordle 318 3/6\n\n🟩⬛⬛⬛⬛\n⬛🟨🟨🟨⬛\n🟩...,1521269608501719040,2022-05-03T04:51:53.000Z,295803948,Twitter for iPad,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351820358168576,False,everyone,"{'mentions': [{'start': 0, 'end': 12, 'usernam...",,228408417.0,"[{'type': 'replied_to', 'id': '152126960850171..."
6,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",und,@theonclejack @Lisanlussapa @MonicaBarcelona @...,1521348481268428800,2022-05-03T04:51:52.000Z,1194873103618363392,Twitter for iPhone,,1521351815018729476,False,everyone,"{'hashtags': [{'start': 143, 'end': 153, 'tag'...",,1.3121525279248753e+18,"[{'type': 'replied_to', 'id': '152134848126842..."
7,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,@buzwepama Wordle 318 5/6\n\n⬛🟨🟨🟨⬛\n🟨🟩🟩⬛🟩\n⬛🟩🟩...,1521351452903542789,2022-05-03T04:51:52.000Z,1500715044027473921,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351813492051970,False,everyone,"{'mentions': [{'start': 0, 'end': 10, 'usernam...",,52375003.0,"[{'type': 'replied_to', 'id': '152135145290354..."
8,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",ja,ポケモンWordle 4/10\n\n⬛⬛⬛⬛⬛\n⬛🟨⬛⬛⬛\n🟩⬛⬛⬛🟩\n🟩🟩🟩🟩🟩\...,1521351801714651136,2022-05-03T04:51:49.000Z,1501897458443780100,Twitter for iPhone,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1521351801714651136,False,everyone,"{'hashtags': [{'start': 67, 'end': 78, 'tag': ...",,,
9,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 5/6\n\n⬜🟩🟩🟨⬜\n⬜🟩🟩⬜🟩\n⬜🟩🟩⬜🟩\n...,1521351801156550657,2022-05-03T04:51:49.000Z,374172093,Twitter Web App,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351801156550657,False,everyone,"{'urls': [{'start': 54, 'end': 77, 'url': 'htt...",,,


In [124]:
df2 = pd.read_csv('./my_first.csv')
df2

Unnamed: 0.1,Unnamed: 0,public_metrics,lang,text,conversation_id,created_at,author_id,source,context_annotations,id,possibly_sensitive,reply_settings,entities,attachments,in_reply_to_user_id,referenced_tweets
0,0,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 4/6\n\n⬛⬛🟨🟨⬛\n⬛🟩⬛🟩🟩\n⬛🟩🟩🟩🟩\n🟩🟩🟩🟩🟩,1521351866545811458,2022-05-03T04:52:04.000Z,1080339364432044032,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351866545811458,False,everyone,,,,
1,1,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 3/6\n\n⬜⬜⬜⬜🟩\n⬜⬜🟩⬜🟩\n🟩🟩🟩🟩🟩\n...,1521351849911193600,2022-05-03T04:52:01.000Z,2933210357,Twitter for Android,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351849911193600,False,everyone,"{'urls': [{'start': 42, 'end': 65, 'url': 'htt...",{'media_keys': ['16_1521351843950972929']},,
2,2,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",ja,日本語のWordle（ローマ字）\n#111\n4/6\nhttps://t.co/wqPI...,1521351849206763520,2022-05-03T04:52:00.000Z,1417413144058556427,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351849206763520,False,everyone,"{'urls': [{'start': 26, 'end': 49, 'url': 'htt...",,,
3,3,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 5/6\n\n⬜⬜⬜⬜⬜\n🟨🟩⬜⬜🟩\n⬜🟩🟩⬜🟩\n...,1521351829464006662,2022-05-03T04:51:56.000Z,109343037,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351829464006662,False,everyone,"{'urls': [{'start': 54, 'end': 77, 'url': 'htt...",,,
4,4,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle 318 5/6\n\n⬜⬜🟨⬜⬜\n⬜⬜🟨⬜⬜\n⬜🟩⬜⬜🟨\n⬜🟩🟨⬜🟨\n...,1521351825966120960,2022-05-03T04:51:55.000Z,25260021,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351825966120960,False,everyone,,,,
5,5,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,@almuckersie Wordle 318 3/6\n\n🟩⬛⬛⬛⬛\n⬛🟨🟨🟨⬛\n🟩...,1521269608501719040,2022-05-03T04:51:53.000Z,295803948,Twitter for iPad,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351820358168576,False,everyone,"{'mentions': [{'start': 0, 'end': 12, 'usernam...",,228408400.0,"[{'type': 'replied_to', 'id': '152126960850171..."
6,6,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",und,@theonclejack @Lisanlussapa @MonicaBarcelona @...,1521348481268428800,2022-05-03T04:51:52.000Z,1194873103618363392,Twitter for iPhone,,1521351815018729476,False,everyone,"{'hashtags': [{'start': 143, 'end': 153, 'tag'...",,1.312153e+18,"[{'type': 'replied_to', 'id': '152134848126842..."
7,7,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,@buzwepama Wordle 318 5/6\n\n⬛🟨🟨🟨⬛\n🟨🟩🟩⬛🟩\n⬛🟩🟩...,1521351452903542789,2022-05-03T04:51:52.000Z,1500715044027473921,Twitter for iPhone,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351813492051970,False,everyone,"{'mentions': [{'start': 0, 'end': 10, 'usernam...",,52375000.0,"[{'type': 'replied_to', 'id': '152135145290354..."
8,8,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",ja,ポケモンWordle 4/10\n\n⬛⬛⬛⬛⬛\n⬛🟨⬛⬛⬛\n🟩⬛⬛⬛🟩\n🟩🟩🟩🟩🟩\...,1521351801714651136,2022-05-03T04:51:49.000Z,1501897458443780100,Twitter for iPhone,"[{'domain': {'id': '45', 'name': 'Brand Vertic...",1521351801714651136,False,everyone,"{'hashtags': [{'start': 67, 'end': 78, 'tag': ...",,,
9,9,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Wordle (ES) #117 5/6\n\n⬜🟩🟩🟨⬜\n⬜🟩🟩⬜🟩\n⬜🟩🟩⬜🟩\n...,1521351801156550657,2022-05-03T04:51:49.000Z,374172093,Twitter Web App,"[{'domain': {'id': '30', 'name': 'Entities [En...",1521351801156550657,False,everyone,"{'urls': [{'start': 54, 'end': 77, 'url': 'htt...",,,


# Procedure


- Create a DataFrame to fill with tweets
    - perform a priming search
    - make the dataframe from the priming data
    - index df by tweet id
- Repeatedly get tweets and tack them onto the end of the DataFrame.
    - repeatedly:
        - get next page of search with the next_token from previous meta
        - turn data into a new dataframe
        - reindex that by tweet id
        - concatenate new df onto the bottom of old one. Indices are 
        - if we've reached some number checkpoint, print a progress update
        - update "last" next_token with new one. If there isn't one, end this loop. Or, if we've grabbed a tweet limit or everything in a certain range. Perhaps a day at a time?
        - delete the new dataframe to preserve memory.
        

In [99]:
third_result['meta']

{'newest_id': '1521619805735903232',
 'oldest_id': '1521619705345318913',
 'result_count': 10,
 'next_token': 'b26v89c19zqg8o3fpywl7wpl6i4s54cnebnrnw60l29vh'}

In [101]:
'next_token' in first_result_dict['meta'].keys()

True

In [104]:
import os
from IPython.display import clear_output
from time import sleep
print("hi")
sleep(4)

# Clearing the Screen
clear_output()
print("hello again")

hello again


In [105]:
"[======================================================================..............................]"



In [159]:
# a function that formats a search parameter string
default_user_fields = [
   'description',
    'entities',
    'id',
    'location',
    'name',
    'pinned_tweet_id',
    'profile_image_url',
    'protected',
    'public_metrics',
    'url',
    'username',
    'verified',
    'withheld'
]

default_tweet_fields = [
    'attachments',
    'author_id',
    'context_annotations',
    'conversation_id',
    'created_at',
    'entities',
    'geo',
    'id',
    'in_reply_to_user_id',
    'lang',
    'public_metrics',
    'possibly_sensitive',
    'referenced_tweets',
    'reply_settings',
    'source',
    'text',
    'withheld'
]



def format_twitter_search_parameters(tweet_fields = default_tweet_fields,
                                     user_fields = default_user_fields,
                                     max_results = 10,
                                     next_token = '',
                                     end_time = '',
                                     until_id = 0
                                    ):
    #optionally search up to an end time
    if end_time != '':
        end_time_bit = '&end_time=' + end_time
    else:
        end_time_bit = end_time
    #optionally search after a previous search using its next_token value (from 'meta')
    if next_token != '':
        next_token_bit = '&next_token=' + next_token
    else:
        next_token_bit = next_token
    #optionally search only before a certain tweet id
    if until_id == 0:
        until_id_bit = ''
    else:
        until_id_bit = '&until_id=' + str(until_id)
    #assemble the string
    parameter_string = 'tweet.fields=' + ','.join(tweet_fields) + '&user.fields=' + ','.join(user_fields) + '&max_results='+ str(max_results)+ until_id_bit +  next_token_bit + end_time_bit
    return parameter_string

# takes a twitter search parameter string with a next_token bit on it, removes the token, and adds the new one.
# returns the new string
def replace_next_token(twitter_search_parameters, next_token):
    next_token_label = '&next_token='
    split_parameters = twitter_search_parameters.split(next_token_label)
    new_parameter_string = split_parameters[0] + next_token_label + str(next_token)
    return new_parameter_string

In [162]:
replace_next_token(format_twitter_search_parameters()+'&next_token=3456247', 111111)

'tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld&user.fields=description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld&max_results=10&next_token=111111'

In [152]:
','.join(default_user_fields)

'description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld'

In [154]:
format_twitter_search_parameters(until_id = 1234)

'tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld&user.fields=description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld&max_results=10&until_id=1234'

In [224]:
# takes in a query and search parameters, formats it and performs a series of searches. returns a dict with
# aggregated results and a token for the next page if you desire to continue searching
def repeat_twitter_search(search_query, tweet_fields, user_fields, search_size = 100, until_id = 0, num_searches = 10, first_next_token = ''):
    # do a "priming search", preparing the dataframe and filling it with first values
    search_parameters = format_twitter_search_parameters(tweet_fields = tweet_fields, user_fields = user_fields, max_results = search_size, until_id = 0)+ '&expansions=author_id'
    # this can start with a next_token already defined.
    if first_next_token != '':
        search_parameters = search_parameters  + '&next_token=' + first_next_token 
    search_result = search_twitter(search_query, search_parameters)
    tweets_df = pd.DataFrame(search_result['data'])
    users_df = pd.DataFrame(search_result['includes']['users'])
    tweets_df.set_index('id', inplace = True)
    #store the next token for reading the second page of results. the loop ahead grabs successive pages.
    next_token = search_result['meta']['next_token']
    #edit the parameters to get the next page.
    # add in a next_token= to the parameters if it doesn't already have one.
    if first_next_token == '':
        search_parameters = search_parameters + '&next_token=' + search_result['meta']['next_token']
    #loop through the following searches, concatting their results to the dataframe.
    for search_number in range(num_searches-1):
        #edit parameters to get the next page
        #expansions is needed to get the user data.
        search_parameters = replace_next_token(search_parameters, next_token)
        search_result = search_twitter(search_query, search_parameters)
        new_tweets_df = pd.DataFrame(search_result['data'])
        new_tweets_df.set_index('id', inplace = True)
        tweets_df = pd.concat([tweets_df, new_tweets_df])
        new_users_df = pd.DataFrame(search_result['includes']['users'])
        new_users_df.set_index('id', inplace = True)
        users_df = pd.concat([users_df, new_users_df])
        next_token = search_result['meta']['next_token']
    return tweets_df, users_df, next_token

In [222]:
pd.DataFrame.join()
pd.concat()

TypeError: join() missing 2 required positional arguments: 'self' and 'other'

In [223]:
kar_dat, kar_usrs, kar_tok = repeat_twitter_search('karkat', tweet_fields = default_tweet_fields, user_fields = default_user_fields, search_size = 10, num_searches=4)

200
Index(['reply_settings', 'entities', 'context_annotations', 'text', 'source',
       'public_metrics', 'referenced_tweets', 'lang', 'author_id',
       'possibly_sensitive', 'created_at', 'conversation_id',
       'in_reply_to_user_id', 'attachments'],
      dtype='object')
200
Index(['reply_settings', 'entities', 'context_annotations', 'text', 'source',
       'public_metrics', 'referenced_tweets', 'lang', 'author_id',
       'possibly_sensitive', 'created_at', 'conversation_id',
       'in_reply_to_user_id', 'attachments'],
      dtype='object')
200
Index(['reply_settings', 'entities', 'context_annotations', 'text', 'source',
       'public_metrics', 'referenced_tweets', 'lang', 'author_id',
       'possibly_sensitive', 'created_at', 'conversation_id',
       'in_reply_to_user_id', 'attachments'],
      dtype='object')
200


In [225]:
kar_tok

'b26v89c19zqg8o3fpywl7wr2x0u0z6cro94oi9oxb3czh'

In [226]:
kar_dat.info()

<class 'pandas.core.frame.DataFrame'>
Index: 40 entries, 1521709966393962497 to 1521664283452444673
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   reply_settings       40 non-null     object
 1   entities             29 non-null     object
 2   context_annotations  39 non-null     object
 3   text                 40 non-null     object
 4   source               40 non-null     object
 5   public_metrics       40 non-null     object
 6   referenced_tweets    17 non-null     object
 7   lang                 40 non-null     object
 8   author_id            40 non-null     object
 9   possibly_sensitive   40 non-null     bool  
 10  created_at           40 non-null     object
 11  conversation_id      40 non-null     object
 12  in_reply_to_user_id  4 non-null      object
 13  attachments          13 non-null     object
dtypes: bool(1), object(13)
memory usage: 4.4+ KB


In [229]:
kar_usrs.info()

<class 'pandas.core.frame.DataFrame'>
Index: 36 entries, 0 to 1168215609798053889
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   public_metrics     36 non-null     object
 1   username           36 non-null     object
 2   description        36 non-null     object
 3   url                36 non-null     object
 4   verified           36 non-null     bool  
 5   profile_image_url  36 non-null     object
 6   name               36 non-null     object
 7   id                 10 non-null     object
 8   protected          36 non-null     bool  
 9   pinned_tweet_id    23 non-null     object
 10  location           21 non-null     object
 11  entities           18 non-null     object
dtypes: bool(2), object(10)
memory usage: 3.2+ KB


In [179]:
replace_next_token(format_twitter_search_parameters(tweet_fields = default_tweet_fields, user_fields = default_user_fields, max_results = 10, until_id = 0) + '&next_token=','awe4tq3')

'tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld&user.fields=description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld&max_results=10&next_token=awe4tq3'

In [181]:
search_twitter('karkat', format_twitter_search_parameters())

200


{'data': [{'reply_settings': 'everyone',
   'entities': {'hashtags': [{'start': 32, 'end': 38, 'tag': 'Nepal'}],
    'urls': [{'start': 106,
      'end': 129,
      'url': 'https://t.co/7hdLj3QC0v',
      'expanded_url': 'https://twitter.com/Alan10074652/status/1518628989610430465',
      'display_url': 'twitter.com/Alan10074652/s…'}],
    'mentions': [{'start': 40,
      'end': 55,
      'username': 'Karkat_Kashyap',
      'id': '702892708575125505'},
     {'start': 56,
      'end': 71,
      'username': 'Bpatil60949451',
      'id': '1430188633701511169'},
     {'start': 72,
      'end': 83,
      'username': 'XtreyAnita',
      'id': '759000527711514624'},
     {'start': 84,
      'end': 95,
      'username': 'upretiu806',
      'id': '702693945952718848'},
     {'start': 96, 'end': 105, 'username': 'tiluac38', 'id': '1309168922'}]},
   'id': '1521695908014043136',
   'text': 'सारै मन पर्यो साफलताको शुभ कमना #Nepal \n@Karkat_Kashyap @Bpatil60949451 @XtreyAnita @upretiu806 @tiluac38 

# Gather first "real" data

I can now use `repeat_twitter_search` to repeatedly tack onto a single pair of user and tweet DataFrames. I should collect a large number of these rows, then save them to csvs. I can always search behind an oldest tweet by getting its id and searching earlier. That will involve some editing of the `repeat_twitter_search` function to include `&end_id` or something like that.

In [235]:
%%time
#Get what I estimated is 24 hours worth of tweets. 60,000 tweets!
#first_word_search_tweets, first_word_search_users, first_word_search_token = repeat_twitter_search(search_query = '🟩 Wordle',
#                                                                                                   tweet_field = default_tweet_fields,
#                                                                                                   user_fields= default_user_fields,
#                                                                                                    search_size=100,
#                                                                                                    until_id=0,
#                                                                                                    num_searches=600,
#                                                                                                    first_next_token='')

Wall time: 0 ns


On second thought, let's do this in a new notebook.