# Assignment 2

## Jade Benson

This notebook is a pilot study for my final project in the 2022 class "Cultural Patterns" where I use computational methods to identify cultural trends. 

## Research Question and Motivation

I am interested in how different groups discuss their relationships with food online, particularly how specific foods are moralized and constructed to be "healthy." The fitness community has many subgroups that demonize certain food groups, primarily carbs which can be seen in diet trends like "Keto," "Paleo," and "Counting Macros." Even something so simple as describing highly processed foods as "junk" conveys a moral argument about the worth of certain foods, and by extension, the worth of the people who regularly consume them. This moralistic rhetoric, often accompanied by images of idealized bodies or bodies to shame (or both in before-and-afters), can inspire restrictive dieting and fears surrounding certain foods and bodies. But opposition groups have also arisen to try to combat this ideology, like the body positive movement. This community tries to celebrate bodies of diverse sizes, shapes, and abilities while removing moralistic associations with food to allow for an "intuitive" eating relationship that tries to reduce the stress of other eating practices. There is perhaps even a third community of professionals (dietians, public health officials, doctors) who acknowledge the need of moderate diets to address the consequences of obesity (particularly in the US), but how this problem is larger than just eating and includes food deserts, economic inequality, easy access to nutrient deficient foods, and corporate incentives. These communities have competing definitions of what constitutes "healthy." 

In this research project, I aim to use Twitter data to identify how people describe and conceptualize "healthy" food and eating and how this differs across subgroups. I am also interested in whether these subgroups are interconnected - is there cross-talk across these groups or are they relatively distinct with separate cultures and understandings of health? Does the professional group attempt to infilitrate fitness influencer or body positive conversations as a part of public health campaigns to give people access to other types of food/diet information? Do certain fitness/diet groups co-op the language of "obesity" and "obesity crisis" as a moralistic term aimed at scaring people into particular ways of eating? Does the body positive movement discredit the metrics and tools of other communities (BMI, calorie counting) to re-establish their own understanding of healthy? I am interested in computationally analyzing these conversations to better understand how healthy food and eating are constructed online. 

## Data Sources and Methods 

I am planning on using the Twitter API to scrape tweets related to these groups and topics of healthy eating and food. I think Twitter is perhaps the best social media site to use for this project because it includes so many diverse groups on the same platform. It also primarily centers around text conversations which allows us to better see how people are talking about these subjects. The replies/mentions can be easily analyzed too so we can see whether there is cross-talk or if the groups are distinct. I have also conducted a prior project using the Twitter API and scraping data, so I feel more comfortable conducting this analysis in the short time frame. 

I think Tiktok may skew a little younger and more informal so would likely have fitness and body positive content, but perhaps not professionals. It also centers around videos more so than text and I'm interested in how people talk about these concepts - but it would be interesting to expand my skill set to try to get video transcriptions or how the images in videos relate to their short descriptions/tags. Reddit would be another option, but I am less familar with the pages that I would need to scrape to effectively cover these conversations. This might make a good backup plan though as I am a little concerned that searching Twitter for "healthy" right now may return more COVID content than food (I'll try to address this with conditions). Reddit might allow for more granular searches. 

## Data Gathering

In this section, I'll use the Twitter API to scrape tweets for a preliminary analysis. In the final version, I plan on expanding the number of Tweets scraped, perhaps the search criteria, and maybe more in-depth information about the users and their networks. 

In [17]:
import tweepy
from tweepy import OAuthHandler

import requests

import pandas as pd
import csv
import re
import string
import time


In [None]:
#have to update this for new Twitter API 
#https://towardsdatascience.com/an-extensive-guide-to-collecting-tweets-from-twitter-api-v2-for-academic-research-using-python-3-518fcb71df2a
#https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/1fd23117345cd1dc3e75c7d69efae994e929c279/Full-Archive-Search/full-archive-search.py

In [49]:
#Jade's Twitter 2.0 info 

#these API tokens might need regeneration
#api_token = "1393223500832382977-1AjGzeeo4Ue6PgFb0bV7ipOvwTezBF" 
#api_token_secret = "rSTeeGm2el1kFwC8kuyqK9upG3DGiCHKbMyyjix9lYNUl"

#access_token = "1393223500832382977-pXEzv2zITWvZpbUyGhxgjv6FyadfjJ"
#access_token_secret = "CJ3T7qSw43CgkMi0CC5Ecb8tiZMIA246xjnhkV8Qy1Ppv"

bearer_token = "AAAAAAAAAAAAAAAAAAAAAK%2FyZAEAAAAAknoZzdqGK6OIOejAHvMLFwAZ%2FaA%3DNm5WcRTGkrng2Jxg0ZQzuWfUGsqr1kJMtZKVuvSFMrjWsRiNPQ"
#from viral sentiments: bearer_token="AAAAAAAAAAAAAAAAAAAAAKROPwEAAAAArHftxA3yStRytXjFAYSPUj7180Y%3DTU6cQLajxIvPmhPDuxh0SQ5mMsjW7QbZYs6CyR8PNjDFwj8yMN"

In [50]:
def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2FullArchiveSearchPython"
    return r

In [71]:
#create the URL to search the API (search/recent)
#https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

#build a query guide 
#https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query 

#params for recent search 
#https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent

def connect_to_endpoint(my_query):
    
    search_url = "https://api.twitter.com/2/tweets/search/recent"
    
    params = {'query': my_query,
            'max_results': max_results,
            'expansions': "author_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,in_reply_to_user_id,geo.place_id",
            'tweet.fields': 'id,text,author_id,in_reply_to_user_id,geo,conversation_id,created_at,lang,public_metrics,referenced_tweets,reply_settings',
            'user.fields': 'description,id,location,name,url,username,verified',
            'max_results' : 100,
            'next_token': {}}
    
    response = requests.request("GET", search_url, auth=bearer_oauth, params=params)
    
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

In [92]:
#this will need to  be worked on
my_query = "((healthy food) OR (diet eat) OR (lose weight) OR (bodypositive OR intuitive eating) OR (obesity) OR (keto OR paleo OR macros OR calories )) lang:en"

In [93]:
my_json  = connect_to_endpoint(my_query)

In [109]:
 df_tweets = pd.DataFrame(columns = ['tweet_id',
                                        'author_id',
                                        'tweet_created_at',
                                        'conversation_id',
                                        'text',
                                        'retweets', 
                                         'replies', 
                                         'likes', 
                                         'quotes', 
                                         'type_reference',
                                         'referenced_tweet',
                                         'replied_to_user', 
                                         'mentions_dict',
                                         'thread_dict'                                     
                                       ])

for r in my_json['data']: 
    tweet_id = r['id']
    author_id = r['author_id']
    tweet_time = r['created_at']
    convo_id = r['conversation_id']
    
    tweet_text = r['text']
    
    num_retweets = r['public_metrics']['retweet_count']
    num_replies = r['public_metrics']['reply_count']
    num_likes = r['public_metrics']['like_count']
    num_quotes = r['public_metrics']['quote_count']
    
    if 'referenced_tweets' in r: 
        type_reference = r['referenced_tweets'][0]['type']
        referenced_tweet = r['referenced_tweets'][0]['id']
        
        if type_reference == 'replied_to':
            replied_to_user = r['in_reply_to_user_id']
        else: 
            replied_to_user = None
    else:
        type_reference = None
        referenced_tweet = None
    
    #just add the dictionaries to df for now 
    #can figure out if/how to keep these later
    if 'entities' in r: 
        mentions_dict = r['entities']
    else: 
        mentions_dict = None 
    
    if 'includes' in r: 
        includes_dict = r['includes']
    else: 
        includes_dict = None
        
        
    
     # Add the 11 variables to the empty list - ith_tweet:
    ith_tweet = [tweet_id,
                 author_id,
                 tweet_time,
                 convo_id, 
                 tweet_text,
                 num_retweets, 
                 num_replies, 
                 num_likes, 
                 num_quotes, 
                 type_reference, 
                 referenced_tweet, 
                 replied_to_user, 
                 mentions_dict, 
                 includes_dict
                 ]
    # Append to dataframe - db_tweets
    df_tweets.loc[len(df_tweets)] = ith_tweet
            

In [111]:
len(df_tweets)

100

In [110]:
df_tweets.head()

Unnamed: 0,tweet_id,author_id,tweet_created_at,conversation_id,text,retweets,replies,likes,quotes,type_reference,referenced_tweet,replied_to_user,mentions_dict,thread_dict
0,1493332804993699841,1417926026831794185,2022-02-14T21:14:20.000Z,1493332804993699841,RT @ProfPatch83: More than 40% of food in the ...,1,0,0,0,retweeted,1.493049922932859e+18,,"{'mentions': [{'start': 3, 'end': 15, 'usernam...",
1,1493332804876349443,794162692311265281,2022-02-14T21:14:20.000Z,1493332804876349443,Want the benefits of a healthy #WholeFood diet...,0,0,0,0,,,,,
2,1493332803349630976,852039149037076480,2022-02-14T21:14:19.000Z,1493332803349630976,The clean eating plan ebook: You'll Find Out H...,0,0,0,0,,,,,
3,1493332796516884482,373561824,2022-02-14T21:14:18.000Z,1493332796516884482,RT @docsquiddy: at least when disney removes t...,1,0,0,0,retweeted,1.4933324043537777e+18,,"{'mentions': [{'start': 3, 'end': 14, 'usernam...",
4,1493332760458678279,1148667397303349254,2022-02-14T21:14:09.000Z,1493332760458678279,RT @favouryuwa: Everybody is loosing weight in...,2,0,0,0,retweeted,1.4929323956701635e+18,,"{'mentions': [{'start': 3, 'end': 14, 'usernam...",


Now want to look at whether "moralistic" language is being used? 