# Diet Classification - Part 1

Part 1 consist of the data extraction. EDA of the vegan and keto datasets will be available in Part 2 and preprocessing and modelling will be available in Part 3. 

## Background

Veganism was first coined in 1944, by Dorothy Morgon and Donald Watson. Veganism in essense is a lifestyle built on the idea of consumption. It is the abstinence of consumption of meat, eggs, cheese, honey i.e. any produce by animals. This also applies to the consumption of by-products of clothing , or accessory (i.e. leather goods, ivory, pearls) made from animals. [(Time, 2008)](https://time.com/3958070/history-of-veganism/)

Veganism interest has made its way into the all time high in the recent years. This popularity is further boosted as the global climate change has changed perspective and its deeper meaning as part of the driver to reduce the use of finite resources sparingly. [(National Geographic, 2014).](https://www.nationalgeographic.com/culture/article/vegetarianism-more-than-meats-the-eye)

Ideally, to be able to create a condusive environment for new consumers to try vegan products and adopt veganism would extend as a long term customers. 

## Problem Statement 

As more and more vegan products are hitting the market in these recent years, we seek to find the idiosyncracy of temporal-fad diet versus purposeful-lifestyle diet in the perspective of consumers. 

Using classification model with logistic regression and naive bayes, we want to investigate jargons that associates and/or differentiate the vegan lifestyle over high fat adequate diets (keto diet). The classification model will be evaluated via accuracy score.

With this findings, we are able to learn the driving popularity amongst modern dietary to redirect the resources in the publicity campaign to increase customer acquisition.

In [1]:
# import library 
import requests
import pandas as pd
import time

pd.set_option('display.max_column', 100)

### Data Collection with API

In [10]:
def data_collect(s_reddit, num=100):
    ''' 
    Inputs the subreddit of choice and number of size of loop range
    Outputs the list in dataframe/100 rows
    '''
    subreddit = []
    utc_bef = None

    for i in range(num):
        #url selection
        url =  'https://api.pushshift.io/reddit/search/submission'
        #url params for submission 
        url_params = {
            'subreddit' : s_reddit,
            'size' : 100,
            'before' : utc_bef
            } 
        #send req
        req = requests.get(url=url,params=url_params)

        #read json data 
        if req.status_code==200:
            data = req.json()
            post = data['data']

            #return timestamp for before 100 posts
            utc_bef = post[-1]['created_utc']
            
            #append json to list
            post = pd.DataFrame(post)
            subreddit.append(post)
        time.sleep(2)
    return subreddit


### Vegan Subreddit

In [6]:
#concat merged by page collection
vegan_df = pd.concat(vegan)

#check and drop duplicates
vegan_df.drop_duplicates(subset='title',inplace=True)


In [7]:
vegan_df.head()

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_class,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,post_hint,preview,pwls,removed_by_category,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,thumbnail,thumbnail_height,thumbnail_width,title,total_awards_received,treatment_tags,upvote_ratio,url,url_overridden_by_dest,whitelist_status,wls,is_gallery,media,media_embed,secure_media,secure_media_embed,crosspost_parent,crosspost_parent_list,author_flair_background_color,banned_by,poll_data,media_metadata,author_cakeday,distinguished,edited,gallery_data,gilded
0,[],False,Elise_93,,[],349bd530-7bcd-11e9-9a04-0ee954ab499e,vegan 8+ years,dark,text,t2_224i01sh,False,False,[],False,False,1618711390,i.redd.it,https://www.reddit.com/r/vegan/comments/mt3zoq...,{},mt3zoq,False,False,False,True,False,False,False,,Funny,[],4983f5d2-e206-11e4-9f95-22000bb2c21d,Funny,dark,text,False,False,True,0,0,False,all_ads,/r/vegan/comments/mt3zoq/a_summary_of_carnist_...,False,image,"{'enabled': True, 'images': [{'id': 'De33IZ6ef...",6,automod_filtered,1618711401,1,,True,False,False,vegan,t5_2qhpm,590954,public,https://b.thumbs.redditmedia.com/cTOm6m5yfgut1...,15.0,140.0,A summary of carnist logic about PETA:,0,[],1.0,https://i.redd.it/hezjrlks9ut61.png,https://i.redd.it/hezjrlks9ut61.png,all_ads,6,,,,,,,,,,,,,,,,
1,[],False,SeviathanVonEldritch,,[],70cd74e0-f01f-11e2-be4e-12313d05241f,vegan,dark,text,t2_b1twfs3g,False,False,[],False,False,1618711076,i.redd.it,https://www.reddit.com/r/vegan/comments/mt3wse...,{},mt3wse,False,False,False,True,False,False,False,,Rant,[],93c0fe64-689e-11e5-b5be-0e7eccdaeabd,Rant,dark,text,False,False,True,0,0,False,all_ads,/r/vegan/comments/mt3wse/underestimated_it_as_...,False,image,"{'enabled': True, 'images': [{'id': 'yoc4iZ2JI...",6,automod_filtered,1618711086,1,,True,False,False,vegan,t5_2qhpm,590952,public,https://b.thumbs.redditmedia.com/exsHq-ceI36Vq...,94.0,140.0,Underestimated it as well...,0,[],1.0,https://i.redd.it/15kexfke9ut61.png,https://i.redd.it/15kexfke9ut61.png,all_ads,6,,,,,,,,,,,,,,,,
2,[],False,quidiuris,,[],,,,text,t2_2n3ymh1l,False,False,[],False,False,1618710905,self.vegan,https://www.reddit.com/r/vegan/comments/mt3vde...,{},mt3vde,False,False,False,False,False,True,False,,,[],,,dark,text,False,False,True,0,0,False,all_ads,/r/vegan/comments/mt3vde/hair_conditioner_bars...,False,,,6,automod_filtered,1618710916,1,[removed],True,False,False,vegan,t5_2qhpm,590952,public,self,,,Hair conditioner bars - any recommendations?,0,[],1.0,https://www.reddit.com/r/vegan/comments/mt3vde...,,all_ads,6,,,,,,,,,,,,,,,,
3,[],False,lunanabiki,,[],,,,text,t2_bjhroi5v,False,False,[],False,False,1618710671,self.vegan,https://www.reddit.com/r/vegan/comments/mt3tar...,{},mt3tar,False,False,False,False,False,True,False,,,[],,,dark,text,False,False,True,0,0,False,all_ads,/r/vegan/comments/mt3tar/nonvegan_logic_that_s...,False,,,6,automod_filtered,1618710682,1,[removed],True,False,False,vegan,t5_2qhpm,590953,public,self,,,nonvegan logic that surprises me (that i haven...,0,[],1.0,https://www.reddit.com/r/vegan/comments/mt3tar...,,all_ads,6,,,,,,,,,,,,,,,,
4,[],False,pastelprincess1,,[],,,,text,t2_81akpcg0,False,False,[],False,False,1618710606,self.vegan,https://www.reddit.com/r/vegan/comments/mt3sqg...,{},mt3sqg,False,False,False,False,False,True,False,,,[],,,dark,text,False,False,True,0,0,False,all_ads,/r/vegan/comments/mt3sqg/sustainable_ethical_a...,False,self,"{'enabled': False, 'images': [{'id': 'k862a5mZ...",6,automod_filtered,1618710617,1,[removed],True,False,False,vegan,t5_2qhpm,590953,public,self,,,Sustainable + Ethical Alternatives Directory,0,[],1.0,https://www.reddit.com/r/vegan/comments/mt3sqg...,,all_ads,6,,,,,,,,,,,,,,,,


In [8]:
#check collected data volume
print(f'Total Rows: {vegan_df.shape[0]}\nTotal Columns: {vegan_df.shape[1]}')

#export csv for data cleaning
vegan_df.to_csv('../data/vegan.csv',index=False)

Total Rows: 9273
Total Columns: 84


### Keto Subreddit

In [11]:
keto = data_collect('keto')

In [12]:
#concat merged by page collection
keto_df = pd.concat(keto)

#check and drop duplicates
keto_df.drop_duplicates(subset='title',inplace=True)


Total Rows: 9557
Total Columns: 76


In [15]:
keto_df.head()

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,can_mod_post,contest_mode,created_utc,domain,full_link,gildings,id,is_crosspostable,is_meta,is_original_content,is_reddit_media_domain,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_class,link_flair_richtext,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media_only,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinned,post_hint,preview,pwls,removed_by_category,retrieved_on,score,selftext,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_subscribers,subreddit_type,thumbnail,title,total_awards_received,treatment_tags,upvote_ratio,url,whitelist_status,wls,author_flair_background_color,author_flair_text_color,author_flair_template_id,author_cakeday,crosspost_parent,crosspost_parent_list,url_overridden_by_dest,banned_by,edited,gilded,thumbnail_height,thumbnail_width,distinguished
0,[],False,Doo77m,,[],,text,t2_3eie5xox,False,False,[],False,False,1618714160,self.keto,https://www.reddit.com/r/keto/comments/mt4o1m/...,{},mt4o1m,False,False,False,False,False,True,False,,SOS,"[{'e': 'text', 't': 'Help'}]",715fe9ae-abe7-11e8-8938-0ef2dfd48a7e,Help,dark,richtext,False,False,True,0,0,False,all_ads,/r/keto/comments/mt4o1m/opinion_on_keto_ebook/,False,self,"{'enabled': False, 'images': [{'id': '0U59H9Ps...",6,moderator,1618714171,1,[removed],True,False,False,keto,t5_2rske,2367546,public,self,Opinion on Keto e-book,0,[],1.0,https://www.reddit.com/r/keto/comments/mt4o1m/...,all_ads,6,,,,,,,,,,,,,
1,[],False,Uzzay-69,,[],,text,t2_32eqarkg,False,False,[],False,False,1618713289,self.keto,https://www.reddit.com/r/keto/comments/mt4gl8/...,{},mt4gl8,False,False,False,False,False,True,False,,,[],,,dark,text,False,False,True,0,0,False,all_ads,/r/keto/comments/mt4gl8/first_keto_meal_hopefu...,False,self,"{'enabled': False, 'images': [{'id': 'Fb5utqKd...",6,moderator,1618713301,1,[removed],True,False,False,keto,t5_2rske,2367526,public,self,First Keto meal. Hopefully the first of many.,0,[],1.0,https://www.reddit.com/r/keto/comments/mt4gl8/...,all_ads,6,,,,,,,,,,,,,
2,[],False,makemillionsnow_,,[],,text,t2_bk1996jt,False,False,[],False,False,1618711447,self.keto,https://www.reddit.com/r/keto/comments/mt4071/...,{},mt4071,False,False,False,False,False,True,False,,,[],,,dark,text,True,False,True,0,0,False,all_ads,/r/keto/comments/mt4071/i_discovered_the_faste...,False,self,"{'enabled': False, 'images': [{'id': '-0-pH_Y3...",6,automod_filtered,1618711458,1,[removed],True,False,False,keto,t5_2rske,2367497,public,self,I DISCOVERED THE FASTEST WAY TO MELT 30 POUNDS...,0,[],1.0,https://www.reddit.com/r/keto/comments/mt4071/...,all_ads,6,,,,,,,,,,,,,
3,[],False,oscjr,,[],,text,t2_blpj2h66,False,False,[],False,False,1618710930,self.keto,https://www.reddit.com/r/keto/comments/mt3vkm/...,{},mt3vkm,False,False,False,False,False,True,False,,,[],,,dark,text,False,False,True,0,0,False,all_ads,/r/keto/comments/mt3vkm/sharing_one_case_of_su...,False,self,"{'enabled': False, 'images': [{'id': 'AzUGM-s6...",6,moderator,1618710941,1,[removed],True,False,False,keto,t5_2rske,2367493,public,self,Sharing one case of success with keto diet,0,[],1.0,https://www.reddit.com/r/keto/comments/mt3vkm/...,all_ads,6,,,,,,,,,,,,,
4,[],False,ghibliburrito,,[],,text,t2_4e2cpf6t,False,False,[],False,False,1618710487,self.keto,https://www.reddit.com/r/keto/comments/mt3rnr/...,{},mt3rnr,False,False,False,False,False,True,False,,,[],,,dark,text,False,False,True,0,0,False,all_ads,/r/keto/comments/mt3rnr/need_some_accountabili...,False,,,6,moderator,1618710498,1,[removed],True,False,False,keto,t5_2rske,2367488,public,self,"Need some accountability, starting tomorrow!",0,[],1.0,https://www.reddit.com/r/keto/comments/mt3rnr/...,all_ads,6,,,,,,,,,,,,,


In [13]:
#check collected data volume
print(f'Total Rows: {keto_df.shape[0]}\nTotal Columns: {keto_df.shape[1]}')

#export csv for data cleaning
keto_df.to_csv('../data/keto.csv',index=False)