# Problem Statement

As part of the commercial analytics team of Nike, we notice that it is a challenge to keep up with consumers conversation with multiple collections launching every season. Our stakeholder, Merchandise Planning department and Marketing department has tasked us to find a solution to help with their planning for the future season. The solution needs to be able to:

- Understand current competitor trend
- Understand current consumer response/comments
- Find out if the merchandise they had planned that would have high demand are accurate based on consumers comments
- Duplicated/reused for every season(dynamic/scalable solution)
- Differentiate Nike posts from competitor posts using a classification model

# 1. Data Collection

In [5]:
import requests
import pandas as pd

Create a function to pre-filter out the post that are less than 20 words while collecting data to streamline data cleaning process.

In [12]:
def more_than_20(data):
    filter_data = []
    #looping from 0 to 249
    for row in data:
#         print(row['selftext'])
        if 'selftext' in row and row['selftext'] != '':
            # find title and value
            text_word_count = len(row['selftext'].split())
            if text_word_count >= 20:
                filter_data.append(row)
        else:
            title_word_count = len(row['title'].split())
            if title_word_count >= 20:
                filter_data.append(row)
            else:
                continue
    return filter_data

Incoporate the function into our recursive loop and cut of at 5000 posts

In [11]:
# params:
# init dataframe
# maximum date
# target posts
# min query size
async def recursive_get_data(dataframe='', subreddit='', target_post='', latest_date='', size=250):
    # api url
    url = 'https://api.pushshift.io/reddit/search/submission'
    params = {'subreddit': subreddit, 'size': size, 'before': latest_date} 
    try:
        res = requests.get(url, params)

        if res.status_code == 200:
            data = res.json()
            posts = data['data']
            # filter posts to specification (>= 20 words)
            filtered_post = more_than_20(posts)
            df = pd.DataFrame(data=filtered_post, columns=['author','subreddit','selftext','title','created_utc'])

            # set next function call max date arg
            latest_date = posts_df.created_utc.min()
            dataframe = pd.concat([dataframe, df], axis=0)

            if len(dataframe) >= target_post:
                dataframe.reset_index(drop= True, inplace= True) #reset the index
                return dataframe.iloc[:5000, :]
            else:
                print(f'New DataFrame Size: {len(dataframe)}')
                return await recursive_get_data(dataframe=dataframe, 
                                                latest_date=latest_date, 
                                                target_post=target_post, 
                                                size=size, 
                                                subreddit=subreddit)
    except Exception as e:
        print(f'Something went wrong: {e}')

In [13]:
init_df = pd.DataFrame()

In [14]:
nike = await recursive_get_data(dataframe=init_df, subreddit='Nike', target_post=5000)

New DataFrame Size: 66
New DataFrame Size: 116
New DataFrame Size: 166
New DataFrame Size: 216
New DataFrame Size: 266
New DataFrame Size: 316
New DataFrame Size: 366
New DataFrame Size: 416
New DataFrame Size: 466
New DataFrame Size: 516
New DataFrame Size: 566
New DataFrame Size: 616
New DataFrame Size: 666
New DataFrame Size: 716
New DataFrame Size: 766
New DataFrame Size: 816
New DataFrame Size: 866
New DataFrame Size: 916
New DataFrame Size: 966
New DataFrame Size: 1016
New DataFrame Size: 1066
New DataFrame Size: 1116
New DataFrame Size: 1166
New DataFrame Size: 1216
New DataFrame Size: 1266
New DataFrame Size: 1316
New DataFrame Size: 1366
New DataFrame Size: 1416
New DataFrame Size: 1466
New DataFrame Size: 1516
New DataFrame Size: 1566
New DataFrame Size: 1616
New DataFrame Size: 1666
New DataFrame Size: 1716
New DataFrame Size: 1766
New DataFrame Size: 1816
New DataFrame Size: 1866
New DataFrame Size: 1916
New DataFrame Size: 1966
New DataFrame Size: 2016
New DataFrame Size: 

In [15]:
nike

Unnamed: 0,author,subreddit,selftext,title,created_utc
0,yeet_zeehond,Nike,Im trying to login with paypal on nike snkrs b...,nike snkrs paypal error,1668065959
1,Alone_Painter_7326,Nike,,Our local courier here in the Philippines lost...,1668060481
2,CumMilkshake69,Nike,,What jacket is this? I know this is vintage an...,1668048611
3,Rudy5860,Nike,So I’m just getting into sneakers and have bou...,Burned on the dunk drop today.,1668036353
4,ness-main,Nike,I got a pair of air forces that I wear almost ...,Is there anyway to restore the grip on a pair ...,1668029200
...,...,...,...,...,...
4995,TONYFAWNTANAA,Nike,,"Hello everyone, cam you please subscribe to my...",1667170337
4996,Voodaji,Nike,I don't know if this is the right subreddit fo...,Squeaky shoes,1667155551
4997,KanniPro,Nike,,Please give suggestions :) I know nothing abou...,1667140982
4998,NesquikBoi,Nike,I've been wearing Air Max 95's for some time n...,Is Air Max Plus sizing identical as Air Max 95...,1667140117


In [16]:
adidas = await recursive_get_data(dataframe=init_df, subreddit='adidas', target_post=5000)

New DataFrame Size: 88
New DataFrame Size: 176
New DataFrame Size: 264
New DataFrame Size: 352
New DataFrame Size: 440
New DataFrame Size: 528
New DataFrame Size: 616
New DataFrame Size: 704
New DataFrame Size: 792
New DataFrame Size: 880
New DataFrame Size: 968
New DataFrame Size: 1056
New DataFrame Size: 1144
New DataFrame Size: 1232
New DataFrame Size: 1320
New DataFrame Size: 1408
New DataFrame Size: 1496
New DataFrame Size: 1584
New DataFrame Size: 1672
New DataFrame Size: 1760
New DataFrame Size: 1848
New DataFrame Size: 1936
New DataFrame Size: 2024
New DataFrame Size: 2112
New DataFrame Size: 2200
New DataFrame Size: 2288
New DataFrame Size: 2376
New DataFrame Size: 2464
New DataFrame Size: 2552
New DataFrame Size: 2640
New DataFrame Size: 2728
New DataFrame Size: 2816
New DataFrame Size: 2904
New DataFrame Size: 2992
New DataFrame Size: 3080
New DataFrame Size: 3168
New DataFrame Size: 3256
New DataFrame Size: 3344
New DataFrame Size: 3432
New DataFrame Size: 3520
New DataFram

In [17]:
adidas

Unnamed: 0,author,subreddit,selftext,title,created_utc
0,KiryuuShino,adidas,Hey ive been wondering if the NMD v3 is tts? F...,Adidas NMD V3 sizing. Is it TTS (True to size) ?,1668067730
1,kurokageidris,adidas,,I was wondering if you can use this Adidas Str...,1668053913
2,ten-lbs-over,adidas,,Anyone know what shoes these are? Worn by Evan...,1668053651
3,UltramanTiga_52,adidas,My TTS is US9 (27cm). Narrow feet\n\nWhat size...,Adilette 22 sizing,1668048085
4,Jpdelgado,adidas,It’s been crazy to get a simple answer from th...,Custom Gear (Adiclub),1668040604
...,...,...,...,...,...
4995,Ok_Improvement_2429,adidas,"Just what the title says, I love my Ultra 22s ...",Ultraboost 22 vs 4DFWD2?,1666463989
4996,StrictCardiologist7,adidas,So yesterday I won the yeezy 450’s off confirm...,No order confirmation,1666458452
4997,Alternative_Coconut6,adidas,,Im thinking about buying the firebird primeblu...,1666440159
4998,AgitatedWhereas7384,adidas,I’m thinking of getting the 4DFWD. Any views o...,Adidas 4DFWD Long term review needed,1666420005


#### Final Collection
- Nike data was posted between 30 October 2022 and 10 November 2022
- Adidas data was posted between 22 October 2022 and 10 November 2022


In [18]:
nike.to_csv('../datasets/nike.csv', index=False)

In [19]:
adidas.to_csv('../datasets/adidas.csv', index=False)