# Project 3: Web APIs & Classification Part 1

## Problem Statement

The Pokemon Company would like to understand what gamers that play the different Pokemon games are discussing online on various forums. The analysis of these discussions can help to further improve updates to the games or the development of future games. The first step in this analysis would be the successful classification of online posts with regards to the correct Pokemon game that the post is talking about.

## Executive Summary

The first Pokemon game was dropped in 1996. The main series of the Pokemon is now into its 8th generation and the platforms these games are played on have evolved over the years as well. Including other spin-offs as well as mobile apps and PC games, there are likely to be a few hundred Pokemon games today. This project aims to study reddit posts associated with two of these games, Pokemon GO and Pokken, with the goal of developing a classification model to decide which game the posts are associated with.

## Importing data

In [1]:
import requests
import pandas as pd
import time
import random

In [2]:
url1 = 'https://www.reddit.com/r/pokemongo/new.json'

In [3]:
url2 = 'https://www.reddit.com/r/PokkenGame/new.json'

In [4]:
posts_pogo = []
after1 = None

for i in range(39):
    if after1 == None:
        current_url1 = url1
    else:
        current_url1 = url1 + '?after=' + after1
    print(current_url1)
    res1 = requests.get(current_url1, headers={'User-agent': 'Greninja Inc 1.0'})
    
    if res1.status_code != 200:
        print('Status error', res1.status_code)
        break
    
    current_dict1 = res1.json()
    current_posts1 = [p['data'] for p in current_dict1['data']['children']]
    posts_pogo.extend(current_posts1)
    after1 = current_dict1['data']['after']
    
    # generate a random sleep duration to look more 'natural'
    sleep_duration1 = random.randint(2,60)
    print(sleep_duration1)
    time.sleep(sleep_duration1)

https://www.reddit.com/r/pokemongo/new.json
15
https://www.reddit.com/r/pokemongo/new.json?after=t3_k46crw
23
https://www.reddit.com/r/pokemongo/new.json?after=t3_k44g8i
43
https://www.reddit.com/r/pokemongo/new.json?after=t3_k43283
11
https://www.reddit.com/r/pokemongo/new.json?after=t3_k41idl
27
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3ydg0
14
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3uis9
12
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3r52k
18
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3nsuz
36
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3huze
12
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3egmn
60
https://www.reddit.com/r/pokemongo/new.json?after=t3_k3atas
44
https://www.reddit.com/r/pokemongo/new.json?after=t3_k36563
54
https://www.reddit.com/r/pokemongo/new.json?after=t3_k30z5b
58
https://www.reddit.com/r/pokemongo/new.json?after=t3_k2y1c3
4
https://www.reddit.com/r/pokemongo/new.json?after=t3_k2vgex
59
https://w

In [5]:
pogo_df = pd.DataFrame(posts_pogo)

In [6]:
pogo_df.to_csv('pogo.csv', index = False)

In [7]:
posts_pokken = []
after2 = None

for j in range(39):
    if after2 == None:
        current_url2 = url2
    else:
        current_url2 = url2 + '?after=' + after2
    print(current_url2)
    res2 = requests.get(current_url2, headers={'User-agent': 'Greninja Inc 1.0'})
    
    if res2.status_code != 200:
        print('Status error', res2.status_code)
        break
    
    current_dict2 = res2.json()
    current_posts2 = [q['data'] for q in current_dict2['data']['children']]
    posts_pokken.extend(current_posts2)
    after2 = current_dict2['data']['after']
    
    # generate a random sleep duration to look more 'natural'
    sleep_duration2 = random.randint(2,60)
    print(sleep_duration2)
    time.sleep(sleep_duration2)

https://www.reddit.com/r/PokkenGame/new.json
42
https://www.reddit.com/r/PokkenGame/new.json?after=t3_ju6idy
32
https://www.reddit.com/r/PokkenGame/new.json?after=t3_j889c3
5
https://www.reddit.com/r/PokkenGame/new.json?after=t3_imwoia
2
https://www.reddit.com/r/PokkenGame/new.json?after=t3_icto61
46
https://www.reddit.com/r/PokkenGame/new.json?after=t3_i6zjqj
47
https://www.reddit.com/r/PokkenGame/new.json?after=t3_i2l55h
26
https://www.reddit.com/r/PokkenGame/new.json?after=t3_i0x7tk
35
https://www.reddit.com/r/PokkenGame/new.json?after=t3_hyvx1s
42
https://www.reddit.com/r/PokkenGame/new.json?after=t3_hq4s9v
35
https://www.reddit.com/r/PokkenGame/new.json?after=t3_hgcixe
4
https://www.reddit.com/r/PokkenGame/new.json?after=t3_gq36iz
45
https://www.reddit.com/r/PokkenGame/new.json?after=t3_gdcb4a
45
https://www.reddit.com/r/PokkenGame/new.json?after=t3_ftgp4z
2
https://www.reddit.com/r/PokkenGame/new.json?after=t3_f6v59n
38
https://www.reddit.com/r/PokkenGame/new.json?after=t3_eqscho

In [8]:
pokken_df = pd.DataFrame(posts_pokken)

In [9]:
pokken_df.to_csv('pokken.csv', index = False)