# Initial Data Creation

This notebook contains scripts for the initial setup for the data in my Splatoon 3 data science project. None of this is something that I would include in the final project, nor would I feel this appropriate to put in a `utils` file. Nevertheless, I wanted a place to save this work in case I needed it later, as well as to show my initial start to the project as I was getting the data that I was missing. 

## Required Packages

In [1]:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import os
import pandas as pd
import json
import requests

## Getting My Initial Battle Data

To start, I downloaded my initial battle data from my initial account. I navigated to my profile page using `selenium` and then parsed the json file using `pandas` which then let me easily save to a csv file.

In [3]:
# read the secrets information
with open("secrets.json") as f:
    secrets = json.load(f)

# set the variables accordingly
username = secrets['USERNAME']
password = secrets['PASSWORD']
api_key = secrets['API_KEY']

# initialize the web driver
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',options=chrome_options)
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': "./data"}}
command_result = driver.execute("send_command", params)

# Go to the json website
driver.get("https://stat.ink/user/download3?type=user-json")

# Find the username/email field and send the username to the input field.
uname = driver.find_element("id", "loginform-screen_name") 
uname.send_keys(username)

# Find the password input field and send the password to the input field.
pword = driver.find_element("id", "loginform-password") 
pword.send_keys(password)

# Click sign in button to login the website.
driver.find_element("xpath", "/html/body/main/div/div[1]/div[1]/div/div[2]/form/div[4]/button").click()

# Wait for login process to complete. 
WebDriverWait(driver=driver, timeout=10).until(
    lambda x: x.execute_script("return document.readyState === 'complete'")
)

# Verify that the login was successful.
error_message = "Incorrect username or password."
# Retrieve any errors found. 
errors = driver.find_elements(By.CLASS_NAME, "flash-error")

# When errors are found, the login will fail. 
if any(error_message in e.text for e in errors): 
    print("[!] Login failed")
else:
    print("[+] Login successful")

# give the driver time to download the file, then close
time.sleep(5)
driver.close()

# unpack the .gz file so we can read the JSON objects
current_dir = os.getcwd()
os.chdir('./data')
os.system('gzip -d statink-super64guy.json.gz')
os.system('rm -rf statink-super64guy.json.gz')
os.chdir(current_dir)


df = pd.DataFrame()
with open("./data/statink-super64guy.json") as f:
    for json_obj in f:
        data = json.loads(json_obj)
        tmp_df = pd.DataFrame.from_dict(data, orient="index").T
        df = pd.concat([df, tmp_df])

print(df.head)
df.to_csv('./data/statink-super64guy-battle.csv', index=False)
time.sleep(3)
os.remove('./data/statink-super64guy.json')

[+] Login successful
<bound method NDFrame.head of                                       id  \
0   f1e7fb92-7141-40c8-b2ba-6bc6e7c50282   
0   4522004d-d1aa-45f2-b1c6-18ee954e48f4   
0   1c77c7ee-4e67-44d4-9912-07b0f162cc17   
0   884431e9-22ac-4afd-a379-061b67cadaae   
0   550bbe6b-77dc-4922-9f1f-af1b98885da7   
..                                   ...   
0   e499ff4e-6fa6-4a0b-b59b-78e07c9062a9   
0   b5f5c855-2d94-479b-a3aa-95d92e0fda3a   
0   cf370a7c-4743-4b3c-adbb-815c9a0ce958   
0   805ac97f-5d8b-489d-8606-59c0c190f8ce   
0   b0e0f7d5-e41b-4a24-aed8-304d46ee1e3d   

                                                  url  \
0   https://stat.ink/@super64guy/spl3/f1e7fb92-714...   
0   https://stat.ink/@super64guy/spl3/4522004d-d1a...   
0   https://stat.ink/@super64guy/spl3/1c77c7ee-4e6...   
0   https://stat.ink/@super64guy/spl3/884431e9-22a...   
0   https://stat.ink/@super64guy/spl3/550bbe6b-77d...   
..                                                ...   
0   https://stat.ink/

## Getting My Initial Salmon Run Battle Data

Getting my initial Salmon Run data is way easier. There's a JSON endpoint where I can get every salmon run battle that I have done. The parsing of the json can then be done with `pandas` which can also be used to convert back to a csv file.

In [2]:
# read the secrets information
with open("secrets.json") as f:
    secrets = json.load(f)

# set the variables accordingly
username = secrets['USERNAME']
password = secrets['PASSWORD']
api_key = secrets['API_KEY']

# make the request for the Salmon Run JSON
r = requests.get('https://stat.ink/@super64guy/salmon3.json')
json_obj = json.loads(r.text)

# generate the df
df = pd.DataFrame()

for obj in json_obj:
    tmp_df = pd.DataFrame.from_dict(obj, orient="index").T
    df = pd.concat([df, tmp_df])

print(df.head(5))
df.to_csv('./data/statink-super64guy-salmonrun.csv', index=False)

                                     id  \
0  75577082-0fd7-4471-9be2-05df1139b0d7   
0  345d8a1a-e5d6-452f-aecf-d4ce76310bd7   
0  22613700-9080-4fb7-adfc-f250c07d3c62   
0  2f1c2fe6-1757-4439-bdbe-81203018980b   
0  876d47c9-d931-4f0c-8dcc-02f1e63c7ef6   

                                                 url  \
0  https://stat.ink/@super64guy/salmon3/75577082-...   
0  https://stat.ink/@super64guy/salmon3/345d8a1a-...   
0  https://stat.ink/@super64guy/salmon3/22613700-...   
0  https://stat.ink/@super64guy/salmon3/2f1c2fe6-...   
0  https://stat.ink/@super64guy/salmon3/876d47c9-...   

                                                user  \
0  {'id': 15092, 'name': 'super64guy', 'screen_na...   
0  {'id': 15092, 'name': 'super64guy', 'screen_na...   
0  {'id': 15092, 'name': 'super64guy', 'screen_na...   
0  {'id': 15092, 'name': 'super64guy', 'screen_na...   
0  {'id': 15092, 'name': 'super64guy', 'screen_na...   

                                   uuid private big_run  \
0  f5af5

## Getting The Initial Worldwide Data

There are [public csv files](https://dl-stats.stat.ink/splatoon-3/battle-results-csv/) that have every battle that has occurred since September 26th (technically Splatoon 3 released on September 9th but the tool that I am using to pull data wasn't functioning until a few weeks after development). Additionally, there is a `battle-results.zip` file that can be used to download all the current data to date. I simply manually downloaded this file and extracted the resulting csv files into the `./data/worldwide` directory.

## Manually Gathering My Data

Using the functions that I wrote in my `pull_latest_data.py` script, I can use already designed functions to manually import my data. Until my CRON job is running, I run each of these cells so that I still have my data.

This cell also tests that the inserting of the data works as intended.

In [3]:
import pull_latest_data

# stuff to help with testing
b_ids = pull_latest_data.get_missing_battle_ids()
s_ids = pull_latest_data.get_missing_salmon_run_ids()
yesterday = pull_latest_data.get_yesterday_date()

# run the actual code
pull_latest_data.download_my_battle_data()
pull_latest_data.get_missing_salmon_run_data()
pull_latest_data.get_worldwide_data()

# check that the code works
battle_ids = pd.read_csv('./data/statink-super64guy.csv')['id'].tolist()
salmon_ids = pd.read_csv('./data/statink-super64guy-salmonrun.csv')['uuid'].tolist()
battle_flag = True
salmon_flag = True

for b in b_ids:
    if b not in battle_ids:
        print('ADDING PERSONAL BATTLE DATA FAILED')
        battle_flag = False
        break

if battle_flag:
    print('ADDING PERSONAL BATTLE DATA PASSED')

for s in s_ids:
    if s not in salmon_ids:
        print('ADDING SALMON RUN DATA FAILED')
        salmon_flag = False
        break

if salmon_flag:
    print('ADDING SALMON RUN DATA PASSED')

if os.path.exists('./data/worldwide/'+yesterday+'.csv'):
    print('ADDING WORLDWIDE DATA PASSED')
else:
    print('ADDING WORLDWIDE DATA FAILED')

[+] Login successful


  df = pd.concat([df, tmp_df])


ADDING PERSONAL BATTLE DATA PASSED
ADDING SALMON RUN DATA PASSED
ADDING WORLDWIDE DATA PASSED
