# Premium ground vehicles

## Gather URLs from War Thunder wiki

Caches URLs in a text file. At some point this will use S3.

In [1]:
import os
import importlib
import utils
importlib.reload(utils)

urls_filename = os.path.join(utils.data_dir(), 'premium_ground_vehicles', 'urls.txt')
vehicle_urls = await utils.generate_premium_ground_vehicle_urls()
await utils.write_file(urls_filename, "\n".join(vehicle_urls))

print(await utils.read_file(urls_filename))

https://wiki.warthunder.com/LVT(A)(4)
https://wiki.warthunder.com/M2A4_(1st_Arm.Div.)
https://wiki.warthunder.com/M3A1_(USMC)
https://wiki.warthunder.com/Stuart_VI_(5th_CAD)_(USA)
https://wiki.warthunder.com/M8_LAC
https://wiki.warthunder.com/M8A1_GMC
https://wiki.warthunder.com/M18_%22Black_Cat%22
https://wiki.warthunder.com/Super_Hellcat
https://wiki.warthunder.com/T18E2
https://wiki.warthunder.com/T114
https://wiki.warthunder.com/M1128_Wolfpack
https://wiki.warthunder.com/Grant_I_(USA)
https://wiki.warthunder.com/M4A5
https://wiki.warthunder.com/Calliope
https://wiki.warthunder.com/T20
https://wiki.warthunder.com/M26_T99
https://wiki.warthunder.com/M26E1
https://wiki.warthunder.com/M46_%22Tiger%22
https://wiki.warthunder.com/T54E1
https://wiki.warthunder.com/Magach_3_(ERA)_(USA)
https://wiki.warthunder.com/M728_CEV
https://wiki.warthunder.com/XM-1_(GM)
https://wiki.warthunder.com/XM-1_(Chrysler)
https://wiki.warthunder.com/M1_KVT
https://wiki.warthunder.com/T14
https://wiki.warthund

## Gather HTML from vehicle URLs

This will cache the HTML from each vehicle's wiki URL so additional data points can be gathered later without additional scraping. At some point this will use S3.

In [2]:
import os
import asyncio
import aiohttp
import importlib
import utils
importlib.reload(utils)

vehicle_data_path = os.path.join(utils.data_dir(), 'premium_ground_vehicles')
urls_filename = os.path.join(vehicle_data_path, 'urls.txt')
urls = (await utils.read_file(urls_filename)).strip('\n').split('\n')

async with aiohttp.ClientSession() as session:
  html_path = os.path.join(vehicle_data_path, 'html')
  await asyncio.gather(*[utils.cache_vehicle_html(url, html_path, session=session) for url in urls])

## Extract ground vehicle data points

This extracts data points out of the previously cached HTML files.

Each data point has a corresponding extraction function, as the method to extract each data point can differ slightly. New data points require a corresponding extraction function.

In [15]:
import os
from concurrent.futures import ProcessPoolExecutor
import asyncio
import importlib
import utils
import extraction
import csv
import aiofiles

importlib.reload(utils)
importlib.reload(extraction)

html_path = os.path.join(utils.data_dir(), 'premium_ground_vehicles', 'html')
html_filenames = [os.path.join(html_path, f) for f in os.listdir(html_path) if os.path.isfile(os.path.join(html_path, f))]

async def extract_and_write_row(filename: str, writer, executor: ProcessPoolExecutor):
    html = await utils.read_file(filename)
    loop = asyncio.get_event_loop()
    extracted_data = await loop.run_in_executor(executor, extraction.GroundVehicleExtraction.extract_from_html, html)
    await writer.writerow(extracted_data)

csv_filename = os.path.join(utils.data_dir(), 'premium_ground_vehicles', 'data.csv')

async with aiofiles.open(csv_filename, mode='w') as f:
    w = csv.writer(f)
    await w.writerow(extraction.GroundVehicleExtraction.DATA_POINTS)
    with ProcessPoolExecutor() as pool:
        await asyncio.gather(*[extract_and_write_row(filename, w, pool) for filename in html_filenames])

print(f'Extracted {len(html_filenames)} vehicles')

Extracted 209 vehicles



The following vehicle data points will be gathered (tracking with ✅/❌):
- ✅ Vehicle name
- ✅ Country
- ✅ Classification of tank (light, medium, heavy, TD, AAA)
  - TODO: check that this selector works for all tanks
- ✅ Rank
- ✅ BR for each game mode
- ✅ Purchase (pull from "Purchase" on page, if exists)
  - Implement custom method to clean data
- ❌ Wheels vs treads
  - This may be hard to figure out, might have to cut it out
- ✅ Hull armor (front/side/back)
- ✅ Turret armor (front/side/back)
- ✅ Crew members
- ✅ Visibility
  - TODO: clean and make decimal
- ❌ Horizontal guidance
- ❌ Vertical guidance
- ❌ Is amphibious
- ❌ Forward speed (AB)
- ❌ Forward speed (RB/SB)
- ❌ Back speed (AB)
- ❌ Back speed (RB/SB)
- ❌ Engine power (AB)
- ❌ Engine power (RB/SB)
- ❌ Power-to-weight ratio (AB)
- ❌ Power-to-weight ratio (RB/SB)
- ❌ Weight (tons)
- ❌ Repair cost (AB)
- ❌ Repair cost (RB/SB)
- ❌ Crew training
- ❌ Crew training (Expert)
- ❌ Crew training (Aces)
- ❌ Crew training (Research Aces)
- ❌ Modifications list
- ❌ First stage ammunition amount (maybe?)
- ❌ Reload time
- ❌ Max ammo
- ❌ Has stabilizer
- ❌ Fire rate
- ❌ Ammunitions
  - ❌ name
  - ❌ type
  - ❌ pen @ 10m
  - ❌ pen @ 100m
  - ❌ pen @ 500m
  - ❌ pen @ 1000m
  - ❌ pen @ 1500m
  - ❌ pen @ 2000m
  - ❌ projectile velocity
  - ❌ projectile mass
  - ❌ fuse delay
  - ❌ fuse sensitivity
  - ❌ explosive mass
  - ❌ degrees richochet 0% chance 
  - ❌ degrees richochet 50% chance 
  - ❌ degrees richochet 100% chance 
- ❌ coax machine gun caliber
- ❌ has mounted MG