# <span style="color:blue">Project X4 </span>

### Index 

* [Introduction](#id_0)  
  * [Libraries](#id_1)
  * [Description](#id_1a)
  * [Files](#id_1b)
  * [Functions](#id_1c)
* [Getting data](#id_2)
  * [Create the list of needed files](#id_2a)  
  * [How ship names are stored](#id_2c)
  * [Creating the data table for all ships](#id_2d)
    * [Making the table](#id_2d_1)
    * [Data clearance](#id_2d_2)
  * [Code to find names from descriptions](#id_2e)
  * [Finding data about ship configuration](#id_2f) 
* [Getting data about ship modules](#id_3)
  * [Shields](#id_3a) 
  * [Engines and thrusters](#id_3b)
    * [Engines](#id_3b_1)
    * [Thrusters](#id3b_2)
* [Saving tables](#id_4)

## Introduction <a class="anchor" id="id_0"></a> 

### Libraries <a class="anchor" id="id_1"></a> 

In [61]:
import pandas as pd
import os
import re
from bs4 import BeautifulSoup
import lxml
pd.set_option('display.max_columns', None)
#pd.set_option('max_colwidth', 300)

### Description <a class="anchor" id="id_1a"></a> 

X4: Foundations - space sandbox game, where a player can build factories, trade, wage ware, board ships and complete quests in an evolving world with beatiful graphic and visual effects. The game was released by the company Egosoft in 2018 and byhte moment has 4 dlcs.  
Ship data gathered for all ships accessible or not to a player.  
Shield data gathered for all types of shields.  
Engine data gathered for all ships, except mines, drones, missiles, spacesuits, xs_objects.

### Files <a class="anchor" id="id_1b"></a> 

In [62]:
#Folders with game objects
units_folder = r'H:\Steam\X4\Unpacked\assets\units' # The main folder with xml files containing data about game objects
boron_dlc_folder = r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units' # The folder with game objects from dlc "Kingdom End"
avarice_dlc_folder = r'H:\Steam\X4\Unpacked\extensions\avarice_dlc\assets\units' # The folder with game objects from dlc "Tides of Avarice"
terran_dlc_folder = r'H:\Steam\X4\Unpacked\extensions\terran_dlc\assets\units'  # # The folder with game objects from dlc "Cradle of humanity"
split_dlc_folder = r'H:\Steam\X4\Unpacked\extensions\split_dlc\assets\units' # The folder with game objects from dlc "Split Vendetta"
# Folders with text informations (names, descriptions etc)
descriptions_eng = r'H:\Steam\X4\Unpacked\t\0001-l044.xml' #for english version
descriptions_ru = r'H:\Steam\X4\Unpacked\t\0001-l007.xml' #for  russian version
# A very problematic file without useful information but caused many errors
problemic_file = r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_l\ship_bor_l_miner_solid_01_macro.xml'

save_folder = r'H:\Steam\X4\SavedTables' # Folder to save extracted tables

Shield data catalogs:

In [63]:
#base game catalog with ship equipment
main_catalog = r'h:/Steam/X4/Unpacked/assets/props'
# Base game shields catalog
cat_shields = r'h:\Steam\X4\Unpacked\assets\props\SurfaceElements\macros'
#dlc ships shields 
split_shields = r'h:\Steam\X4\Unpacked\extensions\split_dlc\assets\props\surfaceelements\macros'
avarice_shields = r'h:\Steam\X4\Unpacked\extensions\avarice_dlc\assets\props\surfaceelements\macros'
terran_shields = r'h:\Steam\X4\Unpacked\extensions\terran_dlc\assets\props\surfaceelements\macros'
boron_shields = r'h:\Steam\X4\Unpacked\extensions\boron_dlc\assets\props\surfaceelements\macros'

Engine data files:

In [64]:
# Base game filepath to the catalog with engine and thruster files
cat_engines  = r'h:/Steam/X4/Unpacked/assets/props/Engines/macros'
#dlc ships engine catalogs 
split_engines = r'h:\Steam\X4\Unpacked\extensions\split_dlc\assets\props\Engines\macros'
avarice_engines = r'h:\Steam\X4\Unpacked\extensions\avarice_dlc\assets\props\engines\macros'
terran_engines = r'h:\Steam\X4\Unpacked\extensions\terran_dlc\assets\props\engines\macros'
boron_engines = r'h:\Steam\X4\Unpacked\extensions\boron_dlc\assets\props\engines\macros'

0001-l044.xml - dictionary of names and descriptions for english version.  
For example   the string
```   
 <page id="20101" title="Ships" descr="Names and descriptions of ships" voice="yes">
 ```    
have information about  names of ships {pade_id, ship_id}.   
The ship Albatros Vanguard has name stored in the form  like "{20101,21102}: 
    ```   
    <identification name="{20101,21102}" basename="{20101,21101}" description="{20101,21112}" variation="{20111,1101}" shortvariation="{20111,1103}" icon="ship_xl_build_01" /> ```
   where page id = 20101 and name id = 21102.

### Functions <a class="anchor" id="id_1c"></a> 

In [65]:
#Function to level list of lists to one dimensional list
def flatten(l):
    return [item for sublist in l for item in sublist]

In [66]:
# Fuction to show missed data in a table
def missing_values_tab(df):
    #Count missed values
    mis_val = df.isnull().sum()
    #Count share of missed values of all data 
    mis_val_percent = round(100 * df.isnull().sum() / len(df),2)
    
    mv_table = pd.concat([mis_val, mis_val_percent], axis=1)
    
    mv_table = mv_table.rename(columns = {0 : 'missed_values', 1 : '%_of_all'})
    
    mv_table['data_type'] = df.dtypes
    mv_table = mv_table[mv_table.iloc[:,1]!=0].sort_values(by='missed_values',ascending=False)
    print ("Dataframe contains " + str(df.shape[1]) + " columns and " + str(df.shape[0]) + " strings.\n")
    print("It has  " + str(mv_table.shape[0]) +" columns with missed values")    
    return display(mv_table) 

## Getting the data <a class="anchor" id="id_2"></a> 

### Create the list of needed files <a class="anchor" id="id_2a"></a> 

In [67]:
list_of_folders = [units_folder, boron_dlc_folder, avarice_dlc_folder, terran_dlc_folder, split_dlc_folder]
list_of_files = [] # list of  files
list_of_XMLfiles = [] #list of xml files
for folder in list_of_folders:
    for root, dirs, files in os.walk(folder):
        for file in files:
            if file.endswith(".xml"):
                list_of_XMLfiles.append( file)
                #print(os.path.join(root, file))
                list_of_files.append(os.path.join(root, file))
print('List of folders in units_folder: ', os.listdir(units_folder))

List of folders in units_folder:  ['size_l', 'size_m', 'size_s', 'size_xl', 'size_xs', 'xref_parts']


In [68]:
#Remove unnecessary files
skip_files = [r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_l\ship_bor_l_destroyer_01_macro.xml', 
r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_l\ship_bor_l_miner_solid_01_macro.xml',
r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_l\ship_bor_l_trans_container_01_macro.xml',
r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_s\macros\ship_bor_s_miner_solid_01_story_macro.xml',
r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_xl\ship_bor_xl_carrier_01_macro.xml',
r'H:\Steam\X4\Unpacked\extensions\boron_dlc\assets\units\size_l\ship_bor_l_miner_solid_01_macro.xml']
for file in list_of_files:
    if file in skip_files:
        list_of_files.remove(file)
print(len(list_of_files))

725


Needed tags: 
```
<explosiondamage shield="5000" value="1000"/>
<storage missile="160" unit="10"/>
<hull max="93000"/>
<secrecy level="2"/>
<purpose primary="fight"/>
<people capacity="44"/>
<shipdetail ref="shipdetail_ship_l_01"/>
<physics mass="196.016">
<inertia pitch="96.271" roll="77.016" yaw="96.271"/>
<drag forward="99.004" horizontal="73.005" pitch="106.203" reverse="396.016" roll="106.203" vertical="73.005" yaw="106.203"/>
</physics>
<thruster tags="large"/>
<ship type="destroyer"/>
</properties> 
``` 
- Equipment 
```
<software>
<software compatible="1" ware="software_dockmk2"/>
<software default="1" ware="software_flightassistmk1"/>
<software default="1" ware="software_scannerlongrangemk1"/>
<software compatible="1" ware="software_scannerlongrangemk2"/>
<software default="1" ware="software_scannerobjectmk1"/>
<software compatible="1" ware="software_scannerobjectmk2"/>
<software default="1" ware="software_targetmk1"/>
<software compatible="1" ware="software_trademk1"/>
</software> 
```
```
      
<connection ref="con_storage01">  # ссылка на файл, содержащий значение емкости грузового отсека
    <macro ref="storage_arg_l_destroyer_01_a_macro" connection="ShipConnection" />
</connection>   
    ```

The tags to include into table:

In [69]:
needed_tags = ['macro', 'explosiondamage', 'storage', 'hull', 'secrecy', 'purpose', 'people', 'physics', 'inertia', 'drag', 'ship']

In [70]:
list_of_ships = [] # список файлов с данными кораблей
list_of_ships_configs = [] #список файлов с данными по геометрии кораблей, нужен для подсчета слотов оборудования
keywords = ['ship','macro','spacesuit','cockpit','storage', 'thruster']
for file in list_of_files:
    if keywords[0] in file and keywords[1] in file and keywords[2] not in file:
        list_of_ships.append(file)
    elif (
        keywords[0] in file and keywords[1] not in file and keywords[3] not in file and keywords[4] not in file 
                and keywords[5] not in file 
    ):
        list_of_ships_configs.append(file)

print(len(list_of_ships))     
print(len(list_of_ships_configs))  

321
260


### How ship names are stored <a class="anchor" id="id_2c"></a>

0001-l044.xml - file with textual information for english version.
The information is divided into pages, each of them contains multiple ids. For example the string  
  
```<page id="20101" title="Ships" descr="Names and descriptions of ships" voice="yes"> ```  
    contains the page id responsible for ship names in the form of {pade_id, ship_id}. The ship Albatros Vanguard have name="{20101,21102}:

```<identification name="{20101,21102}" basename="{20101,21101}" description="{20101,21112}" variation="{20111,1101}" shortvariation="{20111,1103}" icon="ship_xl_build_01" />```  
where  
 20101 - page id, 21102 - id on that page:  
 
```<t id="21101">Albatross</t>```</br>
```<t id="21102">(Albatross Vanguard){20101,21101} {20111,1101}</t>```

In [71]:
with open(descriptions_eng, 'r') as user_file:
    file_contents = user_file.read()
#Create Beatiful soup object of the file     
descriptions_Sobject = BeautifulSoup(file_contents, features="xml")

list_of_tags_in_textfile = []
for tag in descriptions_Sobject.findAll('page', attrs = {'id':20101,'title':'Ships','descr':'Names and descriptions of ships'}):    
    list_of_tags_in_textfile.append(tag)    
#print(list_of_tags_in_textfile)

### Creating the data table for all ships <a class="anchor" id="id_2d"></a>

#### Making the table <a class="anchor" id="id_2d_1"></a>

In [72]:
## 
temp = [] # list of lists with ship data
text_ids_list = [] # dictionary with lists of  page_id and name_id
files = [] # list with filenames
print(len(list_of_ships))
for file in list_of_ships:
     if problemic_file in file:
         continue
     else:
               
        ship_data = {} 
        with open(file, 'r') as user_file:
            file_contents = user_file.read()
        #Create Beatiful soup object of the file     
            file_BStree = BeautifulSoup(file_contents, features="xml")
        
       
            for tag in file_BStree.findAll():
            # looking for needed tags
                if tag.name in needed_tags and not'ref' in tag.attrs:                
                    ship_data[tag.name] = tag.attrs 
                ## Create list of names
            ship_data['component'] = file_BStree.select_one('component').get('ref')
            temp.append(ship_data) 
            files.append(file)
            for tag in file_BStree.findAll('identification'):                  
                   
                    D = {}
                    name_string = tag.attrs['name']
                    page_id = re.findall('\d+',name_string)[0]
                    name_id = re.findall('\d+',name_string)[1]
                    D['pageID'] = page_id
                    D['nameID'] = name_id
                    text_ids_list.append(D)       

## Checking for equal length:
if len(temp) == len(text_ids_list):
    print('Good')
else:
    print('The lists have different length')  
    print(len(temp), len(text_ids_list))
#print(text_ids_list)

321
Good


Finding ship names from text_ids_lists:

In [73]:
ship_names = []
names_ids_list =  []
for element in text_ids_list:    
    ex = descriptions_Sobject.find('page', attrs = {'id':20101, 'title':'Ships'}).find('t', attrs = {'id':element['nameID']})
    try: 
        name = re.search('\D+',ex.text).group()
        name = re.split('[^A-Za-z]',name)
        name = list(filter(None, name))
        new_name = " ".join([ele for ele in name if  ele[0].isupper()])
    except AttributeError:
        continue
    finally:
        ship_names.append(new_name)
        names_ids_list.append(element['nameID'])
    
#print(ship_names)   

In [74]:
## Creating table    
ship_data = pd.DataFrame.from_dict(temp)
#ship_data.columns = columns4 
display(ship_data.head(2))

Unnamed: 0,macro,explosiondamage,storage,hull,secrecy,purpose,people,physics,inertia,drag,ship,component
0,"{'name': 'ship_arg_l_destroyer_01_a_macro', 'c...","{'value': '1000', 'shield': '5000'}","{'missile': '160', 'unit': '10'}",{'max': '93000'},{'level': '2'},{'primary': 'fight'},{'capacity': '44'},{'mass': '196.016'},"{'pitch': '96.271', 'yaw': '96.271', 'roll': '...","{'forward': '99.004', 'reverse': '396.016', 'h...",{'type': 'destroyer'},ship_arg_l_destroyer_01
1,"{'name': 'ship_arg_l_destroyer_01_b_macro', 'c...","{'value': '1000', 'shield': '5000'}","{'missile': '160', 'unit': '10'}",{'max': '111000'},{'level': '2'},{'primary': 'fight'},{'capacity': '36'},{'mass': '235.22'},"{'pitch': '103.378', 'yaw': '103.378', 'roll':...","{'forward': '108.805', 'reverse': '435.22', 'h...",{'type': 'destroyer'},ship_arg_l_destroyer_01


In [75]:
component = ship_data['component']

In [76]:
columns_to_expand = ['macro', 'explosiondamage',	'storage',	'hull',	 'secrecy',	'purpose',	'people',	'physics',	'inertia',	'drag','ship']
for column in columns_to_expand:
    ship_data = pd.concat([ship_data.drop([column], axis = 1), ship_data[column].apply(pd.Series)], axis = 1)   

In [77]:
ship_data = ship_data.drop(['alias',0,'time','countermeasure'],axis = 1)
ship_data['component'] = component
display(ship_data.head(2))

Unnamed: 0,component,name,class,value,shield,missile,unit,max,level,primary,capacity,mass,pitch,yaw,roll,forward,reverse,horizontal,vertical,pitch.1,yaw.1,roll.1,type
0,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_a_macro,ship_l,1000,5000,160,10,93000,2,fight,44,196.016,96.271,96.271,77.016,99.004,396.016,73.005,73.005,106.203,106.203,106.203,destroyer
1,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_b_macro,ship_l,1000,5000,160,10,111000,2,fight,36,235.22,103.378,103.378,82.702,108.805,435.22,87.605,87.605,114.044,114.044,114.044,destroyer


Renaiming columns:

In [78]:
column_names = ['component','filename','ship_class','explosiondamage_shield',
                'explosiondamage_value','storage_missile','storage_unit','hull',
                'secrecy_level','purpose','people_capacity','ship_mass',
                'inertia_pitch', 'inertia_roll', 'inertia_yaw',
                'drag_forward', 'drag_reverse', 'drag_horizontal','drag_vertical',
                'drag_pitch','drag_yaw', 'drag_roll', 'ship_type']
ship_data.columns = column_names

Add the column with ship names to the table:

In [79]:
# Check for equal length of list before joining:
if len(ship_data) == len(ship_names):
    ship_data['name'] = ship_names
    #Do some rearranging of columns' order
    cols = ship_data.columns.to_list()
    cols = cols[-1:] + cols[:-1]
    ship_data_new = ship_data[cols]
else:
    print('Длины не равны')

Adding name id to the table:

In [80]:
ship_data['name_id'] = names_ids_list

Display first 5 rows of the newly created table:

In [81]:
display(ship_data.head())
print(ship_data.info())

Unnamed: 0,component,filename,ship_class,explosiondamage_shield,explosiondamage_value,storage_missile,storage_unit,hull,secrecy_level,purpose,people_capacity,ship_mass,inertia_pitch,inertia_roll,inertia_yaw,drag_forward,drag_reverse,drag_horizontal,drag_vertical,drag_pitch,drag_yaw,drag_roll,ship_type,name,name_id
0,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_a_macro,ship_l,1000,5000,160,10,93000,2,fight,44,196.016,96.271,96.271,77.016,99.004,396.016,73.005,73.005,106.203,106.203,106.203,destroyer,Behemoth Vanguard,11002
1,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_b_macro,ship_l,1000,5000,160,10,111000,2,fight,36,235.22,103.378,103.378,82.702,108.805,435.22,87.605,87.605,114.044,114.044,114.044,destroyer,Behemoth Sentinel,11003
2,ship_arg_l_miner_liquid_01,ship_arg_l_miner_liquid_01_a_macro,ship_l,800,4000,30,10,26000,1,mine,46,205.27,133.749,133.749,106.999,56.738,324.216,126.666,126.666,140.897,140.897,140.897,largeminer,Magnetar Gas Vanguard,11104
3,ship_arg_l_miner_liquid_01,ship_arg_l_miner_liquid_01_b_macro,ship_l,800,4000,30,10,32000,1,mine,38,246.324,147.778,147.778,118.223,62.485,357.059,151.999,151.999,155.677,155.677,155.677,largeminer,Magnetar Gas Sentinel,11105
4,ship_arg_l_miner_solid_01,ship_arg_l_miner_solid_01_a_macro,ship_l,800,4000,30,10,26000,1,mine,46,204.245,132.733,132.733,106.186,56.594,323.396,127.239,127.239,140.528,140.528,140.528,largeminer,Magnetar Mineral Vanguard,11102


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 321 entries, 0 to 320
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   component               321 non-null    object
 1   filename                321 non-null    object
 2   ship_class              321 non-null    object
 3   explosiondamage_shield  166 non-null    object
 4   explosiondamage_value   164 non-null    object
 5   storage_missile         232 non-null    object
 6   storage_unit            106 non-null    object
 7   hull                    321 non-null    object
 8   secrecy_level           318 non-null    object
 9   purpose                 321 non-null    object
 10  people_capacity         243 non-null    object
 11  ship_mass               321 non-null    object
 12  inertia_pitch           320 non-null    object
 13  inertia_roll            320 non-null    object
 14  inertia_yaw             320 non-null    object
 15  drag_f

#### Data clearance <a class="anchor" id="id_2d_2"></a>

Display info about missed data:

In [82]:
missing_values_tab(ship_data)

Dataframe contains 25 columns and 321 strings.

It has  15 columns with missed values


Unnamed: 0,missed_values,%_of_all,data_type
storage_unit,215,66.98,object
explosiondamage_value,157,48.91,object
explosiondamage_shield,155,48.29,object
storage_missile,89,27.73,object
people_capacity,78,24.3,object
secrecy_level,3,0.93,object
inertia_pitch,1,0.31,object
inertia_roll,1,0.31,object
inertia_yaw,1,0.31,object
drag_horizontal,1,0.31,object


Replacing missed data with zeroes because it means that a ship doesnt have that parameter (for example number of units, missile capacity, drones) and changing column types from string data type to numeric:

In [83]:
#Column list to change to float
cols_to_float = ['ship_mass',	'inertia_pitch',	'inertia_yaw',	'inertia_roll',	'drag_forward',
                 'drag_reverse',	'drag_horizontal',	'drag_vertical',	'drag_pitch',	'drag_yaw',	'drag_roll'	]
try:
    for column in cols_to_float:
        ship_data[column] = ship_data[column].fillna(0)
        ship_data[column] = ship_data[column].astype('float')

except BaseException:
    print(column)
#list of columns to change to int
cols_to_int = ['explosiondamage_value',	'explosiondamage_shield',	'storage_missile',
               'storage_unit',	'hull',	'secrecy_level', 'people_capacity']
try:
    for column in cols_to_int:
        ship_data[column] = ship_data[column].fillna(0)        
        ship_data[column] = ship_data[column].astype('int')
except BaseException:
    print(column)

Misses in categorical data i replace with the keyword  "unknown":

In [84]:
ship_data['purpose'] = ship_data['purpose'].fillna('unknown')
ship_data['ship_type'] = ship_data['ship_type'].fillna('unknown')

In [85]:
ship_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 321 entries, 0 to 320
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   component               321 non-null    object 
 1   filename                321 non-null    object 
 2   ship_class              321 non-null    object 
 3   explosiondamage_shield  321 non-null    int32  
 4   explosiondamage_value   321 non-null    int32  
 5   storage_missile         321 non-null    int32  
 6   storage_unit            321 non-null    int32  
 7   hull                    321 non-null    int32  
 8   secrecy_level           321 non-null    int32  
 9   purpose                 321 non-null    object 
 10  people_capacity         321 non-null    int32  
 11  ship_mass               321 non-null    float64
 12  inertia_pitch           321 non-null    float64
 13  inertia_roll            321 non-null    float64
 14  inertia_yaw             321 non-null    fl

### Code to find names from descriptions <a class="anchor" id="id_2e"></a>

In [86]:
##find name
#ex = descriptions_Sobject.find('t', attrs = {'id':1})
ex = descriptions_Sobject.find('page', attrs = {'id':20101, 'title':'Ships'}).find('t', attrs = {'id':10412})
#type(ex.contents)
name = re.search('\D+',ex.text).group()
name = re.split('[^A-Za-z]',name)
name = list(filter(None, name))
new_name = " ".join([ele for ele in name if  ele[0].isupper()])
new_name

''

### Finding data about ship configuration  <a class="anchor" id="id_2f"></a>

The information about the number of shield, engine, weapon slots is stored in files without "macro" in its name:   
h:/Steam/X4/Unpacked/assets/units/size_xl/ship_arg_xl_carrier_01.xml  
```  
 <connection name="con_shieldgen_xl_01" tags="extralarge shield standard ">
				<offset>
					<position x="179.2407" y="140.9466" z="-461.2684"/>
				</offset>
			</connection>  
```

In [87]:
with open(r'h:/Steam/X4/Unpacked/assets/units/size_xl/ship_arg_xl_carrier_01.xml', 'r') as user_file:
    file_contents = user_file.read()
#create beatiful soup tree of xml file      
Sobject = BeautifulSoup(file_contents, features="xml")

In [88]:
tags = Sobject.findAll('connection', attrs = {'tags':"extralarge shield standard "})
print("The number of slots for XL shields: ",len(tags))

The number of slots for XL shields:  3


Count the number of M turrets, engines, shields on Cerberus (argon frigate):

In [89]:
tags="turret medium standard platformcollision unhittable  combat " # m turrets attribute
tags_e = "engine medium  standard " # engines attribute
tags_s="medium shield unhittable platformcollision standard " #medium shields attribute (on M ships, not on M turrets of L?XL ships)
with open(r'h:\Steam\X4\Unpacked\assets\units\size_m\ship_arg_m_frigate_01.xml', 'r') as user_file:
    file_contents = user_file.read()
     
Sobject = BeautifulSoup(file_contents, features="xml")
t = Sobject.findAll('connection', attrs = {'tags':tags})
e = Sobject.findAll('connection', attrs = {'tags':tags_e})
s = Sobject.findAll('connection', attrs = {'tags':tags_s})
print("The number of M turrets: ",len(t))
print("The number of engines: ",len(e))
print("The number of shields: ",len(s))

The number of M turrets:  4
The number of engines:  2
The number of shields:  3


Finding the number of engines on argon S fighter:

In [90]:
# finding the name
##find name
#ex = descriptions_Sobject.find('t', attrs = {'id':1})
ex = descriptions_Sobject.find('page', attrs = {'id':20101}).find('t',attrs = {'id':10302})
#type(ex.contents)
name = re.search('\D+',ex.text).group()
name = re.split('[^A-Za-z]',name)
name = list(filter(None, name))
new_name = " ".join([ele for ele in name if  ele[0].isupper()])
# count engines
tag = "engine small platformcollision  standard "
with open(r'h:\Steam\X4\Unpacked\assets\units\size_s\ship_arg_s_fighter_01.xml', 'r') as user_file:
    file_contents = user_file.read()
Sobject = BeautifulSoup(file_contents, features="xml")
t = Sobject.findAll('connection', attrs = {'tags':tag})
print(f'The number of engines on {new_name} is ', len(t))

The number of engines on Nova Vanguard is  2


The algorithm of data gathering:
1. Iterate through folders in ```units_folder``` except for folders xref_parts as they contain information about civilian objects (drones, police, escapepods etc).
2. Find xml files with "ship" in their name.
3. Create beatifulsoup object of each file.    
4. Do counting


With the help of component attribute we can link xml files in situated in '../units/size_s' folder  
```<component name="ship_arg_l_destroyer_01" class="ship_l">```   
with xml files with ship data in '../units/size_s/macros/' folder:    
```<macros>
  <macro name="ship_arg_l_destroyer_01_a_macro" class="ship_l">
    <component ref="ship_arg_l_destroyer_01" />
    ```            
And add data about slots in ship_data table.

List of needed attributes:

In [91]:
# Boron ships
boron_shield_tags = ["boron extralarge shield", # XL shields
                     "boron large shield",  # L shields
                     "boron hittable medium shield", # M shields on turrets of L/XL shields
                     "boron medium shield unhittable", #  M shield on M ships 
                     "boron shield small unhittable" # S shield
                     ]
boron_turret_tags = ["boron combat large missile turret",   #  L turrets              
                     "boron combat hittable medium missile turret", # М turrets
                     "boron combat medium turret unhittable",  # М turrets on M ships
                     "boron large mining turret" # L turrets on boron L solid miner ships
                     ]
boron_engine_tags = ["boron engine extralarge", "boron engine large", "boron engine medium", "boron engine small"]
# Other ships
shield_tags = ["extralarge shield standard", # XL shields
                "large shield  standard",  # L shields
                "medium shield hittable  standard", #M shields on turrets of L/XL shields
                "medium shield unhittable platformcollision standard", # M shield on M ships 
                "small shield unhittable  standard" # S shield
                ]
turret_tags = ["turret large standard missile  combat", #L turrets on  XL/L ships
               "turret medium standard missile hittable  combat", # M turrets on  XL/L ships
                "turret medium standard platformcollision unhittable  combat"  # М turrets on M ships
                "turret large mining standard" #  L turrets on L solid miner ships
               ]
engine_tags = ["engine extralarge standard", # XL engines
               "engine large  standard",  # L engines
                 "engine medium  standard",  # M engines
                 "engine small platformcollision  standard" # S engines
                 ]

In [92]:
ship_config = {} # The dictionary of dictionaries where key is the component name and its value is the dictionary with extracted information
for file in list_of_ships_configs:
    with open(file, 'r') as user_file:
        file_contents = user_file.read()
        #create a beatiful soup object      
        file_Sobject = BeautifulSoup(file_contents, features="xml")
        

        file_data = {} # dictionary for store information about equipment slots
        # Get the component name
        component =  file_Sobject.select_one('component').get('name').strip()
        file_data['class'] = file_Sobject.select_one('component').get('class')
        # Count shield slots
        try:
           # If it is an XL ship
            if  file_data['class'] =='ship_xl':
               #count XL shields
               tags_xls = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*extralarge)(?=.*shield)')})
               file_data['primary_shield_slots'] = len(tags_xls)
               #count M shields 
               tags_xms = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*shield)(?=.*hittable)')})
               file_data['turret_shield_slots'] = len(tags_xms)
               # count L turrets               
               tags_xlt = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*large)(?=.*turret)')})
               file_data['l_turret_slots'] = len(tags_xlt)
               #count M turrets
               tags_xmt = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*turret)(?=.*hittable)')})
               file_data['m_turret_slots'] = len(tags_xmt)
               #count engines
               tags_xe = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*extralarge)(?=.*engine)')})
               file_data['engines_slots'] = len(tags_xe)
               # Store file_data into ship_config dictionary
               ship_config[component] = file_data
             # in case of L ship
            if  file_data['class'] =='ship_l':
               
               #count L shields
               tags_ls = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*large)(?=.*shield)')})
               file_data['primary_shield_slots'] = len(tags_ls)
                #count M shields
               tags_lms = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*shield)(?=.*hittable)')})
               file_data['turret_shield_slots'] = len(tags_lms)
               # count L turrets              
               tags_lt = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*large)(?=.*turret)')})
               file_data['l_turret_slots'] = len(tags_lt)
               #count M turrets
               tags_lmt = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*turret)(?=.*hittable)')})
               file_data['m_turret_slots'] = len(tags_lmt)
               #count engines
               tags_le = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*large)(?=.*engine)')})
               file_data['engines_slots'] = len(tags_le)
               ship_config[component] = file_data
               
               # in case of M ships
            if  file_data['class'] =='ship_m':               
                #count M shields
               tags_ms = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*shield)(?=.*unhittable)')})
               file_data['primary_shield_slots'] = len(tags_ms) 
               # The number of shields on turrets is zero
               file_data['turret_shield_slots'] = 0
               # the number of L turrets is zero
               file_data['l_turret_slots'] = 0          
               #count M turrets
               tags_mt = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*turret)(?=.*unhittable)')})
               file_data['m_turret_slots'] = len(tags_mt)
               #count engines
               tags_me = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*medium)(?=.*engine)')})
               file_data['engines_slots'] = len(tags_me)
               ship_config[component] = file_data
             
             # if it is S ship
            if  file_data['class'] =='ship_s':
               # count shields
               tags_ss = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*small)(?=.*shield).*$')})
               file_data['primary_shield_slots'] = len(tags_ss)               
               #count engines
               tags_se = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*small)(?=.*engine).*$')})
               file_data['engines_slots'] = len(tags_se) 
               # the rest parameters are equal to zero
               file_data['turret_shield_slots'] = 0
               file_data['l_turret_slots'] = 0
               file_data['m_turret_slots'] = 0
               # saving info into main dictionary
               ship_config[component] = file_data
            # if it is a civilian object
            if  file_data['class'] =='ship_xs':                            
               #they have only engines
               tags_se = file_Sobject.findAll('connection', attrs = {'tags':re.compile('^(?=.*small)(?=.*engine).*$')})
               file_data['engines_slots'] = len(tags_se) 
               file_data['primary_shield_slots'] = 0 
               file_data['turret_shield_slots'] = 0
               file_data['l_turret_slots'] = 0
               file_data['m_turret_slots'] = 0
               ship_config[component] = file_data

        except:           
           print('error!!!',file)
#print(ship_config)

Making table from dictionary of dictionaries:

In [93]:
ship_configs_df = pd.DataFrame.from_dict(ship_config,orient = 'index').reset_index().rename(columns = {'index':'component'})
display(ship_configs_df.head())

Unnamed: 0,component,class,primary_shield_slots,turret_shield_slots,l_turret_slots,m_turret_slots,engines_slots
0,ship_arg_l_destroyer_01,ship_l,3,9,2,8,3
1,ship_arg_l_miner_liquid_01,ship_l,2,7,0,6,2
2,ship_arg_l_miner_solid_01,ship_l,2,9,1,6,2
3,ship_arg_l_trans_container_01,ship_l,2,5,0,7,2
4,ship_arg_l_trans_container_02,ship_l,2,5,0,7,2


##### Merging ship_data and ship_configs tables

In [94]:
ship_data = ship_data.merge(ship_configs_df, on = 'component', how = 'left')
# Избавление от дублирующейся колонки
ship_data = ship_data.drop('class', axis =1)
#Пропуски можно заполнить нулями, так как их наличие означает  отсутствие слотов 
ship_data = ship_data.fillna(0)

In [95]:
#Изменение типа колонок на int
cols_to_int = ['primary_shield_slots',	'turret_shield_slots',	'l_turret_slots',	'm_turret_slots',	'engines_slots']
try:
    for column in cols_to_int:
        ship_data[column] = ship_data[column].fillna(0)        
        ship_data[column] = ship_data[column].astype('int')
except BaseException:
    print(column)
ship_data.head()

Unnamed: 0,component,filename,ship_class,explosiondamage_shield,explosiondamage_value,storage_missile,storage_unit,hull,secrecy_level,purpose,people_capacity,ship_mass,inertia_pitch,inertia_roll,inertia_yaw,drag_forward,drag_reverse,drag_horizontal,drag_vertical,drag_pitch,drag_yaw,drag_roll,ship_type,name,name_id,primary_shield_slots,turret_shield_slots,l_turret_slots,m_turret_slots,engines_slots
0,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_a_macro,ship_l,1000,5000,160,10,93000,2,fight,44,196.016,96.271,96.271,77.016,99.004,396.016,73.005,73.005,106.203,106.203,106.203,destroyer,Behemoth Vanguard,11002,3,9,2,8,3
1,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_b_macro,ship_l,1000,5000,160,10,111000,2,fight,36,235.22,103.378,103.378,82.702,108.805,435.22,87.605,87.605,114.044,114.044,114.044,destroyer,Behemoth Sentinel,11003,3,9,2,8,3
2,ship_arg_l_miner_liquid_01,ship_arg_l_miner_liquid_01_a_macro,ship_l,800,4000,30,10,26000,1,mine,46,205.27,133.749,133.749,106.999,56.738,324.216,126.666,126.666,140.897,140.897,140.897,largeminer,Magnetar Gas Vanguard,11104,2,7,0,6,2
3,ship_arg_l_miner_liquid_01,ship_arg_l_miner_liquid_01_b_macro,ship_l,800,4000,30,10,32000,1,mine,38,246.324,147.778,147.778,118.223,62.485,357.059,151.999,151.999,155.677,155.677,155.677,largeminer,Magnetar Gas Sentinel,11105,2,7,0,6,2
4,ship_arg_l_miner_solid_01,ship_arg_l_miner_solid_01_a_macro,ship_l,800,4000,30,10,26000,1,mine,46,204.245,132.733,132.733,106.186,56.594,323.396,127.239,127.239,140.528,140.528,140.528,largeminer,Magnetar Mineral Vanguard,11102,2,9,1,6,2


In [96]:
missing_values_tab(ship_data)

Dataframe contains 30 columns and 321 strings.

It has  0 columns with missed values


Unnamed: 0,missed_values,%_of_all,data_type


## Getting data about ship modules <a class="anchor" id="id_3"></a>

#### Shields <a class="anchor" id="id_3a"></a>

Корабль Behemot Vanguard , Id = 11002

page id in descriptions_eng responsible for shield names:    
```<page id="20106" title="Shields" descr="Names and descriptions of ship and station shields" voice="no">```

Example of an shield xml file (shield_arg_l_standard_01_mk1_macro.xml):  
```
<macros>
  <macro name="shield_arg_l_standard_01_mk1_macro" class="shieldgenerator">
    <component ref="shield_arg_l_standard_01_mk1" />
    <properties>
      <identification name="{20106,3004}" basename="{20106,3001}" shortname="{20106,3005}" makerrace="argon" description="{20106,3002}" mk="1" />
      <recharge max="38844" rate="173" delay="0" />
      <hull max="2000" threshold="0.2" />
    </properties>
  </macro>
</macros>```

##### Capturing various shield parameters on the example of Argon L shield mk1

In [97]:
##find name
#ex = descriptions_Sobject.find('t', attrs = {'id':1})
ex = descriptions_Sobject.find('page', attrs = {'id':20106}).find('t',attrs = {'id':3004})
#type(ex.contents)
name = re.search('[^()]+',ex.text).group()
name

'ARG L Shield Generator Mk1'

In [98]:
with open(r'h:\Steam\X4\Unpacked\assets\props\SurfaceElements\macros\shield_arg_l_standard_01_mk1_macro.xml', 'r') as user_file:
    file_contents = user_file.read()
#create beatiful soup object of the file      
argon_shield = BeautifulSoup(file_contents, features="xml")

In [99]:
shield_page = descriptions_Sobject.find('page', attrs = {'id':20106})
shield_data = {} # a dictionary to store gathered info
# get the filename of the shield
shield_data['filename'] = argon_shield.select_one('macro').get('name')+".xml"
#get page_id and id of the shield to find its name in descriptions_eng
temp = argon_shield.select_one('identification').get('name')
name = re.search('[^{}]+',temp).group() # name is stored in figure brackets, to extract only list of numbers i use regular expression
                                        # which instructs extract from attribute name all characters except brackets
name_list = name.split(',')             # split string into the  list of 2 numbers: page id and id
shield_page_id = int(name_list[0]) #page_id
shield_id = int(name_list[1])       #  id щита 
shield_data['page_id'] = shield_page_id  # store value of page id in the dict
shield_data['shield_id'] = shield_id     # store id value
# get info about shield maker
shield_data['maker'] = argon_shield.select_one('identification').get('makerrace')
# get info about shield grade
shield_data['version'] = argon_shield.select_one('identification').get('mk')
shield_data['version'] = int(shield_data['version'])
# get info about shield points
shield_data['shield_value'] = argon_shield.select_one('recharge').get('max')
shield_data['shield_value'] = int(shield_data['shield_value'])
#shield regeneration
shield_data['recharge_rate'] = argon_shield.select_one('recharge').get('rate')
shield_data['recharge_rate'] = int(shield_data['recharge_rate'])
#shield delay
shield_data['recharge_delay'] = argon_shield.select_one('recharge').get('delay')
shield_data['recharge_delay'] = int(shield_data['recharge_delay'])
# shield hitpoints (for  L/XL ships)
try:

    shield_data['hull'] = argon_shield.select_one('hull').get('max')
    shield_data['hull'] = int(shield_data['hull'])
except:
    shield_data['hull'] = 0

# Dictionary of dictionaries, where key - name of the shield from descriptions_eng and its value - dictionary with gathered data
shield_info = {}
try:
    tmp = shield_page.find('t',attrs = {'id':shield_data['shield_id']})
    name = re.search('[^()]+',tmp.text).group()   
    shield_info[name] = shield_data
except:    
    print("Error")
    print(shield_data)
print(shield_data)
print(shield_info)

{'filename': 'shield_arg_l_standard_01_mk1_macro.xml', 'page_id': 20106, 'shield_id': 3004, 'maker': 'argon', 'version': 1, 'shield_value': 38844, 'recharge_rate': 173, 'recharge_delay': 0, 'hull': 2000}
{'ARG L Shield Generator Mk1': {'filename': 'shield_arg_l_standard_01_mk1_macro.xml', 'page_id': 20106, 'shield_id': 3004, 'maker': 'argon', 'version': 1, 'shield_value': 38844, 'recharge_rate': 173, 'recharge_delay': 0, 'hull': 2000}}


The algorithm of getting the data:  
1. Go to the catalog ```cat_shields```
2. Find files with "shield" in their names and store them in a list.
3. Iterate through the list and create  dictionary ```shield_info```.   
4. Make a dataframe from the dictionary.   
As stated on egosoft wiki M shield on turret and engine slots of L/XL ships have the "hull" parameter not equal to null.


##### Create list of shield files

In [100]:
list_of_shield_files = []
# List of catalogs to traverse
shield_folders = [cat_shields,split_shields,avarice_shields,boron_shields,terran_shields ]
for folder in shield_folders:
        for root, dirs, files in os.walk(folder):                
                for file in files:
                        if  file.startswith("shield"):                 
                                #print(os.path.join(root, file))
                                list_of_shield_files.append(os.path.join(root, file))
# Get rid of trash files:
list_of_shield_files.remove(r'h:\Steam\X4\Unpacked\extensions\avarice_dlc\assets\props\surfaceelements\macros\shield_gen_m_yacht_01_mk1_video_macro.xml')
list_of_shield_files.remove(r'h:\Steam\X4\Unpacked\extensions\boron_dlc\assets\props\surfaceelements\macros\shield_bor_s_standard_01_mk1_video_macro.xml')
list_of_shield_files.remove(r'h:\Steam\X4\Unpacked\extensions\boron_dlc\assets\props\surfaceelements\macros\shield_bor_s_standard_01_mk2_video_macro.xml')
list_of_shield_files.remove(r'h:\Steam\X4\Unpacked\extensions\boron_dlc\assets\props\surfaceelements\macros\shield_bor_s_standard_01_mk3_video_macro.xml')
print(len(list_of_shield_files))


82


In [101]:
# Beatiful soup object of the part of descriptions eng where page id = 20106 to get shield names
shield_page = descriptions_Sobject.find('page', attrs = {'id':20106})
shield_info = {} # main dict to store data
count = 0 # file counter
#Iterate through list of shield files:
for file in list_of_shield_files: 
    with open(file, 'r') as user_file:
        file_contents = user_file.read()
        #Create beatiful soup object   for each file   
        shield_Sobject = BeautifulSoup(file_contents, features="xml")
        
         
        shield_data = {} # create an empty dictionary to store data from the open file
        # get filename of the file 
        shield_data['filename'] =  shield_Sobject.select_one('macro').get('name')+".xml"
        #  get page id and id
        try:
            temp =  shield_Sobject.select_one('identification').get('name')
            name = re.search('[^{}]+',temp).group()
            name_list = name.split(',')
            shield_page_id = int(name_list[0]) #page_id
            shield_id = int(name_list[1])       #  id
            shield_data['page_id'] = shield_page_id  # store page_id value
            shield_data['shield_id'] = shield_id     # store id value
        except:
            print('identification name error')
            print(file)
        # get info about shield maker
        shield_data['maker'] =  shield_Sobject.select_one('identification').get('makerrace')
        # get info about shield grade
        try:
            shield_data['version'] =  shield_Sobject.select_one('identification').get('mk')
            shield_data['version'] = int(shield_data['version'])
        except:
            print('version error')
            print(file)
        # get info about shield points
        try:
            shield_data['shield_value'] =  shield_Sobject.select_one('recharge').get('max')
            shield_data['shield_value'] = int(shield_data['shield_value'])
        except:
            print('shield value error')
            print(file)
        #extract shield regeneration value
        try:
            shield_data['recharge_rate'] =  shield_Sobject.select_one('recharge').get('rate')
            shield_data['recharge_rate'] = int(shield_data['recharge_rate'])
        except:
            print('recharge rate error')
            print(file)
        #extract shield delay
        try:
            shield_data['recharge_delay'] =  shield_Sobject.select_one('recharge').get('delay')
            shield_data['recharge_delay'] = float(shield_data['recharge_delay'])
        except:
            print('recharge delay error')
            print(file)
        # get info about shield hitpoints
        try:

            shield_data['hull'] =  shield_Sobject.select_one('hull').get('max')
            shield_data['hull'] = int(shield_data['hull'])
        except:
            shield_data['hull'] = 0
        
        count +=1

# Store shield_data into shield_info under the key of its name got from descriptions_eng

        try:
            tmp = shield_page.find('t',attrs = {'id':shield_data['shield_id']})
            name = re.search('[^()]+',tmp.text).group()   
            shield_info[name] = shield_data
        except:    
            print("Error")
            print(shield_data)
#print(shield_data)
#print(shield_info)
print(len(shield_info))
print( count)

65
82


##### Creating and saving of the shield table

In [102]:
shields_df = pd.DataFrame.from_dict(shield_info,orient = 'index').reset_index().rename(columns = {'index':'name'})
display(shields_df.head())

Unnamed: 0,name,filename,page_id,shield_id,maker,version,shield_value,recharge_rate,recharge_delay,hull
0,ARG L Shield Generator Mk1,shield_arg_l_standard_01_mk1_macro.xml,20106,3004,argon,1,38844,173,0.0,2000
1,ARG L Shield Generator Mk2,shield_arg_l_standard_01_mk2_macro.xml,20106,3044,argon,2,46282,268,0.0,2000
2,ARG M Shield Generator Mk1,shield_arg_m_standard_02_mk1_macro.xml,20106,2004,argon,1,5147,26,0.5,500
3,ARG M Shield Generator Mk2,shield_arg_m_standard_02_mk2_macro.xml,20106,2044,argon,2,6133,41,0.5,500
4,ARG S Shield Generator Mk1,shield_arg_s_standard_01_mk1_macro.xml,20106,1004,argon,1,827,82,12.1,0


In [103]:
#Show info
missing_values_tab(shields_df)
shields_df.info()


Dataframe contains 10 columns and 65 strings.

It has  0 columns with missed values


Unnamed: 0,missed_values,%_of_all,data_type


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65 entries, 0 to 64
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   name            65 non-null     object 
 1   filename        65 non-null     object 
 2   page_id         65 non-null     int64  
 3   shield_id       65 non-null     int64  
 4   maker           65 non-null     object 
 5   version         65 non-null     int64  
 6   shield_value    65 non-null     int64  
 7   recharge_rate   65 non-null     int64  
 8   recharge_delay  65 non-null     float64
 9   hull            65 non-null     int64  
dtypes: float64(1), int64(6), object(3)
memory usage: 5.2+ KB


!!17 Missed files are files with the names like ```shield_***_m_standard_01_mk1_macro``` which were replaced with files with names like```shield_***_m_standard_02_mk1_macro``` containing the same gathered data and different in the parameter "integrated". It caused by separation of M shields on shields of M ships and M shields on L/XL ships. As the parameters i need are the same in the both versions that kind of result is acceptable. 

In [104]:
# checking for errors
display(shields_df.query('maker=="split"'))

Unnamed: 0,name,filename,page_id,shield_id,maker,version,shield_value,recharge_rate,recharge_delay,hull
33,SPL L Shield Generator Mk1,shield_spl_l_standard_01_mk1_macro.xml,20106,3084,split,1,33018,140,0.0,3000
34,SPL L Shield Generator Mk2,shield_spl_l_standard_01_mk2_macro.xml,20106,3094,split,2,39340,217,0.0,3000
35,SPL M Shield Generator Mk1,shield_spl_m_standard_02_mk1_macro.xml,20106,2084,split,1,4375,21,0.36,500
36,SPL M Shield Generator Mk2,shield_spl_m_standard_02_mk2_macro.xml,20106,2094,split,2,5213,33,0.36,500
37,SPL S Shield Generator Mk1,shield_spl_s_standard_01_mk1_macro.xml,20106,1124,split,1,703,67,8.9,0
38,SPL S Shield Generator Mk2,shield_spl_s_standard_01_mk2_macro.xml,20106,1134,split,2,840,103,8.9,0
39,SPL S Shield Generator Mk3,shield_spl_s_standard_01_mk3_macro.xml,20106,1144,split,3,1200,177,8.9,0
40,SPL XL Shield Generator Mk1,shield_spl_xl_standard_01_mk1_macro.xml,20106,4044,split,1,110058,398,0.0,9000


###  Engines and thrusters <a class="anchor" id="id_3b"></a>

#### Engines <a class="anchor" id="id_3b_1"></a>

Engine page in  descriptions_eng:  
```<page id="20107" title="Engines Thrusters" descr="Names and descriptions of ship engines" voice="no">```

Example of file ```engine_arg_l_allround_01_mk1_macro.xml``` with data about argon L allround engine mk1:  
```<macros>
  <macro name="engine_arg_l_allround_01_mk1_macro" class="engine">
    <component ref="engine_arg_l_allround_01_mk1" />
    <properties>
      <identification name="{20107,3004}" basename="{20107,3001}" shortname="{20107,3005}" makerrace="argon" description="{20107,3002}" mk="1" />
      <boost duration="29" thrust="2" attack="10" release="1" />
      <travel charge="20" thrust="31" attack="75" release="22.5" />
      <thrust forward="4206" reverse="4627" />
      <angular />
      <hull max="4033" threshold="0.3" />
      <effects>
        <boosting ref="arg_boost_fx_l" />
      </effects>
      <sounds>
        <enginedetail ref="enginedetail_ship_l_01" />
      </sounds>
    </properties>
  </macro>
</macros>```

Snapshot of values of Behemot Vanguard with 'ARG L All-round Engine Mk1' from the game:  
speed 127     
acceleration 64    
boost speed 255  
travel speed 3951 

Behemot Vanguard without engine equipped:  
acceleration = 10 

In [105]:
# argon destoryer data from ship_data (nothing equipped)
ship_data[ship_data['name_id']=='11002']

Unnamed: 0,component,filename,ship_class,explosiondamage_shield,explosiondamage_value,storage_missile,storage_unit,hull,secrecy_level,purpose,people_capacity,ship_mass,inertia_pitch,inertia_roll,inertia_yaw,drag_forward,drag_reverse,drag_horizontal,drag_vertical,drag_pitch,drag_yaw,drag_roll,ship_type,name,name_id,primary_shield_slots,turret_shield_slots,l_turret_slots,m_turret_slots,engines_slots
0,ship_arg_l_destroyer_01,ship_arg_l_destroyer_01_a_macro,ship_l,1000,5000,160,10,93000,2,fight,44,196.016,96.271,96.271,77.016,99.004,396.016,73.005,73.005,106.203,106.203,106.203,destroyer,Behemoth Vanguard,11002,3,9,2,8,3


In [106]:
##find name
#ex = descriptions_Sobject.find('t', attrs = {'id':1})
ex = descriptions_Sobject.find('page', attrs = {'id':20107}).find('t',attrs = {'id':3004})
#type(ex.contents)
name = re.search('[^()]+',ex.text).group()
name

'ARG L All-round Engine Mk1'

In [107]:
##find name
#ex = descriptions_Sobject.find('t', attrs = {'id':1})
ex = descriptions_Sobject.find('page', attrs = {'id':20107}).find('t',attrs = {'id':3124})
#type(ex.contents)
name = re.search('[^()]+',ex.text).group()
name

'BOR L All-round Engine Mk1'

#### Thrusters <a class="anchor" id="id_3b_2"></a>

Thruster tags from thruster_gen_l_allround_01_mk1_macro.xml:       
```<macros>
  <macro name="thruster_gen_l_allround_01_mk1_macro" class="engine">
    <component ref="thruster_gen_l_allround_01_mk1" />
    <properties>
      <identification name="{20107,12004}" basename="{20107,12001}" shortname="{20107,12005}" description="{20107,12002}" mk="1" type="thrustertypes" />
      <component virtual="1" />
      <thrust strafe="775.774" pitch="666.62" yaw="920.014" roll="717.299" />
      <angular roll="20" pitch="60" />
      <hull integrated="1" />
    </properties>
  </macro>
</macros>```

In [108]:
##find name of thruster from descriptions_eng
ex = descriptions_Sobject.find('page', attrs = {'id':20107}).find('t',attrs = {'id':12004})
name = re.search('[^()]+',ex.text).group()
name

'L All-round Thrusters Mk1'

Behemot Vanguard with L All-round Thrusters Mk1:  
acceleration = 10(m/s)  
strafe acceleration 10.0    
yaw (degrees/sec) = 8.7  
pitch (degrees/sec) = 6.3    
roll (degrees/sec) = 6.8  

:
```  
  <thrust strafe="775.774" 
  pitch="666.62"  
  yaw="920.014"    
  roll="717.299" />
  <angular roll="20" pitch="60" />
  ```

In [109]:
list_of_engine_files = [] #List of needed files with engines
# List of keywords to ignore files with mines, drones, civilian transport, missiles
keywords_to_ignore = ['_xs_','_spacesuit_','_mine_','_missile_']
# list of engine catalogs
engine_folders = [cat_engines,split_engines,avarice_engines,boron_engines,terran_engines ]
for folder in engine_folders:
        for root, dirs, files in os.walk(folder):                
                for file in files:
                        if  (file.startswith("engine_") 
                             and '_video_' not in file 
                             and '_xs_' not in file 
                             and '_spacesuit_' not in file 
                             and '_mine_' not in  file
                             and '_missile_' not in file):                 
                                #print(os.path.join(root, file))
                                list_of_engine_files.append(os.path.join(root, file))
print(len(list_of_engine_files))

132


Example of extracting data from ```engine_bor_l_allround_01_mk1_macro.xml``` about boron L allround engine:

In [110]:
with open(r'h:\Steam\X4\Unpacked\extensions\boron_dlc\assets\props\engines\macros\engine_bor_l_allround_01_mk1_macro.xml', 'r') as user_file:
    file_contents = user_file.read()
#    
engine_Sobject = BeautifulSoup(file_contents, features="xml")

In [111]:
engine_page = descriptions_Sobject.find('page', attrs = {'id':20107})
engine_data = {} #dictionary to store data
# get filename
try:
    engine_data['filename'] = engine_Sobject.select_one('macro').get('name')+".xml"
    engine_data['class'] = engine_Sobject.select_one('macro').get('class')
    #  page_id и id двигателя из descriptions_eng
    temp =  engine_Sobject.select_one('identification').get('name')
    name = re.search('[^{}]+',temp).group()
    name_list = name.split(',')
    engine_page_id = int(name_list[0]) #page_id
    engine_id = int(name_list[1])       #  id  
    engine_data['page_id'] = engine_page_id  # сохранение page_id в словарь
    engine_data['engine_id'] = engine_id     # сохранение id в словарь
    # parameters of boost 
    engine_data['boost_duration'] = engine_Sobject.select_one('boost').get('duration')
    engine_data['boost_thrust'] = engine_Sobject.select_one('boost').get('thrust')
    engine_data['boost_attack'] = engine_Sobject.select_one('boost').get('attack')
    engine_data['boost_release'] = engine_Sobject.select_one('boost').get('release')
    #parameters of travel
    engine_data['travel_charge'] = engine_Sobject.select_one('travel').get('charge')
    engine_data['travel_thrust'] = engine_Sobject.select_one('travel').get('thrust')
    engine_data['travel_attack'] = engine_Sobject.select_one('travel').get('attack')
    engine_data['travel_release'] = engine_Sobject.select_one('travel').get('release')
    # parameters of thrust
    engine_data['thrust_forward'] = engine_Sobject.select_one('thrust').get('forward')
    engine_data['thrust_reverse'] = engine_Sobject.select_one('thrust').get('reverse')
    # engine durability
    engine_data['hull'] = engine_Sobject.select_one('hull').get('max')
except:
    print('error')
print(engine_data)

{'filename': 'engine_bor_l_allround_01_mk1_macro.xml', 'class': 'engine', 'page_id': 0, 'engine_id': 0, 'boost_duration': '30', 'boost_thrust': '2', 'boost_attack': '10', 'boost_release': '1', 'travel_charge': '15', 'travel_thrust': '30', 'travel_attack': '75', 'travel_release': '23', 'thrust_forward': '4500', 'thrust_reverse': '5000', 'hull': '5000'}


Gathering data about all engines

!!! In place of boron engine in the file 'engine_bor_l_allround_01_mk1_macro.xml' the game use 'BOR L All-round Engine Mk1' from the file 'engine_bor_l_travel_01_mk1_macro.xml'.

In [112]:
# get beatiful soup object of descriptions_eng with page id = 20107 
engine_page = descriptions_Sobject.find('page', attrs = {'id':20107})
engine_info = {} # main dictionary to store data
#Iterate through needed files
for file in list_of_engine_files:    
    with open(file, 'r') as user_file:
        file_contents = user_file.read()
        #create beatiful soup object of each file    
        engine_Sobject = BeautifulSoup(file_contents, features="xml") 
        engine_data = {} # create dictionary to store information from the opened file
       # get filename 
        try:
            engine_data['filename'] = engine_Sobject.select_one('macro').get('name')+".xml"
        except:
            print('Error with filename', file)
        # get class engine/thruster
        try:
            engine_data['class'] = engine_Sobject.select_one('macro').get('class')
        except:
            print("Error with class", file)
            #  get page_id и id 
        
        try:
            temp =  engine_Sobject.select_one('identification').get('name')
            name = re.search('[^{}]+',temp).group()
            name_list = name.split(',')
            engine_page_id = int(name_list[0]) #page_id
            engine_id = int(name_list[1])       #  id  
            engine_data['page_id'] = engine_page_id  # store page_id in the dict
            engine_data['engine_id'] = engine_id     # store id value           
            if temp == "{0,0,#'engine_bor_l_allround_01_mk1'}":
                    engine_data['page_id'] = 0  
                    engine_data['engine_id'] = 0                    
        except:
            print("Error with page id", file)               
            # maker
        try:
            engine_data['maker'] = engine_Sobject.select_one('identification').get('makerrace')
        except:
            print("Error with maker", file)
                #version
        try:
            engine_data['version'] = engine_Sobject.select_one('identification').get('mk')
        except:
            print("Error with version", file)
                # parameters of boost
        try: 
            engine_data['boost_duration'] = engine_Sobject.select_one('boost').get('duration')
            engine_data['boost_thrust'] = engine_Sobject.select_one('boost').get('thrust')
            engine_data['boost_attack'] = engine_Sobject.select_one('boost').get('attack')
            engine_data['boost_release'] = engine_Sobject.select_one('boost').get('release')
        except:
            print('Errors with boost parameters', file)
                #parameters of travel
        try:
            engine_data['travel_charge'] = engine_Sobject.select_one('travel').get('charge')
            engine_data['travel_thrust'] = engine_Sobject.select_one('travel').get('thrust')
            engine_data['travel_attack'] = engine_Sobject.select_one('travel').get('attack')
            engine_data['travel_release'] = engine_Sobject.select_one('travel').get('release')
        except:
            print("Errors with travel parameters", file)
            # parameters of thrust
        try:
            engine_data['thrust_forward'] = engine_Sobject.select_one('thrust').get('forward')
            engine_data['thrust_reverse'] = engine_Sobject.select_one('thrust').get('reverse')
        except:
            print("Errors with thrust parameters", file)           
            # store "engine_data" dictionary into "engine_info" dictionary under the key of its name got from descriptions_eng
        try:
            if engine_data['page_id']==0 and engine_data['engine_id']==0:  # в случае {0,0,#'engine_bor_l_allround_01_mk1'}
                continue
            elif engine_data['engine_id']==3124:
                name = 'BOR L All-round Engine Mk1'
                engine_info[name] = engine_data 

            else:        
                tmp = engine_page.find('t',attrs = {'id':engine_data['engine_id']})
                name = re.search('[^()]+',tmp.text).group()   
                engine_info[name] = engine_data           
        except:
            print("Errors while assigning to key in engine_info", file)
            
print(len(engine_info))

130


Creating the table of engines:

In [113]:
engines_df = pd.DataFrame.from_dict(engine_info,orient = 'index').reset_index().rename(columns = {'index':'name'})
display(engines_df.head())

Unnamed: 0,name,filename,class,page_id,engine_id,maker,version,boost_duration,boost_thrust,boost_attack,boost_release,travel_charge,travel_thrust,travel_attack,travel_release,thrust_forward,thrust_reverse
0,ARG L All-round Engine Mk1,engine_arg_l_allround_01_mk1_macro.xml,engine,20107,3004,argon,1,29,2,10.0,1,20,31,75,22.5,4206,4627
1,ARG L Travel Engine Mk1,engine_arg_l_travel_01_mk1_macro.xml,engine,20107,3044,argon,1,26,2,10.0,1,20,33,85,37.5,4006,3605
2,ARG M All-round Engine Mk1,engine_arg_m_allround_01_mk1_macro.xml,engine,20107,2004,argon,1,7,8,0.25,1,1,9,30,20.0,1002,952
3,ARG M All-round Engine Mk2,engine_arg_m_allround_01_mk2_macro.xml,engine,20107,2044,argon,2,7,8,0.25,1,1,9,30,20.0,1212,1228
4,ARG M All-round Engine Mk3,engine_arg_m_allround_01_mk3_macro.xml,engine,20107,2084,argon,3,7,8,0.25,1,1,9,30,20.0,1353,1413


In [114]:
engines_df.query('maker=="boron"')

Unnamed: 0,name,filename,class,page_id,engine_id,maker,version,boost_duration,boost_thrust,boost_attack,boost_release,travel_charge,travel_thrust,travel_attack,travel_release,thrust_forward,thrust_reverse
100,BOR L All-round Engine Mk1,engine_bor_l_travel_01_mk1_macro.xml,engine,20107,3124,boron,1,34,2.0,14.0,0.04,0,45,178.5,1.25,3004,2704
101,BOR M All-round Engine Mk1,engine_bor_m_allround_01_mk1_macro.xml,engine,20107,2814,boron,1,9,11.2,0.35,0.04,0,13,42.0,1.0,751,714
102,BOR M All-round Engine Mk2,engine_bor_m_allround_01_mk2_macro.xml,engine,20107,2824,boron,2,9,11.2,0.35,0.04,0,13,42.0,1.0,909,921
103,BOR M All-round Engine Mk3,engine_bor_m_allround_01_mk3_macro.xml,engine,20107,2834,boron,3,9,11.2,0.35,0.04,0,13,42.0,1.0,1015,1060
104,BOR S All-round Engine Mk1,engine_bor_s_allround_01_mk1_macro.xml,engine,20107,1804,boron,1,9,11.2,0.35,0.04,0,20,42.0,1.0,297,312
105,BOR S All-round Engine Mk2,engine_bor_s_allround_01_mk2_macro.xml,engine,20107,1814,boron,2,9,11.2,0.35,0.04,0,20,42.0,1.0,359,430
106,BOR S All-round Engine Mk3,engine_bor_s_allround_01_mk3_macro.xml,engine,20107,1824,boron,3,9,11.2,0.35,0.04,0,20,42.0,1.0,401,509
107,BOR XL All-round Engine Mk1,engine_bor_xl_travel_01_mk1_macro.xml,engine,20107,4124,boron,1,34,2.0,14.0,0.04,0,45,178.5,1.25,7912,7120


Checking for presence of duplicate names:

In [115]:
display(engines_df.groupby('name')['name'].value_counts())

name
ARG L All-round Engine Mk1     1
ARG L Travel Engine Mk1        1
ARG M All-round Engine Mk1     1
ARG M All-round Engine Mk2     1
ARG M All-round Engine Mk3     1
                              ..
XEN L All-round Engine Mk1     1
XEN M Combat Engine Mk1        1
XEN M Travel Engine Mk1        1
XEN S Combat Engine Mk1        1
XEN XL All-round Engine Mk1    1
Name: count, Length: 130, dtype: int64

Checking for missed data:

In [116]:
missing_values_tab(engines_df)
engines_df.info()

Dataframe contains 17 columns and 130 strings.

It has  0 columns with missed values


Unnamed: 0,missed_values,%_of_all,data_type


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   name            130 non-null    object
 1   filename        130 non-null    object
 2   class           130 non-null    object
 3   page_id         130 non-null    int64 
 4   engine_id       130 non-null    int64 
 5   maker           130 non-null    object
 6   version         130 non-null    object
 7   boost_duration  130 non-null    object
 8   boost_thrust    130 non-null    object
 9   boost_attack    130 non-null    object
 10  boost_release   130 non-null    object
 11  travel_charge   130 non-null    object
 12  travel_thrust   130 non-null    object
 13  travel_attack   130 non-null    object
 14  travel_release  130 non-null    object
 15  thrust_forward  130 non-null    object
 16  thrust_reverse  130 non-null    object
dtypes: int64(2), object(15)
memory usage: 17.4+ KB


Change type of columns from string to numeric:

In [117]:
#list of columns to change to int
cols_to_int = ['boost_duration','travel_charge','travel_thrust','thrust_forward', 'thrust_reverse']
#list of cilumns to change to float
cols_to_float = ['boost_thrust','boost_attack','boost_release', 'travel_attack', 'travel_release']
try:
    for column in cols_to_int:               
        engines_df[column] = engines_df[column].astype('int')
    for column in cols_to_float:               
        engines_df[column] = engines_df[column].astype('float')
except BaseException:
    print(column)
engines_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   name            130 non-null    object 
 1   filename        130 non-null    object 
 2   class           130 non-null    object 
 3   page_id         130 non-null    int64  
 4   engine_id       130 non-null    int64  
 5   maker           130 non-null    object 
 6   version         130 non-null    object 
 7   boost_duration  130 non-null    int32  
 8   boost_thrust    130 non-null    float64
 9   boost_attack    130 non-null    float64
 10  boost_release   130 non-null    float64
 11  travel_charge   130 non-null    int32  
 12  travel_thrust   130 non-null    int32  
 13  travel_attack   130 non-null    float64
 14  travel_release  130 non-null    float64
 15  thrust_forward  130 non-null    int32  
 16  thrust_reverse  130 non-null    int32  
dtypes: float64(5), int32(5), int64(2), 

## Saving tables <a class="anchor" id="id_4"></a>

Saving the table with ship data

In [118]:
#Make a save copy of the table
saveCopy_df = ship_data.copy(deep = True)
#save to csv format
saveCopy_df.to_csv(save_folder + r'\raw_X4data.csv', index=False)
#save to excel format
saveCopy_df.to_excel(save_folder + r'\raw_X4data.xlsx',  sheet_name='ship_data',index=False)

Saving the table with shield data

In [119]:
#Make a save copy of the table
shields_saveCopy_df = shields_df.copy(deep = True)
#saving data to csv file
shields_saveCopy_df.to_csv(save_folder + r'\shields_X4data.csv', index=False)
#saving to excel format
shields_saveCopy_df.to_excel(save_folder + r'\shields_X4data.xlsx',  sheet_name='shield_data',index=False)

Saving the table with engine data

In [120]:
#Make a save copy of the table
engines_saveCopy_df = engines_df.copy(deep = True)
#saving data to csv file
engines_saveCopy_df.to_csv(save_folder + r'\engines_X4data.csv', index=False)
#saving to excel format
engines_saveCopy_df.to_excel(save_folder + r'\engines_X4data.xlsx',  sheet_name='engine_data',index=False)