Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading keygroups in cstrike/resource crashes the server #379

Closed
kamikazekuh opened this issue Jan 22, 2021 · 11 comments
Closed

Loading keygroups in cstrike/resource crashes the server #379

kamikazekuh opened this issue Jan 22, 2021 · 11 comments
Labels

Comments

@kamikazekuh
Copy link
Contributor

Using this code to load the cstrike_english.txt keygroup crashes the server on windows

from core import GAME_PATH
from _keyvalues import KeyValues

messages = KeyValues.load_from_file(GAME_PATH / 'resource' / 'cstrike_english.txt')
@jordanbriere
Copy link
Contributor

jordanbriere commented Jan 23, 2021

The issue appears to be the encoding of those files causing an access violation when the engine parses them (by failing to decode them and internally handling a NULL pointer). If I change the encoding of that file to UTF-8, it works as intended. Same if I export LoadFromBuffer and parse it as UTF-16:

kv = KeyValues('lang')
with open(GAME_PATH / 'resource' / 'cstrike_english.txt', encoding='utf-16') as f:
    kv.load_from_buffer(f.read())
print(kv)

@Ayuto
Copy link
Member

Ayuto commented Jan 31, 2021

There also seems to be an issue with parsing that file:

from keyvalues import KeyValues
from paths import GAME_PATH

kv = KeyValues('lang')
with open(GAME_PATH / 'resource' / 'cstrike_english.txt', encoding='utf-16') as f:
    kv.load_from_buffer2(f.read())
    
print(kv.as_dict())
[SP] Caught an Exception:
Traceback (most recent call last):
  File "..\addons\source-python\packages\source-python\plugins\command.py", line 164, in load_plugin
    plugin = self.manager.load(plugin_name)
  File "..\addons\source-python\packages\source-python\plugins\manager.py", line 207, in load
    plugin._load()
  File "..\addons\source-python\packages\source-python\plugins\instance.py", line 74, in _load
    self.module = import_module(self.import_name)
  File "..\addons\source-python\plugins\test7\test7.py", line 8, in <module>
    print(kv.as_dict())

KeyError: "Key 'Radio Responses/Reports\n\n1. \\' does not exist."

@jordanbriere
Copy link
Contributor

jordanbriere commented Jan 31, 2021 via email

@Ayuto
Copy link
Member

Ayuto commented Jan 31, 2021

Ah yes, that solves the problem. Thanks!

@Ayuto Ayuto added the bug label Feb 6, 2021
@Ayuto
Copy link
Member

Ayuto commented Feb 9, 2021

Are we going to change anything or do we keep it like it is?

@jordanbriere
Copy link
Contributor

jordanbriere commented Feb 9, 2021

Are we going to change anything or do we keep it like it is?

I guess we could simply use Python to parse the file. Perhaps something like this:

.def("load_from_file",
	raw_function(&KeyValuesExt::LoadFromFile, 1),
	"Load KeyValues data from a file and return a new KeyValues instance on success."
).staticmethod("load_from_file")

.def("load_from_file2",
	raw_method(&KeyValuesExt::LoadFromFile2, 1),
	"Load KeyValues data from a file into an existing KeyValues instance."
)
static boost::shared_ptr<KeyValues> LoadFromFile(boost::python::tuple args, dict kwargs)
{ 
	KeyValues* pKeyValues = new KeyValues("");
	pKeyValues->UsesEscapeSequences(extract<bool>(kwargs.attr("pop")("uses_escape_sequences", object(false))));
	if (!LoadFromFile2(pKeyValues, args, kwargs)) {
		pKeyValues->deleteThis();
		BOOST_RAISE_EXCEPTION(PyExc_ValueError, "Failed to load from file.")
		return NULL;
	}
	return boost::shared_ptr<KeyValues>(pKeyValues, &__del__);
}

static bool LoadFromFile2(KeyValues* pKeyValues, boost::python::tuple args, dict kwargs)
{ 
	static object io_open = import("io").attr("open");
	object f = io_open(*args, **kwargs);
	bool result = LoadFromBuffer2(pKeyValues, extract<const char *>(f.attr("read")()));
	f.attr("close")();
	return result;
}

Now one could do something like this:

# Raises an encoding error, but ignore it and produce an empty keyvalues
kv = KeyValues.load_from_file(path, errors='ignore', encoding='utf-8')

# Works
kv = KeyValues.load_from_file(path, encoding='utf-16', uses_escape_sequences=True)

# Engine produce a parse error instead of crashing:
#   KeyValues Error: LoadFromBuffer: missing { in file
kv = KeyValues.load_from_file(path)

Simple yet efficient! Only downside, would probably be the path resolution (e.g. resource/cstrike_english.txt would no longer resolves). Perhaps there's a way to resolve relative paths from the filesystem but haven't looked into that.

Feel free to improve/commit/whatever if you are fine with something like that.

@Ayuto
Copy link
Member

Ayuto commented Feb 10, 2021

Goo idea! But we have to remember that resource files could also be located in VPK files. Maybe our SourceFile class can help? That should also fix resolving the paths.

@jordanbriere
Copy link
Contributor

jordanbriere commented Feb 10, 2021

Goo idea! But we have to remember that resource files could also be located in VPK files. Maybe our SourceFile class can help? That should also fix resolving the paths.

Ran a quick test, and yes, it seems to do the trick:

.def("load_from_file",
	&KeyValuesExt::LoadFromFile,
	"Load KeyValues data from a file and return a new KeyValues instance on success.",
	(arg("file_name"), arg("encoding")="utf-8", arg("errors")="strict", arg("uses_escape_sequences")=false)
).staticmethod("load_from_file")

.def("load_from_file2",
	&KeyValuesExt::LoadFromFile2,
	"Load KeyValues data from a file into an existing KeyValues instance.",
	(arg("file_name"), arg("encoding")="utf-8", arg("errors")="strict")
)
static boost::shared_ptr<KeyValues> LoadFromFile(const char *szFile, const char *szEncoding, const char *szErrors, bool bUsesEscapeSequences)
{
	KeyValues* pKeyValues = new KeyValues("");
	pKeyValues->UsesEscapeSequences(bUsesEscapeSequences);

	try {
		LoadFromFile2(pKeyValues, szFile, szEncoding, szErrors);
	}
	catch (error_already_set &) {
		pKeyValues->deleteThis();
		throw_error_already_set();
	}

	return boost::shared_ptr<KeyValues>(pKeyValues, &__del__);
}

static bool LoadFromFile2(KeyValues* pKeyValues, const char *szFile, const char *szEncoding, const char *szErrors)
{
	SourceFile *pFile = SourceFile::Open(szFile, "rb");

	bool bResult;
	try {
		object content = object(handle<>(pFile->Read(-1))).attr("decode")(szEncoding, szErrors);
		bResult = KeyValuesExt::LoadFromBuffer2(pKeyValues, extract<const char *>(content));
	}
	catch (error_already_set &) {
		bResult = false;
	}

	pFile->Close();
	delete pFile;

	if (PyErr_Occurred())
		throw_error_already_set();

	return bResult;
}
from keyvalues import KeyValues

path = 'resource/cstrike_english.txt'

# UnicodeDecodeError
kv = KeyValues.load_from_file(path)

# {}
kv = KeyValues.load_from_file(path, errors='ignore')
print(kv.as_dict())

# KeyError: "Key 'Radio Responses/Reports\r\n\r\n1. \\' does not exist."
kv = KeyValues.load_from_file(path, 'utf-16')
print(kv.as_dict())

# Works
kv = KeyValues.load_from_file(path, encoding='utf-16', uses_escape_sequences=True)
print(kv.as_dict())

Also, instead of the 2 naming perhaps we could do:

KeyValues.from_<file/buffer> # Get a new KeyValues
keyvalues.load_from_<file/buffer> # Load into existing self

@Ayuto
Copy link
Member

Ayuto commented Feb 10, 2021

That looks good! Though, it's a little bit slow, so I tried using PyUnicode_Decode and the filesystem directly, but didn't notice any speed improvements. So, the call back to Python doesn't seem to be the costly part. However, it's not like you need to load thousands of files every second, so it's probably fine as it is.

Also, instead of the 2 naming perhaps we could do:

Hmm, that doesn't really explain the difference to me. I would rather use something like this:

KeyValues.load_from_<file/buffer> # Get a new KeyValues
keyvalues.load_from_<file/buffer>_in_place # Load into existing self

@jordanbriere
Copy link
Contributor

That looks good! Though, it's a little bit slow, so I tried using PyUnicode_Decode and the filesystem directly, but didn't notice any speed improvements. So, the call back to Python doesn't seem to be the costly part. However, it's not like you need to load thousands of files every second, so it's probably fine as it is.

I used decode mainly so that boost manage the reference. Though, the costly part is likely related to the size of that file (thousands of lines dissected/tokenized into keys/values).

Hmm, that doesn't really explain the difference to me. I would rather use something like this:

KeyValues.load_from_<file/buffer> # Get a new KeyValues
keyvalues.load_from_<file/buffer>_in_place # Load into existing self

I'm fine with whatever. I have a personal preference for KeyValues.from_file though, since it is consistent with other static constructors (Entity.from_inthandle, Player.from_userid, etc.) and read well as "returns a KeyValues from file" but I'm fine with either regardless. Nothing can really be worst than the current 2 to me. 😄

@Ayuto
Copy link
Member

Ayuto commented Feb 10, 2021

'm fine with whatever. I have a personal preference for KeyValues.from_file though, since it is consistent with other static constructors (Entity.from_inthandle, Player.from_userid, etc.) and read well as "returns a KeyValues from file" but I'm fine with either regardless. Nothing can really be worst than the current 2 to me. 😄

That's true! In that case we should discard the load_ from their names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants