Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Injection #400

Closed
0x96e63 opened this issue May 20, 2023 · 3 comments
Closed

Code Injection #400

0x96e63 opened this issue May 20, 2023 · 3 comments
Labels

Comments

@0x96e63
Copy link

0x96e63 commented May 20, 2023

Is there any way to prevent code injection when using the search function?

I was wondering even if any malicious code can be injected to modify the data set.

Thanks.

@ch2ohch2oh
Copy link
Contributor

ch2ohch2oh commented Oct 10, 2023

I don't think so. If solr has code injection vulnerability, people can still send maliciously constructed requests to exploit that via curl etc, and pysolr cannot help with that.

@rmayer-sst
Copy link

rmayer-sst commented May 9, 2024

Releases before Jan 2021 have an injection vulnerability due to not correctly escaping its parameters:

self.solr.delete(q='id:*</query><query> id:999 AND id:9999')

Should not delete all documents

@acdha acdha added the Q&A label May 9, 2024
@acdha
Copy link
Collaborator

acdha commented May 9, 2024

This a somewhat complicated topic which pysolr can't easily help with. The problem is that the classic Solr syntax has a variety of features and pysolr doesn't know what context you're escaping things in, how you have configured Solr and which query parser you're using, and what features you want to expose to your users (e.g. do you want to support boolean searches like "apples -bananas" by allowing the user to enter that minus sign directly or does your application have a higher-level interface to express that concept and you'd want to have any hyphens in user-entered data be treated as literal values? Do you let users control whether quotes are used to group words into phrases, etc.?). You really have to decide what is allowed for your public interface and validate that at input rather than trying to escape that on the backend.

django-haystack has a simple clean() method but it's important to remember that Haystack operates in a context where it's using a Django ORM-style interface for all of the complicated features so complex queries are generally constructed using chained methods which do not expose the Solr syntax directly and there's no question about the distinction between the query syntax and the values – e.g. in sqs.filter(title=x).exclude(subject=y) you know you can fully escape x and y to be valid values on the right side of Solr's field:value syntax.

In a different project, I use this code with a customization to support users being able to use quotes to search for phrases but it has a check to simply reject unpaired quotes because we don't have a need to support someone searching for literal quote characters:

# Source: https://solr.apache.org/guide/8_11/the-standard-query-parser.html#escaping-special-characters
# This is modified to allow the use of quotes for phrases with a check that
# they're paired (see escape).
SOLR_ESCAPE_RE = re.compile(
    r"""
    (
        [&]{2}|
        [|]{2}|
        [\\\+!(){}[\]^~:/]|
        \b-
    )
    """,
    flags=re.VERBOSE,
)


def escape(search_value: str) -> str:
    """
    Escape a user-provided value suitably for use as a Solr query term's value
    """

    # Matching unpaired quotes without a regex engine which supports variable
    # width negative lookbehind expressions and since we don't have any reason
    # to support that in normal usage we'll simply confirm that quotes are
    # paired:
    if search_value.count('"') % 2 != 0:
        raise ValueError("Unpaired quotes are not allowed")

    search_value = re.sub(r"\s+", " ", search_value)

    return SOLR_ESCAPE_RE.sub(r"\\\1", search_value)

@acdha acdha closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants