Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that a meta_value is a recognizable date value before storing #1703

Merged
merged 11 commits into from Apr 23, 2021

Conversation

moraleida
Copy link
Contributor

@moraleida moraleida commented Mar 17, 2020

Description of the Change

Introduce proper date validation for strings added to post-meta. Right now, any string passed in as a meta value to a Post, User or Term is first converted to a timestamp and then manipulated to store date, datetime and time values.

This change strenghtens the logic around that check by moving from a simple strtotime conversion to creating a proper DateTime object. If PHP is able to initialize a DateTime object from $meta_value, then we know the manipulations will be successful. If not, catch the Exception thrown and move on, as this is not a valid date.

Cherry on the top, we move away from doing timezone calculations for each value, as we initialize the object with the correct timezone. This will save us some microseconds in processing.

Fixes #1556

Verification Process

This was tested by saving custom fields with values matching PHP Supported Date and Time Formats and verifying that all valid formats got properly processed in Elasticsearch, receiving the date, datetime and time keys, as opposed to invalid formats, which were saved as simple strings ( getting value and raw fields only ).

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Applicable Issues

Changelog Entry

Ricardo Moraleida added 2 commits March 16, 2020 23:42
…In turn, only store date, datetime, time values if they are meaningful
@moraleida
Copy link
Contributor Author

@tlovett1 this is ready for review

@tlovett1 tlovett1 added this to the 3.4.1 milestone Mar 17, 2020
@tlovett1 tlovett1 self-requested a review March 17, 2020 15:45
tlovett1
tlovett1 previously approved these changes Mar 17, 2020
@tlovett1
Copy link
Member

@moraleida were you able to reproduce the reported bug in fixing this?

@moraleida
Copy link
Contributor Author

moraleida commented Mar 17, 2020

@moraleida were you able to reproduce the reported bug in fixing this?

@tlovett1 I'm able to reproduce the same results in storage, but none of the indexing errors. That may or may not be related to versions/settings in the ES cluster. See the results below:

these were stored using simple custom_fields in WP 5.3.2 and ElasticPress develop, on a cluster running Elasticsearch 5.6.16. For all the results below that have date values, those dates are valid in PHP. I'll do some more testing with different mappings and ES versions. I can see a scenario in which a mapped date field, with a strict format specified, might yield errors.

    "meta": {
      "date_today": [
        {
          "value": "today",
          "raw": "today",
          "boolean": false,
          "date": "2020-03-17",
          "datetime": "2020-03-17 00:00:00",
          "time": "00:00:00"
        }
      ],
      "date_in_year": [
        {
          "value": "+1 year",
          "raw": "+1 year",
          "boolean": false,
          "date": "2021-03-17",
          "datetime": "2021-03-17 17:03:17",
          "time": "17:03:17"
        }
      ],
      "non-date": [
        {
          "value": "ohmybbq",
          "raw": "ohmybbq",
          "boolean": false
        }
      ],
      "float": [
        {
          "value": "20.000001",
          "raw": "20.000001",
          "long": 20,
          "double": 20.000001,
          "boolean": false,
          "date": "0001-03-17",
          "datetime": "0001-03-17 20:00:00",
          "time": "20:00:00"
        }
      ],
      "invalid": [
        {
          "value": "9000-30-20",
          "raw": "9000-30-20",
          "boolean": false
        }
      ]
    },

If we want to me more strict about dates we would have to make decisions around which dates EP will consider valid.

@tlovett1
Copy link
Member

Do we know what dates Elasticsearch considers invalid?

@tlovett1 tlovett1 self-requested a review March 24, 2020 14:13
@tlovett1
Copy link
Member

@moraleida in theory this looks good. My only thought is currently we are storing some default values even if $meta_value is not a date/time. Your PR changes that. Was that intentional? What is the reasoning?

@tlovett1 tlovett1 removed this from the 3.4.1 milestone Mar 31, 2020
@moraleida
Copy link
Contributor Author

@tlovett1 this is ready for testing again. The default dates are back and will be used whenever the meta_value is a string that can't be parsed into a proper date.

@Rahmon
Copy link
Contributor

Rahmon commented Mar 22, 2021

Hi @moraleida . I pushed some commits to your branch to update the tests and make some changes.

Looking at the code, only meta values that are string are checked and have meta.date, meta.datetime, and meta.time. I would like to confirm if this behavior is expected.

cc: @brandwaffle

@mckdemps mckdemps removed the request for review from tlovett1 April 16, 2021 17:37
@oscarssanchez
Copy link
Contributor

Hi @Rahmon can you let me know how were you testing?

I did the following:

1.- Install ACF
2.- Create two new fields called prueba1 and prueba2, one is set to save the data as string and the other one as a number.
3.- Create a post and input some random number
4.- Reindex
5.- Check debug bar:

                        "prueba1": [
                            {
                                "value": "4443",
                                "raw": "4443",
                                "long": 4443,
                                "double": 4443,
                                "boolean": false,
                                "date": "4443-04-21",
                                "datetime": "4443-04-21 22:30:50",
                                "time": "22:30:50"
                            }
                        ],
                        "prueba2": [
                            {
                                "value": "4443",
                                "raw": "4443",
                                "long": 4443,
                                "double": 4443,
                                "boolean": false,
                                "date": "4443-04-21",
                                "datetime": "4443-04-21 22:30:50",
                                "time": "22:30:50"

Shouldn't this solve #1556 (comment) ? I want to make sure I am testing correctly.

@Rahmon
Copy link
Contributor

Rahmon commented Apr 22, 2021

@oscarssanchez I've tested in a similar way.

The PR tests if a date is valid here $new_date = new \DateTime( $meta_value, \wp_timezone() );. However, values like '4443' or any that follows the pattern [0-9]{4} is considered valid by PHP. So, an exception is no throwned.

Regarding #1556 (comment), I've tested and the date and datetime were not rejected by Elasticsearch (5.6).

@Rahmon Rahmon removed their assignment Apr 22, 2021
@oscarssanchez
Copy link
Contributor

Thanks @Rahmon ,

Indeed, It looks like this should be correct as the Datetime is defined to start with the year: https://github.com/10up/ElasticPress/pull/1703/files#diff-f5aaa70470d85e353af0ee1d50c73f04d65e2c2a78336a7ac660ead2d8e61d27R477

This is looking good to me!

@brandwaffle brandwaffle merged commit 72848eb into 10up:develop Apr 23, 2021
@Rahmon Rahmon added this to the 3.6.0 milestone Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants