Skip to content
This repository has been archived by the owner on Dec 12, 2021. It is now read-only.

Consider to add type converters #50

Closed
ivan-kleshnin opened this issue May 4, 2015 · 20 comments
Closed

Consider to add type converters #50

ivan-kleshnin opened this issue May 4, 2015 · 20 comments

Comments

@ivan-kleshnin
Copy link

I'm researching Joi alternatives and found this lib. I like the crossplatform aim you set for this project (Joi sucks for frontend having 2+Mib). Unfortunately there is little meaning in validation without type coercion. If value is checked to be an integer it should be possible to use this as an integer.

Not validate as string, then apply type convertion manually, then use. It breaks the whole purpose of declarativeness in validation: you'll end up with the second deeply-nested declarative object with the same keys, and different values describing type convertion rules (semantically the same, as you've already said you want an integer)...

@ansman
Copy link
Owner

ansman commented May 4, 2015

I don't really understand the problem here, the numericality validator has an option called noStrings which rejects non numbers.

The reason I opted for supporting strings per default is that when you read values from a form you get them as strings.

@ivan-kleshnin
Copy link
Author

  1. I don't want to reject values by type. I want to get real native JS number object (same with Date, etc.) after validation.
  2. Sometimes you need to validate JSON object, coming from 3rd party source. It may easily contain non-string values.

So input may be string, number, Number. Output should be number in all cases.
In other words I expected from this library to provide not only boolean result (valid / invalid) but also an object with converted values. Joi does exactly this.

@ansman
Copy link
Owner

ansman commented May 9, 2015

I like the idea.

Version 0.8.0 will ship with a validate.cleanAttributes function that currently just removed unknown keys so I've already started doing some work to help you with this.

I could add a function that does type conversion for you (automatically when using promises as with the attribute cleaning).
The biggest problem would be how the syntax would look. I suppose most types could be inferred from the constraints but not all.
Perhaps the constraints could contain a new key called type that could be used in type conversion, how does this sound?

@ivan-kleshnin
Copy link
Author

@ansman let me think about it a bit more. This turns out to be a harder question than I thought.
I really want to build a clearer picture for myself of what I want and what I expect from the "perfect" validation library.

I wrote a form-builder library in Python once, but that was for backend-driven app. My current requirements differ significantly. I promise to write a big post about all this here.

@ivan-kleshnin
Copy link
Author

Intro

Ok, it's me again. I spent last days thinking and experimenting with validation and the final picture is getting clear to me.

JS community desperately needs an environment agnostic data validation library for JS.
As you truly mentioned in docs:

There are already many validation libraries out there today but most of them are very tightly coupled to a language or framework.

It's ridiculous, I mean we have to have a lot of competing solutions today. But, spent a lot of hours on search, I did find only 3 projects worth considering. Joi, TComb-Validation and this one.

TComb-Validation, being very interesting on itself, is a very specific beast and I'd like to exlude it from today's rant. So we have Joi and ValidateJS. Joi does not have two crucial components ValidateJS has: custom validators and async validators. Joi devs are not going to support second one at all, and first one are only planned (since 2014...). That being said Joi has 1200 stars against local 120 still... Not less important is that fact, that Joi is very very big, 2.5 Mib uncompressed, against 30 Kb of ValidateJS. So, IMO, Joi as a very overrated library, and ValidateJS is an underrated.

Validation is one of the only few areas which suppor code sharing between frontend and backend. We should be eager for such opportunities. They are, above all, one of the topmost marketing benefit of JS. That's why a lot of people, including me, asked Joi authors to reconsider their priorities. But as already said, Joi authors are very rejective to all requests outside of their own view.

A bit about myself. I wrote a big form validation and form building library once in Python (https://github.com/ivan-kleshnin/flask-paqforms). It's OOP... crap – I know. But, at lest, I was there and I know a bit of that problems you're facing. And I still think I managed to build my libs better than much more popular (mainstream) WTForms. By objective reasons... but who cares? Let's come back to JS.

@ivan-kleshnin
Copy link
Author

Parse, Validate, Format

There are a three tightly connected areas that's often called Validation: parsing, validation, formatting.
Imagine an age field. You have a string value from a form, you have to make some cleanup, convert it to number, validate it against min and max. You also have to convert initial (model) data from number to string. You see, it's very connected. Parsing is often called sanitation, but those who coined it totally forgot about reverse process. Will we call it desanitation? Of course not. That's why I propose to call it Parse and Format.

It's becoming obvious we have two layers: HTML and Business. Let's call those two kinds of values form values and business values.

But should we couple or decouple these three aspects?
Why not just make a three different libs for that and be happy?

Ok, lets make a broad and shallow list of our requirements first:

  • parsing and formatting should be symmetrical. Because that's more neat :)
  • parse layer should convert empty strings to undefined. In HTML forms empty values mean "no data". And "no data" in DB is null and in JS it is undefined.
  • format should do the opposite: convert undefined (and perhaps, null) to empty string.
  • parsing and formatting may be quite complex. Imagine a textarea where each line represents a phone number. Parsing means splitting those lines. Validation should validate an array against length and validate each item against phone format. Formatting should, of course, be able to join business value back to string (Irepeat, this is required only to put initial data in form, we don't need to split and join dynamically because we don't want to replace input values dynamically: that would be a bad UX).
  • messages, including keys, should be customizable
  • input data should pass through L10n layer (a special section for this is below)

So why not isolate formatting and parsing in one library and validation in another? That's decoupling and decoupling is good, right? Yes, but there is one very tiny detail everyone seems to forget.

Parsing and formatting also can fail!

Formatting fails are very simple. They can be caused only by serious data mismatch. Therefore, they should be thinked of as a programmer errors so we can just throw an exception and crash the process. Parsing fail is the same as validation fail. This is super important so I will repeat Parsing fail is the same as validation fail. That's why we can't and shouldn't totally isolate one from another.

Validation can be decoupled, but should be concerned of Parsing.

Very simple and reliable API is quite self-rendering here.
Let's be more specific now.

parseString(formValue: String): String
// throws if value is not string, converts '' to undefined

formatString(businessValue: Maybe<String>): String
// throws if value is not string or undefined, converts undefined to ''

parseInteger(formValue: String): Number
// throws if value is not string

formatInteger(businessValue: Maybe<Number>): String
// throws if value is not number or undefined, converts undefined to ''

So Validation library should somehow accept a Parse handler. This handler can solve any user-specific task like mentioned "task of splitting a phones". You can provide some sane defaults but you can't just go with some restricted set of predefined types here. That was Joi biggest mistake. Form cases can be very very different. Callback can solve any of them. Formatting is not really participate here. It will be used somewhere in HTML form.

A shortened example from real React component:

<input type="date"
  value={formatDate(form.birthDate)}         
  onBlur={() => this.validate("birthDate")}  
  onChange={event => this.handleChange("birthDate", event.currentTarget.value, parseDate(event.currentTarget.value))}       
  id="birthDate" ref="birthDate"
  className="form-control"/>
/>

All three aspects of parsing, formatting and validation met here.

@ivan-kleshnin
Copy link
Author

L10N

Everyone remembers about localisation of error messages. Unfortunately it seems quite common to forget about localisation of input data. I met this issue in every validation library of any programming language. Date formats are different across different cultures. But the same applies to numeric formats and a lot of other data.

That's how I solved this by converters (merged OOP parsers + formatters):

class IntConverter:
    def parse(self, data, locale='en'):
        if type(data) == int:
            return data # I would throw in this case now
        elif type(data) == float:
            return int(round(data)) # I would throw in this case now
        elif type(data) == str:
            if data:
                data = data.strip().replace(" ", "\u00A0") # non-breaking-space (for correct number parsing)
                try:
                    return babel.numbers.parse_number(data, locale=locale)
                except Exception:
                    raise ValueError
            else:
                return None
        elif data is None:
            return None
        else:
            raise TypeError

I used the great Babel python library here. In JS I'd like to rely on something similar like http://www.localeplanet.com/

@ivan-kleshnin
Copy link
Author

Outro

Important question remains: how to wire up parsers and validators. We still can approach this in several ways:

  1. Parsers (parse functions) are attached to validation scheme. Validator reads it and applies before validation. Validator has to return [errors, parsedValue] or alternative data structure with both errors and parsedValue.

  2. Completely decouple validation and parsing. This really suits well only for form validation where we can manage to keep two different states. The first state is a formData and the second is a businessData and we produce second from the first manually as a distinct step, not concerning validation. The ugliest part here will be a manual error merging. As we remember errors can come from parse step and from validation step.

  3. Same as previous but in two global steps. First apply parsing, collecting all errors. Then apply validation, collecting all errors. Merge errors. 😕

As I stated in the first post, I believe that the first option is the best one. It's the most performance-friendly and declarative of them.

Code example for getting tags from textarea, validating them by number of tags (up to 10) and by length of each tag (up to 100). Resulting Array should not contain undefined. Items should be trimmed from whitespace.

Code example a-la ValidatorJS:

import R from "ramda"

{
 tags: {
   presence: true,
   parser: R.pipe(
     R.split("\n"), R.map(parseString), R.filter)
   ),
   type: Array,
   maxLength: 10,
   item: {
     parser: parseString,
     type: String,
     maxLength: 100
   }
 }
}

Code example a-la Joi:

import R from "ramda"

{
 tags: {
   Joi.Array().of(Joi.String().maxLength(100)).required().parseWith(
     R.pipe(
       R.split("\n"), R.map(parseString), R.filter)
     )
   ).maxLength(10),
 }
}

Anyhow, you can seriously cut the codesize of any validation library, decoupling questions of type onto separate lib. Validation will validate only one (or two with undefined) type then. Validation will mean "apply validation rules against passed value" not "convert and apply validation rules against resulting value". Single responsibility principle in action.

One question remains: how to incorporate errors from parsing step into validation layer result.
I solved this in Paqforms by having the same inner error representation (Exceptions) (https://github.com/ivan-kleshnin/paqforms/blob/master/paqforms/fields.py#L205).

Both parseXyz and inner doValidate functions can throw. There is one general catch wrapping them. Whether to keep original messages from Parsing layer or replace them with generic Parse failed / Validation failed is up to you. This is a minor question though.

@Jokero
Copy link
Contributor

Jokero commented May 26, 2015

I agree that validation is complex task and should consist of 3 parts:

  1. parsing (I call it as "pre validation filtering")
  2. validation
  3. formatting ("post validation filtering")

I think they are indivisible and should be described in one place.

Also some fields may be optional and have default value.

Validated object may have deep nesting (see #46).

Simplified example of validated object in node.js:

var filter = require('validator');

var fromAndTo = {
    startTime: {
        $default: function() { // value or function returning value
            return new Date();
        },
        $validators: {
            presence: true,
            datetime: true
        }
    },

    endTime: {
        $validators: {
            presence: true,
            datetime: true
        }
    },

    isTerminal: {
        $parsers: filter.toBoolean // function or array of functions
    },

    address: {
        cityId: {
            $validators: {
                presence: true
            }
        },

        terminalId: {
            $validators: {
                presence: function(value, attributes, attributeName, attributePath) {
                    // attributePath is array ['from', 'address', 'terminalId']
                    // or string 'from.address.terminalId'
                    var from = attributes[attributePath[0]];
                    return from.isTerminal;
                }
            }
        },

        value: {
            $parsers:    filter.trim,
            $validators: {
                presence: function(value, attributes, attributeName, attributePath) {
                    var from = attributes[attributePath[0]];
                    return !from.isTerminal;
                }
            }
        }
    },

    geo: {
        $validators: {
            presence: function(value, attributes) {
                var from = attributes[attributePath[0]];
                return !from.isTerminal;
            }
        },

        latitude: {
            $validators: {
                presence:     true,
                numericality: {
                    greaterThanOrEqualTo: -90,
                    lessThanOrEqualTo:    90
                }
            }
        },

        longitude: {
            $validators: {
                presence:     true,
                numericality: {
                    greaterThanOrEqualTo: -180,
                    lessThanOrEqualTo:    180
                }
            }
        }
    },

    note: {
        $parsers: filter.trim
    }
};

var needValidate = {
    from: fromAndTo,
    to:   fromAndTo
};

I like the idea of cleanAttributes, because validated object may have extra data. I think if field has $default, $parsers, $validators, $formatters or nested properties it should be placed in resulting object.

@ivan-kleshnin
Copy link
Author

  1. parsing (I call it as "pre validation filtering")
  2. validation
  3. formatting ("post validation filtering")

Hi @Jokero! As parsing step can and often includes conversion from localized to universal format and formatting step, contrary, includes conversion from universal to localized format term filtering is really not so great as parsing / formatting. We does not filter "bad data" as people often simplify this process. Things are much more complex.

Your API is 3-rd of the possible ways to organize this, yeah.

I think they are indivisible and should be described in one place.

Well, they can be decoupled into different libs but are indivisible if you talk in the sense of requirements.

I agree about other of your detalizations.

@tamtakoe
Copy link

I think this format is justified, but it is necessary to separate levels of abstraction.

1 level. Validate framework. Has config with special fields:
$default - default value,
$parsers, $validators, $formatters - arrays of functions that call in series by framework.
This can be asynchronous if return promise (framework wrap returns in q.when). Processing of these arrays is the same in all cases and is realised by unified library. The difference is that this functions return value in parsers and formatters and return boolean in validators (it is function's logic, not framework). Formatters may want to call in reverse order, as is done in AngularJS (why do that?).

2 level. Filters (parsers and formatters) and validators. Filters can include localization and others. Validate.js has library of basic filters and validators, but user can create customs. User determines which should return custom filters (f.e. convert empty strings to undefined). If someone need special bundle of filters or validators for your application. He can set it in default config for all requests/responses.

defaults.$parsers.push(emtyStringToUndefined);
defaults.date.$parsers.push(localToUTF);

@ivan-kleshnin
Copy link
Author

$parsers, $validators, $formatters - arrays of functions that call in series by framework.

With chaining be ready to meet corner cases like "how to deal with a promise in second element if first and third calls were sync" and so on. I still think function composition is better.
No questions – no endless arguing....

Formatters may want to call in reverse order, as is done in AngularJS (why do that?).

Yeah, yeah – one of such questions. Piping is just not flexible enough.

Validate.js has library of basic filters and validators, but user can create customs.

Why include such stuff in Validate.js? It should be a separate library cause all this parsing functions may be reused in different areas. Like HTML data scraping.

@Jokero
Copy link
Contributor

Jokero commented May 27, 2015

@ivan-kleshnin, I reread your comments and realized that I initially didn't understand the meaning of formatting 😃

You mean:

form data -> parsing -> validation -> business data
form data <- formatting <- business data

And I:

form data -> parsing -> validation -> formatting (additional transformation if necessary) -> business data

@Jonahss
Copy link

Jonahss commented May 27, 2015

I'm taking a look at validate.js for parsing the desiredCapabilities object which gets passed into Appium. This is a case of needing some pretty simple js object-validation, but isn't tied to web forms or anything like that (so I need to verify that a property is typeof 'string' and such). The proposal in this thread would probably be the ideal thing for us (we also need the filtering and parsing steps).

This also looks like a related project: https://github.com/molnarg/js-schema/

@ivan-kleshnin
Copy link
Author

@Jokero aren't that diagrams equal just presented in different ways 😃 ?

@tamtakoe
Copy link

how to deal with a promise in second element if first and third calls were sync

Easy!

function chainEvaluator(filters, input) {
    return filters.reduce(function(previousPromise, currentFn, index, arr) {
        return previousPromise.then(function(value) {
           return currentFn(value);
        })
    }, q.when(filters.shift()(input)));
}

var fn1 = function(value) {return value + '-1'};
var fn2 = function(value) {return q.when(value + '-2')}; //return promise
var fn3 = function(value) {return value + '-3'};
var functionsArray = [fn1, fn2, fn3];

chainEvaluator(functionsArray, 'smth').then(function(result) {
    console.log(result);
});

Why include such stuff in Validate.js?

Because without basic validators no one will use the bare framework. I mean pristine, required etc. validators. Maybe toLowerCase etc. parsers/formatters. It will be separate library but validate.js will be uses it by default.

@ivan-kleshnin
Copy link
Author

Because without basic validators no one will use the bare framework. I mean pristine, required etc. validators. Maybe toLowerCase etc. parsers/formatters. It will be separate library but validate.js will be uses it by default.

I'm for separate library as a dependency as well.

Anyway it's good to see that people are mostly agree here.
Details are details. The subject we discuss is crossplatform so I think it's quite important talk.

@tamtakoe
Copy link

I'm for separate library as a dependency as well.

good.

@ansman What do you think?

@ansman
Copy link
Owner

ansman commented May 30, 2015

I don't really understand the problem you are talking about to be honest.

The scope of validate.js has been and will always be a fast, simple and easy way of validating a set of attributes against a set of constraints.

As it stands today there is a simple form parser, this is only because it's such a commonplace thing to do.
There is also a simple attribute cleaner for ease of use with promises.

I'm all for things such as Promises and Reactive programming which makes things like validation easy:

Promise.resolve(document.querySelector("form#signup"))
  .then(validate.collectFormValues) // This is the parsing step, could easily be replaced with another lib
  .then(function(formValues) { // This is validation, the core of validate.js
    return validate.async(formValues, constraints);
  })
  .then(api.signup)
  .then(handleSignupSuccess)
  .catch(ValidationError, handleValidationError) // This is formatting
  .catch(ServerError, handleServerError)
  .catch(handleGenericError);

@ansman
Copy link
Owner

ansman commented Aug 30, 2015

I'm closing this due to inactivity but feel free to comment and I'll reopen it.

@ansman ansman closed this as completed Aug 30, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants