Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO UI analysis #11885

Merged
merged 167 commits into from Jun 20, 2017
Merged

DO UI analysis #11885

merged 167 commits into from Jun 20, 2017

Conversation

nobuti
Copy link
Contributor

@nobuti nobuti commented Mar 29, 2017

This PR implements #11802, #11621 and #11619.

do-ui

Things to do:

  • More tests
  • Column name for every measurement
  • Get rid of move cursor for measurement item
  • Test behaviour deeply

How to test

  • We can add/remove multiple measurements.
  • Every measurement is working independently.
  • Check the new columns are generated properly after an analysis is executed. Also, the new column name is a combination of measurement, denominator and timespan selected values.
  • Check the quota limits is working properly for multiple measurements. The rule is each analysis will consume n measurements * y rows.
  • Check that we can edit the analysis once saved.
  • Check the form's validation. Only measurement field is required.
  • Check the save button is working properly:
    • A user can't save an invalid form
    • Once saved, the button is disabled until the form changes.
  • This PR introduces a new component for this analysis. The first dropdown allows filtering on the frontend side showing up to 100 items to select. We should check the performance is good.
  • Check that tracking css classes are generated properly.

@ethervoid
Copy link
Contributor

I'm working on the NULL issue right now

@@ -4,6 +4,7 @@
@import '../cartoassets/entry';
@import '../colorpicker/bootstrap-colorpicker/bootstrap-colorpicker';

@import 'vars';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@saleiva
Copy link
Contributor

saleiva commented Jun 15, 2017

The thing is that then we will lose access to the analysis that works in favour of an analysis that doesn't work. To be it sounds too early, but @kevin-reilly to decide

@xavijam
Copy link
Contributor

xavijam commented Jun 15, 2017

Agree @saleiva, we should wait until we have the NULLs issue fixed 👍 . Go go go go!

@ethervoid
Copy link
Contributor

I've created a PR that fixes the reported NULL problem (returning NULL could be driven by multiple causes. In this case was that we're creating the "boundaries" of the dataset and we're including the NULL values and that seems to cause a misbehavior that leads the DO to think that it's dealing with polygons.

In any case, any use of DO with polygon datasets is could not be possible, we need to know if we're able to interpolate the measure with the polygons. If you want to know more about this, please ask me.

@saleiva
Copy link
Contributor

saleiva commented Jun 15, 2017

About the polygons part... @ethervoid , can you elaborate? Not sure I follow

@ethervoid
Copy link
Contributor

Sure, DO could interpolate the measure value using polygons but we have some especial cases where we can't (you can't see it here).

So in some cases (I don't fully understand how DO data works yet), we're not able to return a value so we return NULL. Talking with @xavijam both of us agree in giving the user some kind of feedback in this cases or filter that measures directly (we have to previously analyze the dataset). I've added a note in the roadmap to take it into consideration.

Hope I've explained 🤞

@kevin-reilly
Copy link

Did the staging version revert to the production version somehow? Am I missing something?

I'm seeing this somehow:

kr test carto 2017-06-15 17-52-06

@xavijam
Copy link
Contributor

xavijam commented Jun 16, 2017

Sorry @kevin-reilly, something happened with the new feature flag and it was not enabled. It is enabled again. Apologies again :(.

@xavijam
Copy link
Contributor

xavijam commented Jun 16, 2017

There is something I've suffered several times (and reported by @kevin-reilly too), but not sure how to manage it. Steps:

  • Create a new DO analysis, with a measurement A, timestamp B.
  • Analysis will generate a column called A_B (for example).
  • After finishing it successfully, the user styles the layer by that column (obtaining a choropleth map 💃 ).
  • Turns out, he/she wants to change that DO analysis, and selects another timestamp, in this case, C.
  • So, the analysis will generate another column, in this case, will be called A_C.
  • When the user applies the new analysis it will fail because the Map instantiation will fail too, style will be referencing to a non-existant column A_B.

What should we do in this case @saleiva @kevin-reilly? Not sure if we should reset the styles of the analysis applied.

@ethervoid
Copy link
Contributor

Fix for the NULL issue reported by @kevin-reilly deployed

@kevin-reilly
Copy link

Regarding @xavijam comment about formatting, ideally if A_B is used for formatting that formatting should be applied to A_C when the Analysis is changed.

If that is not possible, if A_B no longer exists because of the change to A_C, than the formatting should return to the default.

How is this handled outside of Analysis? For example, if I have a column called Sales that I use for formatting, what happens if I drop that column using SQL?

@kevin-reilly
Copy link

Couple more things I'm seeing after testing today. All tests using this map and the lightrailstations dataset unless otherwise noted.

Normalizing

Using the Hispanic Population 2015 dataset:

If I do not select Normalize, I cannot select a boundary; the selector defaults to "No values." The analysis completes with different values for most points but with some null values.

If I do select Normalize > Population, the selector allows me to choose a boundary, but the Analysis returns a null for each point. I tried this with both US Census Blocks and Shoreline clipped Elementary School Districts.

If I do select Normalize > Per sq km, I cannot select a boundary; the selector defaults to "No values" and the Analysis returns a null for each point.

Boundaries

  • These should be called "Geographic Units"
  • If there is both a Shoreline clipped and a non-shoreline clipped geography, we should only present the option for Shoreline clipped and remove the "Shoreline clipped" prefix from the description. This is confusing
  • How are these sorted? It should be largest to smallest but I'm not clear it works that way.

Timespans

The Timespan options are a bit strange for many datasets. Looking at Hispanic Population again, I see:

  • 2015-2015
  • 2014-2014
  • 2011-2015
  • 2010-2013
  • 2010-2010
  • 2006-2010

The single year timespans should just show "2015" and not 2015-2015, for example.

Region

Using the lightrailstation dataset which is only in the US, the Region selector only presents that as an option when I click the pulldown. This is correct, but if there is only one selection, it should populate by default and not make me select it.

Using the Irish Counties dataset (which is only in Ireland...), when I select the Region menu, it only offers me "United Kingdom." I believe this should also offer me EU datasets as well.

I created a dataset called Dummy which only includes two points, one in Texas and one in Ireland. When I try to create an Analysis for this data set, the Region menu offers me United States, Canada, Spain, and European Union but not United Kingdom. This seems wrong.

@jorgesancha
Copy link
Contributor

Hey @kevin-reilly

I'm not seeing the same results that you are seeing regards normalization. I was able to select boundaries with that dataset and I always got results back, except with the first case where there were sometimes null values.

Not normalized
image

Normalized
image

Normalized > Per Square Km
image

Regarding everything else, while I agree with pretty much all of your feedback, I don't think we should use this ticket to solve what are fundamentally bad decisions on the design of the DO architecture or inconsistencies in the data. Bear in mind that what we show in the UI ("Shoreline clipped", etc... or the actual Timespans) is what we get from the DO API, and if we want to change something, we should change it there. Making adjustments on the client to cater for these things is not the right way to go.

Regarding "Boundaries" vs "Geographic Units", I would think twice before changing it; or if we do, we should probably change it everywhere else (in the DO, I mean). So far, everyone has referred to this as boundaries, so is quite a significant change.

If there are no more client side issues to fix, I would suggest we put this in production under feature flag and open it to team, and use this interface to:

  • Better explore the DO data and potential inconsistencies
  • Come up with
    • a) a list of issues we want to prioritise on the DO (whether changes to the DO or the actual Data) before we announce it
    • b) any potential changes to the UI to make things clear or better based on the feedback we get

What do you think? Should we get together to discuss briefly this afternoon?

cc/ @juanignaciosl @saleiva

@xavijam
Copy link
Contributor

xavijam commented Jun 19, 2017

If that is not possible, if A_B no longer exists because of the change to A_C, than the formatting should return to the default.

Right now we don't have any logic when an analysis has changed and a "conditional" style was set. But we have it when an analysis is removed, trying to keep the previous style.

How is this handled outside of Analysis? For example, if I have a column called Sales that I use for formatting, what happens if I drop that column using SQL?

If the user styles the layer by a column called "example":

  • and drops that column, the map instantiation will fail (error will appear), and will need to change the style in order to visualize it again.
  • and change the SQL only selecting, the cartodb_id, the_geom, the_geom_webmercator, it will "reset" the style and apply it the default one.

@saleiva
Copy link
Contributor

saleiva commented Jun 19, 2017

Hi guys, I might be arriving late to the party (:tada:) but remember we ideally should show the default time-span as selected on the input. If not, it looks like a needed step.

@juanignaciosl
Copy link
Contributor

I agree with jsancha: +1000 to feature-flag this and open issues with fixes and improvements.

@ethervoid
Copy link
Contributor

@saleiva right now, AFAIK, we don't have a way from DO that defines a time span as the default for a numerator measurement. We could add a new column for it in the metadata part of the database that points the last timespan as the default one but the closest we're is to order them and pick the first of it.

The problem with this approach is that we have only strings for the time spans so it's not 100% reliable. I've added a new issue that would make us possible to filter/sort the time spans.

@kevin-reilly
Copy link

I'm ok with opening this up to more internal users via feature flag. I know there are people looking forward to testing this so that will help expose any other issues.

I am a bit concerned that @jorgesancha was not able to reproduce my normalization issue. I checked this exact thing a number of times on Friday to make sure I wasn't doing something wrong. What would cause us to see different results?

I agree absolutely that Builder should present all datasets made available by the API. The API is the correct place to address my comment about the 'shoreline' clipped datasets. We should put that in the backlog.

@ethervoid
Copy link
Contributor

ethervoid commented Jun 19, 2017

@kevin-reilly we're currently working on the clipped issue here

Regarding the normalization issue, it's weird and concerning so we should take a look to it and open an issue if we're able to have some steps to reproduce.

@nobuti
Copy link
Contributor Author

nobuti commented Jun 20, 2017

Guys, this PR has grown too much, it's unmanageable, so we are creating single issues for every fix we need to make in order to improve the DO analysis. So, I'm merging this into the data-observatory-analysis-ui and opening PR from there.

@nobuti nobuti merged commit a7d88f9 into data-observatory-analysis-ui Jun 20, 2017
@nobuti nobuti deleted the 11802-do-requests branch June 20, 2017 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet