New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remember attempted price list upload submissions #1491

Merged
merged 24 commits into from Apr 12, 2017

Conversation

Projects
None yet
4 participants
@toolness
Contributor

toolness commented Mar 14, 2017

We've had a number of failed attempted price list upload submissions, but the only way we've really known about them (and had enough information to debug them) is due to the submitter emailing us and sending their price list.

This is an experimental PR that attempts to record attempted price list upload submissions and remember various metadata about them (such as number of valid/invalid rows), so that we can easily see what folks are trying to upload, what they're constantly having to fix up the most often, and things like that.

Following a link from the admin index page, users who are part of the new Technical Support Specialists group can choose from a list of attempted submissions:

screen shot 2017-03-15 at 1 03 37 pm

When they choose one, they're taken to a custom detail page that provides more information and allows them to "replay" the submission:

screen shot 2017-03-15 at 1 01 50 pm

Replaying the submission basically restores the original submitter's data_capture:price_list session key and simulates choosing the file they uploaded and submitting it to step 3 of data capture. An alert is now displayed at the top of every step, however, informing the user that they are in "replay mode":

this-is-a-replay

Also, while the user can submit the price list for review in replay mode, it probably isn't a good idea. There might be edge cases where it's useful to do so, however, so we'll simply display the following warning above the submit button:

reminder

Notes

  • This functionality may also be useful for local development, as we can just re-play some pathological cases to manually test things out (rather than e.g. constantly filling out the forms and dragging files to upload widgets over and over). We might even want to add a management command that seeds the DB with some of these pathological cases to make manual testing even easier.

  • In case of errors, it might be nice to provide a "ticket number" that users can mention if they want to email us about something. This would just be the pk of the AttemptedPriceListSubmission record that was created for them. That way if they contact us for tech support, we can easily see exactly what it was they uploaded, and how they filled out the form and stuff.

  • Viewing the AttemptedPriceListSubmission records will only be possible by superusers--not by Data Admins, at least not unless we have compelling reasons to do so. We can put anyone who we want to access this functionality into the Technical Support Specialists group and mark them as staff, and they'll be able to do it.

  • We might want to add a periodic task that regularly deletes old AttemptedPriceListSubmission records (and their associated UploadedFile records) to save space.

  • When replaying an attempted submission, we don't change the UI of the price list upload views in any way. It might be a good idea to prevent the user from actually submitting the price list for review, though, and it might also be good to have a banner at the top of the page that says "This is a replay of an attempted submission from {{ date }} by {{ email }}" for context. Added an alert, which is pictured earlier in this PR's description.

To do

  • Make it possible to simulate the re-submission of the price list, so that we can easily see the same things the user saw when they uploaded their price list.

  • Make it possible to easily re-download the uploaded price list from the admin UI.

  • Review the security of re-downloading the uploaded price list from the admin UI. To be extra safe, we might even want to put the uploaded file in a ZIP file, to ensure that old content-sniffing browsers can't accidentally interpret the file as HTML and become vectors for XSS attacks. Relevant security guides include OWASP's guide on unrestricted file upload and Django's security guide on user-uploaded content.

  • Figure out exactly how we want to store the uploaded files. Right now we're using Django's standard/default file storage, which might not be a great idea since the filesystem on cloud.gov is assumed to be ephemeral. On the other hand, since this functionality is only really intended for debugging shortly after a failed upload, having the files disappear after some amount of time might not actually be that bad--depending on what we mean by "some amount of time". An alternative might be to use django-storages's DatabaseStorage backend. For now I've just implemented a basic file storage backend that uses the database for storage. While this is indeed not a smart thing to do 99% of the time, I think it's OK here. Will comment below for more rationale.

  • Add more tests.

  • Add a new "Tech Support Specialists" group that has permission to view and replay attempted submissions, rather than limiting the functionality to superusers only.

  • Either ensure that we're using at least Postgres 9.4 in dev/staging/production, or revert 5f75a04 to not use a Django JSONField.

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 14, 2017

Yeah, I think file system storage is no bueno for us. The app is running on two instances, so who knows which actual container the files will end up on. The containers also get restarted fairly often (nightly almost it seems), so we could end up losing a lot and having this whole feature not be very useful to us.

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 15, 2017

Good points @jseppi. I've implemented a really basic Django file storage backend that just saves in a DB model called SlowpokeStorage in this PR. I didn't use django-storages' database backend because it does all kinds of funky things to make database storage performant, which require extra configuration on our end. I think we can safely ignore most of the oft-cited performance issues with storing files in the DB because:

  • Price lists aren't uploaded that often, so the write inefficiency won't hurt us too much.
  • We will read the price lists very rarely--only in situations where we need to support COs who are having problems--so the read inefficiencies won't hurt us too much.
  • Similarly, since we only need the files around for supporting COs, we can always delete files after a week or a month, so it's not like we need to store massive amounts of aggregate data in the DB.
  • Price lists don't get that large--we already load them into memory in their entirety when reading their Excel data, so any weaknesses in regards to storing really massive files in a DB shouldn't affect us.

Thoughts @jseppi? In the worst case, we can switch to django-storages, it just felt like a big hassle after reading the instructions for their database backend.

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 15, 2017

Hmm, one flow this doesn't address at all is the price list replacement functionality. Adding similar functionality to that flow would involve extra work... I wish there was an abstraction that could make this easy to apply to both flows.

invalid_row_count = models.IntegerField(null=True, blank=True)
session_state = JSONField()

This comment has been minimized.

@toolness

toolness Mar 16, 2017

Contributor

Since we're in Django 1.9-land now, we can actually use this! yay!

This comment has been minimized.

@toolness

toolness Mar 17, 2017

Contributor

Err, update: this requires Postgres 9.4. We gotta make sure that dev/prod/staging support this, because until now, Travis, at the very least, was running on 9.3!

This comment has been minimized.

@jseppi

jseppi Mar 17, 2017

Contributor

We are on 9.4 in cloud.gov:

'PostgreSQL 9.4.7 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit'

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 17, 2017

Ok, I think this is good to go. The only question remaining is whether we should also add support for recording/replaying attempted submissions in the price list replace workflow in this PR, too. I think it will be straightforward, but will likely involve modifying the AttemptedPriceListSubmission model to indicate which workflow the user was going through.

@jseppi, one option is for you to add support for the other workflow as a way to familiarize yourself w/ this code... of course, we could just punt on that too, since it's not super likely that folks will be using it really soon.

@toolness toolness changed the title from [WIP] Remember attempted price list upload submissions to Remember attempted price list upload submissions Mar 17, 2017

</form>
</div>
{% endif %}

This comment has been minimized.

@toolness

toolness Mar 17, 2017

Contributor

I feel kinda weird adding this to data_capture/step.html, which is used as the base for a number of workflows that are unrelated to the replay functionality (e.g., region 10 bulk upload).

I think we made step.html back when price list upload was the only step-based workflow that existed, but I wonder if it might make sense to make a data_capture/price_list/step.html base template at this point, and move this replay code in there...

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

Yes, I agree there should be a price list upload-specific step.html.

This comment has been minimized.

@toolness

toolness Mar 30, 2017

Contributor

Cool, just filed #1534 to address this later.

@toolness toolness requested a review from jseppi Mar 17, 2017

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 22, 2017

A screenhero-ing might be good for reviewing this since a lot of it is quite outside of my Django knowledge. I'm primarily concerned so far with how the storage stuff works -- it looks like it gets uploaded to disk somewhere (?), but that could be bad since we are on multiple "servers."

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 22, 2017

Ah good idea.

It technically might get uploaded to disk if it's bigger than around 2.5 mb, but so do our price lists--that's just how django deals with file uploads. However, the SlowpokeStorage storage backend I wrote means that the file gets transferred to the db, which is notoriously advised against, but which I think is OK for our particular use case.

@jseppi

Some thoughts I had while reading over all this:

  • I love the functionality and think it will be super helpful
  • What about using S3 for the storage backend? It would be straightforward to set this up on cloud.gov, though I think it will also have some ATO documentation burden since it's another "system" that we'd be going in and out of. We should check with @abisker.
  • I think I recall you mentioning this somewhere, but it would be cool if we could somehow abstract the price list upload flow in way that we could extend it to add the logic for doing the replay.

My biggest concern currently is understanding step_3 from a development perspective.

from contracts.models import (Contract, CashField, EDUCATION_CHOICES,
MIN_ESCALATION_RATE, MAX_ESCALATION_RATE)
@deconstructible
class SlowpokeStorage(Storage):

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

Where'd the name for this come from?

This comment has been minimized.

@toolness

toolness Mar 29, 2017

Contributor

Oh I wanted to make it obvious that it'd be slow and not intended for e.g. storing avatar images that we'd pull out on every request or something, so I thought Slowpoke sounded reasonable.

This comment has been minimized.

@jseppi

This comment has been minimized.

@jseppi

This comment has been minimized.

@jseppi

This comment has been minimized.

@toolness

toolness Mar 30, 2017

Contributor

whoa

This comment has been minimized.

@toolness

toolness Mar 31, 2017

Contributor

For future reference, the above animations are of a Pokemon named Slowpoke, whose existence I was previously unaware of.

def store(cls, f):
hex_hash = cls.get_hex_hash(f)
q = cls.objects.filter(hex_hash=hex_hash)
if q.exists():

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

When would this happen?

This comment has been minimized.

@toolness

toolness Mar 29, 2017

Contributor

Oh it would happen if the user uploaded the same price list multiple times in a row or something. Uh, I guess that might not happen very often given that the same price list probably has the same contract number--I was actually mostly thinking of it from my perspective as a developer, where I am constantly uploading the same price list for development purposes, lol.

This comment has been minimized.

@toolness

toolness Mar 29, 2017

Contributor

Oh, I guess it could also happen if the user goes back to earlier steps and manually uploads the exact same price list in step 3, too. Probably won't happen often but who knows...

</form>
</div>
{% endif %}

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

Yes, I agree there should be a price list upload-specific step.html.

@@ -29,6 +30,10 @@
'data_capture.add_submittedpricelistrow',
])
ROLES['Technical Support Specialists'] = set([

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

I hate to bring this up yet again, but this could require some slight ATO work. In our ATO docs we had to specify all the roles for CALC users, so I'm guessing we'll need to add this one too: https://docs.google.com/document/d/1-UD0cX_Nv_oeRmGy1btUSsaJCM8KaQVQ5CbYAdPmYVI/edit#heading=h.3o7alnk

This comment has been minimized.

@jseppi
@@ -156,6 +183,24 @@ def step_2(request, step):
@require_http_methods(["GET", "POST"])
@handle_cancel
def step_3(request, step):
request_files = request.FILES
if (request.method == 'POST' and

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

This step_3 view is really long and hard to follow :\ It was already that way, but now adding all this replay logic is making it even more so. Any way we could split apart all the branches into maybe separate view functions that could be called from the main step_3 function? Or, do you have any other ideas for making it at least more readable?

This comment has been minimized.

@toolness

toolness Mar 29, 2017

Contributor

yeah blerggg I agree. I will noodle on this and get back to you!

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

<3 Thanks!!

submitter=request.user,
session_state=session_pl
)
if 'file' in request_files:

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

If we're recording the attempt, under what condition would the file not be in the request? I don't doubt that this condition exists, I just can't figure it out in my head.

This comment has been minimized.

@jseppi

jseppi Mar 29, 2017

Contributor

Any why would we create the AttemptedPriceListSubmission model if there isn't a file present?

This comment has been minimized.

@toolness

toolness Mar 29, 2017

Contributor

Oh it's been a while since I looked at this code but I think it would happen if the user had a "pre-filled" file in there. This was added back in #1110... basically, the user may have gotten to step 4 but then realized "oh shoot i want to change something", gone back to step 1 or 2 (rather than e.g. editing the info in-line in step 4), and then just hit "next" on step 3 because they wanted to keep the same price list they uploaded earlier.

I think #1110 might have been added before we added the in-line editing, so it may have been a more compelling use case then?

Anyhow, I created an AttemptedPriceListSubmission because I thought it might be useful to essentially record every time the user passes step 3, regardless of the reason, as there might be something happening under-the-hood that warrants further examination or something. I could be wrong though, so I'm fine w/ not creating the AttemptedPriceListSubmission if they haven't actually uploaded a file.

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 29, 2017

Oh, regarding using S3 for the storage backend, I think that's definitely a possibility, though we'd have to make sure that the S3 buckets aren't publicly accessible. Actually, given that they can't be publicly accessible, I'm not even sure how much of a performance benefit there'd be in using S3 over SlowpokeStorage, if any, especially since S3 actually seems really slow for writes--which we'll be doing often--and very fast for reads--which we'll be doing very infrequently. Hmmmmmmm.

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 29, 2017

The broker allows configuration for public vs private and encrypted or not. I think the default is private actually. But very good point regarding the performance.

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 29, 2017

Ah cool!

Another thing I forgot to mention about the storage (and the model in general): the whole recording/playback system is intended to be a pretty transient thing, in this PR at least, meaning that it should be fine for us to delete everything every so often. This makes me fairly unconcerned about the exact storage mechanism we use, because we won't need to actually migrate the data from one backend to another if we decide to switch storages (doing that would probably be a big hassle). So I think it's OK if we just go with this slowpoke thing for now but switch to a different storage later if it turns out to suck, because the switch shouldn't be that painful.

(Famous last words though, maybe? I've never actually done anything like this before...)

@abisker

This comment has been minimized.

Member

abisker commented Mar 29, 2017

@jseppi @toolness can we grab 5 minutes so you can talk me through the change/we can talk through how it might affect the ATO? Maybe sometime Thursday afternoon?

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 29, 2017

sure! should we also talk about the slack integration we did in #1505 then too?

@abisker

This comment has been minimized.

Member

abisker commented Mar 29, 2017

Sure! Looks like Friday is best, grabbed a slot

@toolness

This comment has been minimized.

Contributor

toolness commented Mar 30, 2017

@jseppi what do you think of the refactoring I did in 9957ad7? Does it make step 3 a bit simpler?

I need to add more docs to it though. I'm hoping it might be straightforward to add support to price list replacement after this ol' refactoring.

@jseppi

This is looking really good 🎉
I like the new Recorder/Replayer implementation -- helps a ton with readability.

Um, I think all my questions/concerns are in review comments. Let me know if anything I said was unclear/dumb :)

@@ -60,11 +62,12 @@ def setUp(self):
registry._init()
def login(self, **kwargs):
kwargs['permissions'] = [PRICE_LIST_UPLOAD_PERMISSION]
perms = kwargs.get('permissions', [])
kwargs['permissions'] = perms + [PRICE_LIST_UPLOAD_PERMISSION]

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

omg I hadn't realized you could join arrays like this in python

@@ -72,6 +75,7 @@ def clear_gleaned_data_if_different_schedule(request):
@login_required
@permission_required(PRICE_LIST_UPLOAD_PERMISSION, raise_exception=True)
@require_http_methods(["GET", "POST"])
@handle_cancel

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Maybe a little comment that this is # for canceling a replay

<p>
The user uploaded
<a href="{% url 'admin:data_capture_send_uploaded_file' id=original.id %}">{{ original.uploaded_file_name }}</a>
({{ original.uploaded_file.contents.size }} bytes).

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Use filesizeformat filter to make this more human-friendly?

This comment has been minimized.

@toolness

toolness Mar 31, 2017

Contributor

Omg I didn't know about that filter!!! AWESOME

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Haha, I didn't either, but Django has filters for everything so I took a look and was happy to find that it did indeed exist :)

return obj
class AttemptedPriceListSubmission(models.Model):

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

What about adding a was_completed (or similarly named) field to indicate if the price list submission made it all the way to to the end? Or I guess actually you could make an association to the SubmittedPriceList model at the end, and then use the presence of that relation/FK existing in order to derive the boolean value. Then in the admin views (especially the list view), you could add a column to show a green or red checkbox if it the upload was submitted. I think this would help us "Technical Support Specialists" easily scan the list view.

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Oooo, in a similar vein we could also track if the user hit "Cancel" (and completed the modal), or if they just left it incomplete. And we could also track the furthest step they made it to... 🤔

This comment has been minimized.

@toolness

toolness Mar 31, 2017

Contributor

Yeah that would be cool! Should they wait for a v2 though? This PR is already kinda big as-is and I'm wondering if we should save enhancements for future PRs, especially with our billable hours limited as they currently are. It'd be great to have some form of this PR deployed when we roll this out to Schedule 70, so that we can help them better with any issues they run into...

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Yes, that's true!

<form method="post">
{% csrf_token %}
<button type="submit" name="cancel" class="button-outline" formnovalidate>Exit this replay</button>

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Hrm, we should make this redirect back to the admin view instead of the CALC home. I think that should be handled as a separate issue/PR though.

def __enter__(self):
return self
def __exit__(self, type, value, traceback):

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Neat, didn't know about __enter__ and __exit__ either. I always learn something reading your code!

final_ctx = {'step': self}
final_ctx.update(self.steps.extra_ctx_vars)
if request is not None:

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

What do you think about throwing an exception for dev assistance if extra_ctx_processors exists but no request is provided?

'submitter',
'uploaded_file_name',
'valid_row_count',
'invalid_row_count',

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

See other review comment about adding a column to indicate if the submission was successful or not.

})
self.assertRedirects(res, Step4Tests.url)
# Ensure that an additional replay wasn't created as a result

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

👍

session_state=request.session[replayer.base_session_key]
)
if 'file' in request.FILES:
self.attempt.set_uploaded_file(request.FILES['file'])

This comment has been minimized.

@jseppi

jseppi Mar 31, 2017

Contributor

Are we setting a "max upload size" anywhere, or is there a sensible default as part of django?

@jseppi

jseppi approved these changes Mar 31, 2017

@jseppi

This comment has been minimized.

Contributor

jseppi commented Mar 31, 2017

:shipit:

@toolness toolness merged commit 8ac4c0a into develop Apr 12, 2017

5 checks passed

QuantifiedCode 25 minor issues introduced. 13 issue(s) fixed.
Details
codeclimate no new or fixed issues
Details
codeclimate/coverage 95.75% (+0.3%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@toolness toolness deleted the remember-uploaded-files branch Apr 12, 2017

@NoahKunin NoahKunin removed the in progress label Apr 12, 2017

@jseppi jseppi referenced this pull request Apr 14, 2017

Merged

Tag and release v2.7.0 #1545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment