Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload HTML directly to S3 bucket, do not dump in database #273

Closed
btbonval opened this issue Jan 13, 2014 · 58 comments
Closed

Upload HTML directly to S3 bucket, do not dump in database #273

btbonval opened this issue Jan 13, 2014 · 58 comments

Comments

@btbonval
Copy link
Member

The S3 static buckets for beta and prod now have an 'html' directory in the base.

When we have HTML, instead of storing it into the database, we want to write it as a file onto S3 (possibly name the file by hash). Instead of storing the html in the database, we want to store the relative static path into something like Note.static_relpath.

The Note detail template would then use an IFRAME pointing at {{ STATIC_URL }}html/{{Note.static_relpath}}.

@ghost ghost assigned btbonval Jan 13, 2014
@btbonval
Copy link
Member Author

We will also need to post-process the HTML already in Production and on the VM and push that out to S3.

This will be a one time deal rather than a recurring thing, so a quick script that doesn't need to hang around should suffice.

This was referenced Jan 13, 2014
@btbonval
Copy link
Member Author

Did a quick search to see how out of fashion IFRAMEs are. Found this question about IFRAME and SEO. Being that SEO is a pretty recent topic, there is a good comment in here: http://productforums.google.com/forum/#!topic/webmasters/Y6DyIR7wLXg

Make sure there is an anchor link to IFRAME content on the page with the IFRAME. That sounds like good practice anyway, in case someone turns off IFRAMEs because they're so 1995.

@btbonval
Copy link
Member Author

sanitize_html parses html in-place on the model. e.g. it loads self.html and saves self.html. We probably want to change this into a filter.

@btbonval
Copy link
Member Author

We probably won't need to batch process HTML across notes, and if we do, the current function will need to be rewritten anyway. Should remove this:
https://github.com/FinalsClub/karmaworld/blob/b7ebe2b1d390232a16618977fb3b19cfa790f7b9/karmaworld/apps/notes/management/commands/process_note_html.py

beautifulsoup is part of the requirements. lxml does one thing, which is in sanitize_html. I have to rewrite sanitize_html to be a filter anyway, so if I replace lxml, the world will be a better, brighter place.

@btbonval
Copy link
Member Author

No need to store a URL for the HTML snippet. Note.slug is supposed to be unique. I'm adding unique, not-null to Document.slug which will inherit to Note.slug. The static S3 filename will be based on the Note slug.

@btbonval
Copy link
Member Author

was trying from django.core.files.storage import default_storage to write files.

unfortunately:

>>> default_storage.bucket_acl
'public-read'

Our static_s3.py configs are for a read-only API interface, which means there won't be uploading? How the heck does collectstatic work if it can't actually write to the S3 bucket using the static_s3.py settings?

I am missing something key.

@btbonval
Copy link
Member Author

I was trying to create a file by simply opening it and writing to it, as per http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html#storage

That gives me an IOError even though open is set to write/create mode:

>>> somefile = default_storage.open('bryantestfile.html', 'w')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/files/storage.py", line 33, in open
    return self._open(name, mode)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/storages/backends/s3boto.py", line 177, in _open
    raise IOError('File does not exist: %s' % name)
IOError: File does not exist: bryantestfile.html

Maybe Django sets the default_storage to read-only mode for static hosting reasons, but switches it for collectstatic. Clearly the bucket has everything it needs:

>>> default_storage.bucket.get_acl()
<Policy: Andrew (owner) = FULL_CONTROL>

@btbonval
Copy link
Member Author

Nothing in docs about default_storage.acl. Nothing about ACL in django-storages. Only thing about ACL is in s3boto, but we can see that bucket ACL from s3boto is just fine.

guess who has two thumbs and has to read source code. this guy. nn/ \nn

@btbonval
Copy link
Member Author

@btbonval
Copy link
Member Author

>>> import storages.backends.s3boto
>>> protected_storage = storages.backends.s3boto.S3BotoStorage(acl='private')
>>> with protected_storage.open('html/bryantest.html', 'w') as s3file:
...     s3file.write(html)
... 
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/files/storage.py", line 33, in open
    return self._open(name, mode)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/storages/backends/s3boto.py", line 177, in _open
    raise IOError('File does not exist: %s' % name)
IOError: File does not exist: html/bryantest.html
>>> protected_storage.acl
'private'
>>> protected_storage.bucket_acl
'public-read'
>>> protected_storage.bucket.get_acl()
<Policy: Andrew (owner) = FULL_CONTROL>

DO NOT WANT

@btbonval
Copy link
Member Author

@btbonval
Copy link
Member Author

According to the above link, the acl comes from here: http://docs.aws.amazon.com/AmazonS3/latest/dev/ACLOverview.html

'public-read' should still give the owner full control, but the allusers group gets read.

It would seem like a bad idea to change the S3 ACL from 'public-read'. Not sure how to access this S3boto stuff as the owner.

@btbonval
Copy link
Member Author

Files are called Keys in the raw s3boto bucket. e.g. default_storage.bucket.get_key('img/asc.gif'). new_key() creates a theoretical file on the S3 bucket. Key.open*() commands don't work, which would be nice for writing directly to the S3 file. Key.send_file() does work. Wrap up the HTML in a little StringIO file-like object and BAM, I just uploaded to S3.

Tested and confirmed. Ugly as junk.

>>> flo = StringIO(html)
>>> nk = default_storage.bucket.new_key('html/bryantest.html')
>>> nk.exists()
False
>>> nk.send_file(flo)
>>> nk.exists()
True
>>> with default_storage.open('html/bryantest.html', 'r') as s3file:
...     print s3file.read()
... 

<html>
<body>
<a href="whaaaat">the</a>
<a href="test" target="_blank">
woop
</a>
<a href="nope" target="werrird">wa</a>
</body>
</html>

@btbonval
Copy link
Member Author

Most of the code is written now. I tried to kick off a process to convert HTML in the database to files on S3, but failed:

(venv)vagrant@vagrant-ubuntu-precise-32:~/karmaworld$ python manage.py populate_s3
Traceback (most recent call last):
...
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/management/commands/populate_s3.py", line 42, in handle
    htmlflo = StringIO(note.html)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue001' in position 10111407: ordinal not in range(128)

"The StringIO object can accept either Unicode or 8-bit strings, but mixing the two may take some care. If both are used, 8-bit strings that cannot be interpreted as 7-bit ASCII (that use the 8th bit) will cause a UnicodeError to be raised when getvalue() is called."
http://docs.python.org/2/library/stringio.html

might as well pass the HTML into BeautifulSoup to see if it can read in the data and output it in consistent UTF-8.

@btbonval
Copy link
Member Author

liar liar pants on fire. It turns out BeautifulSoup does not output UTF-8 by default even though all the docs say it does. Gotta run soup.prettify("utf-8") and suddenly StreamIO is pleased.

@btbonval
Copy link
Member Author

oh good. random disconnection errors or something. More or less exactly what I want to deal with right now.

(venv)vagrant@vagrant-ubuntu-precise-32:~/karmaworld$ python manage.py populate_s3
Processing html/mit6_007s11_lec07pdf.html
Traceback (most recent call last):
...
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/management/commands/populate_s3.py", line 48, in handle
    newkey.send_file(htmlflo)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/boto/connection.py", line 910, in make_request
    return self._mexe(http_request, sender, override_num_retries)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/boto/connection.py", line 872, in _mexe
    raise e
socket.error: [Errno 32] Broken pipe

@btbonval
Copy link
Member Author

well I guess I won't be running this overnight to process.

Can't test if anything worked until I get one Note onto S3 to see if my VM hosts it properly. Can't get one Note onto S3 because broken pipe.

Pushing WIP to origin as feature_html_on_s3 with commit HEAD 87bf8e2

@btbonval
Copy link
Member Author

rebased master into branch and ran tests.

... still running.

still running?

@btbonval
Copy link
Member Author

top says the CPU is mostly running SSHD and top. tests deadlocked?

@btbonval
Copy link
Member Author

Looks like the manage.py tests are stuck running Xvfb, which is in turn not running anything (although it should run firefox). Time to double check master still works.

vagrant@vagrant-ubuntu-precise-32:~$ ps ax | grep python
 3219 pts/1    S+     0:02 python manage.py test
 3286 pts/0    S+     0:00 grep --color=auto python
vagrant@vagrant-ubuntu-precise-32:~$ pstree -p | grep -C 3 3219
        |-rsyslogd(828)-+-{rsyslogd}(837)
        |               |-{rsyslogd}(838)
        |               `-{rsyslogd}(839)
        |-sshd(799)-+-sshd(1158)---sshd(1244)---bash(1245)---python(3219)---Xvfb(3242)
        |           `-sshd(2078)---sshd(2164)---bash(2165)-+-grep(3289)
        |                                                  `-pstree(3288)
        |-udevd(323)-+-udevd(399)

@btbonval
Copy link
Member Author

Tests completed on master branch in ~4 minutes.

Something tripped up feature_html_on_s3 branch so that tests deadlock :( No backtraces to help.

@btbonval
Copy link
Member Author

python manage.py test -v 2 seems to be giving better output. Looks to be hungup on Evernote.

Test searching for a school by partial name ... ok
Test upload of an Evernote note ...

Same pstree as before with the dangling Xvfb. Definitely stuck here.

Code:

def testEvernoteConversion(self):
"""Test upload of an Evernote note"""
self.doConversionForPost({'fp_file': 'https://www.filepicker.io/api/file/vOtEo0FrSbu2WDbAOzLn',
'course': str(self.course.id),
'name': 'KarmaNotes test 3',
'tags': '',
'mimetype': 'text/enml'})

calls
def doConversionForPost(self, post, user=None, session_key=None):
self.assertEqual(Note.objects.count(), 0)
r_d_f = RawDocumentForm(post)
self.assertTrue(r_d_f.is_valid())
raw_document = r_d_f.save(commit=False)
raw_document.fp_file = post['fp_file']
convert_raw_document(raw_document, user=user, session_key=session_key)
self.assertEqual(Note.objects.count(), 1)

Only place I can imagine it hanging is on convert_raw_document?

@btbonval
Copy link
Member Author

The feature_html_on_s3 branch has no changes in the raw_document app.

@btbonval
Copy link
Member Author

Double ctrl-c got a super long backtrace!

Test upload of an Evernote note ... ^C^CTraceback (most recent call last):
  File "manage.py", line 14, in <module>
    execute_from_command_line(sys.argv)
...
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 255, in execute
    output = self.handle(*args, **options)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/south/management/commands/test.py", line 8, in handle
    super(Command, self).handle(*args, **kwargs)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/core/management/commands/test.py", line 89, in handle
    failures = test_runner.run_tests(test_labels)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django_nose/runner.py", line 155, in run_tests
    result = self.run_suite(nose_argv)
...
  File "/usr/lib/python2.7/unittest/case.py", line 327, in run
    testMethod()
  File "/home/vagrant/karmaworld/karmaworld/apps/document_upload/tests.py", line 53, in testEvernoteConversion
    'mimetype': 'text/enml'})
  File "/home/vagrant/karmaworld/karmaworld/apps/document_upload/tests.py", line 36, in doConversionForPost
    convert_raw_document(raw_document, user=user, session_key=session_key)
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/gdrive.py", line 244, in convert_raw_document
    newkey.send_file(htmlflo)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/boto/s3/key.py", line 727, in send_file
    query_args=query_args)

Ahh that'd certainly be unique to this branch. Hanging on direct upload to S3. The html folder on the appropriate S3 is empty. Guess I'll play with this feature a little more, it's still leaving cake on the toothpick.

@btbonval
Copy link
Member Author

note for later: It seems worth moving this one function for uploading to S3 from gdrive.py into Note.

@btbonval
Copy link
Member Author

Testing a PDF that rends to 2.87 MiB of HTML using (mostly) what would be performed right now. Upload seems to do zilch.

In [7]: rds = RawDocument.objects.all()
In [14]: fp_file = rds[1].get_file()
In [19]: html = pdf2html(fp_file.read())
Preprocessing: 88/88
Working: 88/88
In [20]: len(html)
Out[20]: 3012503
In [21]: fhtml = notes[0].filter_html(html)
In [22]: len(fhtml)
Out[22]: 3365756
In [23]: filepath = notes[0].get_relative_s3_path()
In [24]: filepath
Out[24]: 'html/certificate-path-validation-testingpdf.html'
In [28]: fhtmlflo = StringIO(fhtml)
In [29]: newkey = default_storage.bucket.new_key(filepath)
In [30]: newkey.exists()
Out[30]: False
In [33]: fhtmlflo.seek(0)
In [35]: def status_update(transmit, maximum): print "transferred {0} / {1}".format(transmit, maximum)
In [36]: newkey.send_file(fhtmlflo, cb=status_update)
transferred 0 / 0
transferred 0 / 0
transferred 0 / 0
...

@btbonval
Copy link
Member Author

Rewrote upload code to use set_contents_from_string. Moved upload code into Note. Replaced copy pasta in gdrive.py and process_s3.py to make use of the upload code in Note. commit 7b61d07

Running tests again.

@btbonval
Copy link
Member Author

a number of tests errored. It looks like the tests hung, but firefox is actively running at the moment. It's been 5 minutes. :/

@btbonval
Copy link
Member Author

karmaworld.apps.notes.models: ERROR: Error with IndexDen:
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 131, in create_index
    raise TooManyIndexes(e.msg)
TooManyIndexes: "Too many indexes for this account"

also made a copy/paste mistake.

@btbonval
Copy link
Member Author

A few errors showing up, hanging on the firefox test as before.

This time, however, there are three HTML files on the S3!

The hanging thing bothers me. I'll have to use some verbose to see where that is happenin.

@btbonval
Copy link
Member Author

Test upload of an Evernote note ... ok
Test upload of a file with a bogus mimetype ... ok

No files in S3 after these.

The later upload tests have files in S3 after they run.

@btbonval
Copy link
Member Author

Tests didn't hang using verbose output. How bizarre.

Test that Note.save() doesn't make a slug ... ERROR
Search for a note within IndexDen ... ERROR
Test that the slug field is slugifying unicode Note.names ... ok
ERROR
testCreateCourse (test_selenium.AddCourseTest) ... ok

This test appears moot now that slug is unique and not nullable.

======================================================================
ERROR: Test that Note.save() doesn't make a slug
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/tests.py", line 85, in test_save_no_slug
    self.note.save() # re-save the note
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 54, in execute
    return self.cursor.execute(query, args)
IntegrityError: null value in column "slug" violates not-null constraint

I'm guessing this is due to IndexDen not adding any more indices right now.

======================================================================
ERROR: test suite for <class 'karmaworld.apps.notes.tests.TestNoes'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/nose/suite.py", line 227, in run
    self.tearDown()
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/nose/suite.py", line 350, in tearDown
...
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/tests.py", line 58, in tearDownClass
    api.delete_index(secret.INDEX)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 38, in delete_index
    self.get_index(index_name).delete_index()
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 152, in delete_index
    _request('DELETE', self.__index_url)
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/indextank/client.py", line 457, in _request
    raise HttpException(response.status, response.body)
HttpException: HTTP 404: ["No index existed for the given name"]

Three failures from error, no true failures.

Time to check it by hand!

@btbonval
Copy link
Member Author

Removed obsolete null Note.slug test, down to 2 errors caused by IndexDen. Can't get much further than this for now.

@btbonval
Copy link
Member Author

uploaded objects to S3 do not give permission to open/download them.

Need to do what is in this comment: #68 (comment)

@btbonval
Copy link
Member Author

Figured out the IndexDen problem. Back to using Beta's IndexDen and all the tests ran just fine.

@btbonval
Copy link
Member Author

These docs are about as helpful as a bag of wet socks. I guess there are uses for a bag of wet socks, but not many.
http://boto.readthedocs.org/en/latest/ref/s3.html

Here's what an Everyone Open/Download policy looks like in s3boto:

In [35]: policy.acl.grants[4].permission
Out[35]: u'READ'
In [36]: policy.acl.grants[4].display_name
In [37]: policy.acl.grants[4].type
Out[37]: u'Group'
In [38]: policy.acl.grants[4].uri
Out[38]: u'http://acs.amazonaws.com/groups/global/AllUsers'
In [39]: policy.acl.grants[4].id
In [42]: policy.acl.grants[4].__class__
Out[42]: boto.s3.acl.Grant

So to make that, it'd be something like

from boto.s3.acl import Grant
# once key exists
policy = newkey.get_acl()
policy.acl.add_grant(Grant(permission=u'READ', type=u'GROUP', uri=u'http://acs.amazonaws.com/groups/global/AllUsers'))

@btbonval
Copy link
Member Author

Permission attempt failed. No errors, but the permissions according to S3 do not include Everyone.

Time for guess and check.

@btbonval
Copy link
Member Author

I think the first problem is that changing the policy as noted above does not save that policy remotely. Probably need to call one of the newkey.set_*acl() commands.

In [12]: newkey.set_acl(policy)
S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MalformedACLError</Code><Message>The XML you provided was not well-formed or did not validate against our published schema</Message><RequestId>3E57DBBC88D03C8E</RequestId><HostId>W1O4/vy8nDyXEhcgawGHyJrCFmGsaYpqwPcE5CwaLVWVXhuSfB/Suhq/6w0YFMSu</HostId></Error>

Here's a problem. Converting the permission into XML ignores the AllUsers URI.

In [23]: all_read.uri
Out[23]: u'http://acs.amazonaws.com/groups/global/AllUsers'
In [24]: all_read.to_xml()
Out[24]: u'<Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="GROUP"><EmailAddress>None</EmailAddress></Grantee><Permission>READ</Permission></Grant>'

@btbonval
Copy link
Member Author

type is "GROUP". Looking at Boto source code it is case sensitive 'Group'.
https://github.com/boto/boto/blob/develop/boto/s3/acl.py#L155-L156

I'm tempted to write a ticket over there, but it's probably one of those things where the standard for the XML or whatever is case sensitive, therefore the Python must be as well.

@btbonval
Copy link
Member Author

Here's what the grant XML should look like when it's correct vs what is being generated (identical):

In [48]: oldkey.get_xml_acl()
Out[48]: '...<Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant>...'
In [50]: all_read.to_xml()
Out[50]: u'<Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant>'

So the problem appears to be with boto's ability to generate either the ACL XML or the Policy XML in a way that satisfies S3.

As an experiment, let's just take the preexisting acl text and write it to the new key.

In [51]: newkey.set_xml_acl(oldkey.get_xml_acl())
In [52]:

Looks good on the S3 management page.
I guess I'll just grab that raw XML and put that into the source code. :(

@btbonval
Copy link
Member Author

Fugly fugly fugly but it worked. That XML ACL is huge to be dropping in as a string, but boto is too messed up to do anything else I guess. I see the file on S3 with proper ACLs.

When viewing on the site, the URL asks if I want to download it, rather than showing it in the IFRAME.

Changed over to static S3 properly, and it still pops up a download question. It's an HTML file! Maybe the meta data is wrong?

@btbonval
Copy link
Member Author

Yup. Metadata problem.
content-type: application/octet-stream

Gotta make sure these things all get uploaded with content-type as text/html.

That fixes the problem, but it takes forever to download from S3! Also the one I'm looking at looks terrible.

@btbonval
Copy link
Member Author

DIEEEEEE BOTOOOOO!!!! (read as: boto.s3 doesn't do nothin with metadata!?)

In [5]: oldkey = default_storage.bucket.new_key('html/14_motor1pdf.html')
In [6]: oldkey.exists()
Out[6]: True
In [7]: oldkey.metadata
Out[7]: {}
In [8]: oldkey.get_metadata()
---------------------------------------------------------------------------
TypeError: get_metadata() takes exactly 2 arguments (1 given)
In [9]: oldkey.get_metadata('content-type')
In [10]: oldkey.get_metadata('Content-Type')
In [11]: help(oldkey.get_metadata)
Help on method get_metadata in module boto.s3.key:

get_metadata(self, name) method of boto.s3.key.Key instance
In [15]: oldkey.get_metadata(oldkey.name)
In [16]:

btw there is absolutely content-type on every single object, but especially this one when I explicitly set.

@btbonval
Copy link
Member Author

Also tried the above iwht lookup instead of new_key, but I suspect they are exactly the same thing.

@btbonval
Copy link
Member Author

get_metadata is just a wrapper around metadata attribute.
https://github.com/boto/boto/blob/develop/boto/s3/key.py#L523-L524

Here's where it gets metadata, during open_read() (not during __init__.py, of course!). not even a memoized fetching dict, just a dict.
https://github.com/boto/boto/blob/develop/boto/s3/key.py#L274-L275

I don't have enough middle fingers for this.

In [25]: oldkey.open_read()
In [26]: oldkey.metadata
Out[26]: {}
In [27]: oldkey.metadata.__class__
Out[27]: dict

@btbonval
Copy link
Member Author

So even if I /read/ the metadata, it'd just be a local cached dict that gets updated.
https://github.com/boto/boto/blob/develop/boto/s3/key.py#L526-L534
https://github.com/boto/boto/blob/develop/boto/s3/key.py#L536-L537

It doesn't push that stuff anywhere. ever.

@btbonval
Copy link
Member Author

Two types of metadata.
http://www.bucketexplorer.com/documentation/amazon-s3--amazon-s3-objects-metadata-http-header.html

Looks like HTTP Headers are used to set the Metadata for Files. But when? on upload?

@btbonval
Copy link
Member Author

From above link:
"HTTP Headers: You can specify metadata for Amazon S3 Objects (Files), which are Name- Value pairs, which can be sent along with Amazon S3 PUT Request, similar to other standard HTTP headers. Once you upload the S3 Object, you cannot update the Object metadata on Amazon S3. The only way to modify the Object Metadata is to make a copy of the Object and set the Metadata."

They can be changed at the S3 console. So it looks like headers needs a dict with Content-Type. Let's try!

@btbonval
Copy link
Member Author

Well I guess I dun gone shoopted some woops. Passed {'Content-Type': 'text/html'} into Key.set_contents_from_string() headers parameter. Wouldn't you know it, S3 manager shows the right content type.

Better still, the file uploaded with a preview ready to go. The HTML still looks like junk, but that's the fault of pdf2html or something. Not the problem of this ticket.

@btbonval
Copy link
Member Author

pulled in master and running tests.

In the meantime, clicking around the VM site. Got a weird javascript error:

TypeError: $(...).dataTable is not a function

Testing finished. One error. Seems gdrive auth was refused? Guess I'll run it all again.

======================================================================
ERROR: Test setting the user of an uploaded document
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vagrant/karmaworld/karmaworld/apps/document_upload/tests.py", line 117, in testSessionUserAssociation3
    session_key=s.session_key)
  File "/home/vagrant/karmaworld/karmaworld/apps/document_upload/tests.py", line 36, in doConversionForPost
    convert_raw_document(raw_document, user=user, session_key=session_key)
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/gdrive.py", line 203, in convert_raw_document
    service = build_api_service()
  File "/home/vagrant/karmaworld/karmaworld/apps/notes/gdrive.py", line 73, in build_api_service
    return build('drive', 'v2', http=credentials.authorize(httplib2.Http()))
...
  File "/var/www/karmaworld/venv/local/lib/python2.7/site-packages/oauth2client/client.py", line 629, in _do_refresh_request
    raise AccessTokenRefreshError(error_msg)
AccessTokenRefreshError: Invalid response 403.

@btbonval
Copy link
Member Author

Random fluke. Second test run finished fine.

Removed all pyc files and restarted the VM web system. Javascript error cleaned up.

Created a course. Uploaded a PDF. Viewed the PDF. All good.

Deleted course from moderator page. Cascaded down to note and tags. Evyting be irie.

Beta can't handle this kind of awesome, so holding back merge until tomorrow's meeting.

@AndrewMagliozzi
Copy link
Member

Sweet. I can't wait to see it in action later today.

On Fri, Jan 17, 2014 at 12:58 AM, Bryan Bonvallet
notifications@github.comwrote:

Random fluke. Second test run finished fine.

Removed all pyc files and restarted the VM web system. Javascript error
cleaned up.

Created a course. Uploaded a PDF. Viewed the PDF. All good.

Deleted course from moderator page. Cascaded down to note and tags.
Evyting be irie.

Beta can't handle this kind of awesome, so holding back merge until
tomorrow's meeting.


Reply to this email directly or view it on GitHubhttps://github.com//issues/273#issuecomment-32582041
.

@btbonval
Copy link
Member Author

Merged into master at commit 5319934

@btbonval
Copy link
Member Author

how to delete all files in the html/ directory, for reference.
http://stackoverflow.com/questions/11426560/amazon-s3-boto-how-to-delete-folder

from django.core.files.storage import default_storage
keys = []
for key in default_storage.bucket.list('html/'):
    if key.name[-1] == '/':
        # placeholder for a directory, don't delete
        continue
    keys.append(key)
    if len(keys) >= 250:
        default_storage.bucket.delete_keys(keys)
        keys = []
if len(keys):
    default_storage.bucket.delete_keys(keys)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants