-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle huge CSV files with streaming reads #2
Comments
For this to work I need to move away from the approach that copies the content of the CSV file directly into the datasette-import/datasette_import/templates/import_create_table.html Lines 342 to 349 in 0b294dc
Instead I'm going to treat files (both opened and drag-dropped) slightly differently - I'll hide the textarea and replace it with a static element that previews the first X bytes of the file, with a button to cancel the file upload which switches back to the paste area. (This is why I renamed the plugin from I'm going to need to read the file twice - once for the 100 row preview, and then again for the actual import. |
Made an incomplete start on this here: diff --git a/datasette_import/templates/import_create_table.html b/datasette_import/templates/import_create_table.html
index b7b1d0d..9492958 100644
--- a/datasette_import/templates/import_create_table.html
+++ b/datasette_import/templates/import_create_table.html
@@ -241,6 +241,8 @@ function updated() {
const limited = rateLimiter(updated, 1000);
+let selectedFile = null;
+
contentTa.addEventListener('change', limited);
contentTa.addEventListener('keyup', limited);
limited();
@@ -340,16 +342,27 @@ function parseJsonArray(string) {
function setupTextareaWithFileInput(textarea) {
function readFileAndUpdateTextarea(file) {
- const reader = new FileReader();
- reader.onload = (e) => {
- textarea.value = e.target.result;
- limited();
- };
- reader.readAsText(file);
+ // Special handling for tsv/csv
+ if (["text/tab-separated-values", "text/csv"].includes(file.type)) {
+ selectedFile = file;
+ textarea.value = 'Selected file: ' + file.name;
+ textarea.disabled = true;
+ document.querySelector('.import-file-input').value = file;
+ fileInput.value = file;
+ } else {
+ console.log(file);
+ const reader = new FileReader();
+ reader.onload = (e) => {
+ textarea.value = e.target.result;
+ limited();
+ };
+ reader.readAsText(file);
+ }
}
// Create a file input element
const fileInput = document.createElement('input');
+ fileInput.className = 'import-file-input';
fileInput.type = 'file';
fileInput.style.display = 'block';
fileInput.addEventListener('change', (event) => { |
On Mobile Safari on my iPhone trying to import a 250MB CSV file crashed the browser, because it tried to dump the entire thing into the
<textarea>
.I think Papaparse can handle these better than that - by only loading chunks of the CSV into memory at a time and writing those to Datasette without loading the whole thing.
I built a tiny prototype and tested that on my iPhone here: https://static.simonwillison.net/static/2024/csv-row-count.html (counting one row at a time) and https://static.simonwillison.net/static/2024/csv-row-count-chunk.html (counting rows in chunks) - in both cases it could handle a giant CSV file without crashing, although here it was just incrementing a row counter.
The text was updated successfully, but these errors were encountered: