[Feature Request]: CSV import/export #64

Mega-Volti · 2023-05-26T06:58:47Z

For general import/export functionalities, CSV is the most flexible. Ultimately, most data from other services can be converted to CSV format (more of less easily), making it a near universal import/export format.

CSV import is the harder of the two I assume. Features ideal for this are:

Well defined input format, where name, IGDB/TVDB/IMDB/Goodreads/etc. IDs, data watched/read/played/etc., status etc. can be provided
Only either name or ID should be mandatory, if one is provided the other should be looked up automatically, and for other fields, sane defaults should be set (e.g. no date watched set - import with todays date)
Name matching necessarily needs to be fuzzy - although I think the DB APIs do this already, first match to name should be fine
There should be some sort of reasonably easy check for name matching, to see whether the fuzzy match got the right item

As example, I think Simkl (https://simkl.com/) does this nearly perfect. It does name matching, presents a tabular-style view for easy manual checking of important items and it creates a CSV error report for items that couldn't be matched which enables reasonably easy manual import of the rest.

CSV export on the other hand should be reasonably easy I think. That should just be a dump of all relevant fields in the database, written to a text file, using the same column headers as the import functionality. Both Simkl and Goodreads have excellent CSV export functionality which might be a good template.

IgnisDa · 2023-05-29T11:12:02Z

I agree with needing import/export but am not sure about the export format being CSV since it does not support nested data structures. I think JSON would be better.

Mega-Volti · 2023-05-29T17:16:28Z

I think CSV is the easiest and most universal but yes, if some functionality actually does require nested data structures, then creating a JSON file from a CSV isn't all that complicated anyway, so either will work.

What use case do you see for nested data structures? Both Goodreads and Simkl offer CSV exports (which makes CSV kind of a universal import format as well), so at least for media tracking there doesn't seem to be a need for anything more complicated.

IgnisDa · 2023-05-30T04:33:33Z

What use case do you see for nested data structures?

For example a media item can be added to the seen history multiple times. How can I represent this in a CSV?

Mega-Volti · 2023-05-30T11:23:10Z

I think Simkl "solves" this by only saving the latest view date in the CSV, plus potentially a view count. But you are correct, the view history (dates) for the other times would be lost in that case, while they could be preserved within a JSON structure. Simkl also offers JSON export functionality which preserves all information.

It's certainly a tradeoff, complete information vs simplicity of the format. Since either one is text based and it's easy enough to convert one to the other manually if necessary, I think either one would be great options.

IgnisDa · 2023-05-30T12:50:06Z

Cool. I think I'll go with JSON exports then.

Mega-Volti · 2023-05-30T22:41:24Z

Awesome :) Imports as well? That's probably the harder part to get right (name matching and reports on what got matched), although personally I think it's the more important one. With a huge existing collection, entering data manually might not be feasible at all, which means switching to Ryot might be impossible in that case.

IgnisDa · 2023-05-31T02:56:00Z

The matching idea is pretty good and I'd like to implement it for Goodreads too. I have an idea for an import view in mind too. I'll draw it out in Excalidraw and upload it here later.

IgnisDa · 2023-05-31T04:19:03Z

Here is a mockup

(You can also load this file into Excalidraw directly and edit it there to make changes)

Mega-Volti · 2023-05-31T10:32:41Z

Nice, I really like Excalidraw!

It probably needs either more columns for more types of IDs (IGDB for example) or maybe a column "ID type" and another "ID" so that any type of ID can be handled in a flexible manner, so the format doesn't need to change when new types of tracking are added? Or maybe some type of linked tables since one item should probably be able to have multiple IDs (e.g. both a IMDB and TVDB ID).

I like the "is perfect match" part. Maybe instead of having it boolean, make it "match type" with 3 options, "perfect", "best effort", "none". For practical usability, it would also be awesome to have the list easily copy&pastable so that the "best effort" ones can be checked manually and the "none" ones can be added manually.

The ideal most awesome and best solution would of course be a kind of dropdown menu that shows all matches (for the fuzzy name match) so if the ideal one wasn't found, the correct one can be selected manually with the dropdown menu right in the import section. But that's probably too complicated to offer a UI for that but being able to copy&paste the names into some spreadsheet to go through them manually should be enough.

The way Simkl handles it is that after the import, it saves a CSV file with all errors, which then is a great reference for the manual checks / corrections. Not sure whether that is feasible for Ryot, though.

I also just checked the Simkl export. They use the following columns:

SIMKL_ID
Title
Type (mvie, tv show etc.)
Year (release year)
Watchlist (completed, currently watching, dropped etc.)
LastEpWatched (episode number - since it's just CSV it doesn't do the accordion thing but only saves the last ep)
WatchedDate
Rating
Memo
TVDB (ID, blank if not a show)
TMDB (ID, blank if not a movie)
IMDB (ID, bank if not a movie)

IgnisDa · 2023-05-31T10:39:33Z

Would it be possible for you to share this SIMkl export with me? My email ID is in my github profile.

Mega-Volti · 2023-05-31T13:34:57Z

Sure, no problem at all. I assume you don't need my full history for everything, a few lines for TV shows, movies and anime (that's what Simkl tracks, sadly no games) are enough, right? I trimmed it down to examples which are representative I think, this easily fits into a comment here:

SIMKL_ID;Title;Type;Year;Watchlist;LastEpWatched;WatchedDate;Rating;Memo;TVDB;TMDB;IMDB
53080;Finding Nemo;movie;2003;completed;;2020-06-17 08:54;;;;12;tt0266543
20953;Sherlock;tv show;2010;completed;s4e3;2020-06-17 08:55;;;176941;;tt1475582
1188702;Arcane;tv show;2021;watching;s1e9;2020-06-17 08:56;;;371028;;tt11126994
1155406;She-Hulk: Attorney at Law;tv show;2022;dropped;s1e3;2022-11-01 12:42;;;368613;;tt10857160
580212;Marvel's Inhumans;tv show;2017;completed;s1e8;2020-06-20 12:05;;;320558;394623;tt4154858
1377443;Cyberpunk: Edgerunners;anime;2022;completed;s1e10;2022-12-17 22:48;;;;;tt12590266

While their export is only free for regular users, their import is free right away for everyone. The CSV import (which is the best I've found for any media tracking tool so far, probably a good role model even for a JSON import) is here: https://simkl.com/apps/import/csv/

IgnisDa · 2023-05-31T14:01:34Z

It would be great if you could share the entire export so that i could test this feature to the fullest.

IgnisDa · 2023-06-09T06:46:00Z

@Mega-Volti should the json export be a file or an endpoint?

Mega-Volti · 2023-06-09T08:44:12Z

Personally, I'd be perfectly happy with a file. The main use for the export would be backup/storage of data, so being able to save a file and put that somewhere safe is the main use case I had in mind.
I am not a web dev, though, so I have no idea whether an endpoint might enable other use cases that I'm currently not thinking of.

IgnisDa · 2023-06-09T08:58:34Z

I think I'll go with an endpoint since the response could be written to a file if needed and also be used for live data imports somewhere else.

IgnisDa · 2023-07-23T16:24:49Z

@Mega-Volti I have decided to go with a json import to allow for nested data. I have come up with the following import format:

interface Review {
  text?: string;
  /** Score out of 100 */
  rating?: number;
  spoiler?: boolean
}

interface Seen {
  ended_on?: Date;
  /** if a show, which season was seen? */
  show_season_number?: number;
  /** if a show, which episode was seen? */
  show_episode_number?: number;
  /** if a podcast, which episode was seen? */
  podcast_episode_number?: number;
}

export interface ImportItem {
  source_id: number; // In this case, this will just be the serial number (index in array)
  source: 'anilist' | 'audible' | 'google_books' | 'igdb' | 'itunes' | 'listennotes' | 'openlibrary' | 'tmdb';
  identifier: string;
  lot: 'audio_book' | 'anime' | 'book' | 'podcast' | 'manga' | 'movie' | 'show' | 'video_game';
  collections: string[];
  reviews: Review[];
  seen_history: Seen[];
}

Does this look good to you?

Reason I want to go with structure is because Ryot already knows how to import this format:

ryot/apps/backend/src/importer/mod.rs

Lines 107 to 116 in 34012fe

    
           #[derive(Debug)] 
        
           pub struct ImportItem { 
        
               source_id: String, 
        
               lot: MetadataLot, 
        
               source: MetadataSource, 
        
               identifier: ImportItemIdentifier, 
        
               seen_history: Vec<ImportItemSeen>, 
        
               reviews: Vec<ImportItemRating>, 
        
               collections: Vec<String>, 
        
           }

Mega-Volti · 2023-07-23T18:58:16Z

That does seem reasonable, yes. It also looks like a CSV could be converted into this JSON structure quite easily. It might even be possible to offer both imports, by simply having a small CSV->JSON converter run beforehand?

IgnisDa · 2023-07-23T23:58:29Z

I don't think I will provide that since it is dead easy to convert b/w those two formats. It would be better if we create a discussion with those scripts.

IgnisDa · 2023-07-24T07:32:51Z

@Mega-Volti I did not include a dedicated UI since it would be too big a feature. If the definition of the expected types are not enough, you can read the comments here for full explanations:

ryot/apps/backend/src/models.rs

Lines 517 to 572 in 1dea787

    
           #[derive(Debug, Serialize, Deserialize, Clone, Type)] 
        
           pub struct ImportOrExportItemSeen { 
        
               /// The timestamp when started watching. 
        
               pub started_on: Option<DateTimeUtc>, 
        
               /// The timestamp when finished watching. 
        
               pub ended_on: Option<DateTimeUtc>, 
        
               /// If for a show, the season which was seen. 
        
               pub show_season_number: Option<i32>, 
        
               /// If for a show, the episode which was seen. 
        
               pub show_episode_number: Option<i32>, 
        
               /// If for a podcast, the episode which was seen. 
        
               pub podcast_episode_number: Option<i32>, 
        
           } 
        
           #[derive(Debug, Serialize, Deserialize, Clone, Type)] 
        
           pub struct ImportOrExportItemReview { 
        
               /// The date the review was posted. 
        
               pub date: Option<DateTimeUtc>, 
        
               /// Whether to mark the review as a spoiler. Defaults to false. 
        
               pub spoiler: Option<bool>, 
        
               /// Actual text for the review. 
        
               pub text: Option<String>, 
        
           } 
        
           #[derive(Debug, Serialize, Deserialize, Clone, Type)] 
        
           pub struct ImportOrExportItemRating { 
        
               /// Data about the review. 
        
               pub review: Option<ImportOrExportItemReview>, 
        
               /// The score of the review. 
        
               pub rating: Option<Decimal>, 
        
               /// If for a show, the season for which this review was for. 
        
               pub show_season_number: Option<i32>, 
        
               /// If for a show, the episode for which this review was for. 
        
               pub show_episode_number: Option<i32>, 
        
               /// If for a podcast, the episode for which this review was for. 
        
               pub podcast_episode_number: Option<i32>, 
        
           } 
        
           /// Details about a specific media item that needs to be imported. 
        
           #[derive(Debug, Serialize, Deserialize, Clone, Type)] 
        
           pub struct ImportOrExportItem<T> { 
        
               /// An string to help identify it in the original source. 
        
               pub source_id: String, 
        
               /// The type of media. 
        
               pub lot: MetadataLot, 
        
               /// The source of media. 
        
               pub source: MetadataSource, 
        
               /// The provider identifier. For eg: TMDB-ID, Openlibrary ID and so on. 
        
               pub identifier: T, 
        
               /// The seen history for the user. 
        
               pub seen_history: Vec<ImportOrExportItemSeen>, 
        
               /// The review history for the user. 
        
               pub reviews: Vec<ImportOrExportItemRating>, 
        
               /// The collections to add this media to. 
        
               pub collections: Vec<String>, 
        
           }

Let me know how your importing goes when you're done!

Mega-Volti · 2023-07-24T22:55:38Z

I don't think I will provide that since it is dead easy to convert b/w those two formats. It would be better if we create a discussion with those scripts.

Good point, I'll be looking forward to having a nice script to do that, manually converting my giant CSV into a JSON file would be quite a hassle :)

I tried testing the import with sample video games. To see an example of the format, I added a test game ("God of War (2018)" in my case) within Ryot and exported it, to see the exported JSON. I then wanted to just copy that and add my games-to-import. This is where I hit the first roadblock: I only see IDs there, no field for the game name. The point of importing would be to import with game names, I don't have IGDB IDs for all these games. How exactly would an import JSON be formatted in order to allow importing a list of games, of which I have the name as wel as the completion date, but nothing else?

dimon222 · 2023-07-24T23:12:11Z

@Mega-Volti
I saw same with other types of content and the problem is that it's not 1 to 1 matching, and sometimes the game name might differ (and vs &, extra spaces etc).
The way I got around matching most of stuff (TV shows in my case) is by making search requests to find closest match then record identifier, collect list of items that didn't match (for manual inspection). Then the result identifiers were compiled with respective contents and added/imported only then.

Of course if this was possible on backend it would be better , but then resolution of conflicts isn't straightforward

Mega-Volti · 2023-07-24T23:26:48Z

Right, having that functionality built in would be amazing. Exactly as you describe it, if there is a perfect match then display it, otherwise display the cloesest match with some sort of marker that is was only a close match, and enable picking the game from the regular search UI in all other cases.

How did you match your shows to get all the IDs? My list is very, very long, doing that manually is not feasible, and I don't know of any UI that does it for game names.

And without being able to provide game IDs, I won't be able to use the importer properly, right?

IgnisDa · 2023-07-24T23:51:59Z

The UI you are describing would be pretty complicated and as such would take me a lot of time to build; something I am loath to do since this will not be a highly used feature. I can think of two solutions:

Bite the bullet and do it manually. This is what I did back when I started using MediaTracker.
Write a script that takes the game name, searches from the IGDB API and matches it to the first result.

dimon222 · 2023-07-25T01:01:29Z

How did you match your shows to get all the IDs? My list is very, very long, doing that manually is not feasible, and I don't know of any UI that does it for game names.

@Mega-Volti
Same way as guy above suggested - try to match to closest possible name among top 10 search results. Typically its first result. If there's mismatch I collect separate list that I inspect manually (and this list is very small, less than 5% of all items I import which is hundreds).

When I say search - I searched the Ryot graphql API, but I suppose its probably a very rough idea and could instead be done at source IGDB/TMDB and etc

This was referenced Jun 6, 2023

Import reports #27

Closed

Json export #127

Merged

IgnisDa mentioned this issue Jun 17, 2023

Render markdown on the backend #145

Merged

IgnisDa mentioned this issue Jul 4, 2023

Allow import from Movary #166

Closed

IgnisDa mentioned this issue Jul 24, 2023

Generic imports #204

Merged

IgnisDa closed this as completed in #204 Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: CSV import/export #64

[Feature Request]: CSV import/export #64

Mega-Volti commented May 26, 2023 •

edited

Loading

IgnisDa commented May 29, 2023 •

edited

Loading

Mega-Volti commented May 29, 2023 •

edited

Loading

IgnisDa commented May 30, 2023

Mega-Volti commented May 30, 2023

IgnisDa commented May 30, 2023

Mega-Volti commented May 30, 2023 •

edited

Loading

IgnisDa commented May 31, 2023

IgnisDa commented May 31, 2023

Mega-Volti commented May 31, 2023 •

edited

Loading

IgnisDa commented May 31, 2023

Mega-Volti commented May 31, 2023 •

edited

Loading

IgnisDa commented May 31, 2023

IgnisDa commented Jun 9, 2023

Mega-Volti commented Jun 9, 2023

IgnisDa commented Jun 9, 2023

IgnisDa commented Jul 23, 2023 •

edited

Loading

Mega-Volti commented Jul 23, 2023

IgnisDa commented Jul 23, 2023

IgnisDa commented Jul 24, 2023

Mega-Volti commented Jul 24, 2023

dimon222 commented Jul 24, 2023 •

edited

Loading

Mega-Volti commented Jul 24, 2023

IgnisDa commented Jul 24, 2023

dimon222 commented Jul 25, 2023 •

edited

Loading

[Feature Request]: CSV import/export #64

[Feature Request]: CSV import/export #64

Comments

Mega-Volti commented May 26, 2023 • edited Loading

IgnisDa commented May 29, 2023 • edited Loading

Mega-Volti commented May 29, 2023 • edited Loading

IgnisDa commented May 30, 2023

Mega-Volti commented May 30, 2023

IgnisDa commented May 30, 2023

Mega-Volti commented May 30, 2023 • edited Loading

IgnisDa commented May 31, 2023

IgnisDa commented May 31, 2023

Mega-Volti commented May 31, 2023 • edited Loading

IgnisDa commented May 31, 2023

Mega-Volti commented May 31, 2023 • edited Loading

IgnisDa commented May 31, 2023

IgnisDa commented Jun 9, 2023

Mega-Volti commented Jun 9, 2023

IgnisDa commented Jun 9, 2023

IgnisDa commented Jul 23, 2023 • edited Loading

Mega-Volti commented Jul 23, 2023

IgnisDa commented Jul 23, 2023

IgnisDa commented Jul 24, 2023

Mega-Volti commented Jul 24, 2023

dimon222 commented Jul 24, 2023 • edited Loading

Mega-Volti commented Jul 24, 2023

IgnisDa commented Jul 24, 2023

dimon222 commented Jul 25, 2023 • edited Loading

Mega-Volti commented May 26, 2023 •

edited

Loading

IgnisDa commented May 29, 2023 •

edited

Loading

Mega-Volti commented May 29, 2023 •

edited

Loading

Mega-Volti commented May 30, 2023 •

edited

Loading

Mega-Volti commented May 31, 2023 •

edited

Loading

Mega-Volti commented May 31, 2023 •

edited

Loading

IgnisDa commented Jul 23, 2023 •

edited

Loading

dimon222 commented Jul 24, 2023 •

edited

Loading

dimon222 commented Jul 25, 2023 •

edited

Loading