Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: CSV import/export #64

Closed
Mega-Volti opened this issue May 26, 2023 · 24 comments · Fixed by #204
Closed

[Feature Request]: CSV import/export #64

Mega-Volti opened this issue May 26, 2023 · 24 comments · Fixed by #204

Comments

@Mega-Volti
Copy link

Mega-Volti commented May 26, 2023

For general import/export functionalities, CSV is the most flexible. Ultimately, most data from other services can be converted to CSV format (more of less easily), making it a near universal import/export format.

CSV import is the harder of the two I assume. Features ideal for this are:

  • Well defined input format, where name, IGDB/TVDB/IMDB/Goodreads/etc. IDs, data watched/read/played/etc., status etc. can be provided
  • Only either name or ID should be mandatory, if one is provided the other should be looked up automatically, and for other fields, sane defaults should be set (e.g. no date watched set - import with todays date)
  • Name matching necessarily needs to be fuzzy - although I think the DB APIs do this already, first match to name should be fine
  • There should be some sort of reasonably easy check for name matching, to see whether the fuzzy match got the right item

As example, I think Simkl (https://simkl.com/) does this nearly perfect. It does name matching, presents a tabular-style view for easy manual checking of important items and it creates a CSV error report for items that couldn't be matched which enables reasonably easy manual import of the rest.

CSV export on the other hand should be reasonably easy I think. That should just be a dump of all relevant fields in the database, written to a text file, using the same column headers as the import functionality. Both Simkl and Goodreads have excellent CSV export functionality which might be a good template.

@IgnisDa
Copy link
Owner

IgnisDa commented May 29, 2023

I agree with needing import/export but am not sure about the export format being CSV since it does not support nested data structures. I think JSON would be better.

@Mega-Volti
Copy link
Author

Mega-Volti commented May 29, 2023

I think CSV is the easiest and most universal but yes, if some functionality actually does require nested data structures, then creating a JSON file from a CSV isn't all that complicated anyway, so either will work.

What use case do you see for nested data structures? Both Goodreads and Simkl offer CSV exports (which makes CSV kind of a universal import format as well), so at least for media tracking there doesn't seem to be a need for anything more complicated.

@IgnisDa
Copy link
Owner

IgnisDa commented May 30, 2023

What use case do you see for nested data structures?

For example a media item can be added to the seen history multiple times. How can I represent this in a CSV?

@Mega-Volti
Copy link
Author

I think Simkl "solves" this by only saving the latest view date in the CSV, plus potentially a view count. But you are correct, the view history (dates) for the other times would be lost in that case, while they could be preserved within a JSON structure. Simkl also offers JSON export functionality which preserves all information.

It's certainly a tradeoff, complete information vs simplicity of the format. Since either one is text based and it's easy enough to convert one to the other manually if necessary, I think either one would be great options.

@IgnisDa
Copy link
Owner

IgnisDa commented May 30, 2023

Cool. I think I'll go with JSON exports then.

@Mega-Volti
Copy link
Author

Mega-Volti commented May 30, 2023

Awesome :) Imports as well? That's probably the harder part to get right (name matching and reports on what got matched), although personally I think it's the more important one. With a huge existing collection, entering data manually might not be feasible at all, which means switching to Ryot might be impossible in that case.

@IgnisDa
Copy link
Owner

IgnisDa commented May 31, 2023

The matching idea is pretty good and I'd like to implement it for Goodreads too. I have an idea for an import view in mind too. I'll draw it out in Excalidraw and upload it here later.

@IgnisDa
Copy link
Owner

IgnisDa commented May 31, 2023

Here is a mockup

Untitled-2023-02-06-1728

(You can also load this file into Excalidraw directly and edit it there to make changes)

@Mega-Volti
Copy link
Author

Mega-Volti commented May 31, 2023

Nice, I really like Excalidraw!

It probably needs either more columns for more types of IDs (IGDB for example) or maybe a column "ID type" and another "ID" so that any type of ID can be handled in a flexible manner, so the format doesn't need to change when new types of tracking are added? Or maybe some type of linked tables since one item should probably be able to have multiple IDs (e.g. both a IMDB and TVDB ID).

I like the "is perfect match" part. Maybe instead of having it boolean, make it "match type" with 3 options, "perfect", "best effort", "none". For practical usability, it would also be awesome to have the list easily copy&pastable so that the "best effort" ones can be checked manually and the "none" ones can be added manually.

The ideal most awesome and best solution would of course be a kind of dropdown menu that shows all matches (for the fuzzy name match) so if the ideal one wasn't found, the correct one can be selected manually with the dropdown menu right in the import section. But that's probably too complicated to offer a UI for that but being able to copy&paste the names into some spreadsheet to go through them manually should be enough.

The way Simkl handles it is that after the import, it saves a CSV file with all errors, which then is a great reference for the manual checks / corrections. Not sure whether that is feasible for Ryot, though.

I also just checked the Simkl export. They use the following columns:

  • SIMKL_ID
  • Title
  • Type (mvie, tv show etc.)
  • Year (release year)
  • Watchlist (completed, currently watching, dropped etc.)
  • LastEpWatched (episode number - since it's just CSV it doesn't do the accordion thing but only saves the last ep)
  • WatchedDate
  • Rating
  • Memo
  • TVDB (ID, blank if not a show)
  • TMDB (ID, blank if not a movie)
  • IMDB (ID, bank if not a movie)

@IgnisDa
Copy link
Owner

IgnisDa commented May 31, 2023

Would it be possible for you to share this SIMkl export with me? My email ID is in my github profile.

@Mega-Volti
Copy link
Author

Mega-Volti commented May 31, 2023

Sure, no problem at all. I assume you don't need my full history for everything, a few lines for TV shows, movies and anime (that's what Simkl tracks, sadly no games) are enough, right? I trimmed it down to examples which are representative I think, this easily fits into a comment here:

SIMKL_ID;Title;Type;Year;Watchlist;LastEpWatched;WatchedDate;Rating;Memo;TVDB;TMDB;IMDB
53080;Finding Nemo;movie;2003;completed;;2020-06-17 08:54;;;;12;tt0266543
20953;Sherlock;tv show;2010;completed;s4e3;2020-06-17 08:55;;;176941;;tt1475582
1188702;Arcane;tv show;2021;watching;s1e9;2020-06-17 08:56;;;371028;;tt11126994
1155406;She-Hulk: Attorney at Law;tv show;2022;dropped;s1e3;2022-11-01 12:42;;;368613;;tt10857160
580212;Marvel's Inhumans;tv show;2017;completed;s1e8;2020-06-20 12:05;;;320558;394623;tt4154858
1377443;Cyberpunk: Edgerunners;anime;2022;completed;s1e10;2022-12-17 22:48;;;;;tt12590266

While their export is only free for regular users, their import is free right away for everyone. The CSV import (which is the best I've found for any media tracking tool so far, probably a good role model even for a JSON import) is here: https://simkl.com/apps/import/csv/

@IgnisDa
Copy link
Owner

IgnisDa commented May 31, 2023

It would be great if you could share the entire export so that i could test this feature to the fullest.

This was referenced Jun 6, 2023
@IgnisDa
Copy link
Owner

IgnisDa commented Jun 9, 2023

@Mega-Volti should the json export be a file or an endpoint?

@Mega-Volti
Copy link
Author

Personally, I'd be perfectly happy with a file. The main use for the export would be backup/storage of data, so being able to save a file and put that somewhere safe is the main use case I had in mind.
I am not a web dev, though, so I have no idea whether an endpoint might enable other use cases that I'm currently not thinking of.

@IgnisDa
Copy link
Owner

IgnisDa commented Jun 9, 2023

I think I'll go with an endpoint since the response could be written to a file if needed and also be used for live data imports somewhere else.

@IgnisDa
Copy link
Owner

IgnisDa commented Jul 23, 2023

@Mega-Volti I have decided to go with a json import to allow for nested data. I have come up with the following import format:

interface Review {
  text?: string;
  /** Score out of 100 */
  rating?: number;
  spoiler?: boolean
}

interface Seen {
  ended_on?: Date;
  /** if a show, which season was seen? */
  show_season_number?: number;
  /** if a show, which episode was seen? */
  show_episode_number?: number;
  /** if a podcast, which episode was seen? */
  podcast_episode_number?: number;
}

export interface ImportItem {
  source_id: number; // In this case, this will just be the serial number (index in array)
  source: 'anilist' | 'audible' | 'google_books' | 'igdb' | 'itunes' | 'listennotes' | 'openlibrary' | 'tmdb';
  identifier: string;
  lot: 'audio_book' | 'anime' | 'book' | 'podcast' | 'manga' | 'movie' | 'show' | 'video_game';
  collections: string[];
  reviews: Review[];
  seen_history: Seen[];
}

Does this look good to you?

Reason I want to go with structure is because Ryot already knows how to import this format:

#[derive(Debug)]
pub struct ImportItem {
source_id: String,
lot: MetadataLot,
source: MetadataSource,
identifier: ImportItemIdentifier,
seen_history: Vec<ImportItemSeen>,
reviews: Vec<ImportItemRating>,
collections: Vec<String>,
}

@Mega-Volti
Copy link
Author

That does seem reasonable, yes. It also looks like a CSV could be converted into this JSON structure quite easily. It might even be possible to offer both imports, by simply having a small CSV->JSON converter run beforehand?

@IgnisDa
Copy link
Owner

IgnisDa commented Jul 23, 2023

I don't think I will provide that since it is dead easy to convert b/w those two formats. It would be better if we create a discussion with those scripts.

@IgnisDa
Copy link
Owner

IgnisDa commented Jul 24, 2023

@Mega-Volti I did not include a dedicated UI since it would be too big a feature. If the definition of the expected types are not enough, you can read the comments here for full explanations:

#[derive(Debug, Serialize, Deserialize, Clone, Type)]
pub struct ImportOrExportItemSeen {
/// The timestamp when started watching.
pub started_on: Option<DateTimeUtc>,
/// The timestamp when finished watching.
pub ended_on: Option<DateTimeUtc>,
/// If for a show, the season which was seen.
pub show_season_number: Option<i32>,
/// If for a show, the episode which was seen.
pub show_episode_number: Option<i32>,
/// If for a podcast, the episode which was seen.
pub podcast_episode_number: Option<i32>,
}
#[derive(Debug, Serialize, Deserialize, Clone, Type)]
pub struct ImportOrExportItemReview {
/// The date the review was posted.
pub date: Option<DateTimeUtc>,
/// Whether to mark the review as a spoiler. Defaults to false.
pub spoiler: Option<bool>,
/// Actual text for the review.
pub text: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, Clone, Type)]
pub struct ImportOrExportItemRating {
/// Data about the review.
pub review: Option<ImportOrExportItemReview>,
/// The score of the review.
pub rating: Option<Decimal>,
/// If for a show, the season for which this review was for.
pub show_season_number: Option<i32>,
/// If for a show, the episode for which this review was for.
pub show_episode_number: Option<i32>,
/// If for a podcast, the episode for which this review was for.
pub podcast_episode_number: Option<i32>,
}
/// Details about a specific media item that needs to be imported.
#[derive(Debug, Serialize, Deserialize, Clone, Type)]
pub struct ImportOrExportItem<T> {
/// An string to help identify it in the original source.
pub source_id: String,
/// The type of media.
pub lot: MetadataLot,
/// The source of media.
pub source: MetadataSource,
/// The provider identifier. For eg: TMDB-ID, Openlibrary ID and so on.
pub identifier: T,
/// The seen history for the user.
pub seen_history: Vec<ImportOrExportItemSeen>,
/// The review history for the user.
pub reviews: Vec<ImportOrExportItemRating>,
/// The collections to add this media to.
pub collections: Vec<String>,
}

Let me know how your importing goes when you're done!

@Mega-Volti
Copy link
Author

I don't think I will provide that since it is dead easy to convert b/w those two formats. It would be better if we create a discussion with those scripts.

Good point, I'll be looking forward to having a nice script to do that, manually converting my giant CSV into a JSON file would be quite a hassle :)

I tried testing the import with sample video games. To see an example of the format, I added a test game ("God of War (2018)" in my case) within Ryot and exported it, to see the exported JSON. I then wanted to just copy that and add my games-to-import. This is where I hit the first roadblock: I only see IDs there, no field for the game name. The point of importing would be to import with game names, I don't have IGDB IDs for all these games. How exactly would an import JSON be formatted in order to allow importing a list of games, of which I have the name as wel as the completion date, but nothing else?

@dimon222
Copy link

dimon222 commented Jul 24, 2023

@Mega-Volti
I saw same with other types of content and the problem is that it's not 1 to 1 matching, and sometimes the game name might differ (and vs &, extra spaces etc).
The way I got around matching most of stuff (TV shows in my case) is by making search requests to find closest match then record identifier, collect list of items that didn't match (for manual inspection). Then the result identifiers were compiled with respective contents and added/imported only then.

Of course if this was possible on backend it would be better , but then resolution of conflicts isn't straightforward

@Mega-Volti
Copy link
Author

Right, having that functionality built in would be amazing. Exactly as you describe it, if there is a perfect match then display it, otherwise display the cloesest match with some sort of marker that is was only a close match, and enable picking the game from the regular search UI in all other cases.

How did you match your shows to get all the IDs? My list is very, very long, doing that manually is not feasible, and I don't know of any UI that does it for game names.

And without being able to provide game IDs, I won't be able to use the importer properly, right?

@IgnisDa
Copy link
Owner

IgnisDa commented Jul 24, 2023

The UI you are describing would be pretty complicated and as such would take me a lot of time to build; something I am loath to do since this will not be a highly used feature. I can think of two solutions:

  • Bite the bullet and do it manually. This is what I did back when I started using MediaTracker.
  • Write a script that takes the game name, searches from the IGDB API and matches it to the first result.

@dimon222
Copy link

dimon222 commented Jul 25, 2023

How did you match your shows to get all the IDs? My list is very, very long, doing that manually is not feasible, and I don't know of any UI that does it for game names.

@Mega-Volti
Same way as guy above suggested - try to match to closest possible name among top 10 search results. Typically its first result. If there's mismatch I collect separate list that I inspect manually (and this list is very small, less than 5% of all items I import which is hundreds).

When I say search - I searched the Ryot graphql API, but I suppose its probably a very rough idea and could instead be done at source IGDB/TMDB and etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants