Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportableUniqueObject take long time (on the first load) #502

Open
loicleser opened this issue Apr 11, 2024 · 4 comments
Open

ImportableUniqueObject take long time (on the first load) #502

loicleser opened this issue Apr 11, 2024 · 4 comments

Comments

@loicleser
Copy link

loicleser commented Apr 11, 2024

Hello,

I'm testing your solution in my app. Everything looks really good and can be very helpful for me.

However, I'm encountering issues with the loading of the data. I'm trying to load a large set of movies from a JSON (40,000 movies). Everything works well with ImportableObject, but it duplicates the objects on each download. So, I implemented ImportableUniqueObject, and it takes a very long time to save the data.

With ImportableObject, it takes 1 or 2 seconds, and with ImportableUniqueObject, it takes 4 to 5 minutes.

Am I doing something wrong?

Here is my code:

class CoreDataManager {
    
    static let shared = CoreDataManager()
    
    let dataStack = DataStack(
        xcodeModelName: "___",
        migrationChain: []
    )
    
    init() {
        do {
            try dataStack.addStorageAndWait(
                SQLiteStore(
                    fileName: "MyDataModel.sqlite",
                    configuration: "Default",
                    localStorageOptions: .recreateStoreOnModelMismatch
                )
            )
        } catch {
            print("Error configuring CoreStore: \(error)")
        }
    }
}
func downloadMovies() {
    let url = URL(string: "MYJSONURL")!
    let task = URLSession.shared.dataTask(with: url) { (data, response, error) in
        guard let data = data, error == nil else {
            print("error: \(error)")
            return
        }

        do {
            let decoder = JSONDecoder()
            let movies = try decoder.decode([MovieJSON].self, from: data)
            self.dataStack.perform(
                asynchronous: { (transaction) -> Void in
                    try! transaction.importUniqueObjects(
                        Into<Movie>(),
                        sourceArray: movies
                    )
                },
                completion: { (result) -> Void in
                    switch result {
                    case .success:
                        print("success")
                    case .failure(let error):
                        print("error: \(error)")
                    }
                }
            )
        } catch {
            print("error: \(error)")
        }
    }
    task.resume()
}

class Movie: NSManagedObject, ImportableUniqueObject {
    static func uniqueID(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws -> String? {
        return source.name
    }
    
    func update(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws {
        
    }
    
    func didInsert(from source: ImportSource, in transaction: BaseDataTransaction) throws {
        self.name = source.name
    }
    
    typealias UniqueIDType = String
    
    static var uniqueIDKeyPath: String {
        return #keyPath(Movie.name)
    }
    
    typealias ImportSource = MovieJSON
    
}

Thank you for your help.

@JohnEstropia
Copy link
Owner

That sounds about reasonable, since uniquing requires a fetch operation in addition to the write operations. Note that importUniqueObjects() already optimizes this for you by doing only one fetch operation to map existing unique IDs before doing the write operations, but a large data is a large data. I suspect that the performance bottlenecks you're seeing is due to the high volume of memory use and disk IO being done in one whole transaction.

My suggestion would be to split the decoded movies array into batches. Something like this:

extension Array {
    func chunked(into size: Int) -> [[Element]] {
        return stride(from: 0, to: count, by: size).map {
            Array(self[$0 ..< Swift.min($0 + size, count)])
        }
    }
}
...

            let movies = try decoder.decode([MovieJSON].self, from: data)
            for chunk in movies.chunked(into: 100) { // adjust depending on how large each object import is
                self.dataStack.perform(
                    asynchronous: { (transaction) -> Void in
                        try! transaction.importUniqueObjects(
                            Into<Movie>(),
                            sourceArray: chunk
                        )
                    },
                    completion: { (result) -> Void in
                        switch result {
                        case .success:
                            print("success")
                            // Merge your completion blocks somehow with something like a DispatchGroup
                        case .failure(let error):
                            print("error: \(error)")
                        }
                    }
                )
            }
...

@loicleser
Copy link
Author

Thank you for your response, I will conduct a test, but since I have other data to load, I don't think the solution will be sufficient. I need it to remain very fast. I will try to load the existing data in bulk and filter the new data before saving it to avoid using ImportableUniqueObject.

Thanks

@JohnEstropia
Copy link
Owner

I will try to load the existing data in bulk and filter the new data before saving it

If you check the implementation of importUniqueObjects(...), I think that's already what it does for you.

Let me know if you find an approach that works for your case so I can consider adopting it internally.

@loicleser
Copy link
Author

loicleser commented Apr 12, 2024

Hello @JohnEstropia ,

I've started running some tests, but I've encountered an issue with the following code:

class Movie: NSManagedObject, ImportableUniqueObject {
    static func uniqueID(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws -> String? {
        return source.name
    }
    
    func shouldUpdate(from source: ImportSource, in transaction: BaseDataTransaction) -> Bool {
        return false
    }
    
    func shouldInsert(from source: ImportSource, in transaction: BaseDataTransaction) -> Bool {
        return false
    }
    
    func update(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws {
        
    }
    
    func didInsert(from source: ImportSource, in transaction: BaseDataTransaction) throws {
        //self.name = source.name
    }
    
    typealias UniqueIDType = String
    
    static var uniqueIDKeyPath: String {
        return #keyPath(Movie.name)
    }
    
    typealias ImportSource = MovieJSON
}

Even though I've set shouldUpdate and shouldInsert to return false, sources are still being inserted during didInsert. Is this behavior normal? I'd like to have precise control over when objects are added or updated.

Regarding my problem, I've come up with an initial, faster solution:


let decoder = JSONDecoder()
let movies = try decoder.decode([MovieJSON].self, from: data)
DispatchQueue.main.async {
    do {
        let existingMovies = try self.dataStack.fetchAll(From<Movie>())
        let existingMovieNames = Set(existingMovies.map { $0.name })
        let filteredMovies = movies.filter { !existingMovieNames.contains($0.name) }

        self.dataStack.perform(
            asynchronous: { (transaction) -> Void in
                try! transaction.importUniqueObjects(
                    Into<Movie>(),
                    sourceArray: movies
                )
            },
            completion: { (result) -> Void in
                switch result {
                case .success:
                    print("success")
                case .failure(let error):
                    print("error: \(error)")
                }
            }
        )
    } catch {
        print("error: \(error)")
    }
}

However, in some cases, I will need to update the names of some movies, so this solution doesn't completely resolve my problem.

Thank you for your assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants