Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize importer #50

Closed
dabreegster opened this issue Apr 25, 2020 · 7 comments · Fixed by #55
Closed

Parallelize importer #50

dabreegster opened this issue Apr 25, 2020 · 7 comments · Fixed by #55
Labels
good first issue Good for newcomers

Comments

@dabreegster
Copy link
Collaborator

The importer tool converts one map at a time. Easy opportunity to make use of all CPUs. Simple change in one layer of the code. Makes logging harder to read, but eh. Only trick is that ensure_popdat_exists has to be called before the parallelization starts.

@dabreegster dabreegster added the good first issue Good for newcomers label Apr 25, 2020
@JavedNissar
Copy link
Contributor

This is the primary relevant file for this issue right?

abstreet/importer/src/main.rs 

Are there any others I should be mindful of as well? @dabreegster

@dabreegster
Copy link
Collaborator Author

That's the main one. The other modules in importer are worth a look too, but hopefully they're pretty simple. I think the only easy opportunity for parallelization is the for name in names loop.

The only potential race conditions would be attempting to produce the same file at the same time. This could happen by two callers trying to do seattle::ensure_popdat_exists, since this first fully builds the huge_seattle map and then produces a file from it. I guess within each of the per-city modules, there's an input() function that downloads all input files needed -- we also don't want to duplicate that work at the same time. Ah, and the utility download always produces tmp_output before renaming; probably have to deal with that. Maybe this is a little more complex than I first estimated.

(By the way, if you know of a simple way to express a dependency graph of tasks and be idempotent / detect work that's already been done, there may be a much better way to express everything in importer.)

@JavedNissar
Copy link
Contributor

JavedNissar commented Apr 30, 2020

Okay, so from looking at it seems like the only jobs that can be run in parallel within for name in names would be job.raw_to_map,job.scenario, and job.scenario_everyone. So, in line with that, I should only spawn threads for those jobs and leave job.osm_to_raw to run sequentially given those concerns you've expressed about the input() function. So, with that in mind, the solution I am considering is the following:

    let mut was_ensure_popdate_exists_called = false;
    for name in names {
        //Leave job.osm_to_raw alone
        if job.osm_to_raw {
            match job.city.as_ref() {
                "austin" => austin::osm_to_raw(&name),
                "los_angeles" => los_angeles::osm_to_raw(&name),
                "seattle" => seattle::osm_to_raw(&name),
                x => panic!("Unknown city {}", x),
            }
        }

        //Spawn a thread to manage this
        if job.raw_to_map {
            utils::raw_to_map(&name, job.use_fixes);
        }
 
        if job.scenario {
            assert_eq!(job.city, "seattle");
            //make sure ensure_popdat_exists is only called once
            if !was_ensure_popdat_exists{
                seattle::ensure_popdat_exists(job.use_fixes);
                was_ensure_popdat_exists = true;
            }

            //Spawn a thread to manage this
            let mut timer = abstutil::Timer::new(format!("Scenario for {}", name));
            let map = map_model::Map::new(abstutil::path_map(&name), job.use_fixes, &mut timer);
            soundcast::make_weekday_scenario(&map, &mut timer).save();
        }

        if job.scenario_everyone {
            assert_eq!(job.city, "seattle");
            if !was_ensure_popdat_exists_called{
                seattle::ensure_popdat_exists(job.use_fixes);
            }

            //Spawn a thread to manage this
            let mut timer = abstutil::Timer::new(format!("Scenario for {}", name));
            let map = map_model::Map::new(abstutil::path_map(&name), job.use_fixes, &mut timer);
            soundcast::make_weekday_scenario_with_everyone(&map, &mut timer).save();
        }
    }
    //Wait for all threads to complete

@dabreegster
Copy link
Collaborator Author

This should almost work. The only problem is that ensure_popdat_exists may call raw_to_map, so there may be a case when two threads are both doing raw_to_map(huge_seattle) at the same time. That's technically correct, provided writing the output file is atomic.

I haven't done much parallelism in Rust before, but it could be nice to use the type system to statically enforce correctness. What if we had some marker type that does not implement Sync to prevent accidentally calling osm_to_raw concurrently?

Also, the generic dependency graph executor I was thinking of is https://github.com/salsa-rs/salsa. Probably overkill for this tool, though.

@JavedNissar
Copy link
Contributor

JavedNissar commented May 5, 2020

Agreed that the dependency graph executor is probably a bad move. Plus, it seems unstable and I imagine adding unstable dependencies wouldn't be healthy for this project. I'll take a stab at a marker type but other than that I'll look to execute the plan from my earlier comment.

@JavedNissar
Copy link
Contributor

JavedNissar commented May 5, 2020

Maybe I'm missing something but I keep getting the following compile error on the latest revision of the repo @dabreegster :

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:16:25
   |
16 |             min_x: f64::MAX,
   |                         ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
16 |             min_x: std::f64::MAX,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:17:25
   |
17 |             min_y: f64::MAX,
   |                         ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
17 |             min_y: std::f64::MAX,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:18:25
   |
18 |             max_x: f64::MIN,
   |                         ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
18 |             max_x: std::f64::MIN,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:19:25
   |
19 |             max_y: f64::MIN,
   |                         ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
19 |             max_y: std::f64::MIN,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:96:27
   |
96 |             min_lon: f64::MAX,
   |                           ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
96 |             min_lon: std::f64::MAX,
   |                      ^^^^^^^^^^^^^

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:97:27
   |
97 |             min_lat: f64::MAX,
   |                           ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
97 |             min_lat: std::f64::MAX,
   |                      ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:98:27
   |
98 |             max_lon: f64::MIN,
   |                           ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
98 |             max_lon: std::f64::MIN,
   |                      ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:99:27
   |
99 |             max_lat: f64::MIN,
   |                           ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
99 |             max_lat: std::f64::MIN,
   |                      ^^^^^^^^^^^^^

error: aborting due to 8 previous errors

For more information about this error, try `rustc --explain E0599`.
error: could not compile `geom`.

@dabreegster
Copy link
Collaborator Author

I upgraded to Rust 1.43 on April 23, which made those imports unnecessary. Try rustup update stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants