# Parallel Processing SimpleDirectoryReader

In this notebook, we demonstrate how to use parallel processing when loading data with `SimpleDirectoryReader`. Parallel processing can be useful with heavier workloads i.e., loading from a directory consisting of many files. (NOTE: if using Windows, you may see less gains when using parallel processing for loading data. This has to do with the differences between how multiprocess works in linux/mac and windows e.g., see [here](https://pythonforthelab.com/blog/differences-between-multiprocessing-windows-and-linux/) or [here](https://stackoverflow.com/questions/52465237/multiprocessing-slower-than-serial-processing-in-windows-but-not-in-linux))

In [None]:
import cProfile, pstats
from pstats import SortKey

In this demo, we'll use the `PatronusAIFinanceBenchDataset` llama-dataset from [llamahub](https://llamahub.ai). This dataset is based off of a set of 32 PDF files which are included in the download from llamahub. 

In [None]:
!llamaindex-cli download-llamadataset PatronusAIFinanceBenchDataset --download-dir ./data_parallel

In [None]:
from llama_index.core import SimpleDirectoryReader

# define our reader with the directory containing the 32 pdf files
reader = SimpleDirectoryReader(input_dir="./data_parallel/source_files")

### Sequential Load

Sequential loading is the default behaviour and can be executed via the `load_data()` method.

In [None]:
documents = reader.load_data(show_progress=True)
len(documents)

In [None]:
cProfile.run("reader.load_data()", "oldstats")
p = pstats.Stats("oldstats")
p.strip_dirs().sort_stats(SortKey.CUMULATIVE).print_stats(15)

### Parallel Load

To load using parallel processes, we set `num_workers` to a positive integer value.

In [None]:
documents = reader.load_data(num_workers=10, show_progress=True)

In [None]:
len(documents)

In [None]:
cProfile.run("reader.load_data(num_workers=30)", "newstats")
p = pstats.Stats("newstats")
p.strip_dirs().sort_stats(SortKey.CUMULATIVE).print_stats(15)

### In Conclusion

In [None]:
821 / 51