# Read Multiple Parquet Files and Write to Single CSV using Python

This post shows you how to read multiple Parquet files in a single folder into your Python session and write them to a single CSV.

The main thing to note here is that Dask accepts an asterisk *** as a glob character to match related filenames.


## Create Multiple Parquet Files

Let's create some dummy dataframes and write them to multiple Parquet files.

In [1]:
import pandas as pd
import numpy as np

In [24]:
# use the recommended method for generating random integers with NumPy
rng = np.random.default_rng()

In [7]:
# generate 3 dummy dataframes with similar filenames
for i in range(3):
    df = pd.DataFrame(rng.integers(0, 100, size=(10, 4)), columns=list('ABCD'))
    df.to_parquet(f"dummy_df_{i}.parquet")

## Load Multiple Parquet Files with Dask

In [9]:
import dask.dataframe as dd

In [10]:
ddf = dd.read_parquet('dummy_df_*.parquet', index=False)

In [11]:
len(ddf)

30

In [12]:
ddf

Unnamed: 0_level_0,A,B,C,D
npartitions=3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
,int64,int64,int64,int64
,...,...,...,...
,...,...,...,...
,...,...,...,...


In [13]:
ddf.head()

Unnamed: 0,A,B,C,D
0,44,77,59,68
1,98,23,42,85
2,42,37,23,88
3,6,18,86,76
4,49,78,23,81


## Reset the Index

In [None]:
ddf.set_index()

In [89]:
ddf.compute()

Unnamed: 0,a,b,c
0,1,4,7
1,2,5,8
2,3,6,9
0,1,4,7
1,2,5,8
2,3,6,9
0,1,4,7
1,2,5,8
2,3,6,9


In [79]:
ddf.compute()

Unnamed: 0,a,b,c
0,1,4,7
1,2,5,8
2,3,6,9
0,1,4,7
1,2,5,8
2,3,6,9
0,1,4,7
1,2,5,8
2,3,6,9


### NEED TO FIGURE OUT HOW TO SET INDEX CORRECTLY
- set_index
- reset_index

## Concat in different order

## Write Parquet Files to CSV

In [21]:
ddf.to_csv("dummy_df_all.csv", 
           single_file=True, 
           index=False
)

['/Users/rpelgrim/Documents/git/coiled-resources/parquet-csv/dummy_df_all.csv']

In [22]:
test = pd.read_csv("dummy_df_all.csv")

In [23]:
test

Unnamed: 0,A,B,C,D
0,44,77,59,68
1,98,23,42,85
2,42,37,23,88
3,6,18,86,76
4,49,78,23,81
5,49,8,26,22
6,55,19,99,24
7,1,57,58,6
8,34,94,38,54
9,82,81,83,55
