# Reading multiple files

Easiest way is to create a python list of file paths/names and then iterate over them using a python list comprehesion, returning a list of pandas dataframes.

In [1]:
import pandas as pd
import numpy as np

# depends on all the csv files having the same format, e.g. delimiter, so that 
# the same options can be applied to 'read_csv'
files = [
    './data/Summer Olympic medals/Gold.csv',
    './data/Summer Olympic medals/Silver.csv',
    './data/Summer Olympic medals/Bronze.csv'
]

data_frames = [pd.read_csv(file) for file in files]
data_frames

[     NOC                Country   Total
 0    USA          United States  2088.0
 1    URS           Soviet Union   838.0
 2    GBR         United Kingdom   498.0
 3    FRA                 France   378.0
 4    GER                Germany   407.0
 5    AUS              Australia   293.0
 6    ITA                  Italy   460.0
 7    HUN                Hungary   400.0
 8    SWE                 Sweden   347.0
 9    NED            Netherlands   212.0
 10   ROU                Romania   155.0
 11   JPN                  Japan   206.0
 12   RUS                 Russia   192.0
 13   CAN                 Canada   154.0
 14   GDR           East Germany   329.0
 15   POL                 Poland   103.0
 16   FIN                Finland   124.0
 17   CHN                  China   234.0
 18   FRG           West Germany   143.0
 19   BRA                 Brazil    59.0
 20   DEN                Denmark   147.0
 21   BEL                Belgium    91.0
 22   NOR                 Norway   194.0
 23   SUI       

This approach is not restricted to working with CSV files. As long as pandas has a suitable data import function, e.g. `read_tsv`, `read_json`, `read_excel`, etc, you can apply a loop or comprehension to generate a list of DataFrames imported from the source files.

Where the file names share the same pattern, we can use the python `glob` package to return a list of all the filenames that can be iterated over.

In [2]:
from glob import glob

# match for all files starting with 'sales', and ending with '.csv'
filenames = glob('./data/Sales/sales*.csv')
print(type(filenames))
filenames

<class 'list'>


['./data/Sales/sales-feb-2015.csv',
 './data/Sales/sales-jan-2015.csv',
 './data/Sales/sales-mar-2015.csv']

In [3]:
dfs = [pd.read_csv(f) for f in filenames]

When dataframes share the same structure, a 'quick and dirty' way to combine them is to simply copy the required columns from one or more dataframes into a single dataframe combining all the req'd data.

In [4]:
# copy the 'gold' dataframe and rename it's columns
medals = data_frames[0].copy()
medals.columns = ['NOC', 'Country', 'Gold']

medals.head()

Unnamed: 0,NOC,Country,Gold
0,USA,United States,2088.0
1,URS,Soviet Union,838.0
2,GBR,United Kingdom,498.0
3,FRA,France,378.0
4,GER,Germany,407.0


In [5]:
# add the columns of interest from the other dataframes
medals['Silver'] = data_frames[1]['Total']
medals['Bronze'] = data_frames[2]['Total']
medals.head()

Unnamed: 0,NOC,Country,Gold,Silver,Bronze
0,USA,United States,2088.0,1195.0,1052.0
1,URS,Soviet Union,838.0,627.0,584.0
2,GBR,United Kingdom,498.0,591.0,505.0
3,FRA,France,378.0,461.0,475.0
4,GER,Germany,407.0,350.0,454.0
