# Working with Multiple DataFrames

- In most of the real life projects you will not get data from a single resource. You might need to combine data that you gather from multiple sources.

**We have already worked with big mart sales data. Here, we have divided the data based on different outlet size. Let's see how to work with the multiple files.**

### Read The Data

In [1]:
# Importing the pandas library

import pandas as pd

### Read 3 different files outlet_size_small.csv, outlet_size_medium.csv and outlet_size_high.csv stored in the folder datasets.

In [3]:
# Read the datasets
outlet_size_small = pd.read_csv('datasets/outlet_size_small.csv')
outlet_size_medium = pd.read_csv('datasets/outlet_size_medium.csv')
oulet_size_large = pd.read_csv('datasets/outlet_size_high.csv')

### let's check the shape of the data

In [4]:
outlet_size_small.shape, outlet_size_medium.shape, oulet_size_large.shape

((2388, 9), (2793, 9), (932, 9))

### So, there are 2388 small, 2793 medium and 932 large size outlets are there.

### Let's have a look at the data

### OUTLET SIZE SMALL

In [5]:
outlet_size_small.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Type,Item_Outlet_Sales
0,FDA03,18.5,Regular,0.045464,Dairy,144.1102,OUT046,Supermarket Type1,2187.153
1,FDS46,17.6,Regular,0.047257,Snack Foods,119.6782,OUT046,Supermarket Type1,2145.2076
2,FDP49,9.0,Regular,0.069089,Breakfast,56.3614,OUT046,Supermarket Type1,1547.3192
3,FDU02,13.35,Low Fat,0.102492,Dairy,230.5352,OUT035,Supermarket Type1,2748.4224
4,NCB30,14.6,Low Fat,0.025698,Household,196.5084,OUT035,Supermarket Type1,1587.2672


### OUTLET SIZE MEDIUM

In [6]:
outlet_size_medium.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,Supermarket Type1,2097.27
3,FDP36,10.395,Regular,0.0,Baking Goods,51.4008,OUT018,Supermarket Type2,556.6088
4,FDP10,,Low Fat,0.12747,Snack Foods,107.7622,OUT027,Supermarket Type3,4022.7636


### OUTLET_SIZE_LARGE

In [7]:
oulet_size_large.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Type,Item_Outlet_Sales
0,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,Supermarket Type1,994.7052
1,FDO10,13.65,Regular,0.012741,Snack Foods,57.6588,OUT013,Supermarket Type1,343.5528
2,FDF32,16.35,Low Fat,0.068024,Fruits and Vegetables,196.4426,OUT013,Supermarket Type1,1977.426
3,FDN22,18.85,Regular,0.13819,Snack Foods,250.8724,OUT013,Supermarket Type1,3775.086
4,DRJ59,11.65,low fat,0.019356,Hard Drinks,39.1164,OUT013,Supermarket Type1,308.9312


## CONCATENATE ALL THE DATAFRAMES

**we will use the concat function to concatenate all the dataframes. You just need to pass the list of dataframes to concatenate.**

## FOR ROW-WISE CONCATENATION USE AXIS=0

In [8]:
# dataframes list

all_dataframes = [outlet_size_small, outlet_size_medium, oulet_size_large]

In [9]:
# concatenate all the dataframes
data = pd.concat(all_dataframes, axis=0)

In [10]:
# shape of the data
data.shape

(6113, 9)

## FOR COLUMN-WISE CONCATENATION USE AXIS=1

**It is not advised to concatenate dataframes column-wise. If you want then you need to take care of some checks like the number of rows must be same in both dataframes, indexes are sorted of both dataframes. If you are done with all the checks then you can simply use axis=1 to do the job.**

**Let's see with the help of an example**

In [11]:
sample_dataframe = pd.DataFrame({
    'roll_no' : [102, 101, 104, 103, 105],
    'name' : ['Arvind', 'Rahul', 'Prateek', 'Piyush', 'Kartik'],
    'grade' : ['B', 'B', 'A', 'C', 'A'],
    'marks' : [15, 15, 20, 4, 22],
    'city' : ['Gurugram', 'Delhi', 'Delhi', 'Gurugram', 'Hyderabad']
})
sample_dataframe

Unnamed: 0,roll_no,name,grade,marks,city
0,102,Arvind,B,15,Gurugram
1,101,Rahul,B,15,Delhi
2,104,Prateek,A,20,Delhi
3,103,Piyush,C,4,Gurugram
4,105,Kartik,A,22,Hyderabad


## Let's create a dataframe which contains column name phone_no. Here, we are assuming that order of phone numbers are correct as the order of namesin the sample_dataframe.

In [12]:
# another sample dataframe

phone_no = pd.DataFrame({'phone_no' : [212202, 202021, 212334, 213431, 211721]})
phone_no

Unnamed: 0,phone_no
0,212202
1,202021
2,212334
3,213431
4,211721


In [13]:
combined = pd.concat([sample_dataframe, phone_no], axis=1)
combined

Unnamed: 0,roll_no,name,grade,marks,city,phone_no
0,102,Arvind,B,15,Gurugram,212202
1,101,Rahul,B,15,Delhi,202021
2,104,Prateek,A,20,Delhi,212334
3,103,Piyush,C,4,Gurugram,213431
4,105,Kartik,A,22,Hyderabad,211721
