# Renaming and Combining

## Introduction

Oftentimes data will come to us with column names, index names, or other naming conventions that we are not satisfied with. In that case, pandas provides functions to change the names of the offending entries to something better.

## Renaming

The first function we'll introduce here is ```rename()```, which lets you change index names and/or column names. For example, to change the ```DY``` column in our dataset to ```D/Y```, we would do:

In [2]:
import pandas as pd

br_small_caps = pd.read_csv('statusinvest-busca-avancada.csv', delimiter=';')
br_small_caps.rename(columns={'DY': 'D/Y'})

Unnamed: 0,TICKER,PRECO,D/Y,P/L,P/VP,P/ATIVOS,MARGEM BRUTA,MARGEM EBIT,MARG. LIQUIDA,P/EBIT,...,PATRIMONIO / ATIVOS,PASSIVOS / ATIVOS,GIRO ATIVOS,CAGR RECEITAS 5 ANOS,CAGR LUCROS 5 ANOS,LIQUIDEZ MEDIA DIARIA,VPA,LPA,PEG Ratio,VALOR DE MERCADO
0,AGRO3,27.55,11.66,11.95,1.45,0.85,27.98,15.46,21.58,16.68,...,0.59,0.41,0.33,27.22,13.38,6.609.378.40,19.06,2.31,0.04,2.828.928.882.20
1,ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,...,0.74,0.21,0.43,31.18,22.25,13.557.37,1.65,0.81,0.01,48.323.942.94
2,BLAU3,11.29,2.48,8.51,1.0,0.64,34.09,22.18,16.14,6.19,...,0.64,0.36,0.47,11.91,14.06,2.278.661.06,11.26,1.33,-0.26,2.025.357.571.31
3,BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,...,0.93,0.07,0.34,8.81,74.34,53.310.276.50,4.34,0.52,5.59,4.212.250.617.75
4,BRBI11,14.5,9.1,8.88,1.84,0.14,100.0,8.88,61.44,61.47,...,0.07,0.93,0.02,-16.03,27.88,3.910.827.37,7.87,1.63,0.4,1.522.437.708.00
5,BRIT3,4.24,2.06,12.01,1.25,0.58,45.24,18.58,12.48,8.07,...,0.47,0.53,0.39,41.71,44.43,2.218.034.23,3.4,0.35,0.11,1.904.162.443.84
6,CAMB3,11.15,1.32,6.52,1.86,1.29,48.26,21.89,15.87,4.73,...,0.69,0.31,1.24,13.16,34.14,363.010.12,5.98,1.71,1.58,471.367.142.00
7,CAMB4,6.25,0.43,3.66,1.05,0.72,48.26,21.89,15.87,2.65,...,0.69,0.31,1.24,13.16,34.14,,5.98,1.71,0.89,471.367.142.00
8,CAML3,8.71,2.95,4.41,1.02,0.33,19.82,6.85,4.12,2.66,...,0.33,0.67,1.82,18.83,13.77,6.617.186.57,8.57,1.97,0.01,3.048.500.000.00
9,CEBR3,20.93,9.07,8.64,1.43,1.02,50.78,61.36,49.19,6.93,...,0.71,0.11,0.24,-32.89,14.18,72.491.66,14.59,2.42,-0.52,1.450.093.620.45


```rename()``` lets you rename index *or*  column values by specifying a ```index``` or ```column``` keyword parameter, respectively. It supports a variety of input formats, but usually a Python dictionary is the most convenient. Here is an example using it to rename some elements of the index.

In [3]:
br_small_caps.rename(index={0 : 'firstEntry', 1: 'secondEntry'})

Unnamed: 0,TICKER,PRECO,DY,P/L,P/VP,P/ATIVOS,MARGEM BRUTA,MARGEM EBIT,MARG. LIQUIDA,P/EBIT,...,PATRIMONIO / ATIVOS,PASSIVOS / ATIVOS,GIRO ATIVOS,CAGR RECEITAS 5 ANOS,CAGR LUCROS 5 ANOS,LIQUIDEZ MEDIA DIARIA,VPA,LPA,PEG Ratio,VALOR DE MERCADO
firstEntry,AGRO3,27.55,11.66,11.95,1.45,0.85,27.98,15.46,21.58,16.68,...,0.59,0.41,0.33,27.22,13.38,6.609.378.40,19.06,2.31,0.04,2.828.928.882.20
secondEntry,ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,...,0.74,0.21,0.43,31.18,22.25,13.557.37,1.65,0.81,0.01,48.323.942.94
2,BLAU3,11.29,2.48,8.51,1.0,0.64,34.09,22.18,16.14,6.19,...,0.64,0.36,0.47,11.91,14.06,2.278.661.06,11.26,1.33,-0.26,2.025.357.571.31
3,BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,...,0.93,0.07,0.34,8.81,74.34,53.310.276.50,4.34,0.52,5.59,4.212.250.617.75
4,BRBI11,14.5,9.1,8.88,1.84,0.14,100.0,8.88,61.44,61.47,...,0.07,0.93,0.02,-16.03,27.88,3.910.827.37,7.87,1.63,0.4,1.522.437.708.00
5,BRIT3,4.24,2.06,12.01,1.25,0.58,45.24,18.58,12.48,8.07,...,0.47,0.53,0.39,41.71,44.43,2.218.034.23,3.4,0.35,0.11,1.904.162.443.84
6,CAMB3,11.15,1.32,6.52,1.86,1.29,48.26,21.89,15.87,4.73,...,0.69,0.31,1.24,13.16,34.14,363.010.12,5.98,1.71,1.58,471.367.142.00
7,CAMB4,6.25,0.43,3.66,1.05,0.72,48.26,21.89,15.87,2.65,...,0.69,0.31,1.24,13.16,34.14,,5.98,1.71,0.89,471.367.142.00
8,CAML3,8.71,2.95,4.41,1.02,0.33,19.82,6.85,4.12,2.66,...,0.33,0.67,1.82,18.83,13.77,6.617.186.57,8.57,1.97,0.01,3.048.500.000.00
9,CEBR3,20.93,9.07,8.64,1.43,1.02,50.78,61.36,49.19,6.93,...,0.71,0.11,0.24,-32.89,14.18,72.491.66,14.59,2.42,-0.52,1.450.093.620.45


You'll probably rename columns very often, but rename index values very rarely. For that, ```set_index()``` is usually more convenient. <br />
Both the rox index and the column index can have their own ```name``` attribute. The complimentary ```rename_axis()```method may be used to change these names. For example:

In [4]:
br_small_caps.rename_axis("stocks", axis='rows').rename_axis("info", axis='columns')

info,TICKER,PRECO,DY,P/L,P/VP,P/ATIVOS,MARGEM BRUTA,MARGEM EBIT,MARG. LIQUIDA,P/EBIT,...,PATRIMONIO / ATIVOS,PASSIVOS / ATIVOS,GIRO ATIVOS,CAGR RECEITAS 5 ANOS,CAGR LUCROS 5 ANOS,LIQUIDEZ MEDIA DIARIA,VPA,LPA,PEG Ratio,VALOR DE MERCADO
stocks,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,AGRO3,27.55,11.66,11.95,1.45,0.85,27.98,15.46,21.58,16.68,...,0.59,0.41,0.33,27.22,13.38,6.609.378.40,19.06,2.31,0.04,2.828.928.882.20
1,ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,...,0.74,0.21,0.43,31.18,22.25,13.557.37,1.65,0.81,0.01,48.323.942.94
2,BLAU3,11.29,2.48,8.51,1.0,0.64,34.09,22.18,16.14,6.19,...,0.64,0.36,0.47,11.91,14.06,2.278.661.06,11.26,1.33,-0.26,2.025.357.571.31
3,BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,...,0.93,0.07,0.34,8.81,74.34,53.310.276.50,4.34,0.52,5.59,4.212.250.617.75
4,BRBI11,14.5,9.1,8.88,1.84,0.14,100.0,8.88,61.44,61.47,...,0.07,0.93,0.02,-16.03,27.88,3.910.827.37,7.87,1.63,0.4,1.522.437.708.00
5,BRIT3,4.24,2.06,12.01,1.25,0.58,45.24,18.58,12.48,8.07,...,0.47,0.53,0.39,41.71,44.43,2.218.034.23,3.4,0.35,0.11,1.904.162.443.84
6,CAMB3,11.15,1.32,6.52,1.86,1.29,48.26,21.89,15.87,4.73,...,0.69,0.31,1.24,13.16,34.14,363.010.12,5.98,1.71,1.58,471.367.142.00
7,CAMB4,6.25,0.43,3.66,1.05,0.72,48.26,21.89,15.87,2.65,...,0.69,0.31,1.24,13.16,34.14,,5.98,1.71,0.89,471.367.142.00
8,CAML3,8.71,2.95,4.41,1.02,0.33,19.82,6.85,4.12,2.66,...,0.33,0.67,1.82,18.83,13.77,6.617.186.57,8.57,1.97,0.01,3.048.500.000.00
9,CEBR3,20.93,9.07,8.64,1.43,1.02,50.78,61.36,49.19,6.93,...,0.71,0.11,0.24,-32.89,14.18,72.491.66,14.59,2.42,-0.52,1.450.093.620.45


## Combining

When performing operations on a dataset, we will sometimes need to combine different DataFrames and/or Series in non-trivial ways. Pandas has three core methods for doing this. In order of increasing complexity, these are ```concat()```, ```join()```, and ```merge()```. Most of what ```merge()```  can do can also be done more simply with ```join()```, so we will omit it and focus on the first two functions here. <br />
The simplest combining method is ```concat()```. Given a list of elements, this function will smush those elements along an axis. <br />
This is useful when we have data in different DataFrame or Series objects but have the same fields (columns).

In [7]:
br_dividends = pd.read_csv('statusinvest-busca-avancada-dividends.csv', delimiter=';')

pd.concat([br_small_caps, br_dividends])

Unnamed: 0,TICKER,PRECO,DY,P/L,P/VP,P/ATIVOS,MARGEM BRUTA,MARGEM EBIT,MARG. LIQUIDA,P/EBIT,...,PATRIMONIO / ATIVOS,PASSIVOS / ATIVOS,GIRO ATIVOS,CAGR RECEITAS 5 ANOS,CAGR LUCROS 5 ANOS,LIQUIDEZ MEDIA DIARIA,VPA,LPA,PEG Ratio,VALOR DE MERCADO
0,AGRO3,27.55,11.66,11.95,1.45,0.85,27.98,15.46,21.58,16.68,...,0.59,0.41,0.33,27.22,13.38,6.609.378.40,19.06,2.31,0.04,2.828.928.882.20
1,ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,...,0.74,0.21,0.43,31.18,22.25,13.557.37,1.65,0.81,0.01,48.323.942.94
2,BLAU3,11.29,2.48,8.51,1.0,0.64,34.09,22.18,16.14,6.19,...,0.64,0.36,0.47,11.91,14.06,2.278.661.06,11.26,1.33,-0.26,2.025.357.571.31
3,BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,...,0.93,0.07,0.34,8.81,74.34,53.310.276.50,4.34,0.52,5.59,4.212.250.617.75
4,BRBI11,14.5,9.1,8.88,1.84,0.14,100.0,8.88,61.44,61.47,...,0.07,0.93,0.02,-16.03,27.88,3.910.827.37,7.87,1.63,0.4,1.522.437.708.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19,SYNE3,604,5532,165,084,029,4141,6211,4200,112,...,034,041,042,2359,7627,"2.467.846,55",715,366,000,"915.866.670,00"
20,TAEE11,3487,785,713,165,057,6345,7356,4245,411,...,035,065,019,1568,1096,"46.641.964,14",2118,489,031,"12.026.456.843,37"
21,TAEE4,1167,782,716,165,057,6345,7356,4245,413,...,035,065,019,1568,1096,"1.108.898,18",706,163,032,"12.026.456.843,37"
22,VALE3,5516,862,796,124,052,3545,2535,1488,467,...,042,056,044,733,,"989.261.751,32",4452,693,-043,"249.600.026.824,20"


The middlemost combiner in terms of complexity is ```join()```. ```join()``` lets you combine different dataframes objects which have an index in common. For example, to pull down stocks from both DataFrame with the same ticket:

In [10]:
left = br_small_caps.set_index('TICKER')
right = br_dividends.set_index('TICKER')

left.join(right, lsuffix='_SC', rsuffix='_DV')

Unnamed: 0_level_0,PRECO_SC,DY_SC,P/L_SC,P/VP_SC,P/ATIVOS_SC,MARGEM BRUTA_SC,MARGEM EBIT_SC,MARG. LIQUIDA_SC,P/EBIT_SC,EV/EBIT_SC,...,PATRIMONIO / ATIVOS_DV,PASSIVOS / ATIVOS_DV,GIRO ATIVOS_DV,CAGR RECEITAS 5 ANOS_DV,CAGR LUCROS 5 ANOS_DV,LIQUIDEZ MEDIA DIARIA_DV,VPA_DV,LPA_DV,PEG Ratio_DV,VALOR DE MERCADO_DV
TICKER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AGRO3,27.55,11.66,11.95,1.45,0.85,27.98,15.46,21.58,16.68,21.62,...,55.0,45.0,34.0,1378.0,1182.0,"3.190.795,23",2113.0,302.0,22.0,"2.096.795.926,48"
ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,-51.38,...,,,,,,,,,,
BLAU3,11.29,2.48,8.51,1.0,0.64,34.09,22.18,16.14,6.19,6.67,...,,,,,,,,,,
BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,18.69,...,,,,,,,,,,
BRBI11,14.5,9.1,8.88,1.84,0.14,100.0,8.88,61.44,61.47,12.66,...,,,,,,,,,,
BRIT3,4.24,2.06,12.01,1.25,0.58,45.24,18.58,12.48,8.07,11.59,...,,,,,,,,,,
CAMB3,11.15,1.32,6.52,1.86,1.29,48.26,21.89,15.87,4.73,4.45,...,,,,,,,,,,
CAMB4,6.25,0.43,3.66,1.05,0.72,48.26,21.89,15.87,2.65,4.45,...,,,,,,,,,,
CAML3,8.71,2.95,4.41,1.02,0.33,19.82,6.85,4.12,2.66,5.23,...,,,,,,,,,,
CEBR3,20.93,9.07,8.64,1.43,1.02,50.78,61.36,49.19,6.93,3.47,...,,,,,,,,,,


The ```lsuffix``` and ```rsuffix``` parameters are necessary here because the data has the same column names in both datasets. We wouldn't need this otherwise.