**RENAMING AND COMBINING**

Oftentimes data will come to us with column names, index names, or other naming conventions that we are not satisfied with. 
We're going to see how to use pandas functions to change the names of the offending entries to something better.
We'll also explore how to combine data from multiple DataFrames and/or Series.

**Renaming**

**rename()** --> lets you change index names and/or column names.
For example we can change the 'points' column to 'score'

In [25]:
points_renamed = reviews.rename(columns={'points' : 'score'})
points_renamed.columns

Index(['country', 'description', 'designation', 'score', 'price', 'province',
       'region_1', 'region_2', 'taster_name', 'taster_twitter_handle', 'title',
       'variety', 'winery'],
      dtype='object')

rename() lets you rename index or column values by specifying a index or column keyword parameter, respectively. It supports a variety of input formats, but usually a Python dictionary is the most convenient. Here is an example using it to rename some elements of the index.

In [41]:
reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'}).head(2)

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
firstEntry,Italy,"Aromas include tropical fruit, broom, and mine...",Vulkà Bianco,87,20.0,Sicily & Sardinia,Etna,Eastern Sicily,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
secondEntry,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,Douro,Northern Portugal,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos


You'll probably rename columns very often, but rename index values very rarely. For that, set_index() is usually more convenient.
**set_index()** --> para convertir una o más columnas en el índice del DataFrame.

Both the row index and the column index can have their own name attribute. The complimentary **rename_axis()** method may be used to change these names. For example:

In [43]:
reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns').head(2)

fields,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
wines,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,Italy,"Aromas include tropical fruit, broom, and mine...",Vulkà Bianco,87,20.0,Sicily & Sardinia,Etna,Eastern Sicily,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,Douro,Northern Portugal,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos


**Combining**

When performing operations on a dataset, we will sometimes need to combine different DataFrames and/or Series in non-trivial ways.
Pandas has three core methods for doing this. In order of increasing complexity, these are **concat()**, **join()**, and **merge()**. Most of what merge() can do can also be done more simply with join(), so we will omit it and focus on the first two functions here.

The simplest combining method is concat(). Given a list of elements, this function will smush those elements together along an axis.

**concat()** --> Une (concatena) varios objetos de pandas —como DataFrames o Series— a lo largo de un eje (filas o columnas).
```
pd.concat([df1, df2], axis=0)  # Por filas (uno debajo del otro) - OJO los indices no se reinician automáticamente
pd.concat([df1, df2], axis=1)  # Por columnas (uno al lado del otro
```)

This is useful when we have data in different DataFrame or Series objects but having the same fields (columns). One example: the YouTube Videos dataset, which splits the data up based on country of origin (e.g. Canada and the UK, in this example). If we want to study multiple countries simultaneously, we can use concat() to smush them together:

In [None]:
canadian_youtube = pd.read_csv("../input/youtube-new/CAvideos.csv")
british_youtube = pd.read_csv("../input/youtube-new/GBvideos.csv")

pd.concat([canadian_youtube, british_youtube])
#returns one df below the other one

The middlemost combiner in terms of complexity is join(). **join()** lets you combine different DataFrame objects **which have an index in common**. For example, to pull down videos that happened to be trending on the same day in both Canada and the UK, we could do the following:

In [None]:
left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])

left.join(right, lsuffix='_CAN', rsuffix='_UK')
#returns df next to the other, and i specify an extra "name" to differenciate btw the columns of each

The lsuffix and rsuffix parameters are necessary here because the data has the same column names in both British and Canadian datasets. If this wasn't true (because, say, we'd renamed them beforehand) we wouldn't need them.

**EXERCISES**

In [3]:
import pandas as pd

In [51]:
reviews = pd.DataFrame([
    {
        'country': 'Italy',
        'description': 'Aromas include tropical fruit, broom, and minerals.',
        'designation': 'Vulkà Bianco',
        'points': 87,
        'price': 20.0,
        'province': 'Sicily & Sardinia',
        'region_1': 'Etna',
        'region_2': 'Eastern Sicily',
        'taster_name': 'Kerin O’Keefe',
        'taster_twitter_handle': '@kerinokeefe',
        'title': 'Nicosia 2013 Vulkà Bianco (Etna)',
        'variety': 'White Blend',
        'winery': 'Nicosia'
    },
    {
        'country': 'Portugal',
        'description': 'This is ripe and fruity, a wine that is smooth and balanced.',
        'designation': 'Avidagos',
        'points': 87,
        'price': 15.0,
        'province': 'Douro',
        'region_1': 'Douro',
        'region_2': 'Northern Portugal',
        'taster_name': 'Roger Voss',
        'taster_twitter_handle': '@vossroger',
        'title': 'Quinta dos Avidagos 2011 Avidagos Red (Douro)',
        'variety': 'Portuguese Red',
        'winery': 'Quinta dos Avidagos'
    },
    {
        'country': 'France',
        'description': 'A dry style of Pinot Gris, crisp with acidity and minerality.',
        'designation': 'Classic',
        'points': 100,
        'price': 32.0,
        'province': 'Alsace',
        'region_1': 'Alsace',
        'region_2': 'Northeast France',
        'taster_name': 'Roger Voss',
        'taster_twitter_handle': '@vossroger',
        'title': 'Domaine Marcel Deiss 2012 Pinot Gris (Alsace)',
        'variety': 'Pinot Gris',
        'winery': 'Domaine Marcel Deiss'
    },
    {
        'country': 'France',
        'description': 'Big, rich and off-dry, with intensity and floral notes.',
        'designation': 'Lieu-dit Harth Cuvée Caroline',
        'points': 90,
        'price': 21.0,
        'province': 'Berona',
        'region_1': 'Alsace',
        'region_2': 'Northeast France',
        'taster_name': 'Roger Voss',
        'taster_twitter_handle': '@vossroger',
        'title': 'Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caroline (Alsace)',
        'variety': 'Gewürztraminer',
        'winery': 'Domaine Schoffit'
    },
    {
        'country': 'Spain',
        'description': 'Dark cherry, spice and leather aromas dominate this classic Rioja.',
        'designation': 'Reserva',
        'points': 89,
        'price': 18.0,
        'province': 'Rioja',
        'region_1': 'Rioja Alta',
        'region_2': 'Northern Spain',
        'taster_name': 'Michael Schachner',
        'taster_twitter_handle': '@wineschach',
        'title': 'Marqués de Cáceres 2011 Reserva (Rioja)',
        'variety': 'Tempranillo',
        'winery': 'Marqués de Cáceres'
    },
    {
        'country': 'US',
        'description': 'Fruity and soft, with hints of raspberry and vanilla.',
        'designation': 'Estate',
        'points': 88,
        'price': 25.0,
        'province': 'California',
        'region_1': 'Napa Valley',
        'region_2': 'North Coast',
        'taster_name': 'Jim Gordon',
        'taster_twitter_handle': '@jimgordonwine',
        'title': 'Robert Mondavi 2014 Cabernet Sauvignon (Napa Valley)',
        'variety': 'Cabernet Sauvignon',
        'winery': 'Robert Mondavi'
    },
    {
        'country': 'Argentina',
        'description': 'Bold and structured, offering black fruit and mocha.',
        'designation': 'Gran Reserva',
        'points': 92,
        'price': 30.0,
        'province': 'Mendoza Province',
        'region_1': 'Uco Valley',
        'taster_name': 'Alejandro Iglesias',
        'taster_twitter_handle': '@aliglesiaswine',
        'title': 'Trapiche 2015 Gran Reserva Malbec (Uco Valley)',
        'variety': 'Malbec',
        'winery': 'Trapiche'
    },
    {
        'country': 'Chile',
        'description': 'Smooth, with red berries and a touch of herbs.',
        'designation': 'Reserva Especial',
        'points': 86,
        'price': 12.0,
        'province': 'Maipo Valley',
        'region_2': 'Central Valley',
        'taster_name': 'Patricio Tapia',
        'taster_twitter_handle': '@ptapiawine',
        'title': 'Concha y Toro 2016 Carmenere (Maipo Valley)',
        'variety': 'Carmenere',
        'winery': 'Concha y Toro'
    },
    {
        'country': 'Germany',
        'description': 'Lively and fresh, with notes of green apple and lime.',
        'designation': 'Kabinett',
        'points': 91,
        'price': 22.0,
        'province': 'Mosel',
        'region_1': 'Mosel',
        'region_2': 'Western Germany',
        'taster_name': 'Anne Krebiehl',
        'taster_twitter_handle': '@annewine',
        'title': 'Dr. Loosen 2015 Riesling Kabinett (Mosel)',
        'variety': 'Riesling',
        'winery': 'Dr. Loosen'
    },
    {
        'country': 'South Africa',
        'description': 'Aromas of citrus and melon, fresh and vibrant.',
        'designation': 'Signature',
        'points': 85,
        'price': 10.0,
        'province': 'Western Cape',
        'region_1': 'Stellenbosch',
        'region_2': 'Coastal Region',
        'taster_name': 'Lauren Buzzeo',
        'taster_twitter_handle': '@laurenbuzzeo',
        'title': 'Spier 2016 Chenin Blanc (Western Cape)',
        'variety': 'Chenin Blanc',
        'winery': 'Spier'
    },
    {
        'country': 'South Africa',
        'description': 'Aromas of citrus and melon, fresh and vibrant.',
        'designation': 'Signature',
        'points': 87,
        'price': 10.0,
        'province': 'Western Cape',
        'region_1': 'Stellenbosch',
        'region_2': 'Coastal Region',
        'taster_name': 'Lauren Buzzeo',
        'taster_twitter_handle': '@laurenbuzzeo',
        'title': 'Spier 2016 Chenin Blanc (Western Cape)',
        'variety': 'Chenin Blanc',
        'winery': 'Spier'
    }
])

**Exercise 1**

region_1 and region_2 are pretty uninformative names for locale columns in the dataset. Create a copy of reviews with these columns renamed to region and locale, respectively.

In [77]:
renamed = reviews.rename(columns= {"region_1" : "region", "region_2" : "locale"})
renamed.head(1)

Unnamed: 0,country,description,designation,points,price,province,region,locale,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, and mine...",Vulkà Bianco,87,20.0,Sicily & Sardinia,Etna,Eastern Sicily,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia


In [None]:
#also valid:
renamed = reviews.rename(columns=dict(region_1='region', region_2='locale'))

**Exercise 2**

In [81]:
reviews.rename_axis("wine", axis = 'rows').head(1)

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
wine,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,Italy,"Aromas include tropical fruit, broom, and mine...",Vulkà Bianco,87,20.0,Sicily & Sardinia,Etna,Eastern Sicily,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia


**Exercise 3:**

The Things on Reddit dataset includes product links from a selection of top-ranked forums ("subreddits") on reddit.com. Run the cell below to load a dataframe of products mentioned on the /r/gaming subreddit and another dataframe for products mentioned on the r//movies subreddit.
Create a DataFrame of products mentioned on either subreddit.

In [None]:
combined_products = pd.concat([movie_products, gaming_products])

**Exercise 4:**

The Powerlifting Database dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. 
Both tables include references to a MeetID, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.

**join()** --> Une dos DataFrames horizontalmente (por columnas) usando:
El índice por defecto
O una columna clave, si se especifica con on=

I need to hace 'MeetID' as the index in both of them!!!

In [None]:
powerlifting_combined = powerlifting_meets.set_index('MeetID').join(powerlifting_competitors.set_index('MeetID'))