**SUMMARY FUNCTIONS AND MAPS**

Data does not always come out of memory in the format we want it in right out of the bat. Sometimes we have to do some more work ourselves to reformat it for the task at hand. This tutorial will cover different operations we can apply to our data to get the input "just right".

**Summary functions**

Pandas provides many simple "summary functions" (not an official name) which restructure the data in some useful way. For example, consider the describe() method:
This method generates a high-level summary of the attributes of the given column. It is type-aware, meaning that its output changes based on the data type of the input. 
With numerical --> count, mean, 75%, max, name. length, dtype
The output above only makes sense for numerical data; for string data here's what we get --> count, unique, top, freq, name, dtype


If you want to get some particular simple summary statistic about a column in a DataFrame or a Series, there is usually a helpful pandas function that makes it happen.
For example, to see the mean of the points allotted (e.g. how well an averagely rated wine does), we can use the **mean()** function:
```
reviews.points.mean()



To see a list of unique values we can use the **unique()** function:
```
reviews.taster_name.unique()


To see a list of unique values and how often they occur in the dataset, we can use the **value_counts()** method:
```
reviews.taster_name.value_counts()


**Maps**


A map is a term, borrowed from mathematics, for a function that takes one set of values and "maps" them to another set of values. In data science we often have a need for creating new representations from existing data, or for transforming data from the format it is in now to the format that we want it to be in later. Maps are what handle this work, making them extremely important for getting your work done!

--> es una forma de transformar un conjunto de datos en otro, **aplicando una función a cada elemento**.
👉 Piensa en él como un traductor automático que toma una lista de cosas y las convierte en otra lista, cambiando cada cosa con una misma regla.

There are two mapping methods that you will use often.

**map()** is the first, and slightly simpler one. For example, suppose that we wanted to remean the scores the wines received to 0. We can do this as follows:

```
review_points_mean = reviews.points.mean()
reviews.points.map(lambda p: p - review_points_mean
```
Esto aplica una función a cada elemento de reviews.points.
lambda p: define una función anónima que recibe un valor p (un punto individual).
p - review_points_mean es lo que devuelve esa función: la diferencia entre el valor p y la media.

¿Qué es lambda?
Una función lambda es simplemente una forma rápida y anónima de definir una función pequeña.
🧠 Es como una función normal, pero escrita en una línea sin def ni nombre.

**Comparémoslo:*
*
*Con for:
*
review_points_mean = reviews.points.mean()
nuevos_puntos = []
for p in reviews.points:
    nuevos_puntos.append(p - review_points_mean)
✔️ Hace lo mismo, pro:

Es máslargo

Crea una lista unueva manalmente

Tienes que es

*Con lambda y .map():*
reviews.points.map(lambda p: p - review_points_mean)
✔️ Hace lo mismo pero:
En una sola línea
Más expresivo y legible
Más "pythónico" (al estilo profesional de Python)
RESUMEN:
lambda es una forma rápida de definir funciones pequeñas.
.map() aplica esa función a cada elemento de una serie o lista.
cribir más)

--> The function you pass to map() should expect a **single value** from the Series (a point value, in the above example), and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

**apply()** is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row.

If we had called reviews.apply() with axis='index', then instead of passing a function to transform each row, we would need to give a function to transform each column.


```
def remean_points(row):
    row.points = row.points - review_points_mean
    return row

reviews.apply(remean_points, axis='ns')colum
```
apply(función, axis='columns') ⇒ aplica la función a cada 
fila.
axis='columns' o axis=1 → recorre 
filas
(por contraste: axis=0 recorre colu)ns')

**COMPARACION -map() VS apply()**
.map() --> for Series, goes one value at a time, and applies the function to each element on the Series or list (does not access to various columns)
.apply() --> for DataFrames, goes row by row, and can access multiple columns
¿Cómo sabe apply() qué columnas usar?
*No es que se las pases directamente a apply().
👉 Es la función que tú escribes (la que le das a apply) la que decide qué columnas usar.*

¿Y si quisieras acceder a varias?
Solo tienes que hacerlo dentro de la función:
```
def add_ratio(row):
    row["price_per_point"] = row["price"] / row["points"]
    return r
reviews.apply(add_ratio, axis='columns')
```
✅ Aquí tú decides usar price y points,
✅ y creas una nueva columna price_per_point.
ow

*IMPORTANT*
Note that map() and apply() return new, transformed Series and DataFrames, respectively. They don't modify the original data they're called on. If we look at the first row of reviews, we can see that it still has its original points value.

Pandas provides many common mapping operations as built-ins. For example, here's a faster way of remeaning our points column:
```
review_points_mean = reviews.points.mean()
reviews.points - review_points_mea
```
In this code we are performing an operation between a lot of values on the left-hand side (everything in the Series) and a single value on the right-hand side (the mean value). Pandas looks at this expression and figures out that we must mean to subtract that mean value from every value in the dataset.

Pandas will also understand what to do if we perform these operations between Series of equal length. For example, an easy way of combining country and region information in the dataset would be to do the following:
```
reviews.country + " - " + reviews.region_1```
These operators are faster than map() or apply() because they use speed ups built into pandas. All of the standard Python operators (>, <, ==, and so on) work in this manner.

However, they are not as flexible as map() or apply(), which can do more advanced things, like applying conditional logic, which cannot be done with addition and subtraction alone.
.
n


REMEMBER:
¿Y si quisieras guardarlo?
```
reviews["points_demeaned"] = reviews.points - review_points_mean
```
Esto crea una nueva columna llamada "points_demeaned" y guarda allí el resultado de restarle el promedio a cada valor.

**EXERCISES**

In [3]:
import pandas as pd

In [89]:
reviews = pd.DataFrame([
    {
        'country': 'Italy',
        'description': 'Aromas include tropical fruit, broom, and minerals.',
        'designation': 'Vulkà Bianco',
        'points': 87,
        'price': 20.0,
        'province': 'Sicily & Sardinia',
        'region_1': 'Etna',
        'region_2': 'Eastern Sicily',
        'taster_name': 'Kerin O’Keefe',
        'taster_twitter_handle': '@kerinokeefe',
        'title': 'Nicosia 2013 Vulkà Bianco (Etna)',
        'variety': 'White Blend',
        'winery': 'Nicosia'
    },
    {
        'country': 'Portugal',
        'description': 'This is ripe and fruity, a wine that is smooth and balanced.',
        'designation': 'Avidagos',
        'points': 87,
        'price': 15.0,
        'province': 'Douro',
        'region_1': 'Douro',
        'region_2': 'Northern Portugal',
        'taster_name': 'Roger Voss',
        'taster_twitter_handle': '@vossroger',
        'title': 'Quinta dos Avidagos 2011 Avidagos Red (Douro)',
        'variety': 'Portuguese Red',
        'winery': 'Quinta dos Avidagos'
    },
    {
        'country': 'France',
        'description': 'A dry style of Pinot Gris, crisp with acidity and minerality.',
        'designation': 'Classic',
        'points': 90,
        'price': 32.0,
        'province': 'Alsace',
        'region_1': 'Alsace',
        'region_2': 'Northeast France',
        'taster_name': 'Roger Voss',
        'taster_twitter_handle': '@vossroger',
        'title': 'Domaine Marcel Deiss 2012 Pinot Gris (Alsace)',
        'variety': 'Pinot Gris',
        'winery': 'Domaine Marcel Deiss'
    },
    {
        'country': 'France',
        'description': 'Big, rich and off-dry, with intensity and floral notes.',
        'designation': 'Lieu-dit Harth Cuvée Caroline',
        'points': 90,
        'price': 21.0,
        'province': 'Alsace',
        'region_1': 'Alsace',
        'region_2': 'Northeast France',
        'taster_name': 'Roger Voss',
        'taster_twitter_handle': '@vossroger',
        'title': 'Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caroline (Alsace)',
        'variety': 'Gewürztraminer',
        'winery': 'Domaine Schoffit'
    },
    {
        'country': 'Spain',
        'description': 'Dark cherry, spice and leather aromas dominate this classic Rioja.',
        'designation': 'Reserva',
        'points': 98,
        'price': 18.0,
        'province': 'Rioja',
        'region_1': 'Rioja Alta',
        'region_2': 'Northern Spain',
        'taster_name': 'Michael Schachner',
        'taster_twitter_handle': '@wineschach',
        'title': 'Marqués de Cáceres 2011 Reserva (Rioja)',
        'variety': 'Tempranillo',
        'winery': 'Marqués de Cáceres'
    },
    {
        'country': 'Canada',
        'description': 'Fruity and soft, with hints of raspberry and vanilla.',
        'designation': 'Estate',
        'points': 60,
        'price': 25.0,
        'province': 'California',
        'region_1': 'Napa Valley',
        'region_2': 'North Coast',
        'taster_name': 'Jim Gordon',
        'taster_twitter_handle': '@jimgordonwine',
        'title': 'Robert Mondavi 2014 Cabernet Sauvignon (Napa Valley)',
        'variety': 'Cabernet Sauvignon',
        'winery': 'Robert Mondavi'
    },
    {
        'country': 'Argentina',
        'description': 'Bold and structured, offering black fruit and mocha.',
        'designation': 'Gran Reserva',
        'points': 92,
        'price': 30.0,
        'province': 'Mendoza Province',
        'region_1': 'Uco Valley',
        'region_2': 'Mendoza',
        'taster_name': 'Alejandro Iglesias',
        'taster_twitter_handle': '@aliglesiaswine',
        'title': 'Trapiche 2015 Gran Reserva Malbec (Uco Valley)',
        'variety': 'Malbec',
        'winery': 'Trapiche'
    },
    {
        'country': 'Chile',
        'description': 'Smooth, with red berries and a touch of herbs.',
        'designation': 'Reserva Especial',
        'points': 86,
        'price': 12.0,
        'province': 'Maipo Valley',
        'region_1': 'Maipo Valley',
        'region_2': 'Central Valley',
        'taster_name': 'Patricio Tapia',
        'taster_twitter_handle': '@ptapiawine',
        'title': 'Concha y Toro 2016 Carmenere (Maipo Valley)',
        'variety': 'Carmenere',
        'winery': 'Concha y Toro'
    },
    {
        'country': 'Germany',
        'description': 'Lively and fresh, with notes of green apple and lime.',
        'designation': 'Kabinett',
        'points': 91,
        'price': 22.0,
        'province': 'Mosel',
        'region_1': 'Mosel',
        'region_2': 'Western Germany',
        'taster_name': 'Anne Krebiehl',
        'taster_twitter_handle': '@annewine',
        'title': 'Dr. Loosen 2015 Riesling Kabinett (Mosel)',
        'variety': 'Riesling',
        'winery': 'Dr. Loosen'
    },
    {
        'country': 'South Africa',
        'description': 'Aromas of citrus and melon, fresh and vibrant.',
        'designation': 'Signature',
        'points': 85,
        'price': 10.0,
        'province': 'Western Cape',
        'region_1': 'Stellenbosch',
        'region_2': 'Coastal Region',
        'taster_name': 'Lauren Buzzeo',
        'taster_twitter_handle': '@laurenbuzzeo',
        'title': 'Spier 2016 Chenin Blanc (Western Cape)',
        'variety': 'Chenin Blanc',
        'winery': 'Spier'
    },
    {
        'country': 'South Africa',
        'description': 'Aromas of citrus and melon, fresh and vibrant.',
        'designation': 'Signature',
        'points': 85,
        'price': 10.0,
        'province': 'Western Cape',
        'region_1': 'Stellenbosch',
        'region_2': 'Coastal Region',
        'taster_name': 'Lauren Buzzeo',
        'taster_twitter_handle': '@laurenbuzzeo',
        'title': 'Spier 2016 Chenin Blanc (Western Cape)',
        'variety': 'Chenin Blanc',
        'winery': 'Spier'
    }
])

print(reviews.head())

    country                                        description  \
0     Italy  Aromas include tropical fruit, broom, and mine...   
1  Portugal  This is ripe and fruity, a wine that is smooth...   
2    France  A dry style of Pinot Gris, crisp with acidity ...   
3    France  Big, rich and off-dry, with intensity and flor...   
4     Spain  Dark cherry, spice and leather aromas dominate...   

                     designation  points  price           province  \
0                   Vulkà Bianco      87   20.0  Sicily & Sardinia   
1                       Avidagos      87   15.0              Douro   
2                        Classic      90   32.0             Alsace   
3  Lieu-dit Harth Cuvée Caroline      90   21.0             Alsace   
4                        Reserva      98   18.0              Rioja   

     region_1           region_2        taster_name taster_twitter_handle  \
0        Etna     Eastern Sicily      Kerin O’Keefe          @kerinokeefe   
1       Douro  Northern Port

In [None]:
dir(reviews.points)

In [20]:
reviews["points_mean"] = reviews.points.mean()
print(reviews.head(5))

    country                                        description  \
0     Italy  Aromas include tropical fruit, broom, and mine...   
1  Portugal  This is ripe and fruity, a wine that is smooth...   
2    France  A dry style of Pinot Gris, crisp with acidity ...   
3    France  Big, rich and off-dry, with intensity and flor...   
4     Spain  Dark cherry, spice and leather aromas dominate...   

                     designation  points  price           province  \
0                   Vulkà Bianco      87   20.0  Sicily & Sardinia   
1                       Avidagos      87   15.0              Douro   
2                        Classic      90   32.0             Alsace   
3  Lieu-dit Harth Cuvée Caroline      90   21.0             Alsace   
4                        Reserva      89   18.0              Rioja   

     region_1           region_2        taster_name taster_twitter_handle  \
0        Etna     Eastern Sicily      Kerin O’Keefe          @kerinokeefe   
1       Douro  Northern Port

In [24]:
reviews["median_points"] = reviews.points.median()
print(reviews.head())

    country                                        description  \
0     Italy  Aromas include tropical fruit, broom, and mine...   
1  Portugal  This is ripe and fruity, a wine that is smooth...   
2    France  A dry style of Pinot Gris, crisp with acidity ...   
3    France  Big, rich and off-dry, with intensity and flor...   
4     Spain  Dark cherry, spice and leather aromas dominate...   

                     designation  points  price           province  \
0                   Vulkà Bianco      87   20.0  Sicily & Sardinia   
1                       Avidagos      87   15.0              Douro   
2                        Classic      90   32.0             Alsace   
3  Lieu-dit Harth Cuvée Caroline      90   21.0             Alsace   
4                        Reserva      89   18.0              Rioja   

     region_1           region_2        taster_name taster_twitter_handle  \
0        Etna     Eastern Sicily      Kerin O’Keefe          @kerinokeefe   
1       Douro  Northern Port

In [58]:
#see how many different wines im seeing
print(reviews.winery.unique()) #to get a list of unique entries in a column
print(reviews.winery.count())
print(reviews.winery.value_counts()) #To see a list of unique values and how often they occur in a Series

['Nicosia' 'Quinta dos Avidagos' 'Domaine Marcel Deiss' 'Domaine Schoffit'
 'Marqués de Cáceres' 'Robert Mondavi' 'Trapiche' 'Concha y Toro'
 'Dr. Loosen' 'Spier']
11
winery
Spier                   2
Nicosia                 1
Quinta dos Avidagos     1
Domaine Marcel Deiss    1
Domaine Schoffit        1
Marqués de Cáceres      1
Robert Mondavi          1
Trapiche                1
Concha y Toro           1
Dr. Loosen              1
Name: count, dtype: int64


In [64]:
reviews.head(11)

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, and mine...",Vulkà Bianco,87,20.0,Sicily & Sardinia,Etna,Eastern Sicily,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,Douro,Northern Portugal,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,France,"A dry style of Pinot Gris, crisp with acidity ...",Classic,90,32.0,Alsace,Alsace,Northeast France,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
3,France,"Big, rich and off-dry, with intensity and flor...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,Northeast France,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit
4,Spain,"Dark cherry, spice and leather aromas dominate...",Reserva,89,18.0,Rioja,Rioja Alta,Northern Spain,Michael Schachner,@wineschach,Marqués de Cáceres 2011 Reserva (Rioja),Tempranillo,Marqués de Cáceres
5,US,"Fruity and soft, with hints of raspberry and v...",Estate,88,25.0,California,Napa Valley,North Coast,Jim Gordon,@jimgordonwine,Robert Mondavi 2014 Cabernet Sauvignon (Napa V...,Cabernet Sauvignon,Robert Mondavi
6,Argentina,"Bold and structured, offering black fruit and ...",Gran Reserva,92,30.0,Mendoza Province,Uco Valley,Mendoza,Alejandro Iglesias,@aliglesiaswine,Trapiche 2015 Gran Reserva Malbec (Uco Valley),Malbec,Trapiche
7,Chile,"Smooth, with red berries and a touch of herbs.",Reserva Especial,86,12.0,Maipo Valley,Maipo Valley,Central Valley,Patricio Tapia,@ptapiawine,Concha y Toro 2016 Carmenere (Maipo Valley),Carmenere,Concha y Toro
8,Germany,"Lively and fresh, with notes of green apple an...",Kabinett,91,22.0,Mosel,Mosel,Western Germany,Anne Krebiehl,@annewine,Dr. Loosen 2015 Riesling Kabinett (Mosel),Riesling,Dr. Loosen
9,South Africa,"Aromas of citrus and melon, fresh and vibrant.",Signature,85,10.0,Western Cape,Stellenbosch,Coastal Region,Lauren Buzzeo,@laurenbuzzeo,Spier 2016 Chenin Blanc (Western Cape),Chenin Blanc,Spier


In [68]:
reviews.country.value_counts()

country
France          2
South Africa    2
Italy           1
Portugal        1
Spain           1
US              1
Argentina       1
Chile           1
Germany         1
Name: count, dtype: int64

**Exercise 4:** Create variable centered_price containing a version of the price column with the mean price subtracted.
(Note: this 'centering' transformation is a common preprocessing step before applying various machine learning algorithms.) 

In [72]:
centered_price = reviews.price - reviews.price.mean()
print(centered_price)

0      0.454545
1     -4.545455
2     12.454545
3      1.454545
4     -1.545455
5      5.454545
6     10.454545
7     -7.545455
8      2.454545
9     -9.545455
10    -9.545455
Name: price, dtype: float64


In [9]:
mean_price = reviews.price.mean()
cent_price = reviews.price.map(lambda p: p - mean_price)
print(cent_price)

0      0.454545
1     -4.545455
2     12.454545
3      1.454545
4     -1.545455
5      5.454545
6     10.454545
7     -7.545455
8      2.454545
9     -9.545455
10    -9.545455
Name: price, dtype: float64


**Exercise 5:** I'm an economical wine buyer. Which wine is the "best bargain"? Create a variable bargain_wine with the title of the wine with the highest points-to-price ratio in the dataset.

In [15]:
bargain_wine = reviews.title[(reviews.points/reviews.price).idxmax()]
print(bargain_wine)

Spier 2016 Chenin Blanc (Western Cape)


In [90]:
bargain_wine = reviews["winery"][(reviews.points/reviews.price).idxmax()]
print(bargain_wine)

Spier


In [96]:
#Solution:
bargain_idx = (reviews.points / reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx, 'title']
print(bargain_wine)
#REMEMBER Both loc and iloc are row-first, column-second!!!

Spier 2016 Chenin Blanc (Western Cape)


**Exercise 6:**
There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series `descriptor_counts` counting how many times each of these two words appears in the `description` column in the dataset. (For simplicity, let's ignore the capitalized versions of these words.)

In [352]:
#define a function that splits the string into words, cleans them and then searches for the words:
def tropical_search (list):
    tropical = 0
    clean_list = [i.lower().strip('.,') for i in list.split(' ')]
    if 'tropical' in clean_list:
        tropical += 1
    return tropical
def fruity_search(list):
    fruity = 0
    clean_list = [i.lower().strip('.,') for i in list.split(' ')]
    if 'fruity' in clean_list:
        fruity += 1
    return fruity
#now use map() to apply it to every row in the column 'description':
result = pd.Series([sum(reviews.description.map(tropical_search)), sum(reviews.description.map(fruity_search))], index = ['tropical','fruity'] )
print(result)


tropical    1
fruity      2
dtype: int64


In [350]:
search_fruity = reviews.description.map(lambda description: 'fruity' in description)
print(search_fruity)

0     False
1      True
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
Name: description, dtype: bool


In [340]:
search_fruity = reviews.description.map(lambda description: 'fruity' in description).sum()
search_tropical = reviews.description.map(lambda description: 'tropical' in description).sum()
results = pd.Series([search_fruity, search_tropical], index = ['Fruity', 'Tropical'])
print(results)
# have to use .sum() at the end cos i wanna sum after having the list of booleans that tell me whether the word is in the description or not

Fruity      1
Tropical    1
dtype: int64


In [318]:
#Solution:
n_trop = reviews.description.map(lambda desc: "tropical" in desc).sum()
n_fruity = reviews.description.map(lambda desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])
print(descriptor_counts)

tropical    1
fruity      1
dtype: int64


**Exercise 7:**
We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series star_ratings with the number of stars corresponding to each review in the dataset.

In [95]:
def rating(row): 
    if row.country == 'Canada':
        return('***')
    elif row.points >=95:
        return('***')
    elif row.points>=85 and row.points<95: #that and is not needed cos if the latter one was crrect, this line wouldnt run
        return('**')
    else:
        return('*')
star_ratings = reviews.apply(rating,'columns')

print (star_ratings)

0      **
1      **
2      **
3      **
4     ***
5     ***
6      **
7      **
8      **
9      **
10     **
dtype: object


In [75]:
three = reviews.title[(reviews.points >= 95)].count()
two = reviews.title[(reviews.points >= 85) & (reviews.points < 95)].count()
one = reviews.title[(reviews.points < 85)].count()
print(three)
print(two)
print(one)

0
11
0
