# Introduzione a Pandas

*Data Frame* (oggetto di tipo `DataFrame`): tabella che organizza i dati in righe (*record*) e colonne con un'intestazione.

Ogni colonna contiene dati di tipo omogeneo.

Ad ognuna delle righe viene associato un indice (chiave primaria), che di default è un numero intero progressivo.

Righe e colonne sono oggetti di tipo `Series`.

### Funzionalità

1. costruzione
1. interrogazione
1. aggiornamento

Formati da cui derivare un *data frame*: `csv`, `excel`, `json`.

---

Importare `Pandas`.

In [1]:
import pandas as pd

---

## Creazione di un *Data Frame* a partire da un dizionario

       df = pd.DataFrame(data_dictionary)

In [2]:
data_dictionary = {'Cognome' : ['Rossi', 'Bianchi', 'Verdi', 'Neri'],
                   'Nome' : ['Andrea', 'Sara', 'Tommaso', 'Anna'],
                  'AnnoNascita' : [2000, 2001, 1999, 2004]}

In [3]:
pd.DataFrame(data_dictionary)

Unnamed: 0,Cognome,Nome,AnnoNascita
0,Rossi,Andrea,2000
1,Bianchi,Sara,2001
2,Verdi,Tommaso,1999
3,Neri,Anna,2004


---

## Creazione di un *Data Frame* a partire da una lista

       df = pd.DataFrame(data_list, columns = column_names)
       


In [5]:
data_list = []
data_list.append(['Rossi', 'Andrea', 2000])
data_list.append(['Bianchi', 'Sara', 2001])
data_list.append(['Verdi', 'Tommaso', 1999])
data_list.append(['Neri', 'Anna', 2004])

In [6]:
pd.DataFrame(data_list, columns = ['Cognome', 'Nome', 'AnnoNascita'])

Unnamed: 0,Cognome,Nome,AnnoNascita
0,Rossi,Andrea,2000
1,Bianchi,Sara,2001
2,Verdi,Tommaso,1999
3,Neri,Anna,2004


---

## Creazione di un *Data Frame* da un file `csv`

    df = pd.read_csv(csv_file_name)

### Lettura del file `2017-german-election-overall.csv`

In [9]:
df = pd.read_csv('./2017-german-election-overall.csv')

---

## Recuperare informazioni sul *Data Frame*

        df.info()

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   area_id               299 non-null    int64 
 1   area_names            299 non-null    object
 2   state                 299 non-null    object
 3   registered.voters     299 non-null    int64 
 4   total_votes           299 non-null    int64 
 5   invalid_first_votes   299 non-null    int64 
 6   invalid_second_votes  299 non-null    int64 
 7   valid_first_votes     299 non-null    int64 
 8   valid_second_votes    299 non-null    int64 
dtypes: int64(7), object(2)
memory usage: 21.1+ KB


---

## Ottenere le prime/ultime `n` righe

    df.head(n)
    df.tail(n)

In [11]:
df.head(10)

Unnamed: 0,area_id,area_names,state,registered.voters,total_votes,invalid_first_votes,invalid_second_votes,valid_first_votes,valid_second_votes
0,1,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1647,1509,170258,170396
1,2,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1299,1125,137901,138075
2,3,Steinburg – Dithmarschen Süd,Schleswig-Holstein,175950,132016,1133,1141,130883,130875
3,4,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1285,1119,156102,156268
4,5,Kiel,Schleswig-Holstein,204650,151463,1657,1290,149806,150173
5,6,Plön – Neumünster,Schleswig-Holstein,174934,131710,1221,1196,130489,130514
6,7,Pinneberg,Schleswig-Holstein,237474,187711,1616,1339,186095,186372
7,8,Segeberg – Stormarn-Mitte,Schleswig-Holstein,245826,193318,1628,1381,191690,191937
8,9,Ostholstein – Stormarn-Nord,Schleswig-Holstein,181480,138415,1151,1142,137264,137273
9,10,Herzogtum Lauenburg – Stormarn-Süd,Schleswig-Holstein,244503,193359,1597,1388,191762,191971


---

## Fare una copia di un *data frame*

        df.copy()

In [12]:
df.copy()

Unnamed: 0,area_id,area_names,state,registered.voters,total_votes,invalid_first_votes,invalid_second_votes,valid_first_votes,valid_second_votes
0,1,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1647,1509,170258,170396
1,2,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1299,1125,137901,138075
2,3,Steinburg – Dithmarschen Süd,Schleswig-Holstein,175950,132016,1133,1141,130883,130875
3,4,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1285,1119,156102,156268
4,5,Kiel,Schleswig-Holstein,204650,151463,1657,1290,149806,150173
...,...,...,...,...,...,...,...,...,...
294,295,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,2354,1818,137190,137726
295,296,Saarbrücken,Saarland,199887,147602,2304,2171,145298,145431
296,297,Saarlouis,Saarland,207500,160371,2422,2831,157949,157540
297,298,St. Wendel,Saarland,177468,141378,2620,2668,138758,138710


---

## Variabile `shape`



In [13]:
df.shape

(299, 9)

## Variabile  `columns`

In [15]:
list(df.columns)

['area_id',
 'area_names',
 'state',
 'registered.voters',
 'total_votes',
 'invalid_first_votes',
 'invalid_second_votes',
 'valid_first_votes',
 'valid_second_votes']

---

## Cambiare i nomi delle colonne

    df.rename(columns = name_dict, inplace = False)


In [16]:
df.rename(columns = {'registered.voters' : 'voters', 'area_names' : 'area'}, inplace = True)

In [17]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_first_votes,invalid_second_votes,valid_first_votes,valid_second_votes
0,1,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1647,1509,170258,170396
1,2,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1299,1125,137901,138075
2,3,Steinburg – Dithmarschen Süd,Schleswig-Holstein,175950,132016,1133,1141,130883,130875
3,4,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1285,1119,156102,156268
4,5,Kiel,Schleswig-Holstein,204650,151463,1657,1290,149806,150173
...,...,...,...,...,...,...,...,...,...
294,295,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,2354,1818,137190,137726
295,296,Saarbrücken,Saarland,199887,147602,2304,2171,145298,145431
296,297,Saarlouis,Saarland,207500,160371,2422,2831,157949,157540
297,298,St. Wendel,Saarland,177468,141378,2620,2668,138758,138710


---

## Rimuovere righe o colonne

    df.drop(list_to_remove, axis = 1|0, inplace = False)
    
---

**ESERCIZIO**: rimuovere le colonne `valid_first_votes` e `invalid_first_votes` e in seguito le righe con indice `2`, `9` e `10`.

In [18]:
df.drop(['valid_first_votes', 'invalid_first_votes'], axis = 1, inplace = True)

In [19]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes
0,1,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396
1,2,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075
2,3,Steinburg – Dithmarschen Süd,Schleswig-Holstein,175950,132016,1141,130875
3,4,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268
4,5,Kiel,Schleswig-Holstein,204650,151463,1290,150173
...,...,...,...,...,...,...,...
294,295,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726
295,296,Saarbrücken,Saarland,199887,147602,2171,145431
296,297,Saarlouis,Saarland,207500,160371,2831,157540
297,298,St. Wendel,Saarland,177468,141378,2668,138710


In [20]:
df.drop([2,9,10], axis = 0, inplace = True)

In [22]:
df.head(20)

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes
0,1,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396
1,2,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075
3,4,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268
4,5,Kiel,Schleswig-Holstein,204650,151463,1290,150173
5,6,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514
6,7,Pinneberg,Schleswig-Holstein,237474,187711,1339,186372
7,8,Segeberg – Stormarn-Mitte,Schleswig-Holstein,245826,193318,1381,191937
8,9,Ostholstein – Stormarn-Nord,Schleswig-Holstein,181480,138415,1142,137273
11,12,Schwerin – Ludwigslust-Parchim I – Nordwestmec...,Mecklenburg-Vorpommern,216743,157070,1658,155412
12,13,Ludwigslust-Parchim II – Nordwestmecklenburg I...,Mecklenburg-Vorpommern,205757,146781,1673,145108


---

## Estrarre una colonna

    df[column_name]
    df.column_name
    
---

**ESERCIZIO**: estrarre la colonna `voters` e aggiornare a 0 tutti i valori della colonna `area_id`.

In [25]:
df['voters']

0      225659
1      186384
3      199632
4      204650
5      174934
        ...  
294    183202
295    199887
296    207500
297    177468
298    192408
Name: voters, Length: 296, dtype: int64

In [26]:
df.voters

0      225659
1      186384
3      199632
4      204650
5      174934
        ...  
294    183202
295    199887
296    207500
297    177468
298    192408
Name: voters, Length: 296, dtype: int64

In [28]:
df['area_id'] = 0

In [29]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396
1,0,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075
3,0,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268
4,0,Kiel,Schleswig-Holstein,204650,151463,1290,150173
5,0,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514
...,...,...,...,...,...,...,...
294,0,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726
295,0,Saarbrücken,Saarland,199887,147602,2171,145431
296,0,Saarlouis,Saarland,207500,160371,2831,157540
297,0,St. Wendel,Saarland,177468,141378,2668,138710


---

## Estrarre più colonne


    df[column_list]
    
---

**ESERCIZIO**: estrarre le colonne `voters` e `total_votes`.

In [30]:
df[['voters', 'total_votes']]

Unnamed: 0,voters,total_votes
0,225659,171905
1,186384,139200
3,199632,157387
4,204650,151463
5,174934,131710
...,...,...
294,183202,139544
295,199887,147602
296,207500,160371
297,177468,141378


In [31]:
df[['voters']]

Unnamed: 0,voters
0,225659
1,186384
3,199632
4,204650
5,174934
...,...
294,183202
295,199887
296,207500
297,177468


---

## Ottenere gli indici (chiavi primarie) di un data frame (o di una colonna)
    
    df.index
    column.index

---

In [33]:
df.state.index

Index([  0,   1,   3,   4,   5,   6,   7,   8,  11,  12,
       ...
       289, 290, 291, 292, 293, 294, 295, 296, 297, 298],
      dtype='int64', length=296)

---

## Aggiornare i valori di una colonna

    df[column_name] = series_obj
    
---

In [34]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396
1,0,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075
3,0,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268
4,0,Kiel,Schleswig-Holstein,204650,151463,1290,150173
5,0,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514
...,...,...,...,...,...,...,...
294,0,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726
295,0,Saarbrücken,Saarland,199887,147602,2171,145431
296,0,Saarlouis,Saarland,207500,160371,2831,157540
297,0,St. Wendel,Saarland,177468,141378,2668,138710


**ESERCIZIO**: aggiornare i valori della colonna `area_id` con il relativo indice.

In [35]:
df['area_id'] = df.index

In [36]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396
1,1,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075
3,3,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173
5,5,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514
...,...,...,...,...,...,...,...
294,294,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726
295,295,Saarbrücken,Saarland,199887,147602,2171,145431
296,296,Saarlouis,Saarland,207500,160371,2831,157540
297,297,St. Wendel,Saarland,177468,141378,2668,138710


---

## Aggiungere una colonna

    df[new_column_name] = series_obj
    
---

**ESERCIZIO**: aggiungere la colonna che contiene la percentuale dei voti totali sui votanti.

In [41]:
df['vote_percentage'] = (df['total_votes'] / df['voters'] * 100).round(2)

In [42]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396,76.18
1,1,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075,74.68
3,3,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268,78.84
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514,75.29
...,...,...,...,...,...,...,...,...
294,294,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726,76.17
295,295,Saarbrücken,Saarland,199887,147602,2171,145431,73.84
296,296,Saarlouis,Saarland,207500,160371,2831,157540,77.29
297,297,St. Wendel,Saarland,177468,141378,2668,138710,79.66


---

## Selezionare righe tramite *slicing*


    df[start_pos_index:end_pos_index:step]
    
---

**ESERCIZIO**: estrarre le righe dalla quarta alla undicesima.

In [71]:
df[3:11]

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,unknown,174934,131710,1196,130514,75.29
6,6,Pinneberg,Schleswig-Holstein,237474,187711,1339,186372,79.04
7,7,Segeberg – Stormarn-Mitte,Schleswig-Holstein,245826,193318,1381,191937,78.64
8,8,Ostholstein – Stormarn-Nord,Schleswig-Holstein,181480,138415,1142,137273,76.27
11,11,Schwerin – Ludwigslust-Parchim I – Nordwestmec...,Mecklenburg-Vorpommern,216743,157070,1658,155412,72.47
12,12,Ludwigslust-Parchim II – Nordwestmecklenburg I...,Mecklenburg-Vorpommern,205757,146781,1673,145108,71.34
13,13,Rostock – Landkreis Rostock II,Mecklenburg-Vorpommern,222718,164037,1731,162306,73.65


---

## Selezionare righe in base a una condizione

    df[mask]
    
`mask` è un oggetto di tipo `Series` che contiene valori booleani `True` e `False`.

Viene restituito un *data frame* con le sole righe che corrispondono a un valore `True` di `mask`.

---

In [44]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396,76.18
1,1,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075,74.68
3,3,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268,78.84
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514,75.29
...,...,...,...,...,...,...,...,...
294,294,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726,76.17
295,295,Saarbrücken,Saarland,199887,147602,2171,145431,73.84
296,296,Saarlouis,Saarland,207500,160371,2831,157540,77.29
297,297,St. Wendel,Saarland,177468,141378,2668,138710,79.66


**ESERCIZIO**: estrarre i *records* che si riferiscono a `Berlin`.

In [47]:
mask = df['state'] == 'Berlin'
df[mask]

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
74,74,Berlin-Mitte,Berlin,206706,151634,2214,149420,73.36
75,75,Berlin-Pankow,Berlin,237071,188607,2198,186409,79.56
76,76,Berlin-Reinickendorf,Berlin,182392,137084,2171,134913,75.16
77,77,Berlin-Spandau – Charlottenburg Nord,Berlin,184602,132934,2276,130658,72.01
78,78,Berlin-Steglitz-Zehlendorf,Berlin,221209,180827,2110,178717,81.74
79,79,Berlin-Charlottenburg-Wilmersdorf,Berlin,198672,158245,1429,156816,79.65
80,80,Berlin-Tempelhof-Schöneberg,Berlin,235250,181259,2162,179097,77.05
81,81,Berlin-Neukölln,Berlin,202616,143790,2980,140810,70.97
82,82,Berlin-Friedrichshain-Kreuzberg – Prenzlauer B...,Berlin,223426,173504,1664,171840,77.66
83,83,Berlin-Treptow-Köpenick,Berlin,205105,157461,2124,155337,76.77


---

**ESERCIZIO**: estrarre il data frame che contiene solo le aree e il numero dei votanti per `Berlin` e `Bayern`.

In [51]:
mask = (df['state'] == 'Berlin') | (df['state'] == 'Bayern')
df[mask][['area', 'voters']]

Unnamed: 0,area,voters
74,Berlin-Mitte,206706
75,Berlin-Pankow,237071
76,Berlin-Reinickendorf,182392
77,Berlin-Spandau – Charlottenburg Nord,184602
78,Berlin-Steglitz-Zehlendorf,221209
79,Berlin-Charlottenburg-Wilmersdorf,198672
80,Berlin-Tempelhof-Schöneberg,235250
81,Berlin-Neukölln,202616
82,Berlin-Friedrichshain-Kreuzberg – Prenzlauer B...,223426
83,Berlin-Treptow-Köpenick,205105


---

## Selezionare una riga tramite `iloc[]`

    df.iloc[pos_index]
    

In [53]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396,76.18
1,1,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075,74.68
3,3,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268,78.84
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514,75.29
...,...,...,...,...,...,...,...,...
294,294,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726,76.17
295,295,Saarbrücken,Saarland,199887,147602,2171,145431,73.84
296,296,Saarlouis,Saarland,207500,160371,2831,157540,77.29
297,297,St. Wendel,Saarland,177468,141378,2668,138710,79.66


In [73]:
third_row = df.iloc[2]
third_row

area_id                                     3
area                    Rendsburg-Eckernförde
state                      Schleswig-Holstein
voters                                 199632
total_votes                            157387
invalid_second_votes                     1119
valid_second_votes                     156268
vote_percentage                         78.84
Name: 3, dtype: object

---

## Selezionare righe contigue tramite `iloc[]`

    df.iloc[start_pos_index:end_pos_index]
    

In [74]:
df.iloc[3:11]

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,unknown,174934,131710,1196,130514,75.29
6,6,Pinneberg,Schleswig-Holstein,237474,187711,1339,186372,79.04
7,7,Segeberg – Stormarn-Mitte,Schleswig-Holstein,245826,193318,1381,191937,78.64
8,8,Ostholstein – Stormarn-Nord,Schleswig-Holstein,181480,138415,1142,137273,76.27
11,11,Schwerin – Ludwigslust-Parchim I – Nordwestmec...,Mecklenburg-Vorpommern,216743,157070,1658,155412,72.47
12,12,Ludwigslust-Parchim II – Nordwestmecklenburg I...,Mecklenburg-Vorpommern,205757,146781,1673,145108,71.34
13,13,Rostock – Landkreis Rostock II,Mecklenburg-Vorpommern,222718,164037,1731,162306,73.65


---

## Selezionare righe non contigue tramite `iloc[]`


    df.iloc[pos_index_list]
    
---

In [75]:
df.iloc[[0,5,7,10]]

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396,76.18
6,6,Pinneberg,Schleswig-Holstein,237474,187711,1339,186372,79.04
8,8,Ostholstein – Stormarn-Nord,Schleswig-Holstein,181480,138415,1142,137273,76.27
13,13,Rostock – Landkreis Rostock II,Mecklenburg-Vorpommern,222718,164037,1731,162306,73.65


---

## Selezionare una riga tramite `loc[]`


        df.loc[index]
        
---

In [54]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396,76.18
1,1,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075,74.68
3,3,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268,78.84
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514,75.29
...,...,...,...,...,...,...,...,...
294,294,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726,76.17
295,295,Saarbrücken,Saarland,199887,147602,2171,145431,73.84
296,296,Saarlouis,Saarland,207500,160371,2831,157540,77.29
297,297,St. Wendel,Saarland,177468,141378,2668,138710,79.66


**ESERCIZIO**: estrarre il campo `area` della riga con indice `4`.

In [56]:
df.loc[4]['area']

'Kiel'

---

## Selezionare righe tramite `loc[]`


    df.loc[index_list]
    
---

**ESERCIZIO**: estrarre le righe con indici `5`, `8` e `10`.

In [58]:
df.loc[[5,8]]

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
5,5,Plön – Neumünster,Schleswig-Holstein,174934,131710,1196,130514,75.29
8,8,Ostholstein – Stormarn-Nord,Schleswig-Holstein,181480,138415,1142,137273,76.27


---

## Selezionare righe che rispettano una condizione tramite `loc[]`


        df.loc[mask]
        
---

**ESERCIZIO**: estrarre le righe corrispondenti al valore `Berlin` della colonna `state`.

In [59]:
df.loc[df['state'] == 'Berlin']

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
74,74,Berlin-Mitte,Berlin,206706,151634,2214,149420,73.36
75,75,Berlin-Pankow,Berlin,237071,188607,2198,186409,79.56
76,76,Berlin-Reinickendorf,Berlin,182392,137084,2171,134913,75.16
77,77,Berlin-Spandau – Charlottenburg Nord,Berlin,184602,132934,2276,130658,72.01
78,78,Berlin-Steglitz-Zehlendorf,Berlin,221209,180827,2110,178717,81.74
79,79,Berlin-Charlottenburg-Wilmersdorf,Berlin,198672,158245,1429,156816,79.65
80,80,Berlin-Tempelhof-Schöneberg,Berlin,235250,181259,2162,179097,77.05
81,81,Berlin-Neukölln,Berlin,202616,143790,2980,140810,70.97
82,82,Berlin-Friedrichshain-Kreuzberg – Prenzlauer B...,Berlin,223426,173504,1664,171840,77.66
83,83,Berlin-Treptow-Köpenick,Berlin,205105,157461,2124,155337,76.77


---

## Selezionare il campo di una riga tramite `loc[]`


    df.loc[index, column_name]
    
---

In [61]:
df.loc[5, 'state'] = 'unknown'

In [62]:
df

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
0,0,Flensburg – Schleswig,Schleswig-Holstein,225659,171905,1509,170396,76.18
1,1,Nordfriesland – Dithmarschen Nord,Schleswig-Holstein,186384,139200,1125,138075,74.68
3,3,Rendsburg-Eckernförde,Schleswig-Holstein,199632,157387,1119,156268,78.84
4,4,Kiel,Schleswig-Holstein,204650,151463,1290,150173,74.01
5,5,Plön – Neumünster,unknown,174934,131710,1196,130514,75.29
...,...,...,...,...,...,...,...,...
294,294,Zollernalb – Sigmaringen,Baden-Württemberg,183202,139544,1818,137726,76.17
295,295,Saarbrücken,Saarland,199887,147602,2171,145431,73.84
296,296,Saarlouis,Saarland,207500,160371,2831,157540,77.29
297,297,St. Wendel,Saarland,177468,141378,2668,138710,79.66


---

## Selezionare un campo di più righe tramite `loc[]`


    df.loc[index_list, column_name]
    
---

In [66]:
df.loc[[5,11], 'area']

5                                     Plön – Neumünster
11    Schwerin – Ludwigslust-Parchim I – Nordwestmec...
Name: area, dtype: object

---

**ESERCIZIO**: aggiornare a 0 tutti i valori inferiori a 1000 relativi alla colonna `invalid_second_votes`.

In [68]:
mask = df['invalid_second_votes'] < 1000
df.loc[df[mask].index, 'invalid_second_votes'] = 0

In [69]:
df[mask]

Unnamed: 0,area_id,area,state,voters,total_votes,invalid_second_votes,valid_second_votes,vote_percentage
18,18,Hamburg-Altona,Hamburg,185942,146025,0,145076,78.53
19,19,Hamburg-Eimsbüttel,Hamburg,192399,156001,0,155141,81.08
20,20,Hamburg-Nord,Hamburg,217226,178302,0,177403,82.08
29,29,Stade I – Rotenburg II,Niedersachsen,197298,152485,0,151639,77.29
30,30,Mittelems,Niedersachsen,230151,179790,0,178827,78.12
34,34,Rotenburg I – Heidekreis,Niedersachsen,168081,127071,0,126097,75.6
35,35,Harburg,Niedersachsen,199081,161838,0,160858,81.29
36,36,Lüchow-Dannenberg – Lüneburg,Niedersachsen,180866,141000,0,140138,77.96
38,38,Stadt Osnabrück,Niedersachsen,196475,151988,0,151006,77.36
40,40,Stadt Hannover I,Niedersachsen,179883,134842,0,133867,74.96


---

## Selezionare più campi di più righe tramite `loc[]`


        df.loc[index_list, column_list]
        
---

---

## Ottenere un valore tramite `at[]`

    df.at[index, column_name]
    
---

---

## Controllare la presenza di valori nulli


    pd.isnull(df|column)
    
---

---

## Ottenere statistiche generali

    df.describe()
    column.describe()

---

## Alcuni metodi utili...

Gli oggetti `DataFrame` e `Series` dispongono di metodi come `max()`, `min()`, `count()`, `var()`, `std()`, `mean()`, `sum()`.

Gli oggetti `DataFrame` dispongono del metodo `corr()` per calcolare la matrice di covarianza. 

---

### Metodo  `unique()`

Il metodo `unique()` degli oggetti `Series` restituisce l'array dei valori distinti presenti nell'oggetto invocante.

---

**ESERCIZIO**: determinare i valori distinti della colonna `state`.

### Metodo `value_counts()`

    column|df.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

---

**ESERCIZIO**: determinare le frequenze relative del campo `state` e produrle in ordine crescente.

### Metodo `sort_values()`


    df.sort_values(column|column_list, ascending = True, inplace = False)
    
---

**ESERCIZIO**: ordinare le righe per colonne `state` e `area`, in ordine discendente (senza aggiornare il *data frame* invocante).

---

**ESERCIZIO**: ordinare le righe per numero di voti totali, in ordine discendente (senza aggiornare il *data frame* invocante).

---

## Raggruppare valori

    df.groupby(column_list)
   
---

**ESERCIZIO**: estrarre il numero di votanti totali per campo `state`.

---

**ESERCIZIO**: visualizzare il numero dei votanti per ogni area raggruppandole per campo `state`.

---

## Applicare una funzione con il metodo `apply()`

        df.apply(function)
        column.apply(function)


---

**ESERCIZIO**: estrarre il *data frame* con le due colonne contenenti il numero dei votanti e dei voti totali, con tutti i valori decrementati di 1000 unità.

---

**ESERCIZIO**: estrarre la colonna del numero dei votanti convertito in numero decimale.

---

## Iterare lungo le righe di un *data frame*


    for (index, record) in df.iterrows():
        block

---

## Salvare un *data frame* in formato `csv`


        df.to_csv(file_name, index = False)
        
---

---

## Richiamare `matplotlib` da Pandas

---

## Cambiare gli indici (chiavi primarie)
    
    df.set_index(column|columns, drop = True, inplace = False)
 

---

**ESERCIZIO**: produrre una copia del data frame che ha indice dato dal campo `area` seguito da un `-` e dall'attuale indice.