# Pandas

Allow us to manipulate and transform data

## Series

Is an ordered sequence of values... almost the same as a List. Is like a column in a database

## DataFrame

Is a collection of Series. Like a table of a database o an Excel table

First thing we need to do when working with pandas, is to install it. We have that already in this environment.
If we didn't we should execute a command:

    $ pip install pandas

Next we have to import that library



In [1]:
import pandas as pd

# Create a DataFrame.

we could ge a DataFrame from a database or csv file (Excel)... but in this case, we are going to create an empty DataFrame from scratch.

In [2]:
empty_dataframe = pd.DataFrame()

print(empty_dataframe)

Empty DataFrame
Columns: []
Index: []


In [3]:
# The index is the id
names = ('Alvaro', 'Ana', 'Luia', 'James', 'Brendan', 'Sarah')
names_dataframe = pd.DataFrame(names)
print(names_dataframe)

         0
0   Alvaro
1      Ana
2     Luia
3    James
4  Brendan
5    Sarah


In [4]:
names_dataframe.columns

RangeIndex(start=0, stop=1, step=1)

In [5]:
names_dataframe.index

RangeIndex(start=0, stop=6, step=1)

In [6]:
# Identify (give a name) to my columns

names_dataframe.columns=['Surname']
names_dataframe

Unnamed: 0,Surname
0,Alvaro
1,Ana
2,Luia
3,James
4,Brendan
5,Sarah


In [7]:
names2_dataframe = pd.DataFrame(names, columns=['Surname'])
names2_dataframe

Unnamed: 0,Surname
0,Alvaro
1,Ana
2,Luia
3,James
4,Brendan
5,Sarah


In [8]:
people = ( ('Alvaro', 24), ('Ana', 27), ('Luzia', 31), ('James', 32), ('Brendan', 30), ('Sarah', 27) )
people_dataframe = pd.DataFrame(people)
people_dataframe

Unnamed: 0,0,1
0,Alvaro,24
1,Ana,27
2,Luzia,31
3,James,32
4,Brendan,30
5,Sarah,27


In [9]:
people_dataframe.columns=['Surname', 'Age']
people_dataframe

Unnamed: 0,Surname,Age
0,Alvaro,24
1,Ana,27
2,Luzia,31
3,James,32
4,Brendan,30
5,Sarah,27


In [10]:
people_dataframe['Age']

0    24
1    27
2    31
3    32
4    30
5    27
Name: Age, dtype: int64

In [11]:
people_dataframe['Surname']

0     Alvaro
1        Ana
2      Luzia
3      James
4    Brendan
5      Sarah
Name: Surname, dtype: object

In [12]:
people_dataframe.iloc[3]

Surname    James
Age           32
Name: 3, dtype: object

In [13]:
people_dataframe

Unnamed: 0,Surname,Age
0,Alvaro,24
1,Ana,27
2,Luzia,31
3,James,32
4,Brendan,30
5,Sarah,27


In [14]:
people_dataframe.index = people_dataframe['Surname']
people_dataframe

Unnamed: 0_level_0,Surname,Age
Surname,Unnamed: 1_level_1,Unnamed: 2_level_1
Alvaro,Alvaro,24
Ana,Ana,27
Luzia,Luzia,31
James,James,32
Brendan,Brendan,30
Sarah,Sarah,27


In [15]:
people_dataframe.loc['James']

Surname    James
Age           32
Name: James, dtype: object

In [16]:
people_dataframe.loc['James']['Age']

32

In [17]:
people_dataframe['Age'].loc['James']

32

In [18]:
people = {
    "names": ('Alvaro', 'Ana', 'Luzia', 'James', 'Brendan', 'Sarah'),
    "ages": (25, 27, 31, 32, 34, 35)
}
new_people_dataframe = pd.DataFrame(people)
new_people_dataframe

Unnamed: 0,names,ages
0,Alvaro,25
1,Ana,27
2,Luzia,31
3,James,32
4,Brendan,34
5,Sarah,35


In [19]:
# Discover the size 

new_people_dataframe.ndim       # Number of columns

2

In [20]:
new_people_dataframe.shape       

(6, 2)

In [22]:
new_people_dataframe.shape[0] # Number of rows

6

In [23]:
new_people_dataframe.shape[1] # Number of columns

2

In [24]:
new_people_dataframe.size  

12

In [25]:
new_people_dataframe.head(3)

Unnamed: 0,names,ages
0,Alvaro,25
1,Ana,27
2,Luzia,31


In [26]:
new_people_dataframe.tail(3)

Unnamed: 0,names,ages
3,James,32
4,Brendan,34
5,Sarah,35


In [28]:
for each_row in new_people_dataframe.index:
    print(each_row)

0
1
2
3
4
5


In [29]:
for each_column in new_people_dataframe.columns:
    for each_row in new_people_dataframe.index:
        print( new_people_dataframe[each_column][each_row] )

Alvaro
Ana
Luzia
James
Brendan
Sarah
25
27
31
32
34
35


In [30]:
for each_row in new_people_dataframe.index:
    for each_column in new_people_dataframe.columns:
        print( new_people_dataframe[each_column][each_row] )

Alvaro
25
Ana
27
Luzia
31
James
32
Brendan
34
Sarah
35


In [31]:
new_people_dataframe.dtypes

names    object
ages      int64
dtype: object

In [34]:
new_people_dataframe.iloc[1]

names    Ana
ages      27
Name: 1, dtype: object

In [36]:
new_people_dataframe[2:5]

Unnamed: 0,names,ages
2,Luzia,31
3,James,32
4,Brendan,34


In [37]:
new_people_dataframe.iloc[2:5]

Unnamed: 0,names,ages
2,Luzia,31
3,James,32
4,Brendan,34


In [40]:
new_people_dataframe.iloc[2:5,[0,1]]

Unnamed: 0,names,ages
2,Luzia,31
3,James,32
4,Brendan,34


# Summary Dataframes

Start using pandas:

    $ pip install pandas

```
import pandas as pd  # To let python know that I want to use this library
```

```
new_dataframe=pd.Dataframe(  some_data  ) # This allows to create a new Dataframe
```

Some_data can be:

- nothing -> Empty DataFrame
- tuple   -> 1 column DataFrame
- tuple of tuples -> 1 column per subtuple
- distionary of tuples -> 1 column per tuple

## Properties

```
my_dataframe.shape -> returns a tuple with numer of rows and columns
my_dataframe.ndim  -> returns the number of columns
my_dataframe.size  -> returns the number of columns by the number of rows
my_dataframe.dtypes  -> returns the datatype of each column
my_dataframe.index  -> returns or assigns the values to be used as the ids of the rows . In addition we can use it to loop thru the rows
my_dataframe.columns  -> returns or assigns the names of the columns . In addition we can use it to loop thru the columns
```

## Recover data

```
my_dataframe['Name of a column] -> returns a column
my_dataframe['Name of a column][Numerical id of the row] -> returns a data
my_dataframe['Name of a column].iloc[Numerical id of the row] -> returns a data
my_dataframe.iloc[Numerical id of the row]['Name of a column] -> returns a data
my_dataframe.iloc[Start row id:end row id (not included)]     -> returns a bunch of rows
my_dataframe.iloc[Start row id:end row id (not included) , [list with the id of the columns to be retrieved]]
my_dataframe.head(number) -> return the first n rows in the dataframe
my_dataframe.tail(number) -> return the last n rows in the dataframe
```




In [41]:
new_people_dataframe

Unnamed: 0,names,ages
0,Alvaro,25
1,Ana,27
2,Luzia,31
3,James,32
4,Brendan,34
5,Sarah,35


In [45]:
new_people_dataframe['ages']

0    25
1    27
2    31
3    32
4    34
5    35
Name: ages, dtype: int64

In [48]:
new_column=new_people_dataframe['ages'].apply( lambda age: age*2 ) # This is not updating anything
# This creates a new column
new_column

0    50
1    54
2    62
3    64
4    68
5    70
Name: ages, dtype: int64

In [49]:
new_people_dataframe['double_ages']=new_column

In [50]:
new_people_dataframe

Unnamed: 0,names,ages,double_ages
0,Alvaro,25,50
1,Ana,27,54
2,Luzia,31,62
3,James,32,64
4,Brendan,34,68
5,Sarah,35,70


In [51]:
new_people_dataframe['trible_ages']=new_people_dataframe['ages'].apply( lambda age: age*3 )
new_people_dataframe

Unnamed: 0,names,ages,double_ages,trible_ages
0,Alvaro,25,50,75
1,Ana,27,54,81
2,Luzia,31,62,93
3,James,32,64,96
4,Brendan,34,68,102
5,Sarah,35,70,105


In [52]:
removed_column = new_people_dataframe.pop('trible_ages')

In [53]:
new_people_dataframe

Unnamed: 0,names,ages,double_ages
0,Alvaro,25,50
1,Ana,27,54
2,Luzia,31,62
3,James,32,64
4,Brendan,34,68
5,Sarah,35,70


In [54]:
removed_column

0     75
1     81
2     93
3     96
4    102
5    105
Name: trible_ages, dtype: int64

In [57]:
new_people_dataframe['names']=new_people_dataframe['names'].apply( lambda name: name.upper() )

In [58]:
new_people_dataframe

Unnamed: 0,names,ages,double_ages
0,ALVARO,25,50
1,ANA,27,54
2,LUZIA,31,62
3,JAMES,32,64
4,BRENDAN,34,68
5,SARAH,35,70


In [59]:
new_people_dataframe ['names'].str.upper()

0     ALVARO
1        ANA
2      LUZIA
3      JAMES
4    BRENDAN
5      SARAH
Name: names, dtype: object

In [60]:
new_people_dataframe["ages"]*2

0    50
1    54
2    62
3    64
4    68
5    70
Name: ages, dtype: int64

In [None]:
# I want a new column, grouping the people by its age:
# Older that 30? True | False

In [61]:
new_people_dataframe['older_than_30']=new_people_dataframe['ages'].apply( lambda age: age >= 30 )
new_people_dataframe

Unnamed: 0,names,ages,double_ages,older_than_30
0,ALVARO,25,50,False
1,ANA,27,54,False
2,LUZIA,31,62,True
3,JAMES,32,64,True
4,BRENDAN,34,68,True
5,SARAH,35,70,True


In [69]:
filtering_column = new_people_dataframe['ages'].apply( lambda age: age < 30 )
filtering_column

0     True
1     True
2    False
3    False
4    False
5    False
Name: ages, dtype: bool

In [70]:
new_people_dataframe[ filtering_column ]

Unnamed: 0,names,ages,double_ages,older_than_30
0,ALVARO,25,50,False
1,ANA,27,54,False


In [74]:
new_people_dataframe['ages'] != 34 # >= > <= < == !=

0     True
1     True
2     True
3     True
4    False
5     True
Name: ages, dtype: bool

In [77]:
import re

def myTransformationFunction(value):
    #if re.match(".*[Nn]", value):
    #    return value.upper()
    #else:
    #    return value.lower()
    return value.upper() if re.match(".*[Nn]", value) else value.lower()

    
new_people_dataframe['names'].apply( lambda value : value.upper() if re.match(".*[Nn]", value) else value.lower() )

0     alvaro
1        ANA
2      luzia
3      james
4    BRENDAN
5      sarah
Name: names, dtype: object