# 📘 **DataFrames**

A DataFrame is the second data structure provided by Pandas library. It is a 2-dimensional data structure, build it with rows and columns.

**Key Characteristics**

* **2-Dimensions:** Rows and columns
* **Heterogenous:** The values can be of different types in the different columns
* **Mutable:** Each value in dataframes can be changed
* **Labeled Axes:** Dataframes have both row index and column index 

In [1]:
import pandas as pd

## **Create DataFrames**

**Dataframe from a list of lists**

In [2]:
data_list = [
    ["Alice", 28, "Data Engineer"],
    ["Dante", 35, "Finance Analyst"],
    ["Bob", 21, "HR Analyst Junior"]
]

df_from_list = pd.DataFrame( data= data_list, columns=["name", "age", "position"] )
df_from_list

Unnamed: 0,name,age,position
0,Alice,28,Data Engineer
1,Dante,35,Finance Analyst
2,Bob,21,HR Analyst Junior


**DataFrame From a Dictionary**

In [3]:
data_dict = {
    "product_name": ["Play Station 5", "Xbox Series X", "Tv LG 32\""],
    "category": ["Console", "Console", "Tvs"],
    "price": [599.99, 599.99, 242.99]
}

df_dict = pd.DataFrame(data_dict)
df_dict

Unnamed: 0,product_name,category,price
0,Play Station 5,Console,599.99
1,Xbox Series X,Console,599.99
2,"Tv LG 32""",Tvs,242.99


**DataFrame with Specific Index**

In [4]:
champions_league_results = {
    'matches': [
        
        {
            'local_team': 'Real Madrid',
            'guest_team': 'Olympique de Marseille',
            'local_goals': 2,
            'guest_goals': 1,
            'date_time': '2025-09-16 20:00',
            'stadium': 'Santiago Bernabéu Stadium',
            'country': 'Spain'
        },
        {
            'local_team': 'Juventus',
            'guest_team': 'Borussia Dortmund',
            'local_goals': 4,
            'guest_goals': 4,
            'date_time': '2025-09-16 20:00',
            'stadium': 'Allianz Stadium',
            'country': 'Italy'
        },
        {
            'local_team': 'S.L. Benfica',
            'guest_team': 'Qarabağ FK',
            'local_goals': 2,
            'guest_goals': 3,
            'date_time': '2025-09-16 20:00',
            'stadium': 'Estádio da Luz',
            'country': 'Portugal'
        },
        {
            'local_team': 'Tottenham Hotspur F.C.',
            'guest_team': 'Villarreal CF',
            'local_goals': 1,
            'guest_goals': 0,
            'date_time': '2025-09-16 20:00',
            'stadium': 'Tottenham Hotspur Stadium',
            'country': 'England'
        },
        {
            'local_team': 'Athletic Club',
            'guest_team': 'Arsenal F.C.',
            'local_goals': 0,
            'guest_goals': 2,
            'date_time': '2025-09-16 20:00',
            'stadium': 'San Mamés Stadium',
            'country': 'Spain'
        },
        {
            'local_team': 'PSV',
            'guest_team': 'Royale Union Saint-Gilloise',
            'local_goals': 1,
            'guest_goals': 3,
            'date_time': '2025-09-16 20:00',
            'stadium': 'Philips Stadion',
            'country': 'Netherlands'
        },
        
        {
            'local_team': 'Liverpool',
            'guest_team': 'Atletico de Madrid',
            'local_goals': 3,
            'guest_goals': 2,
            'date_time': '2025-09-17 20:00',
            'stadium': 'Anfield',
            'country': 'England'
        },
        {
            'local_team': 'FC Bayern Munich',
            'guest_team': 'Chelsea F.C.',
            'local_goals': 3,
            'guest_goals': 1,
            'date_time': '2025-09-17 20:00',
            'stadium': 'Allianz Arena',
            'country': 'Germany'
        },
        {
            'local_team': 'Paris Saint-Germain F.C.',
            'guest_team': 'Atalanta BC',
            'local_goals': 4,
            'guest_goals': 0,
            'date_time': '2025-09-17 20:00',
            'stadium': 'Parc des Princes',
            'country': 'France'
        },
        {
            'local_team': 'AFC Ajax',
            'guest_team': 'Inter Milan',
            'local_goals': 0,
            'guest_goals': 2,
            'date_time': '2025-09-17 20:00',
            'stadium': 'Johan Cruyff Arena',
            'country': 'Netherlands'
        },
        {
            'local_team': 'SK Slavia Prague',
            'guest_team': 'FK Bodø/Glimt',
            'local_goals': 2,
            'guest_goals': 2,
            'date_time': '2025-09-17 20:00',
            'stadium': 'Fortuna Arena',
            'country': 'Czech Republic'
        },
        {
            'local_team': 'Olympiacos F.C.',
            'guest_team': 'Pafos FC',
            'local_goals': 0,
            'guest_goals': 0,
            'date_time': '2025-09-17 20:00',
            'stadium': 'Karaiskakis Stadium',
            'country': 'Greece'
        }
    ]
}

In [5]:
df_ucl = pd.DataFrame(champions_league_results["matches"], 
                      index=["A", "B", "C", "D", "E", "f", "G", "H",
                            "I", "J", "K", "L"])

In [6]:
df_ucl

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain
B,Juventus,Borussia Dortmund,4,4,2025-09-16 20:00,Allianz Stadium,Italy
C,S.L. Benfica,Qarabağ FK,2,3,2025-09-16 20:00,Estádio da Luz,Portugal
D,Tottenham Hotspur F.C.,Villarreal CF,1,0,2025-09-16 20:00,Tottenham Hotspur Stadium,England
E,Athletic Club,Arsenal F.C.,0,2,2025-09-16 20:00,San Mamés Stadium,Spain
f,PSV,Royale Union Saint-Gilloise,1,3,2025-09-16 20:00,Philips Stadion,Netherlands
G,Liverpool,Atletico de Madrid,3,2,2025-09-17 20:00,Anfield,England
H,FC Bayern Munich,Chelsea F.C.,3,1,2025-09-17 20:00,Allianz Arena,Germany
I,Paris Saint-Germain F.C.,Atalanta BC,4,0,2025-09-17 20:00,Parc des Princes,France
J,AFC Ajax,Inter Milan,0,2,2025-09-17 20:00,Johan Cruyff Arena,Netherlands


## **Selection and Indexing**

In [7]:
# select a specific column
df_ucl["local_goals"]

A    2
B    4
C    2
D    1
E    0
f    1
G    3
H    3
I    4
J    0
K    2
L    0
Name: local_goals, dtype: int64

In [8]:
# select a specific row by numerical index
df_ucl.iloc[2]

local_team         S.L. Benfica
guest_team           Qarabağ FK
local_goals                   2
guest_goals                   3
date_time      2025-09-16 20:00
stadium          Estádio da Luz
country                Portugal
Name: C, dtype: object

In [9]:
# select specific row by object index
df_ucl.loc["K"]

local_team     SK Slavia Prague
guest_team        FK Bodø/Glimt
local_goals                   2
guest_goals                   2
date_time      2025-09-17 20:00
stadium           Fortuna Arena
country          Czech Republic
Name: K, dtype: object

In [10]:
# select a group of rows
df_ucl[0: 4]

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain
B,Juventus,Borussia Dortmund,4,4,2025-09-16 20:00,Allianz Stadium,Italy
C,S.L. Benfica,Qarabağ FK,2,3,2025-09-16 20:00,Estádio da Luz,Portugal
D,Tottenham Hotspur F.C.,Villarreal CF,1,0,2025-09-16 20:00,Tottenham Hotspur Stadium,England


In [16]:
# select more than 1 columns
df_ucl[["local_team", "local_goals"]]

Unnamed: 0,local_team,local_goals
A,Real Madrid,2
B,Juventus,4
C,S.L. Benfica,2
D,Tottenham Hotspur F.C.,1
E,Athletic Club,0
f,PSV,1
G,Liverpool,3
H,FC Bayern Munich,3
I,Paris Saint-Germain F.C.,4
J,AFC Ajax,0


In [17]:
# select specific number of rows with head()
df_ucl.head(3)

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain
B,Juventus,Borussia Dortmund,4,4,2025-09-16 20:00,Allianz Stadium,Italy
C,S.L. Benfica,Qarabağ FK,2,3,2025-09-16 20:00,Estádio da Luz,Portugal


### **Filter Data**

In [18]:
df_ucl[df_ucl["local_goals"] < 3]

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain
C,S.L. Benfica,Qarabağ FK,2,3,2025-09-16 20:00,Estádio da Luz,Portugal
D,Tottenham Hotspur F.C.,Villarreal CF,1,0,2025-09-16 20:00,Tottenham Hotspur Stadium,England
E,Athletic Club,Arsenal F.C.,0,2,2025-09-16 20:00,San Mamés Stadium,Spain
f,PSV,Royale Union Saint-Gilloise,1,3,2025-09-16 20:00,Philips Stadion,Netherlands
J,AFC Ajax,Inter Milan,0,2,2025-09-17 20:00,Johan Cruyff Arena,Netherlands
K,SK Slavia Prague,FK Bodø/Glimt,2,2,2025-09-17 20:00,Fortuna Arena,Czech Republic
L,Olympiacos F.C.,Pafos FC,0,0,2025-09-17 20:00,Karaiskakis Stadium,Greece


In [19]:
df_ucl[df_ucl["local_goals"] < 3]["local_team"]

A               Real Madrid
C              S.L. Benfica
D    Tottenham Hotspur F.C.
E             Athletic Club
f                       PSV
J                  AFC Ajax
K          SK Slavia Prague
L           Olympiacos F.C.
Name: local_team, dtype: object

In [21]:
df_ucl[ df_ucl["country"] == "Spain" ]

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain
E,Athletic Club,Arsenal F.C.,0,2,2025-09-16 20:00,San Mamés Stadium,Spain


In [24]:
df_ucl[ (df_ucl["country"] == "Spain") | (df_ucl["country"] == "Germany") ]

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain
E,Athletic Club,Arsenal F.C.,0,2,2025-09-16 20:00,San Mamés Stadium,Spain
H,FC Bayern Munich,Chelsea F.C.,3,1,2025-09-17 20:00,Allianz Arena,Germany


In [26]:
df_ucl[ df_ucl["country"].isin(["Germany", "France"]) ]

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country
H,FC Bayern Munich,Chelsea F.C.,3,1,2025-09-17 20:00,Allianz Arena,Germany
I,Paris Saint-Germain F.C.,Atalanta BC,4,0,2025-09-17 20:00,Parc des Princes,France


### **Add New Columns**

In [28]:
df_ucl["diff_local_goals"] = df_ucl["local_goals"] - df_ucl["guest_goals"]

### **Applying custom functions**

We can create new columns applying custom functions using the ".apply()" method, this methods receive some arguments like the name of function and the axis.

* **axis=1:** Apply function row-wise (when we want to access the data of two or more rows)
* **axis=0:** Apply function column-wise

In [31]:
# using a custom function
def diff_guest_goals(row):
    return row["guest_goals"] - row["local_goals"]

In [34]:
df_ucl["diff_guest_goals"] = df_ucl.apply( diff_guest_goals, axis=1 )
df_ucl.head()

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country,diff_local_goals,diff_guest_goals
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain,1,-1
B,Juventus,Borussia Dortmund,4,4,2025-09-16 20:00,Allianz Stadium,Italy,0,0
C,S.L. Benfica,Qarabağ FK,2,3,2025-09-16 20:00,Estádio da Luz,Portugal,-1,1
D,Tottenham Hotspur F.C.,Villarreal CF,1,0,2025-09-16 20:00,Tottenham Hotspur Stadium,England,1,-1
E,Athletic Club,Arsenal F.C.,0,2,2025-09-16 20:00,San Mamés Stadium,Spain,-2,2


### **Delete Columns**

Using the **.drop()** method, we can delete columns, specifiyng the axis

* **axis=1:** Drop columns
* **axis=0:** Drop rows

In [41]:
df_ucl.drop("diff_local_goals", axis=1)

Unnamed: 0,local_team,guest_team,local_goals,guest_goals,date_time,stadium,country,diff_guest_goals
A,Real Madrid,Olympique de Marseille,2,1,2025-09-16 20:00,Santiago Bernabéu Stadium,Spain,-1
B,Juventus,Borussia Dortmund,4,4,2025-09-16 20:00,Allianz Stadium,Italy,0
C,S.L. Benfica,Qarabağ FK,2,3,2025-09-16 20:00,Estádio da Luz,Portugal,1
D,Tottenham Hotspur F.C.,Villarreal CF,1,0,2025-09-16 20:00,Tottenham Hotspur Stadium,England,-1
E,Athletic Club,Arsenal F.C.,0,2,2025-09-16 20:00,San Mamés Stadium,Spain,2
f,PSV,Royale Union Saint-Gilloise,1,3,2025-09-16 20:00,Philips Stadion,Netherlands,2
G,Liverpool,Atletico de Madrid,3,2,2025-09-17 20:00,Anfield,England,-1
H,FC Bayern Munich,Chelsea F.C.,3,1,2025-09-17 20:00,Allianz Arena,Germany,-2
I,Paris Saint-Germain F.C.,Atalanta BC,4,0,2025-09-17 20:00,Parc des Princes,France,-4
J,AFC Ajax,Inter Milan,0,2,2025-09-17 20:00,Johan Cruyff Arena,Netherlands,2


## **Getting Information About DataFrames**

In [35]:
# see the shape
df_ucl.shape

(12, 9)

In [36]:
# see an overview of information
df_ucl.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, A to L
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   local_team        12 non-null     object
 1   guest_team        12 non-null     object
 2   local_goals       12 non-null     int64 
 3   guest_goals       12 non-null     int64 
 4   date_time         12 non-null     object
 5   stadium           12 non-null     object
 6   country           12 non-null     object
 7   diff_local_goals  12 non-null     int64 
 8   diff_guest_goals  12 non-null     int64 
dtypes: int64(4), object(5)
memory usage: 1.2+ KB


In [37]:
# see data types
df_ucl.dtypes

local_team          object
guest_team          object
local_goals          int64
guest_goals          int64
date_time           object
stadium             object
country             object
diff_local_goals     int64
diff_guest_goals     int64
dtype: object

In [38]:
# see the column names
df_ucl.columns

Index(['local_team', 'guest_team', 'local_goals', 'guest_goals', 'date_time',
       'stadium', 'country', 'diff_local_goals', 'diff_guest_goals'],
      dtype='object')