<h1>Dataframes</h1>

<p>DataFrames are a two-dimensional data structure in Pandas that can store data of various types in columns with column labels and row indices, enabling powerful and efficient data manipulations such as selection, filtering, grouping, sorting, and joining of tables. They are similar to Excel spreadsheets or database tables, but with additional features that facilitate working with data in Python. Pandas offers a wide range of functions for working with DataFrames and manipulating data in a flexible and effective manner.</p>

<p>DataFrames são uma estrutura de dados bidimensional do Pandas que podem armazenar dados de vários tipos em colunas com rótulos de coluna e índices de linha, permitindo manipulações de dados poderosas e eficientes, como seleção, filtragem, agrupamento, ordenação e junção de tabelas. Eles são semelhantes a planilhas do Excel ou a tabelas de banco de dados, mas com recursos adicionais que facilitam o trabalho com dados em Python. O Pandas oferece uma ampla gama de funções para trabalhar com DataFrames e manipular dados de maneira flexível e eficaz.</p>

In [1]:
import pandas as pd
from numpy.random import randn
#Numpy function to generate random numbers

In [2]:
df = pd.DataFrame(randn(5,4), index=["A", "B" ,"C" ,"D" , "E"], columns="W X Y Z".split())
#Using randn, I created a two-dimensional array with 5 rows and 4 columns of random numbers,
#and then specified the corresponding indices and columns.
df

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122
B,-0.185678,0.603814,-0.772985,0.872938
C,1.227227,0.403007,0.825448,0.73248
D,0.273832,0.212442,1.245242,-1.550229
E,-0.673642,0.931925,1.202262,0.135763


In [3]:
#To select elements from the DataFrame, I indicate the column, 
#unlike when working with Series where I indicate the index.
#When selecting a column, I get back a Series, the DataFrame is a collection of Series.

df['W']

A   -1.236257
B   -0.185678
C    1.227227
D    0.273832
E   -0.673642
Name: W, dtype: float64

In [4]:
#If I make the same query, but put the column name in a list,
#even if it's just one column, I'll get back a DataFrame with a single column.

df[['W']]

Unnamed: 0,W
A,-1.236257
B,-0.185678
C,1.227227
D,0.273832
E,-0.673642


In [5]:
#To create new columns, simply pass a name that doesn't exist, 
#assigning the values for that column. In this case, we created a copy of column W.

df['New Column'] = df['W']
df

Unnamed: 0,W,X,Y,Z,New Column
A,-1.236257,0.497522,-1.404977,-1.11122,-1.236257
B,-0.185678,0.603814,-0.772985,0.872938,-0.185678
C,1.227227,0.403007,0.825448,0.73248,1.227227
D,0.273832,0.212442,1.245242,-1.550229,0.273832
E,-0.673642,0.931925,1.202262,0.135763,-0.673642


In [6]:
#I can create new columns by performing arithmetic operations.

df['Another column'] = df['W'] + df['Y']
df

Unnamed: 0,W,X,Y,Z,New Column,Another column
A,-1.236257,0.497522,-1.404977,-1.11122,-1.236257,-2.641234
B,-0.185678,0.603814,-0.772985,0.872938,-0.185678,-0.958663
C,1.227227,0.403007,0.825448,0.73248,1.227227,2.052675
D,0.273832,0.212442,1.245242,-1.550229,0.273832,1.519075
E,-0.673642,0.931925,1.202262,0.135763,-0.673642,0.52862


In [7]:
#I can delete a column using the 'del' command.
del df['Another column']
df

Unnamed: 0,W,X,Y,Z,New Column
A,-1.236257,0.497522,-1.404977,-1.11122,-1.236257
B,-0.185678,0.603814,-0.772985,0.872938,-0.185678
C,1.227227,0.403007,0.825448,0.73248,1.227227
D,0.273832,0.212442,1.245242,-1.550229,0.273832
E,-0.673642,0.931925,1.202262,0.135763,-0.673642


In [8]:
#I can delete a column using the 'drop' method
#"axis 1" corresponds to column and "axis 0" corresponds to row
df.drop('New Column', axis=1)

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122
B,-0.185678,0.603814,-0.772985,0.872938
C,1.227227,0.403007,0.825448,0.73248
D,0.273832,0.212442,1.245242,-1.550229
E,-0.673642,0.931925,1.202262,0.135763


In [9]:
#In the case of 'drop', it manipulates a copy of the DataFrame and presents it to us,
#but the original remains intact.

df

Unnamed: 0,W,X,Y,Z,New Column
A,-1.236257,0.497522,-1.404977,-1.11122,-1.236257
B,-0.185678,0.603814,-0.772985,0.872938,-0.185678
C,1.227227,0.403007,0.825448,0.73248,1.227227
D,0.273832,0.212442,1.245242,-1.550229,0.273832
E,-0.673642,0.931925,1.202262,0.135763,-0.673642


In [10]:
#I can use drop to save a modified copy in another variable, creating a new dataframe

df2 = df.drop('New Column', axis=1)
df2

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122
B,-0.185678,0.603814,-0.772985,0.872938
C,1.227227,0.403007,0.825448,0.73248
D,0.273832,0.212442,1.245242,-1.550229
E,-0.673642,0.931925,1.202262,0.135763


In [11]:
#To make the drop function modify the original dataframe, I need to pass the inplace attribute as true
df.drop('New Column', axis=1, inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122
B,-0.185678,0.603814,-0.772985,0.872938
C,1.227227,0.403007,0.825448,0.73248
D,0.273832,0.212442,1.245242,-1.550229
E,-0.673642,0.931925,1.202262,0.135763


In [12]:
#I can use "loc" to select rows
#In this way, it returns the row in a series format

df.loc['A']

W   -1.236257
X    0.497522
Y   -1.404977
Z   -1.111220
Name: A, dtype: float64

In [13]:
#In this way, it returns the row in a DataFrames format

df.loc[['A']]

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122


In [14]:
df.loc[['A','B']]

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122
B,-0.185678,0.603814,-0.772985,0.872938


In [15]:
#With iloc I can select specific elements by indicating the row and column

df.iloc[0,2] # row A and column Y

-1.4049772197075003

In [16]:
#In loc, from the second element onwards, it is understood that 
#I am referring to the column, so I can pass the rows and columns like in iloc,
#but in iloc it is done numerically and in loc I pass the index

df.loc[['A','B'],'X']

A    0.497522
B    0.603814
Name: X, dtype: float64

In [17]:
df.loc[['A','B'],['X']]

Unnamed: 0,X
A,0.497522
B,0.603814


In [18]:
df

Unnamed: 0,W,X,Y,Z
A,-1.236257,0.497522,-1.404977,-1.11122
B,-0.185678,0.603814,-0.772985,0.872938
C,1.227227,0.403007,0.825448,0.73248
D,0.273832,0.212442,1.245242,-1.550229
E,-0.673642,0.931925,1.202262,0.135763
